Multi-task Learning

Abstract: The common approach in machine learning trains and optimizes one task at a time. In the case of supervised algorithms, single-task training requires carefully labeled datasets such that algorithms can learn the relationship between inputs and outputs. In many practical settings, these labeled datasets often are too small or not exactly right for the task, effectively prohibiting the use of advanced, data-hungry machine learning techniques.

In contrast, multi-task learning trains related tasks in parallel, often resulting in insightful models that generalize better. It is a technique that can be applied to many algorithms, and offers a way to make use of several distinct, small but related datasets. In this talk we provide an overview of multi-task learning and recipes for turning single-task algorithms into multi-task algorithms, including data science “workhorse” algorithms (e.g., decision trees, random forests). We also describe a multi-task neural network capable of learning task-agnostic representations, and show a related prototype.

Bio: Shioulin Sam is a research engineer at Cloudera Fast Forward Labs. In her previous life, she was an angel investor focusing on women-led start-ups. She also worked in the investment management industry designing quantitative trading strategies. She holds a Ph.D in Electrical Engineering and Computer Science from Massachusetts Institute of Technology.

Open Data Science Conference