Programming Machine Learning with Weak Supervision
Programming Machine Learning with Weak Supervision


One of the key bottlenecks in building machine learning systems today is creating and managing large labeled training datasets. In this talk, I’ll describe our work on systems to support and accelerate ways of creating training data in higher-level, programmatic, but noisier ways---often referred to as weak supervision. This work is motivated by the observation that ML developers spend an increasing amount of their time doing training data engineering---i.e. labeling, augmenting, reshaping, cleaning, and maintaining training datasets---and that we can better support these emerging workflows with both data management and statistical learning tools and principles. In this talk, I’ll describe Snorkel, our open-source system for training data labeling (, that can reduce training data creation time from months to days; other recent work around data augmentation and multi-task supervision; and applications of this work in domains ranging from medical imaging to unstructured data extraction.


Alex Ratner is a Ph.D. candidate in computer science at Stanford, advised by Chris Re, where his research focuses on weak supervision: the idea of using higher-level, noisier input from domain experts to train complex state-of-the-art models where limited or no hand-labeled training data is available. He leads the development of the Snorkel framework ( for weakly supervised ML, which has been applied to machine learning problems in domains like genomics, radiology, and political science. He is supported by a Stanford Bio-X SIGF fellowship.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google