End-to-end AI Application Development with Programmatic Supervision

Abstract: 

One of the major bottlenecks in the development and deployment of AI applications is the need for the massive labeled training datasets that drive modern ML approaches today. These training datasets traditionally are often labeled by hand at a great time and monetary expense, and often cannot be hand-labeled practically at all due to privacy, expertise, and or rate-of-change requirements in real-world settings like finance, government, medicine, telecommunications, and more.

This talk will cover a range of *programmatic* (often called " weak supervision") approaches to building, labeling, augmenting, and structuring training datasets, as well as the broader effects on end-to-end ML and AI application development. Specifically, this talk will cover techniques around programmatic labeling- such as the data programming and Snorkel approaches; data augmentation techniques for augmenting datasets with transformed copies of data to increase model robustness; data structuring or “slicing” techniques for highlighting, monitoring, and enabling models to attend to critical and/or difficult subsets of the data; and more key techniques around training data management.

More broadly, this talk will address how these new programmatic approaches lead to a whole new end-to-end ML/AI application development process. Using the example of Snorkel, a new platform for this process, we will cover these ideas and how they extend to model training, monitoring and analysis, and the feedback loops that lead to actionable modification or extension of the programmatic supervision approaches, leading more broadly to a more iterative and error analysis-driven development and deployment process for ML and AI applications overall.

Bio: 

Alex Ratner has Ph.D. in computer science at Stanford, advised by Chris Re, where his research focuses on weak supervision: the idea of using higher-level, noisier input from domain experts to train complex state-of-the-art models where limited or no hand-labeled training data is available. He leads the development of the Snorkel framework (snorkel.stanford.edu) for weakly supervised ML, which has been applied to machine learning problems in domains like genomics, radiology, and political science. He is supported by a Stanford Bio-X SIGF fellowship.