Applying Interactive Weak Supervision to NLP Tasks


Weakly supervised approaches have gained popularity in the last two years, but there is still a significant amount of overhead in applying these methods to more complex NLP tasks. The performance of weakly supervised systems is contingent on both the quality and quantity of independent sources of weak signal- if a practitioner cannot come up with sufficient sources themselves then weak supervision is largely impractical.

To overcome this, we can use techniques to interactively generate candidate sources of weak supervision to guide the practitioner, making weak supervision practical for many tasks that would otherwise be difficult to support . In this tutorial, we’ll first build a basic weakly supervised system for an NLP task, and then augment it with some of these generative techniques to speed up the iterative process.

Session Outline:

Lesson 1: Foundational Weak Supervision
Familiarize yourself with concepts from weak supervision by implementing your first weakly supervised system from scratch. Learn the primitives that go into any weakly supervised system, and build an intuition about some of the inner workings.

Lesson 2: Probabilistic weak label evaluation
Evaluating weakly supervised labels is tricky, and using probabilistic labels in conventional model architectures is even trickier. Work through a simple implementation that should help you build an understanding of how to apply these techniques to more complex tasks.

Lesson 3: Weak signal generation
Complete the loop by programmatically generating some sources of weak supervision and measure how overall label quality was affected. Learn about some potential generation techniques and where their limitations are.


Shayan Mohanty is the CEO and Co-Founder of Watchful, a company that largely automates the process of creating labeled training data. He's spent over a decade of leading data engineering teams at various companies including Facebook, where he served as lead for the stream processing team responsible for processing 100% of the ads metrics data for all FB products. He is also a Guest Scientist at Los Alamos National Laboratory and has given talks on topics ranging from Automata Theory to Machine Teaching.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from Youtube
Consent to display content from Vimeo
Google Maps
Consent to display content from Google