Best Practices for Data Annotation at Scale


The long-term success of machine learning relies on consistently labeled high-quality data. While most machine learning initiatives begin in the lab, they take on a life of their own and can create significant challenges once they scale. ML data ops practitioners can find themselves being consumed by the logistics of data annotation and management instead of focusing on the science.

Wherever you are in your team’s machine learning journey, you must think about evolving towards large-scale production. Proactively planning a data management strategy can generate progressively better results, but it requires thought and stakeholder buy-in. A key ingredient of this journey is your data labeling and annotation framework.

A data pipeline designed for human judgment and incremental training on edge cases provides that last mile of acceptability, enabling the machine learning solution to go to production. This session will reveal the implications of a live data loop in a production environment and how it significantly impacts the customer experience. Attendees will also takeaway trends and challenges in combining humans with the machine learning pipeline.

In this session, iMerit's Jai Natarajan reveals best practices to build scalable and repeatable data labeling pipelines with a balance of tools and humans-in-the-loop. Through peer, manager, and machine-learning expert collaboration, data annotators refine their skills and master tasks well beyond the expertise of crowdsourcing. In a collaborative framework, annotators and ML experts negotiate and create meaning through an iterative feedback process as they identify new concepts and nuances in the data. Attendees will learn concepts like designing to break the ML, edge case knowledge management and workflow management.


Jai Natarajan is the Vice President, Strategic Business Development at iMerit, a global AI data solutions company delivering high-quality data that powers machine learning and artificial intelligence applications for Fortune 500 companies. Bringing more than 24 years of experience, Jai works with more than 5500 data experts who label and enrich data at scale to help customers get better results from their machine learning algorithms. Jai works with iMerit’s partner ecosystem to develop iMerit’s solutions for its customers, and provides strategic inputs to the company. Previously, Jai worked at Lucasfilm and Sony, and founded Xentrix, an Emmy-winning animation studio. He is a board member of the Anudip Foundation. Jai has an M.S. in Computer Science from UCLA, and undergraduate degrees from Birla Institute of Technology and Science.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google