How to Get Quality Data Labels
How to Get Quality Data Labels


The opportunity to crowdsource training data for AI provides unprecedented capacity for gathering human-generated data labels. However, crowdsourced data can vary significantly in quality. Variations in worker skill, training, and experience can affect quality. The complexity of the user experience (UX) for labeling as well as natural human error or fatigue can also affect quality. Bad actors can pollute collected data with fraudulent results. These variations are ultimately potential sources of error for a trained model. Data scientists should have an understanding of potential quality impacts on their data, resulting from how they design and constrain their labeling tasks. This presentation focuses on getting high quality results from a crowd with specific insights on how improving the worker experience can result in better quality labels in addition to benefiting the workers.


Dr. Cheryl Martin joined Alegion in 2018. She previously directed the Center for Content Understanding and the Cyber Information Assurance and Decision Support Group at the Applied Research Laboratories, The University of Texas at Austin (ARL:UT). Dr. Martin's areas of expertise include distributed artificial intelligence, machine learning, rule-based systems, and dynamically adaptive software. Through her work at ARL:UT, she has applied data mining, detection, and inference techniques to information assurance problems and cyber security challenges such as intrusion detection and insider threat detection. Her work in combining semantic knowledge models, natural language processing, expert systems, and machine learning to categorize and label text has been successful in automatically determining whether documents contain sensitive information that must be protected with respect to classification decisions and review and release.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google