
Abstract: Advances in conversational AI depend on increasingly sophisticated models to handle more complex and context-dependent linguistic structures. To support these advances, high-quality and high-volume labeled ground truth data remains as essential as ever in training and validating dialogue models. Developing a successful pipeline for data annotation depends on an effective partnership with an annotation team. This talk draws on iMerit's experiences as a natural language annotation partner, laying out our approach to five key processes in the annotation pipeline: (i) exchange of expertise, (ii) annotator training, (iii) workflow design, (iv) feedback cycle, and (v) quality evaluation. We discuss how to set priorities, question assumptions, align requirements with priorities, and evaluate outcomes within each of these processes, in order to build a robust annotation pipeline at scale.
Bio: Dr. Teresa O’Neill is a Solutions Architect at iMerit specializing in language annotation services. Before joining iMerit, she worked for a decade in academia as an educator and researcher. At iMerit, she leverages her experience as a linguist with both theoretical and applied specializations to build custom human-in-the-loop annotation pipelines for customers with NLP/NLU use cases.

Teresa O'Neill, PhD
Title
Solutions Architect | iMerit
