An Annotation Pipeline for Domain-Specific Dialogue Systems

Abstract: Task-oriented dialogue systems require large volumes of data with both breadth and depth to perform effectively. While many in the industry shy away from gathering human-labeled data with the advent of increasingly powerful unsupervised methods in NLP, high-quality labeled data remains essential to boosting the performance of dialogue systems, particularly in specialized domains like medicine and finance, and for complex linguistic structures like long-distance anaphora and dialogue states.

Obtaining high-quality labeled data to train and validate dialogue models for specialized contexts faces two perennial challenges: quality and scalability. This talk presents an approach to a human annotation workflow at iMerit that addresses both of these concerns, while also yielding additional insights to further enrich and refine dialogue systems. The talk describes how iMerit’s iterative training, annotation, and evaluation process achieves quality and scale at a low-cost through the collaboration of subject-matter experts and skilled labelers.

Bio: Dr. Teresa O’Neill is a Solutions Architect at iMerit specializing in language annotation services. Before joining iMerit, she worked for a decade in academia as an educator and researcher. At iMerit, she leverages her experience as a linguist with both theoretical and applied specializations to build custom human-in-the-loop annotation pipelines for customers with NLP/NLU and e-commerce use cases.