Training Conversational Agents on Noisy Data


The development of conversational agents that can interact with people in the real world presents many difficult challenges. First, natural language understanding and perception of context must be performed based on continuous data from sensors that often contain a high rate of sensor noise. Second, conversational agents must respond appropriately to natural human behavior that spans vast action spaces, thus data-efficient collection strategies are necessary in order to acquire large amounts of training data. In this talk, I will address both sides of the challenge: (1) using data-efficient strategies during the utterance collection and annotation phases to optimize the trade-off between cost and quality when collecting training data, (2) using data-driven approaches to train and generate behaviors for a conversational agent despite noisy data and lack of training labels. Finally, I will discuss field studies and past research for a conversational humanoid robot to autonomously generate socially appropriate locomotion and speech behaviors that was deployed in the real world to provide guidance and services to people.


Phoebe Liu is a senior data scientist with expertise in robotics and conversational AI. Previously, she was a robotics researcher in Japan, working in Hiroshi Ishiguro Laboratory at Advanced Telecommunications Research Institute International (ATR), where she also earned her PhD at Osaka University at the same time. She has 6+ years of research experience and 3 years of industry experience. Her research focus includes embodied agent, dialogue system, multimodal data annotation and behavior generation, topic modeling, learning from demonstration, style transfer, explainable AI, proactiveness, personalization, and artificial curiosity and has published many first author papers in robotics conferences and journals.