Abstract: "Most companies in industry collect and leverage text data for some part of their business operations. Some have text data at the core of their platform, others utilize it behind the scenes, triaging and responding to support requests and customer feedback.
Clustering and classification should always be in the toolkit as the first techniques to use when dealing with this kind of data. Being some of the easiest to scale in production, their ease of use can quickly help business address a set of applied problems:
- How do you automatically make the distinction between different categories of sentences?
- How can you find sentences in a dataset that are most similar to a given one?
- How can you extract a rich and concise representation that can then be used for a range of other tasks?
- Most importantly, how do you find quickly whether these tasks are possible on your dataset at all?
We will focus on a set of solutions, starting from the simplest effective models, before moving on more subtle solutions.
We will then explore a few methods about interpreting, understanding, and explaining models, to make sure they are capturing relevant information and not noise.
By the end of this session, you will be able to look at your own text data, and quickly prototype ways to understand and leverage it.
Bio: Emmanuel has a profile at the intersection of Artificial Intelligence and Business, having earned an Msc. in Artificial Intelligence, an Msc. in Computer Engineering, and an Msc. in Management from three of France’s top schools. Recently, Emmanuel has worked implementing and scaling out predictive analytics and machine learning solutions for Local Motion and Zipcar. He is currently an AI Program Director and Machine Learning Engineer at Insight, where he has lead dozens of AI Products from ideation to polished implementation.
AI Program Director and Machine Learning Engineer at Insight Data Science