Abstract: Advances in Natural Language Processing (NLP) over the last year have been fundamentally changing the way we work with text based data. We have seen the release of Google’s BERT and OpenAI’s GPT-2. Hugging Face’s Transformers library has democratized Natural Language Understanding (NLU) and Natural Language Generation (NLG) through. While these libraries provide access to powerful capabilities, it can be challenging for many users to figure out how to get started and apply them to their own datasets.
To address this challenge, Novetta has developed an intuitive guide structured as an NLP task framework that drastically lowers the barrier to entry for developers to use these advanced capabilities. This high-level guide enables users to take advantage of open pre-trained language models to fine-tune models for text classification, question answering, entity extraction, and part-of-speech tagging. The sequence of tutorials will provide quick and easy access to a wide variety of embedding schemes for downstream use, such as in recommendation systems. The ability to stand each of these tasks up as a service for easy integration into existing workflows and applications would be fast and straightforward.
In this hands-on workshop we will use Flair and Transformers to:
- Provide an overview of recent advances in NLP
- Install Flair and Transformers
- Demonstrate our approach to initially setup and build these advanced NLP capabilities
- Train a simple text classifier
- Train a simple Question Answering (QA) model
- Combine QA and NER to develop a custom application to automatically build visual timelines on documents by asking predefined questions on dates, places and people found in the document
- Leverage Kubernetes to deploy the application as a scalable service in the cloud.
Bio: Andrew Chang is an Applied Machine Learning Researcher in Novetta’s Machine Learning (ML) Center of Excellence. Andrew is a graduate from Carnegie Mellon University who has a focus on researching state of the art machine learning models and rapid prototyping ML technologies and solutions across the scope of customer problems. He has an interest in open source projects and research in natural language processing, geometric deep learning, reinforcement learning, and computer vision. Andrew is the author and creator of NovettaNLP.