State-of-the-art Text Classification with ULMFiT
State-of-the-art Text Classification with ULMFiT


Transfer learning has revolutionized image analysis tasks such as object detection and recognition by allowing users to start with high-quality deep learning models that can be fine-tuned for their particular use case. However, transfer learning has not yet been widely applied to other tasks. The Universal Language Model Fine-Tuning for Text Classification (ULMFiT) approach has shown the success of applying this method to NLP. ULMFiT has improved on the previous NLP state-of-the-art by 20% or more.

ULMFiT takes the same paradigm used for image tasks and applies it to text. Analysts start with a language model trained on a large corpus, such as Wikipedia. This model develops a basic understanding of how language works, then analysts fine-tune the model on their specific corpus before training the text classifier. A language model can be trained and applied to any language.

In this session Matt will provide an overview of ULMFiT before walking through a use case in which he used Amazon SageMaker to train ULMFiT on hand-labelled quotes sourced from thousands of news articles. After demonstrating the success of this method, Matt will discuss a novel approach that further improves upon ULMFiT by up to 10% by incorporating article-level metadata such as publication name, author name, and speaker. The session will end with a short discussion of how this model has been implemented to increase the efficiency of the analysts who are manually tagging quotes.


Matt Teschke is an applied machine learning researcher who has demonstrated experience developing solutions to complex problems for government and commercial customers. As leader of Novetta’s Machine Learning Center of Excellence, Matt directs ML research in NLP, object detection, face recognition, entity resolution, and supervised learning. He is interested in applying open source and cloud technologies to develop practical and innovative solutions for customers.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google