Transfer Learning in NLP
Transfer Learning in NLP


Transfer learning enables leveraging knowledge acquired from related data to improve performance on a target task. The advancement of deep learning and large amount of labelled data such as ImageNet has made high performing pre-trained computer vision models possible. Transfer learning, in particular, fine-tuning a pre-trained model on a target task, has been a far more common practice than training from scratch in computer vision.

In NLP, however, due to the lack of models pre-trained on large corpus, the most common transfer learning technique had been utilizing pre-trained word representations. These word embeddings are used as the first layer of the model on the new dataset, and still require training from scratch with large amount of labeled data to obtain good performance. Since 2018, thanks to the various large language models (ULMFiT, OpenAI GPT, BERT family, etc) that have been made available, transfer learning has become a new paradigm in NLP and new state of the art results on many NLP tasks have been achieved.

In this session we'll learn the different types of transfer learning, the architecture of these pre-trained language models, and how different transfer learning techniques can be used to solve various NLP tasks. In addition, we’ll also show a variety of data augmentation techniques and how you can achieve state of the art result on a tiny dataset of 30 examples using transfer learning.


Joan Xiao is a Principal Data Scientist at Linc Global. In her role, she leads research innovation and applies novel technologies to a broad range of real word problems. Previously she led the data science team at H5, a leading data search and analytics service company in e-Discovery industry. Prior to that, she led a Big Data Analytics team at HP.
Joan received her Ph.D in Mathematics and MS in Computer Science from University of Pennsylvania.