Advanced NLP: From Essentials to Deep Transfer Learning


Being specialized in domains like computer vision and natural language
processing is no longer a luxury but a necessity which is expected of any data
scientist in today’s fast-paced world! With a hands-on and interactive approach,
we will understand essential concepts in NLP along with extensive hands-on examples to master state-of-the-art tools, techniques and methodologies for actually applying NLP to solve real- world problems. We will leverage machine learning, deep learning and deep transfer learning to learn and solve popular tasks using NLP including NER, Classification, Recommendation \ Information Retrieval, Summarization, Classification, Language Translation, Q&A and Topic Models

Session Outline
Module 1: NLP Essentials
Here we start with the basics of how to process and work with text data and strings. Look at essential components of a NLP pipeline and get started on some of the key components from this pipeline including understanding POS tagging, Named Entity Recognition, Spelling Correction and Text Pre-processing. We will look at traditional approaches as well as newer deep transfer learning based approaches for a few of these components.
Key Focus Areas: Text Pre-processing, NER, POS Tagging, Spelling Corrections

Module 2: Text Representation
Text can't be consumed directly by downstream machine learning and deep learning models since they are at heart math-based models. The key focus of this module will be to cover both traditional statistical based methodologies and newer representation learning based methodologies which use deep learning to represent text data including bag of words, n-grams, word embeddings, universal embeddings and contextual embeddings.
Key Focus Areas: Count-based Representations (Bag of Words, N-grams, TF-IDF), Similarity, Topics, Word Embeddings (Word2Vec, GloVe, FastText), Universal Embeddings, Contextual Embeddings (Transformers)

Module 3: NLP Application (Machine Learning \ Deep Learning)
We will look at several popular applications of NLP in this module and go through hands-on examples. This includes movie recommendation systems using similarity, topic modeling analysis on research papers, summarizing text documents, language translation, text classification and sentiment analysis
Key Focus Areas: Topic Models, Similarity \ Information Retrieval, Summarization (TextRank \ Transformers), Language Translation (seq2seq \ attention), Classification (machine learning & deep learning models)

Module 4: NLP Applications with Deep Transfer Learning
We finally dive into some of the latest and best advancements which have happened in the last few years in the world of NLP, thanks to deep transfer learning. We will cover a deep conceptual understanding of the transformer architecture and look at some hands-on examples of text classification and multi-task NLP using transformers where we look at solving NER, Q&A, sentiment analysis, summarization, translation using effective constructs like the transformers pipeline.
Key Focus Areas: Text Classification (with pre-trained embeddings, universal sentence encoders and transformers), Multi-task NLP with transformer pipelines (sentiment analysis, NER, text generation, summarization, question-answering, translation). Fine-tuning\training transformers (tips \ guidelines)

Background Knowledge
Skills: Basic understanding of Machine Learning, Deep Learning (though we will cover some essentials)
Tools \ Languages: Python, Tensorflow\Keras\PyTorch, Scikit-Learn (Basics)


Anuj is a seasoned Machine Learning leader, having incubated and led multiple ML teams. During his career, he has worked in academia, early stage startups as well as Fortune 100. He has successfully led efforts for building commercially viable products in a wide spectrum of verticals and functions. He has authored over a dozen research papers and patents.

Anuj is a major speaker at various international forums. He has delivered technical talks, bootcamps and participated in panel discussions at prestigious forums like ODSC, Anthill inside, PyData DC, Fifth Elephant, ICDCN, PODC, to name a few. He was Co-editor of “Anthill inside 2018”. He is a well-known name in the Machine Learning ecosystem.