From Document Vector-Models to Word-Embeddings – Natural Language Processing and Deep Learning

Abstract: Deep learning models, especially convolutional neural networks (CNNs), are particularly effective in many natural language processing (NLP) tasks. We begin this session with the classical document vector model coupled with Naïve Bayes classifiers to introduce basic concepts of natural language processing (NLP) and to create a performance baseline to evaluate our deep learning models against. We then discuss word-embeddings as a more semantic way of representing documents. Google’s word2vec and Stanford’s GLOVE systems are prime examples here. Convolutional neural networks can easily integrate this semantic information as part of their network architecture providing substantial performance increases over classical approaches. However, statistically significant performance increases over the traditional Naïve Bayes approach can be achieved by introducing a trainable embedding layer into convolutional neural networks freeing the network from the fixed interpretations in word2vec and GLOVE to discover its own embeddings relevant to the current task at hand. “Fake News” detection is our running example throughout this session.

Bio: Lutz is a veteran software professional, computer scientist, and author with more that 30 years of experience. Research interests include machine learning, data mining, and Big Data as well as computational logic and programming language semantics and implementation. Lutz has 50+ publications in the areas of programming languages, machine learning, and data mining including a book on machine learning using support vector machines (www.wiley.com/WileyCDA/WileyTitle/productCd-0470371927.html).

As a software professional in the private sector Lutz has assembled and lead teams from project inception to successful release in many capacities including project lead and VP of development. Most notably Lutz lead a team at Thinking Machines Corporation implementing high-performance parallel programming languages for their custom hardware.

In his spare time Lutz advocates for animal welfare and is involved with horse rescue. He also plays, builds, and restores string instruments. You can follow his instrument building adventures under his Icelandic alter-ego Hilmar Hansson here: http://hilmarhanssoninstruments.blogspot.com

Lutz has a PhD in computer science from the University of Oxford, UK, a masters degree in CS from the University of New Hampshire, and a Bachelor of Science in electrical engineering from the University of Rhode Island.