Deep Learning for Natural Language Processing Using Apache Spark and TensorFlow

Abstract: When interacting with customers, being able to extract relevant communications information in real-time is critical for success. This presentation will illustrate how Salesforce is using Apache Spark and TensorFlow to monitor customer activities in real-time and surface insights. Long Short-Term Memory (LSTM) networks have proven to be an effective technology to achieve state-of-the-art results on a variety of Natural Language Processing (NLP) tasks. It naturally captures the temporal information and the semantic meanings of human language, when coupled with word embedding models.

LSTM networks can be readily built using any of today’s deep learning packages. However, most popular deep learning packages use Python as their native language, which presents a real challenge in productizing such technology, with production environments often relying on other technology stack.

In this talk, we will explain how to build an LSTM classifier using the TensorFlow framework, and combine the deep learning apparatus of TensorFlow with the distributed data processing power of Spark. We will discuss how to reuse existing Scala data preparation libraries in TensorFlow training pipeline and unify them into a single Notebook and discuss strategies for scoring at runtime. We will show that pre-trained Word2Vec embeddings reduce the demand for large volume of labeled data. The end result is a fast and accurate machine learning model for text classification that can be integrated into a structured streaming production environment.

Bio: Alexis Roos is director of data science and machine learning at salesforce where he is leading a team of data scientists and engineers delivering Intelligent services for Einstein platform. Alexis has over 20 years of engineering and management experience with the last 6 years focused on large scale (10s of TBs of data and billion records) data science & engineering including data analytics, entity resolution, distributed graph processing, machine learning, natural language processing and deep learning. Alexis started coding as a teen, became an avid 68000 programmer and pursued a MS in CS & Cognitive Sciences when AI was about expert systems and is constantly learning and experimenting about AI.

Open Data Science Conference