State of the Art Natural Language Processing at Scale
State of the Art Natural Language Processing at Scale


Natural language processing is a key component in many data science systems that must understand or reason about the text. Common use cases include question answering, information extraction, sentiment analysis, natural language BI, language modeling, and disambiguation. Building such systems usually requires combining three types of software libraries: NLP annotation frameworks, machine learning frameworks, and deep learning frameworks.

David Talby presents the open-source Spark NLP package for training distributed custom natural language machine-learned pipelines on Apache Spark. The library natively extends Spark ML and includes state-of-the-art deep learning models, language models, and 30+ pre-trained NLP models. The talk walks through the library's goals, design and API's, using Jupyter notebooks that will be made publicly available after the talk. Best practices and industry use cases where the library has been applied will be discussed as well.


David Talby is a chief technology officer at Pacific AI, helping fast-growing companies apply big data and data science techniques to solve real-world problems in healthcare, life science, and related fields. David has extensive experience in building and operating web-scale data science and business platforms, as well as building world-class, Agile, distributed teams. Previously, he was with Microsoft’s Bing Group, where he led business operations for Bing Shopping in the US and Europe and worked at Amazon both in Seattle and the UK, where he built and ran distributed teams that helped scale Amazon’s financial systems. David holds a Ph.D. in computer science and master’s degrees in both computer science and business administration.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google