
Abstract: Spark NLP is a Natural Language Processing (NLP) library built on top of Apache Spark ML. It provides simple, performant / accurate NLP annotations for machine learning pipelines that can scale easily in a distributed environment. Spark NLP comes with 1100+ pretrained pipelines and models in more than 192+ languages. It supports nearly all the NLP tasks and modules that can be used seam- lessly in a cluster. Downloaded more than 2.7 million times and experiencing 9x growth since January 2020, Spark NLP is used by 54% of healthcare organizations as the world’s most widely used NLP library in the enterprise.
In this talk, the speaker will go over the basic components and building blocks of Spark NLP and cover some end to end NLP pipelines including the state-of-the-art text classification and named entity recognition models.
Bio: Veysel Kocaman is a Senior Data Scientist and ML Engineer at John Snow Labs and has a decade long industry experience. He is also pursuing his PhD in CS as well as giving lectures at Leiden University (NL) and holds an MS degree in Operations Research from Penn State University. He is affiliated with Google as a Developer Expert in Machine Learning.

Veysel Kocaman, PhD
Title
Sr. Data Scientist | John Snow Labs
