Building a ML Serving Platform at Scale for Natural Language Processing


xNLP driven AI is a fast-growing technology domain with diverse applications in customer engagement, employee collaboration, marketing, and social media. Does having an accurate model by itself mean, that we have a successful product, service, or solution? No! The most difficult phase begins after that. We still have the following challenges to solve. How do we Create a sellable service around this model? How to Scale this service to handle millions of inference requests, in a cost-effective manner? How to build automated deployment pipelines for software and models? How to add security, privacy, manageability, and observability for the service? How to track model drift and analyze model performance? In this session, we will discuss these questions and explore answers. We will go through the unique challenges that NLP serving poses and the solutions and best practices to overcome them.

Session Outline
1. Challenges in serving NLP at Scale: Discuss the various use cases for NLP serving. Iterate over the unique challenges that NLP poses and how they impact serving performance and scalability.
2. Building Blocks for NLP Serving: Discuss the various building blocks for NLP Serving, including APIs, Streaming data, synchronous and asynchronous inference models, and data archival and querying.
3. Creating an architecture for Serving: Assemble the building blocks to create an end-to-end architecture. Discuss possible bottlenecks and resolutions.
4. Additional Considerations: Discuss additional considerations like security, privacy, manageability, and observability of the serving platform.

Background Knowledge
ML, NLP, Software Architecture


Kumaran Ponnambalam is a seasoned veteran in everything data, with a reputation for delivering high-performance database and SaaS applications and currently specializing in leading Big Data Science and Engineering efforts.