MLOps Spotlight: Scaling NLP Pipelines at IHS Markit


The data science team at IHS Markit has been hard at work building sophisticated NLP pipelines that work at scale using the Iguazio MLOps platform and open source MLRun framework. Today they will share their journey and provide advice for other data science teams looking to:

Ingest, prepare, classify and index structured and unstructured data (in this case, PDFs and Images)
Handle terabytes of data in hours, not months
Work in one unified research and production environment to make deployment seamless
Enable CI/CD for ML
Run complex models that make unstructured data searchable (including computer vision)
Allow for sharing and reuse of components across projects and teams
Utilize auto-scaling serverless functions to abstract away infrastructure complexities
Build rapidly, iterate faster and focus on the business logic and not the underlying infrastructure

Nick (IHS Markit) and Yaron (Iguazio) will share their approach to automating the NLP pipeline end to end. They’ll also provide details on leveraging capabilities such as Spot integration and Serving Graphs to reduce costs and improve the data science process.


Yaron Haviv is a serial entrepreneur who has been applying his deep technological experience in data, cloud, AI and networking to leading startups and enterprise companies since the late 1990s. As the co-founder and CTO of Iguazio, Yaron drives the strategy for the company’s data science platform and leads the shift towards real-time AI. He also initiated and built Nuclio, a leading open source serverless platform with over 3,400 Github stars and MLRun, Iguazio’s open source MLOps orchestration framework.

Prior to co-founding Iguazio in 2014, Yaron was the Vice President of Datacenter Solutions at Mellanox (now NVIDIA), where he led technology innovation, software development and solution integrations. He was also the CTO and Vice President of R&D at Voltaire, a high-performance computing, IO and networking company which floated on the NYSE in 2007. Yaron is an active contributor to the CNCF Working Group and was one of the foundation’s first members. He presents at major industry events and writes tech content for leading publications including TheNewStack, Hackernoon, DZone, Towards Data Science and more.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google