Data Drift Identification for NLP Models in the Context of AI Governance for Enterprises


Data drift is a crucial challenge for AI models dealing with Natural Language Processing like Named Entity Recognition. Lack of monitoring of Data drift for such NLP models, running in production, can potentially make the model unusable by the model consuming applications in enterprises. Also, lack of ability to monitoring important metric like Data drift can pose a big challenge to run the model in production from AI Governance perspective as it may be against the compliance standards. The Data drift can be identified in variety of ways; namely based on Lemma, Synonyms, Positions, Parts of Speech, or Context. However, from AI Governance perspective implementing the Data drift identification for AI Models deployed in Production is a challenge. The distribution of vectorized word corpus used in Model Development time need not be necessarily available during Model Monitoring time (when model is running in production) because of different contraints and challenged faced in an enterprise. The models may be developed in a different technology platform and monitored in other; there may be legal/compliance restriction in terms of sharing training data with POI information; the binary of corpus distribution may be too large to be shared; and many other similar challenges can make Data drift monitoring of such NLP models very difficult to implement. In this presentation, we shall cover the specific of these challenges in details in the context of enterprise implementation of such models running in production. We shall also cover a working solution addressing those challenges, with a case study of such NLP model in Financial industry, that can be used as a reference to address Data drift monitoring challenges for similar NLP models.


Sourav Mazumder is an IBM Data Scientist Thought Leader and The Open Group Distinguished Data Scientist. Sourav has consistently driven business innovation and values through methodologies and Technologies related to Artificial Intelligence, Data Science and Big Data transpired through his knowledge, insights, experience and influencing skills across multiple industries including Manufacturing, Insurance, Telecom, Banking, Media, Health Care and Retail industries in USA, Europe, Australia, Japan and India. Over the last 10 years, he has influenced key decision makers of several fortune 500 companies to adopt Artificial Intelligence, Data Science, and Big Data related technologies to address complex business needs. Sourav has also consistently provided directions to and successfully led numerous challenging Artificial Intelligence, Data Science and Big Data projects, applying various related methodologies ranging from Descriptive statistics, Probabilistic Modelling, Algorithmic Modelling, Natural Language Processing, etc., to solve critical business problems. Sourav has also successfully partnered with academia within North America, India, South Africa to mentor students and enable them in this field. Sourav has experience and exposure in working with a variety of Artificial Intelligence, Data Science and Big Data related technologies such as Watson Open Scale, Watson Natural Language Processing, Watson Machine Learning, IBM Cloud Pak for Data, Spark, Hadoop, BigSQL, HBase, MongoDb, Solr, System ML, Cognos, R, Python, Scala/Java and using them in projects involving phases from creation of Minimum Viable Product to Productionization at an enterprise level. Sourav is an Open Source enthusiast and contributes to Open Source regularly. Sourav holds patents in the Data and AI space (patent profile Sourav consistently publishes papers/blogs/articles in various industry forums. Sourav is co-author, guest editor and chief editor of multiple books in AI, Data Science and Big Data space ( Sourav is regularly invited to speak in various Industry conferences, like Open Data Science Conference, Spark Summit, IBM Think, Global AI Conference, etc in this subject area. He can be found on Linkedin (

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google