Abstract: In model serving, two important decisions are when to retrain the model and how to efficiently retrain it. Having one fixed model during the entire often life-long inference process is usually detrimental to model performance, as data distribution evolves over time, resulting in a lack of reliability of the model trained on historical data. It is important to detect drift and retrain the model in time. We present an ensemble drift detection technique utilizing three different signals to capture data and concept drifts. In a practical scenario, ground truth labels of samples are received after a lag in time, which we consider appropriate. Our framework automatically decides what data to use to retrain based on the signals. It also triggers a warning indicating a likelihood of drift.
Model training in serving is not a one-time task but an incremental learning process. We address two challenges of life-long retraining: catastrophic forgetting and efficient retraining. To solve these two issues, we design a retraining model that can select important samples and important weights utilizing multi-armed bandits. To further address forgetting, we propose a new regularization term focusing on synapse and neuron importance.
Only a significant minority of companies unlock the true potential of AI as trained models accumulate dust due to challenges in MLOps. Serving reliable AI predictions to customers involves cost, effort, and planning to set up a continuous deployment pipeline. MLOps for Deep Learning demands a carefully crafted deployment pipeline. We discuss our open-source project which is a robust continuous deployment pipeline by integrating our unique drift detection and model retrain algorithms for serving DL models. We show how to efficiently deploy, monitor, and maintain DL models in production using our solution which is a Kubernetes native POC solution.
Bio: Diego Klabjan is a professor at Northwestern University, Department of Industrial Engineering and Management Sciences. He is also Founding Director, Master of Science in Analytics, and the Deep Learning Lab. His expertise is focused on data science and deep learning with a concentration in finance, insurance, and healthcare. Professor Klabjan has led projects with large companies such as The Chicago Mercantile Exchange Group, Intel, General Motors and many others, and he is also assisting numerous start-ups with their analytics needs. He is also a founder of Opex Analytics.