Zen and the Art of Model Maintenance

Abstract: Rapid advances in technology combined with the enthusiasm of the open-source community have created opportunities for incredible strides in AI & machine-learning capabilities. But as we push bigger and smarter models into production faster, the need to understand the performance of those engines becomes even more important, especially when there are competing solutions in play, often designed in dissimilar environments. In many scenarios, real performance data is available only after the model has long seen its last sunset.

Addressing this gap in performance monitoring can be daunting, especially when accuracy in production is where it counts. There is no magic bullet that fits every situation. Factors that make it particularly challenging include the format and delivery of the raw data (streaming, images, categorical, etc.), intricacy of the features extracted (text vectorization, tensor conversion, risk tables, etc.), complexity of the algorithms (anomaly detection, deep learning, etc.), reason codes emitted and model decisions. As a result, understanding the business impact of your scoring engine requires a non-trivial, bespoke approach.

During this discussion, we will show you open-source-driven methods to identify and deal with signals throughout the modeling process that have proven to be early indicators of model degradation and provide some architecture diagrams + code repositories you can easily adapt to your solutions. A subset of the scenarios we will cover is as follows:
1. Identifying suspicious changes to your input data
2. Monitoring distributions of the features extracted from your input data
3. Appraising shifts in model scoring distributions
4. Examining the model payload (decisions and reason codes) for signs of trouble

Bio: Joe Blue is Director of Global Data Science for MapR with 20 years of hands-on experience including concentrations in financial services & healthcare. He and his team help MapR customers think bigger - by leveraging the converged platform to discover from more data via machine-learning, then delivering value from those insights to the business. He also actively presents in the big data & machine learning communities via conferences, meetups and webinars. Prior to MapR, he has built solutions for United Healthcare, HNC Software, ID Analytics and Lexis Nexis.

Open Data Science Conference