Logging Machine Learning Data with Whylogs: Why Statistical Profiling is the Key to Data Observability at Scale

Abstract: 

When machine learning applications are deployed to production, debugging, troubleshooting and monitoring become permanent fixtures for the data scientist. Sophisticated tools enable traditional software engineers to quickly identify and resolve issues, continuously improving software robustness. A cornerstone of the DevOps toolchain is logging, it’s the foundation of testing and monitoring tools. What does logging look like in an ML system?

In this talk, we will show how data logging can be enabled for a machine learning application using an open source library, whylogs. We’ll discuss how something so simple enables testing, monitoring and debugging of the entire data pipeline. We’ll dive deeper into key properties of a logging library we've built that can handle TBs of data, run with a near constant memory footprint, and produce statistically accurate profiles of structured and unstructured data. Attendees will leave the talk equipped with best practices to log and understand their data.

Bio: 

Bernease Herman is a data scientist at WhyLabs and research scientist at the University of Washington eScience Institute. At WhyLabs, she is building model and data monitoring solutions using approximate statistics techniques. Her academic research focuses on evaluation metrics and interpretable ML with specialty on synthetic data and societal implications. Earlier in her career, Bernease built ML-driven solutions for inventory planning at Amazon. Bernease teaches for the University of Washington Master’s Program in Data Science program and served as chair of the Rigorous Evaluation for AI Systems (REAIS) workshop series in 2020. She is a PhD student at the University of Washington and holds a Bachelor’s degree in mathematics and statistics from the University of Michigan.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google