
Abstract: When machine learning applications are deployed to production, debugging, troubleshooting and monitoring become permanent fixtures for the data scientist. Sophisticated tools enable traditional software engineers to quickly identify and resolve issues, continuously improving software robustness. A cornerstone of the DevOps toolchain is logging, it’s the foundation of testing and monitoring tools. What does logging look like in an ML system?
In this talk, we will show how data logging can be enabled for a machine learning application using an open source library, whylogs. We’ll discuss how something so simple enables testing, monitoring and debugging of the entire data pipeline. We’ll dive deeper into key properties of a logging library we've built that can handle TBs of data, run with a near constant memory footprint, and produce statistically accurate profiles of structured and unstructured data. Attendees will leave the talk equipped with best practices to log and understand their data.
Bio: Bernease Herman is a data scientist at WhyLabs and research scientist at the University of Washington eScience Institute. At WhyLabs, she is building model and data monitoring solutions using approximate statistics techniques. Her academic research focuses on evaluation metrics and interpretable ML with specialty on synthetic data and societal implications. Earlier in her career, Bernease built ML-driven solutions for inventory planning at Amazon. Bernease teaches for the University of Washington Master’s Program in Data Science program and served as chair of the Rigorous Evaluation for AI Systems (REAIS) workshop series in 2020. She is a PhD student at the University of Washington and holds a Bachelor’s degree in mathematics and statistics from the University of Michigan.

Bernease Herman
Title
Data Scientist | WhyLabs | University of Washington eScience Institute
