
Abstract: Every day, machine learning is increasingly affecting human lives. Machine learning models are being integrated into software to make credit decisions, sift through resumes, and translate between languages. All of this makes tracking and reproducing ML models even more important.
Knowing how deployed models are trained and on what data usually requires layering a tracking system on top of the ML training library. Once captured this provenance metadata can be used for reporting, experiment tracking and failure analysis. However the layering approach can lead to issues if the tracking is misconfigured or not integrated into all runs. A resulting lack of provenance information can make it difficult to perform analysis & tracking tasks. It also makes it hard to reproduce models either for regulatory reasons (to verify that a model was trained on the right data) or to allow retraining of the model on new or modified data while preserving the hyperparameters and data pipeline.
In this talk we'll discuss our approach to solving the problems of provenance tracking and reproducibility by engineering a machine learning library from the ground up to incorporate first-class notions of provenance and reproducibility, automatically capturing provenance for all ML computations.
Session Outline
Attend this session to learn about:
- Considerations for building provenance into concurrent environments like the JVM
- How to ensure accurate capture of all the information that flows into training a model
- Use cases for provenance information and how to use it to track deployed models
Bio: Adam Pocock is a Machine Learning researcher at Oracle Labs. He's the lead developer of the Tribuo machine learning library, and maintains several other machine learning libraries on the JVM including TensorFlow-Java and ONNX Runtime's Java API. Adam's research has covered several areas of ML & applications, from work on scaling up and parallelizing Bayesian inference, to building multilingual NLP systems. He holds a PhD in Computer Science from the University of Manchester where his research focused on the theoretical underpinnings of feature selection algorithms.

Adam Pocock, PhD
Title
Machine Learning researcher | Oracle
