Automating Machine Learning Lifecycle with Kubeflow

Abstract: During the workshop we are going to build and automate consequent stages of machine learning lifecycle, starting with data preparation and up to the model maintenance in production using Kubeflow, a machine learning toolkit for Kubernetes, and Hydrosphere.io, an open source machine learning models management platform. 

You will learn how to execute and automate the following steps:  
- Data preparation — perform data gathering and transformation to use it further for training;
- Model training — train a new model upon a prepared data, using a predefined model architecture and save model’s artifacts;
- Model cataloguing — extract model’s metadata from the trained model artifacts and upload them to Hydrosphere.io;
- Model deployment — deploy an uploaded model to production and expose REST, gRPC and Kafka endpoints;
- Integration testing — perform integration tests on your model on recent production traffic as well as on gathered edge cases; 
- Model monitoring — supply deployed model with monitoring services to watch its behaviour in production environment;
- Model maintenance — repeatedly fine-tune your model with the most relevant production data; 
- Configuring ML pipelines, once it’s done properly, will save a lot of time and effort in ML engineers’ daily routines. Come and see the best practices we’ve gained working with our customers.

We will provide sandboxes within our AWS infrastructure, for participants to follow the workshop course.
Prerequisites required to take a productive part:
Docker (https://docs.docker.com/)
Kubectl (https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl)
pip install kfp==0.1.11 Pillow==5.2.0 numpy==1.16.2 hs>=2.0.0rc6 requests==2.21.0 scikit-learn==0.20.2

Bio: Stepan Pushkarev is a CTO at Hydrosphere.io. His background is in engineering of data- and AI/ML platforms. He has spent last couple of years building continuous delivery and monitoring tools for machine learning applications as well as designing streaming data platforms. He works closely with data scientists to make them productive in their daily operations and efficient in delivering the value.