Abstract: The lifecycle of any machine learning model, regular or deep, consists of (a) the pre-processing/transformation/augmenting of data (b) the training of the model with different hyper-parameter values/learning rates (c) the computing of results on new data/test sets. Whether you are using transfer learning, or a from-scratch model, this process requires a large amount of computation, management of your experimental process, and the quick perusal of results from your experiment. In this workshop, we will learn how to combine off-the shelf clustering software such as kubernetes and dask, with learning systems such as tensorflow/pytorch/scikit-learn, on cloud infrastructure such as AWS/Google Cloud/Azure to construct a machine-learning system for your data science team. We'll start with an understanding of kubernetes, move onto analysis pipelines in sklearn and dask, finally arriving at kubeflow. Participants should install minikube on their laptops (https://kubernetes.io/docs/tasks/tools/install-minikube/), and create accounts on the Google Cloud.

Bio: Rahul Dave is a lecturer in Bayesian Statistics and Machine Learning at Harvard University, and consults on the same topics at LxPrior. He holds a Ph.D. from the University of Pennsylvania in Computational Astrophysics, and has programmed device drivers for telescopes, bespoke databases for astrophysical data, and machine learning systems in various fields. His new startup, univ.ai, helps students and companies upgrade the skill and understanding of both their developers and managers for this new AI driven world, by providing both corporate training and consulting.