Abstract: In this talk, I will demonstrate how to train, optimize, and serve distributed machine learning models across various environments including the following:
1) Local Laptop
2) Kubernetes Cluster (Running Anywhere)
3) AWS's New SageMaker Service (Announced Last Week @ Re-invent)
I'll also present some post-training model-optimization techniques to improve model serving performance for TensorFlow running on GPUs. These techniques include 16-bit model training, neural network layer fusing, and 8-bit weight quantization.
Lastly, I'll discuss alternate runtimes for TensorFlow on GPUs including and TensorFlow Lite and Nvidia's TensorRT.
Bio: Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, ""High Performance TensorFlow in Production.""
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.