Data, I/O, and TensorFlow: Building a Reliable Machine Learning Data Pipeline
Data, I/O, and TensorFlow: Building a Reliable Machine Learning Data Pipeline

Abstract: 

Building an efficient data pipeline to apply machine learning models in production has been a challenge for many data science practitioners and software engineers. While the model formats have been largely standardized, there is a great variety of data input sources that almost always require customized processing. On top of that is the streaming data inputs. A data pipeline architecture must be carefully implemented for reliable production deployment when data has to be consumed continuously. In TensorFlow 2.0, tf.data has been introduced as a canonical way of data processing for training and inference with tf.keras models. It simplifies data processing for static and streaming data sources, helping a lot for the production deployment of machine learning models.

In this tutorial, we will guide you through hands-on examples of integrating tf.keras model with different data input sources through tf.data in production environments, from simple csv/json files to SQL databases and to cloud data warehouse services such as Google Cloud BigQuery. We will also cover Apache Kafka as a data input to illustrate the streaming data pipeline architecture for continuous data processing with machine learning. As a bonus, attendees will learn the basics of distributed machine learning and its production usage.

● Google Colab;
● Python;
● Tensorflow;
● Tensorflow-io;

TensorFlow docs: https://www.tensorflow.org
TensorFlow GitHub repo: https://github.com/tensorflow/tensorflow
TensorFlow I/O docs: https://www.tensorflow.org/io
TensorFlow I/O GitHub repo: https://github.com/tensorflow/io
https://drive.google.com/open?id=1DP60_RctKs8QrK3N34uDFcDyM8tNy-ti

Bio: 

Yong Tang is the Director of Engineering at MobileIron. His most recent focus is on data processing in machine learning. He is a maintainer and the SIG I/O lead of the TensorFlow project. He received the Open Source Peer Bonus Award from Google for his contributions to TensorFlow and is the author of the Kafka Dataset module in TensorFlow. In addition to TensorFlow, Yong Tang also contributes to many other projects for the open-source community. He is a maintainer of Docker, CoreDNS, and SwarmKit. Yong Tang received his PhD in Computer Science & Engineering at the University of Florida.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google