Training Sessions

– Taught by World-Class Data Scientists –

Learn the latest data science concepts, tools and techniques from the best. Forge a connection with these rockstars from industry and academic, who are passionate about molding the next generation of data scientists.

Highly Experienced Instructors

Our instructors are highly regarded in data science, coming from both academia and notable companies.

Real World Applications

Gain the skills and knowledge to use data science in your career and business, without breaking the bank.

Cutting Edge Subject Matter

Find training sessions offered on a wide variety of data science topics from machine learning to data visualization.

ODSC Training Includes

Form a working relationship with some of the world’s top data scientists for follow up questions and advice.

Additionally, your ticket includes access to 50+ talks and workshops.

High quality recordings of each session, exclusively available to premium training attendees.

Equivalent training at other conferences costs much more.

Professionally prepared learning materials, custom tailored to each course.

Opportunities to connect with other ambitious like-minded data scientists.

Optimizing, Deploying, and Scaling Distributed Tensorflow AI Models in a Hybrid-Cloud Environment with Chris Fregly

Apache Spark is becoming increasing popular for big data science projects. Spark can handle large volumes of data significantly faster and easier than other platforms, and it includes tools for real-time processing, machine learning, and interactive SQL. It is quickly being adopted by industry to achieve business objectives that need data and data science at scale.

Bio

Chris Fregly is a Research Scientist at PipelineIO – a Machine Learning and Artificial Intelligence Startup in San Francisco.
Chris is an Apache Spark Contributor, Netflix Open Source Committer, Founder of Advanced Spark and TensorFlow Meetup, and Author of the upcoming O’Reilly Video Series, “High Performance Distributed Tensorflow in Production.”
Previously, Chris was a Distributed Systems Engineer at Netflix, Data Solutions Engineer at Databricks, and a Founding Member of the IBM Spark Technology Center in San Francisco.

Curriculum

In this workshop, we will optimize, deploy, and scale various Tensorflow Models in a distributed, hybrid-cloud production environment.  We will use 100% open source tools including Tensorflow, Jupyter Notebook, Docker, Kubernetes, Prometheus, Grafana, and NetflixOSS.
First, we will focus on post-training model optimization techniques such as batch normalization folding, weight rounding, and 8-bit quantization.  Next, we will explore Tensorflow’s XLA JIT and AOT static compiler features.  Last, we will serve, load-test, and monitor our optimized models in a highly-tuned Tensorflow Serving and NetflixOSS Microservice runtime environment.
All code and Docker Images are 100% open source.  Links to GitHub and DockerHub are available at http://pipeline.io.

Prerequisites

All code and Docker Images are 100% open source.  Links to GitHub and DockerHub are available at http://pipeline.io.

DS 101 with Todd Cioffi

Curious about Data Science? Self-taught on some aspects, but missing the big picture? Well you’ve got to start somewhere and this session is the place to do it. This session will cover, at a layman’s level, some of the basic concepts of data science. In a conversational format, we will discuss: What are the differences between Big Data and Data Science – and why aren’t they the same thing? What distinguishes descriptive, predictive, and prescriptive analytics? What purpose do predictive models serve in a practical context? What kinds of models are there and what do they tell us? What is the difference between supervised and unsupervised learning? What are some common pitfalls that turn good ideas into bad science? During this session, attendees will learn the difference between k-nearest neighbor and k-means clustering, understand the reasons why we do normalize and don’t overfit, and grasp the meaning of No Free Lunch.

Bio

For more than 20 years, Todd has been highly respected as both a technologist and a trainer. As a tech, he has seen that world from many perspectives: “data guy” and developer; architect, analyst and consultant. As a trainer, he has designed and covered subject matter from operating systems to end-user applications, with an emphasis on data and programming. As a strong advocate for knowledge sharing, he combines his experience in technology and education to impart real-world use cases to students and users of analytics solutions across multiple industries. He is a regular contributor to the community of analytics and technology user groups in the Boston area, writes and teaches on many topics, and looks forward to the next time he can strap on a dive mask and get wet. Todd is the Senior Data Architect for Enterprise Data Management at MassIT, and is the former Director of Boston Operations for the Open Data Science Conference.

 

Curriculum

Curious about Data Science? Self-taught on some aspects, but missing the big picture? Well you’ve got to start somewhere and this session is the place to do it. This session will cover, at a layman’s level, some of the basic concepts of data science. In a conversational format, we will discuss: What are the differences between Big Data and Data Science – and why aren’t they the same thing? What distinguishes descriptive, predictive, and prescriptive analytics? What purpose do predictive models serve in a practical context? What kinds of models are there and what do they tell us? What is the difference between supervised and unsupervised learning? What are some common pitfalls that turn good ideas into bad science? During this session, attendees will learn the difference between k-nearest neighbor and k-means clustering, understand the reasons why we do normalize and don’t overfit, and grasp the meaning of No Free Lunch.

 

Prerequisites

TBD

Bayesian Statistics in Python with Renowned Author & Professor Allen Downey

Bayesian statistical methods are powerful, versatile tools that provide guidance for decision-making under uncertainty. Data scientists who know how to apply Bayesian methods can craft practical solutions to important problems in science, engineering, and business. But for many people it’s hard to get started, and this workshop will help remove the roadblocks imposed by un-intuitive mathematical representations. As one of the world’s foremost experts, attending a workshop by Alan Downey is a highly rewarding experience.

Bio

Allen Downey is a Professor of Computer Science at Olin College of Engineering in Needham, MA. He is the author of several books related to computer science and data science, including Think Python, Think Stats, Think Bayes, and Think Complexity. Prof Downey has taught at Colby College and Wellesley College, and in 2009 he was a Visiting Scientist at Google. He received his Ph.D. in Computer Science from U.C. Berkeley, and M.S. and B.S. degrees from MIT.

Curriculum

Bayesian statistical methods are becoming more common and more important, but there are not many resources to help beginners get started. People who know Python can use their programming skills to get a head start. I will present simple programs that demonstrate the concepts of Bayesian statistics, and apply them to a range of example problems. Participants will work hands-on with example code and practice on example problems. Attendees should have at least basic level Python and basic statistics. If you learned about Bayes’s theorem and probability distributions at some time, that’s enough, even if you don’t remember it! Attendees should bring a laptop with Python and matplotlib. You can work in any environment; you just need to be able to download a Python program and run it. I will provide code to help attendees get set up ahead of time.

Prerequisites

TBD

Intro to DS with Python with Harish Krishnamurthy

The great debate around “can you be a data scientist without being a coder” still goes on. However the consensus amongst employers is “no” and one of the most popular responses to that question is Python. RefactorEd team ( From collaberry) with over 20 years of experience in training, will show you how to get started with DS coding in Python

Bio

Harish Krishnamurthy is the Lead Data Scientist at RefactorED. He has experience designing and teaching data science at General Assembly. Harish interests are in the broad areas of AI/Machine Learning that include, but not limited to, Predictive Modeling, Time Series Analysis, Probabilistic and Bayesian methods, Neural Nets, Clustering and ETL. He has experience working at various research labs such as BBN Technologies, Mitsubishi Electric Research Labs (MERL) and Sclumberger Doll Research and industries such as Nokia and MassMutual Financial Group. His Masters Thesis was in the area of system combination, speech recognition at BBN and Northeastern University, Communication and Digitial Signal Processing group. He has experience working with large data sets from the days before big data tools existed, to using abstractions such as Spark. He is also involved in teaching Data Science and helping several startups identify, solve complex problems in the larger domain of ML/Data Science.

Curriculum

TBD

 

Prerequisites

TBD

Introduction to Machine Learning with Andreas Mueller, core developer of scikit-learn

The resurging interest in machine learning is due to multiple factors including growing volumes and varieties of data,and cheaper computational processing. Thus making it possible to quickly and automatically produce models that can analyze bigger, more complex data and deliver faster, more accurate results on a very large scale. scikit-learn (http://scikit-learn.org/) has emerged as one of the most popular open source machine learning toolkits, now widely used in academia and industry.

Bio

Andreas Mueller received his MS degree in Mathematics (Dipl.-Math.) in 2008 from the Department of Mathematics at the University of Bonn. In 2013, he finalized his PhD thesis at the Institute for Computer Science at the University of Bonn. After working as a machine learning scientist at the Amazon Development Center Germany in Berlin for a year, he joined the Center for Data Science at the New York University in the end of 2014. In his current position as assistant research engineer at the Center for Data Science, he works on open source tools for machine learning and data science. He is one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, for several years, and has authored and contributed to a number of open source projects related to machine learning.

Curriculum

This workshop will cover basic concepts of machine learning, such as supervised and unsupervised learning, cross-validation and model selection, and how they map to programming concepts in scikit-learn. Andreas will demonstrate how to prepare data for machine learning, and go from applying a single algorithm to building a machine learning pipeline. We will cover the trade-offs of learning on large datasets, and describe some techniques to handle larger-than-RAM and streaming data on a single machine.

Prerequisites

TBD

Advanced scikit-learn with Andreas Mueller, core developer of scikit-learn

The resurging interest in machine learning is due to multiple factors including growing volumes and varieties of data,and cheaper computational processing. Thus making it possible to quickly and automatically produce models that can analyze bigger, more complex data and deliver faster, more accurate results on a very large scale. scikit-learn (http://scikit-learn.org/) has emerged as one of the most popular open source machine learning toolkits, now widely used in academia and industry.

scikit-learn provides easy-to-use interfaces in Python to perform advanced analysis and build powerful predictive models.

Bio

Andreas Mueller received his MS degree in Mathematics (Dipl.-Math.) in 2008 from the Department of Mathematics at the University of Bonn. In 2013, he finalized his PhD thesis at the Institute for Computer Science at the University of Bonn. After working as a machine learning scientist at the Amazon Development Center Germany in Berlin for a year, he joined the Center for Data Science at the New York University in the end of 2014. In his current position as assistant research engineer at the Center for Data Science, he works on open source tools for machine learning and data science. He is one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, for several years, and has authored and contributed to a number of open source projects related to machine learning.

Curriculum

TBD

Prerequisites

TBD

Intro to DS with R with Dan Becker & Zachary Deane-Mayer

Bio

Dan started studying how people learn new skills at his first job after college, as a whitewater kayak instructor. He subsequently taught statistics while completing a PhD in Econometrics, and he has been helping people learn machine learning since finishing in 2nd place in Kaggle’s Heritage Health Prize.

Zach is fascinated by predicting the future, and spends all of his free time competing in predictive modeling competitions. He is currently one of top 500 data scientists on Kaggle, and took 9th place in the Heritage Health Prize as part of the Analytics Inside team.

Curriculum

TBD

Prerequisites

TBD

Machine Learning Fundamentals, Building models and applying them to real data with Udacity

Bio

Presented By: Arpan Chakraborty, Kendall Robertson & Adarsh Nair

Arpan Chakraborty: Arpan likes to find computing solutions to everyday problems. He obtained his PhD from North Carolina State University, focusing on biologically-inspired computer vision, and applying it to research areas ranging from robotics to cognitive science. ​At Udacity, he works with partners from both academia and industry to build practical artificial intelligence and machine learning courses. Arpan enjoys exploring the outdoors through hiking and backpacking.

Curriculum

In this session, we will give a fun, conceptual, and hands-on overview of Machine Learning. We will focus on two aspects: (1) The core fundamentals of machine learning, and (2) a hands on approach to applying it. We will focus on several algorithms, including Neural Networks, Support Vector Machines, Decision Trees, and Naive Bayes. Then we will study the testing framework in machine learning, the metrics to evaluate a model’s performance, and several techniques to improve these models. We will show how to develop these techniques in Python, more specifically, Pandas and Scikit-learn. At then end, you’ll have the chance to apply the knowledge you’ve learned on two possible projects: One that builds a spam detector using the Naive Bayes classifier, and another one that analyzes census data using several different machine learning algorithms

Prerequisites

TBD