Premium Training Sessions

– Taught by World-Class Data Scientists –

Learn the latest data science concepts, tools and techniques from the best. Forge a connection with these rockstars from industry and academic, who are passionate about molding the next generation of data scientists.

Highly Experienced Instructors

Our instructors are highly regarded in data science, coming from both academia and notable companies.

Real World Applications

Gain the skills and knowledge to use data science in your career and business, without breaking the bank.

Small Class Sizes

Small individually focused hands-on training sessions, designed for maximum learning and exposure.

Premium Training Includes

Form a working relationship with some of the world’s top data scientists for follow up questions and advice.
Additionally, your ticket includes access to 50+ talks and workshops.
High quality recordings of each session, exclusively available to premium training attendees.
Equivalent training at other conferences costs much more.
Professionally prepared learning materials, custom tailored to each course.
Opportunities to connect with other ambitious like-minded data scientists.

Building Interactive Visualizations in Python with Tim Hogan

Bio

Tim is a Business Technology Consultant, with a focus on Business Intelligence. He helps companies leverage digital applications to enhance operations and better serve their customers. His work spans from ideation facilitation and technology roadmaps, to complex implementations. Tim’s past projects range from manufacturing in the government sector (U.S. Treasury, DOT) to supply chain operations for major retailers (Intel, Wegmans, MillerCoors, Tom’s Shoes). Cross-domain applications, alternative business models, and automation are of special interest.When Tim is not playing with the 14 virtual machines he has on his computer, he can be seen (or not seen) traveling and skiing.

Curriculum

Coming Soon!

Prerequisites

TBD

Natural Language Processing using Deep Learning with Udacity

Bio

Presented by: Arpan Chakraborty, Kendall Robertson & Adarsh Nair


Arpan Chakraborty: Arpan likes to find computing solutions to everyday problems. He obtained his PhD from North Carolina State University, focusing on biologically-inspired computer vision, and applying it to research areas ranging from robotics to cognitive science. ​At Udacity, he works with partners from both academia and industry to build practical artificial intelligence and machine learning courses. Arpan enjoys exploring the outdoors through hiking and backpacking.

Curriculum

In this session, you will gain a conceptual understanding of Natural Language Processing (NLP) techniques and learn how to apply them through hands-on projects. You will begin with preprocessing steps necessary for parsing unstructured text data. You will then transform text using basic representations such as bag-of-words and TF-IDF, and train a Naive Bayes classifier to detect spam emails. Finally, you will learn to transform text using more advanced representations such as Word2Vec, tSNE and LDA, and explore how you can use them for topic modeling. The projects will be implemented in Python, and libraries like Scikit-Learn and NLTK, and starter code will be provided through GitHub.

Prerequisites

TBD

Machine Learning in R with Jared Lander, Founder at Lander Analytics

R is often considered the original lingua franca of data science in terms of programming. As more languages are competing for the title, R maintains a large and passionate following. In fact, one of the main strengths of R is its huge community that provides open source user-contributed packages (CRAN) for a wide range of data science models and tools. Combining all this with a very active user support group, R promises to remain popular for the foreseeable future.

Bio

Jared Lander is theChief Data scientist at Lander Analytics, Columbia Professor, Author of R for Everyone and Organizer of the World’s Largest R Meetup.

Curriculum

Modern statistics has become almost synonymous with machine learning; a collection of techniques that utilize today’s incredible computing power. This course focuses on the available methods for implementing machine learning algorithms in R, and will examine some of the underlying theories behind the curtain, covering the Elastic Net, Decision Trees and cross-validation. Attendees should have a good understanding of linear models and classification and should have R and RStudio installed, along with the `glmnet`, `rpart`, `rpart.plot`, `boot`, `ggplot2` and `coefplot` packages. Elastic Net: Learn about penalized regression with the Lasso and Ridge, Fit models with `glmnet`, Understand the coefficient path, View coefficients with `coefplot`. Decision Trees: Learn how to make classifications (and regression) using recursive partitioning, Fit models with `rpart.`, Make compelling visualizations with `rpart.plot`. Cross-Validation: Learn the reasoning and process behind cross-validation, Cross-validate glm models with `cv.glm`

Prerequisites

TBD

Modeling in R with Jared Lander, Founder at Lander Analytics

At one point the open source language, R, was considered the lingua franca for data science in terms of programming. As more languages competing for that title, R still has a very passionate following. In fact, one of the main strengths of R is its huge community that provides open source user-contributed packages (CRAN), documentation and very active user support group. R packages are a collection of R functions and data that make it easy to immediately get access to the latest techniques and functionalities without needing to develop everything from scratch yourself.

Bio

Jared Lander is theChief Data scientist at Lander Analytics, Columbia Professor, Author of R for Everyone and Organizer of the World’s Largest R Meetup.

Curriculum

The linear model, and its extensions, forms the backbone of statistical analysis. In this course we cover Linear Regression using `lm`, Generalized Linear Models using `glm` and model assessment using `AIC`, `BIC` and other measures. The focus will be mainly on applied programming, though theoretical properties and derivations will be taught where appropriate. Attendees should already have a basic knowledge of linear models and have R and RStudio installed, along with the `UsingR`, `ggplot2` and `coefplot` packages. Linear Models: Learn about the best fit line, Understand the formula interface in R, Understand the design matrix, Fit Models with `lm`, Visualize the coefficients with `coefplot`, Make predictions on new data. Generalized Linear Models: Learn about Logistic Regression for classification, Learn about Poisson Regression for count data, Fit models with `glm`, Visualize the coefficients with `coefplot`, Model Assessment, Compare models, `AIC`,’BIC`

Prerequisites

TBD

Data Science with Spark: Beyond the Basics (Part 1) with Adam Breindel

In 2016 Spark firmly established itself as one of data scientist’s favorite scalable machine learning platforms. Spark provides a general machine learning library, MLlib, that is designed for simplicity, scalability, and easy integration with other tools. Spark can also process your data in local, machine standalone mode and even build models when the input data set is larger than the amount of memory your computer has. Spark provides data scientists with a powerful, unified engine that is both fast (100x faster than Hadoop for large-scale data processing) and easy to use. This allows data practitioners to solve their machine learning problems (as well as graph computation, streaming, and real-time interactive query processing) interactively and at much greater scale. Spark also provides many language choices, including Scala, Java, Python, and R.

Bio

Adam Breindel consults and teaches widely on Apache Spark and other technologies. Adam’s experience includes work with banks on neural-net fraud detection, streaming analytics, cluster management code, and web apps, as well as development at a variety of startup and established companies in the travel, productivity, and entertainment industries. He is excited by the way that Spark and other modern big-data tech remove so many old obstacles to system design and make it possible to explore new categories of interesting, fun, hard problems.

Curriculum

This class is aimed at practitioners who are already familiar with the basics of Apache Spark and are have tried the machine learning samples in the Spark docs or some of the ML tutorial examples online. We’ll start from there and work to advance our knowledge of Spark ML. After briefly reviewing some fundamentals of Spark, DataFrames and Spark ML APIs, the class will then explore: – Performing feature preparation/transformation beyond the Spark built-in tools – “Borrowing” functionality from scikit-learn to help us pre-process features in Spark – Converting DataFrame data to access legacy (RDD) mllib features that are not yet exposed in the SparkML DataFrame API – Implementing data prep operations as reusable components by implementing new Transformers and Estimators – Adding a reusable parallel machine learning algorithm to Spark, by creating our own Estimator and Model classes – Sharing our reusable components with our Python data science colleagues by creating Python wrappers like those built into Spark.

Prerequisites

TBD

Data Science with Spark: Beyond the Basics (Part 2) with Adam Breindel

In 2016 Spark firmly established itself as one of data scientist’s favorite scalable machine learning platforms. Spark provides a general machine learning library, MLlib, that is designed for simplicity, scalability, and easy integration with other tools. Spark can also process your data in local, machine standalone mode and even build models when the input data set is larger than the amount of memory your computer has. Spark provides data scientists with a powerful, unified engine that is both fast (100x faster than Hadoop for large-scale data processing) and easy to use. This allows data practitioners to solve their machine learning problems (as well as graph computation, streaming, and real-time interactive query processing) interactively and at much greater scale. Spark also provides many language choices, including Scala, Java, Python, and R.

Bio

Adam Breindel consults and teaches widely on Apache Spark and other technologies. Adam’s experience includes work with banks on neural-net fraud detection, streaming analytics, cluster management code, and web apps, as well as development at a variety of startup and established companies in the travel, productivity, and entertainment industries. He is excited by the way that Spark and other modern big-data tech remove so many old obstacles to system design and make it possible to explore new categories of interesting, fun, hard problems.

Curriculum

This class is aimed at practitioners who are already familiar with the basics of Apache Spark and are have tried the machine learning samples in the Spark docs or some of the ML tutorial examples online. We’ll start from there and work to advance our knowledge of Spark ML. After briefly reviewing some fundamentals of Spark, DataFrames and Spark ML APIs, the class will then explore: – Performing feature preparation/transformation beyond the Spark built-in tools – “Borrowing” functionality from scikit-learn to help us pre-process features in Spark – Converting DataFrame data to access legacy (RDD) mllib features that are not yet exposed in the SparkML DataFrame API – Implementing data prep operations as reusable components by implementing new Transformers and Estimators – Adding a reusable parallel machine learning algorithm to Spark, by creating our own Estimator and Model classes – Sharing our reusable components with our Python data science colleagues by creating Python wrappers like those built into Spark.

Prerequisites

TBD

Data Exploration with Apache Drill with Charles Givre

Aside from needing nothing more than ANSI SQL to run queries and using conventional ODBC/JDBC connectors to allow access to the data, Drill doesn’t require schemas to be defined for the data before querying. This means less involvement from IT to prepare data for analysis; anyone with a suitable toolset and the proper permissions can plug in and begin querying. Apache drill works well with self-describing data in addition to complex data types. It offers low latency queries and supports large datasets.

Bio

Coming Soon!

Curriculum

Join Charles Givre for a hands on introduction to data exploration with Apache Drill. Becoming a data-driven business means using all the data you have available, but a common problem in many organizations is that data is not optimally arranged for ad-hoc analysis. Through a combination of lecture and hands-on exercises, you’ll gain the ability to access previously inaccessible data sources and analyze them with ease. You’ll learn how to use Drill to query and analyze structured data, connect multiple data sources to Drill, and perform cross-silo queries. Study after study shows that data scientists and analysts spend between 50% and 90% of their time preparing their data for analysis. Using Drill, you can dramatically reduce the time it takes to go from raw data to insight. This course will show you how.

Prerequisites

TBD

Open Geospatial Machine Learning with Kevin Stofan

Bio

Kevin is a Customer Facing Data Scientist at DataRobot and an Adjunct Professor at Pennsylvania State University where he teaches a graduate level Geographic Information Systems (GIS) course. He has over 16 years of experience using GIS and geospatial analysis to solve real world business problems. His experience with modeling geospatial phenomena include geostatistical, spatial econometric, and point pattern analysis.

Curriculum

This workshop will guide attendees through the entire geospatial machine learning workflow. Attendees will be be exposed to a variety of open source tools used to process, model, and visualize geospatial data including PySAL, GDAL, and QGIS. We will work through a supervised machine learning problem to predict the sale price of single family homes in Pinellas County Florida using a large number of property, structural, and socioeconomic features. This workshop will focus on concepts unique to handling geospatial data such as spatial autocorrelation, coordinate transformations, and edge effects.

Prerequisites

TBD

Learning from Text: Natural Language Processing with Python with Michelle Gill

Bio

Michelle Gill is a Senior Data Scientist at Metis, where she teaches quarterly bootcamps and conducts corporate training focused on data science, machine learning, big data, and related technologies. Her career as a data scientist has spanned from basic research to management consulting. As a scientist at the National Cancer Institute, she developed machine learning algorithms and software that increased experimental throughput up to 10X. Michelle was also a consultant for The Boston Consulting Group, where she advised clients in industries ranging from the pharmaceutical to financial services on strategic growth and organizational streamlining. She has a Ph.D. in Molecular Biophysics & Biochemistry from Yale University and completed a postdoctoral fellowship at Columbia University focused on elucidating mechanisms of the cancer pathway. Outside of work, Michelle enjoys running, watching college basketball, wine, and tweeting (@modernscientist).

Curriculum

It is estimated that 80% of the world’s data is unstructured, and much of this data exists in text form. Using real world applications, this workshop will cover the basics of working with textual data using Python. Processing text and creation of a bag-of-words representation will be introduced before moving on to model building using machine learning. The workshop will culminate with a discussion of deep learning methods, such as word2vec. This workshop is geared towards individuals who have basic Python and machine learning experience but are relatively new to natural language processing. Upon completion of this workshop, participants will be able to use Python to build and explore their own models on text data.

Prerequisites

TBD

Fortune-Telling with Python: An Intro to Time Series Modeling with Jonathan Balaban

Bio

Jonathan is a senior data scientist at Metis. Prior to Metis he worked as a data scientist at McKinsey and Booz Allen Hamilton, and he has taught data science at General Assembly. He has led teams to design bespoke data science solutions that have driven revolutionary changes in client operations. Jonathan – sometime

Curriculum

This workshop will provide a pythonic tour of time series methodologies and packages, including ARIMA, seasonal models, and Markov approaches. 

Prerequisites

Intermediate level with basic statistics and time data familiarity required.
Open Data Science Conference