Vritual Training & Workshop Sessions

– Taught by World-Class Data Scientists –

Learn the latest data science concepts, tools, and techniques from the best. Forge a connection with these rock stars from industry and academia, who are passionate about molding the next generation of data scientists.

Meet Some of our 2020 Instructors


Confirmed 2020 Virtual Training & Workshop Sessions

West Trainings
October 27th
October 28th
October 29th
October 30th
West Workshops/Tutorials
October 28th
October 29th
October 30th
October 27th
October 28th
October 29th
October 30th
October 28th
October 29th
October 30th
Natural Language Processing with PyTorch

Half-Day Training | Deep Learning | Open Source | Beginner – Intermediate

 

Objective: Natural Language Processing (NLP) is the fastest-growing field of deep learning with interest and funding from top AI companies to solve problems of language, text, and unstructured information. This has resulted in a tremendous focus on model building that combines language, mathematics, and computer science.This workshop will focus on problems of text summarization, question answering, and sentiment classification using modern approaches to model-building (GNMT, BERT, and GPT2). We will apply this to real-world problems to create an NLP pipeline on top of the PyTorch framework and spaCy…more details

 

Natural Language Processing with PyTorch image
Yashesh A. Shroff, PhD
Lead Strategy Planner for AI Ecosystem Enabling | Intel
Natural Language Processing with PyTorch image
Ravi Ilango
Principal Data Scientist | Intel
Data Visualization: From Jupyter to Dashboards

Half-Day Training | Data Visualization | Intermediate

 

Data visualization is fundamental to the data science process. Using plots and graphs to convey a complex idea makes your data more accessible to everyone. In this session, you will learn the fundamentals of plotting with Pandas in Jupyter by building an interactive visualization prototype that can also run as a standalone web application/dashboard. This session is for anyone who wants to be more familiar with data visualization, hands-on, with Python, Pandas, Matplotlib, interactive widgets, and Flask...more details

Data Visualization:  From Jupyter to Dashboards image
David Yerrington
Data Science Consultant | Yerrington Consulting
Deep Learning (with TensorFlow 2)

Full-Day Training | Deep Learning | Machine Learning | Beginner – Intermediate

 

Relatively obscure a few short years ago, Deep Learning is ubiquitous today across data-driven applications as diverse as machine vision, natural language processing, and super-human game-playing.
This Deep Learning primer brings the revolutionary machine-learning approach behind contemporary artificial intelligence to life with interactive demos featuring TensorFlow 2, the major, cutting-edge revision of the world’s most popular Deep Learning library. To facilitate an intuitive understanding of Deep Learning’s artificial-neural-network foundations, the essential theory will be introduced visually and pragmatically. Paired with tips for overcoming common pitfalls and hands-on Python code run-throughs provided in straightforward Jupyter notebooks, this foundational knowledge empowers you to build powerful state-of-the-art Deep Learning models…more details

 

Deep Learning (with TensorFlow 2) image
Dr. Jon Krohn
Chief Data Scientist, Author of Deep Learning Illustrated | Untapt
Advanced NLP with TensorFlow and PyTorch: LSTMs, Self-attention and Transformers

Full-Day Training | NLP | Advanced

 

Natural Language Processing (NLP) has recently experienced its own “ImageNet” moment. Rapidly evolving language models have enabled practitioners to decipher long lost languages, translate speech in one language to speech in another language directly without converting to text, generate long form text that adapts to the style and content of human prompts, and translate between language pairs never seen explicitly by computer systems (among many other impressive results).
In this training, you will develop a theoretical understanding of modern NLP along with the hands-on skills needed to develop state-of-the-art models. You will implement a variety of recurrent layer and transformer based architectures in both TensorFlow and PyTorch for tasks including text classification, machine translation, and predictive text…more details

Advanced NLP with TensorFlow and PyTorch: LSTMs, Self-attention and Transformers image
Daniel Whitenack, PhD
Instructor, Data Scientist | SIL International
Rapid Data Exploration and Analysis with Apache Drill

Half-Day Training | Beginner-Intermediate

 

Data analysts and data scientists often struggle with getting data into a usable form. Indeed research shows that it can consume up to 90% of a data scientist’s time preparing data. In this interactive workshop, you will learn how to use Apache Drill to rapidly explore a wide variety of data, from a variety of sources without having to write code…more details

Rapid Data Exploration and Analysis with Apache Drill image
Charles Givre
VP Data & Analytics, Cybersecurity Technology and Controls | JPMorgan Chase & Co
Keras from Soup to Nuts – An Example Driven Tutorial

Half-Day Training | Intermediate

 

This workshop hopes to convince participants that Keras is a worthwhile addition to their Machine Learning toolbelt. It teaches them how to build their own Keras models, initially using components already available in Keras, then extend them by customizing some of these components, and finally exploit the underlying Tensorflow platform for maximum flexibility and performance. They will also be able to work with the many cool (sometimes SOTA) models shared by the Keras community…more details

Keras from Soup to Nuts – An Example Driven Tutorial image
Sujit Pal
Technology Research Director | Elsevier Labs
Getting Started with Pandas for Data Analysis

Half-Day Training | Beginner

 

This tutorial offers a comprehensive introduction to the powerful pandas library for data analysis built on top of the Python programming language. Pandas represents a great step forward for graphical spreadsheet users looking to grow their data manipulation skills. I like to call it “Excel on steroids”. By completing this workshop, you’ll have a strong foundation for using Pandas in your day-to-day data analysis needs. We’ll start out with the basics — importing datasets, selecting rows and columns, filtering rows by criteria — and progress to advanced concepts like grouping values, joining multiple datasets together, and cleaning text…more details

Getting Started with Pandas for Data Analysis image
Boris Paskhaver
Software Engineer | Stride Consulting
Modern and Old Reinforcement Learning Part 1

Half-Day Training | Deep Learning | Machine Learning | Beginner-Intermediate

 

Reinforcement Learning recently progressed greatly in the industry as one of the best techniques for sequential decision making and control policies. DeepMind used RL to greatly reduce energy consumption in Google’s data center. It has been used to do text summarization, autonomous driving, dialog systems, media advertisements and in finance by JPMorgan Chase. We are at the very beginning of the adoption of these algorithms as systems are required to operate more and more autonomously.
In this workshop we will explore Reinforcement Learning, starting from its fundamentals and ending creating our own algorithms...more details

Modern and Old Reinforcement Learning Part 1 image
Leonardo De Marchi
Head of Data Science and Analytics | Badoo (now MagicLab, which owns several apps)
Introduction to Scikit-learn: Machine learning in Python

Half-Day Training | Machine Learning | Open Source | Beginner

 

Scikit-learn is a machine learning library in Python that is used by many data science practitioners. Machine learning is a valuable tool used across many domains such as medicine, physics, and finance. We will start this training by learning about scikit-learn’s API for supervised machine learning. scikit-learn’s API mainly consists of three methods: fit, to build models, predict, to make predictions from models, and transform, to change the representation of the input data...more details

Introduction to Scikit-learn: Machine learning in Python image
Thomas Fan
Staff Associate - Machine Learning | Columbia University in the City of New York
Intermediate Machine Learning with Scikit-learn: Cross-validation, Parameter Tuning, Pandas Interoperability, and Missing Values

Half-Day Training | Machine Learning | Open Source | Intermediate

 

Scikit-learn is a machine learning library in Python that is used by many data science practitioners. In this training, we will learn about cross validation, tuning machine learning algorithms and pandas interoperability. We will start by learning about cross validation for machine learning. Cross validation enables us to evaluate our machine learning models by splitting our data into training and testing datasets. We will cover cross validation schemes such as K-Fold cross-validation and the importance of stratifying your data. Next, we will learn about tuning algorithms in scikit-learn with grid search and random search. These hyper-parameter searching techniques help find hyper-parameter combinations that are suited for your dataset...more details

Intermediate Machine Learning with Scikit-learn: Cross-validation, Parameter Tuning, Pandas Interoperability, and Missing Values image
Thomas Fan
Staff Associate - Machine Learning | Columbia University in the City of New York
Intermediate Machine Learning with Scikit-learn: Evaluation, Calibration, and Inspection

Half-Day Training | Machine Learning | Open Source | Intermediate

 

Scikit-learn is a machine learning library in Python that is used by many data science practitioners. In this training, we will learn about model evaluation, model calibration, and model inspection. We will start by learning about evaluating a machine learning model after it is trained. We will compare various metrics such as ROC AUC and mean average precision and see how they behave on datasets with different characteristics. We will use scikit-learn’s plotting API to easily visualize the performance of a model and to compare multiple models. Next, we will learn about how to calibrate a machine learning model with scikit-learn. A well-calibrated model will predict probabilities that reflect the true likelihood of an event...more details

Intermediate Machine Learning with Scikit-learn: Evaluation, Calibration, and Inspection image
Thomas Fan
Staff Associate - Machine Learning | Columbia University in the City of New York
Advanced Machine Learning with Scikit-learn: Text Data, Imbalanced Data, and Poisson Regression

Half-Day Training | Machine Learning | Open Source | Intermediate

 

Scikit-learn is a machine learning library in Python that is used by many data science practitioners. In this training, we will learn about processing text data, working with imbalanced data and Poisson regression. We will start by learning about processing text data with scikit-learn’s CountVectorizer and TfidfVectorizer. The CountVectorizer converts a collection of text documents into a matrix of token counts. We will explore the hyper-parameters that the CountVectorizer provides for creating these token counts. The TfidfVectorizer weights the count features into floating point values using the term frequency and inverse document-frequency...more details

Advanced Machine Learning with Scikit-learn: Text Data, Imbalanced Data, and Poisson Regression image
Thomas Fan
Staff Associate - Machine Learning | Columbia University in the City of New York
Introduction to Shiny Application Development

Half-Day Training

 

Turning raw data into meaningful information and telling data driven stories is one of the great challenges of data science. When your data does not have you to speak for it in a live situation, your application needs to communicate your message clearly and provide simple interfaces and meaningful interactions to drive your message home to consumers.
In this session you will learn to use Shiny to build a dashboard from blank page to interactive application using the programming language R, the free R development environment rStudio and Redis. We will use free public data and open source libraries as we sculpt our dashboard together...more details

Introduction to Shiny Application Development image
Bethany Poulin
Data Science Instructor | General Assembly
Painting with Data: Introduction to d3.js

Half-Day Training | Data Visualization | Intermediate-Advanced

 

In this workshop we will build an interactive data visualization from scratch using d3.js in the browser. The posibilities shown in d3 examples are exciting but the API surface of d3 and the various browser standards like HTML, CSS, SVG and JavaScript, can be overwhelming. Think of this workshop as a guided tour that will point out the important things to pay attention to as we go step-by-step from CSV file to interactive visualization...more details

Painting with Data: Introduction to d3.js image
Ian Johnson
User Experience Engineer, Organizer | Google, Bay Area D3
Modern and Old Reinforcement Learning Part 2

Half-Day Training | Deep Learning | Machine Learning | Beginner – Intermediate

 

Reinforcement Learning recently progressed greatly in the industry as one of the best techniques for sequential decision making and control policies. DeepMind used RL to greatly reduce energy consumption in Google’s data center. It has been used to do text summarization, autonomous driving, dialog systems, media advertisements and in finance by JPMorgan Chase. We are at the very beginning of the adoption of these algorithms as systems are required to operate more and more autonomously.
In this workshop we will explore Reinforcement Learning, starting from its fundamentals and ending creating our own algorithms.
..more details

Modern and Old Reinforcement Learning Part 2 image
Leonardo De Marchi
Head of Data Science and Analytics | Badoo (now MagicLab, which owns several apps)
State of the art AI methods with TensorFlow: Transfer Learning, RL and GANs

Half-Day Training | Deep Learning | Advanced

 

Although supervised learning has dominated industry machine learning implementations, unsupervised and semi-supervised methods have started to be practically applied to real world problems (outside of playing video games). Generative Adversarial Networks (GANs) are being utilized to augment data and generate dialogue, and Reinforcement Learning (RL) is helping people plan marketing campaigns and control robots. In this training, you will develop a theoretical understanding of these and other related state-of-the-art AI methods along with the hands-on skills needed to train and utilize them. You will implement a variety of models in TensorFlow for tasks including object recognition, image generation and robotics…more details

State of the art AI methods with TensorFlow: Transfer Learning, RL and GANs image
Daniel Whitenack, PhD
Instructor, Data Scientist | SIL International
Hands-on Reinforcement Learning with Ray RLlib

Full-Day Training | Machine Learning | Intermediate-Advanced

 

Ray RLlib implements a wide variety of reinforcement learning algorithms and it provides the tools for adding your own. It integrates with popular frameworks like OpenAI Gym, TensorFlow, and PyTorch. It provides concise abstractions for defining the algorithm and tools you want to use, and specifying the cluster resources available. It is extensible for new algorithms, agents, and environments. Ray does the work to leverage the resources, providing state-of-the-art performance…more details

Hands-on Reinforcement Learning with Ray RLlib image
Dean Wampler, PhD
Principal Software Engineer, Author | Domino Data Lab
Applied Deep Learning: Building a Chess Object Detection Model with TensorFlow

Half-Day Training | Computer Vision | Deep Learning | Intermediate

 

In this tutorial, we will introduce how to build an object detection model. Specifically, we will build an object detection model that identifies chess pieces (a custom dataset provided by the presenter). In doing so, participants will gain insight into the fundamentals of computer vision: structuring a good problem for object detection, dataset collection and annotation, data preparation through preprocessing, data augmentation to support a well-fit model, training a model, debugging a model’s fit, and using the model for inference.

We will introduce Keras and TensorFlow as our specific libraries for writing computer vision models and train a Yolov3 (You Only Look Once) model for real-time detection. Participants will leverage Colab for GPU compute.

Session Outline
– Introduction to Computer Computer Vision Problems
– Data Preparation
– Data Preprocessing and Augmentation
– Training Our Yolov3 Model
– Inference and Q&A...more details

Applied Deep Learning: Building a Chess Object Detection Model with TensorFlow image
Joseph Nelson
Cofounder, Principal Data Scientist & Faculty | Roboflow.ai, BetaVector, General Assembly
A Hands-On Tutorial for Training Interpretable Variational Autoencoders Using siVAE

Workshop | Deep Learning | Data Visualization | Intermediate

 

In this hands-on tutorial, we will introduce attendees to the siVAE (scalable, interpretable VAE) model that infers a set of factor loadings that explicitly map latent dimensions to the input features that define them, during training of the VAE model. Using standard datasets from computer vision (MNIST, Fashion-MNIST and CIFAR-10), we will walk attendees through the process of training the siVAE model, visualizing the sample embeddings inferred by classic VAEs, and extracting and visualizing the features that contribute to individual latent dimensions. We will also teach attendees how to estimate and visualize feature awareness, a new metric for measuring the overall importance of individual features for embedding a sample in the latent space. At the end of the tutorial, attendees will be able to train an siVAE model on their own datasets and interpret and visualize the latent dimensions inferred…more details

A Hands-On Tutorial for Training Interpretable Variational Autoencoders Using siVAE image
Gerald Quon, PhD
Assistant Professor | UC Davis Machine Learning & AI Group
State-of-the-Art Natural Language Processing with Spark NLP

Tutorial | NLP | Machine Learning | Intermediate

 

This is a hands-on tutorial on applying the latest advances in deep learning and transfer learning for common NLP tasks such as named entity recognition, document classification, spell checking, and sentiment analysis. Learn to building complete text analysis pipelines using the highly accurate, high performant, open-source Spark NLP library in Python.This is a hands-on tutorial on applying the latest advances in deep learning and transfer learning for common NLP tasks such as named entity recognition, document classification, spell checking, and sentiment analysis. Learn to building complete text analysis pipelines using the highly accurate, high performant, open-source Spark NLP library in Python…more details

 

State-of-the-Art Natural Language Processing with Spark NLP image
David Talby, PhD
CTO | Pacific AI
Solving Problems with Both Text and Numerical Data Using Gradient Boosting

Workshop | Machine Learning | Open-source | Intermediate

 

Gradient boosting is a powerful machine-learning technique that achieves state-of-the-art results in a variety of practical tasks. For a number of years, it has remained the primary method for learning problems with heterogeneous features, noisy data, and complex dependencies: web search, recommendation systems, weather forecasting, and many others.
Some problems contain different types of data, including numerical, categorical and text data. In this case the best solution is either buiding new numerical features instead of text and categories and pass it to gradient boosting, or using out-of-the box solutions for that…
more details

Solving Problems with Both Text and Numerical Data Using Gradient Boosting image
Stanislav Kirillov
Senior Software Developer | Yandex
StructureBoost: Gradient Boosting with Categorical Structure

Workshop | Machine Learning | Research Frontiers | Intermediate-Advanced

 

The values of a categorical variable frequently have a structure that is not ordinal or linear in nature. For example, the months of the year have a circular structure, and the US States have a geographical structure. Standard approaches such as one-hot or numerical encoding are unable to effectively exploit the structural information of such variables. In this tutorial, we will introduce the StructureBoost gradient boosting package, wherein the structure of categorical variables can be represented by a graph, and exploited to improve predictive performance. Moreover, StructureBoost can make informed predictions on categorical values for which there is little or no data, by leveraging the knowledge of the structure. We will walk through examples of how to configure and train models using StructureBoost and demonstrate other features of the package…more details