Training & Workshop Sessions

– Taught by World-Class Data Scientists –

Learn the latest data science concepts, tools, and techniques from the best. Forge a connection with these rock stars from industry and academia, who are passionate about molding the next generation of data scientists.

Highly Experienced Instructors

Our instructors are highly regarded in data science, coming from both academia and renowned companies.

Real World Applications

Gain the skills and knowledge to use data science in your career and business, without breaking the bank.

Cutting Edge Subject Matter

Find training sessions offered on a wide variety of data science topics from machine learning to data visualization to DevOps.

Training, Workshops & Tutorials Sessions

ODSC Europe 2019 will host training and workshop session on some of the latest and in-demand technique, models and frameworks including:

Training Focus Areas

  • Deep Learning and Reinforcement Learning

  • Machine Learning, Transfer Learning and Adversarial Learning 

  • Computer Vision

  • NLP, Speech, Text Analytics and Spatial Analysis

  • Data Visualization

Quick Facts

  • Choose from 25 Training sessions

  • Choose from 30+ workshops

  • Hands-on training session are 4 hours in duration

  • Workshops and tutorial are 2 hours in duration

Frameworks

  • TensorFlow, PyTorch, and MXNet

  • Scikit-learn, PyMC3, Pandas, Theano, NLTK, NumPy, SciPy

  • Kera, Apache Spark, Apache Storm, Airflow, Apache Kafka

  • Kubernetes, Kubeflow, Apache Ignite, Hadoop

Europe 2019 Confirmed Instructors

Training Sessions

More sessions added weekly

Training: R for Python Programmers

Should a data scientist use R or should they use Python? The answer to this rather delicate question is, of course, they should know a bit each and use the most appropriate language for the task in hand. In this tutorial, we’ll take participants through the R tidyverse, one of the (many!) areas that R shines. The tidyverse is essential for any data scientist who deals with data on a day-to-day basis. By focusing on small key tasks, the tidyverse suite of packages removes the pain of data manipulation. We’ll cover some of the core features of the tidyverse, such as dplyr (the workhorse of the tidyverse), string manipulation, graphics and the concept of tidy data…more details

Instructor Bio

Dr. Colin Gillespie is Senior lecturer (Associate Professor) at Newcastle University, UK. His research interests are high performance statistical computing and Bayesian statistics. He is regularly employed as a consultant by Jumping Rivers and has been teaching R since 2005 at a variety of levels, ranging from beginners to advanced programming.

Dr. Colin Gillespie

Senior Lecturer at Newcastle University

Training: Deep Reinforcement Learning

Reinforcement Learning recently progressed greatly in industry as one of the best techniques for sequential decision making and control policies.

DeepMind used RL to greatly reduce energy consumption in Google’s data centre. It has being used to do text summarisation, autonomous driving, dialog systems, media advertisements and in finance by JPMorgan Chase. We are at the very beginning of the adoption of these algorithms as systems are required to operate more and more autonomously.
In this workshop we will explore Reinforcement Learning, starting from its fundamentals and ending creating our own algorithms…more details

Instructor Bio

Leonardo De Marchi holds a Master in Artificial intelligence and has worked as a Data Scientist in the sport world, with clients such as the New York Knicks and Manchester United, and with large social networks, like Justgiving.

He now works as Head of Data Scientist and Analytics in Badoo, the largest dating site with over 420 million users. He is also the lead instructor at ideai.io, a company specialized in Reinforcement Learning, Deep Learning and Machine Learning training.

Leonardo De Marchi

Head of Data Science and Analytics at Badoo

Training: Advanced Machine Learning with Scikit-learn, Part I

Scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners. This training will cover some of the more advanced aspects of scikit-learn, such as building complex machine learning pipelines, advanced model evaluation, feature engineering and working with imbalanced datasets. We will also work with text data using the bag-of-word method for classification.

This workshop assumes familiarity with Jupyter notebooks and basics of pandas, matplotlib and numpy. It also assumes some familiarity with the API of scikit-learn and how to do cross-validations and grid-search with scikit-learn.

Instructor Bio

Andreas Mueller received his MS degree in Mathematics (Dipl.-Math.) in 2008 from the Department of Mathematics at the University of Bonn. In 2013, he finalized his PhD thesis at the Institute for Computer Science at the University of Bonn. After working as a machine learning scientist at the Amazon Development Center Germany in Berlin for a year, he joined the Center for Data Science at the New York University in the end of 2014. In his current position as assistant research engineer at the Center for Data Science, he works on open source tools for machine learning and data science. He is one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, for several years, and has authored and contributed to a number of open source projects related to machine learning.

Andreas Mueller, PhD

Author, Research Scientist, Core Contributor of scikit-learn at Columbia Data Science Institute

Training: Advanced Machine Learning with Scikit-learn, Part II

Scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners. This training will cover some advanced topics in using scikit-learn, such as how to perform out-of-core learning with scikit-learn and how to speed up parameter search. We’ll also cover how to build your own models or feature extraction methods that are compatible with scikit-learn, which is important for feature extraction in many domains. We will see how we can customize scikit-learn even further, using custom methods for cross-validation or model evaluation.

This workshop assumes familiarity with Jupyter notebooks and basics of pandas, matplotlib and numpy. It also assumes experience using scikit-learn and familiarity with the API.

Instructor Bio

Andreas Mueller received his MS degree in Mathematics (Dipl.-Math.) in 2008 from the Department of Mathematics at the University of Bonn. In 2013, he finalized his PhD thesis at the Institute for Computer Science at the University of Bonn. After working as a machine learning scientist at the Amazon Development Center Germany in Berlin for a year, he joined the Center for Data Science at the New York University in the end of 2014. In his current position as assistant research engineer at the Center for Data Science, he works on open source tools for machine learning and data science. He is one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, for several years, and has authored and contributed to a number of open source projects related to machine learning.

Andreas Mueller, PhD

Author, Research Scientist, Core Contributor of scikit-learn at Columbia Data Science Institute

Training: Building Recommendation Engines and Deep Learning Models Using Python, R and SAS

Deep learning is the newest area of machine learning and has become ubiquitous in predictive modeling. The complex, brainlike structure of deep learning models is used to find intricate patterns in large volumes of data. These models have heavily improved the performance of general supervised models, time series, speech recognition, object detection and classification, and sentiment analysis.

Factorization machines are a relatively new and powerful tool for modeling high-dimensional and sparse data. Most commonly they are used as recommender systems by modeling the relationship between users and items. For example, factorization machines can be used to recommend your next Netflix binge based on how you and other streamers rate content…more details

Instructor's Bio

Jordan Bakerman holds a Ph.D. in statistics from North Carolina State University. His dissertation centered on using social media to forecast real world events, such as civil unrest and influenza rates. As an intern at SAS, Jordan wrote the SAS Programming for R Users course for students to efficiently transition from the R to SAS using a cookbook style approach. As an employee, Jordan has developed courses demonstrating how to integrate open source software within SAS products. He is passionate about statistics, programming, and helping others become better statisticians.

Jordan Bakerman, PhD

Sr. Analytical Training Consultant at SAS

Training: Building Recommendation Engines and Deep Learning Models Using Python, R and SAS

Deep learning is the newest area of machine learning and has become ubiquitous in predictive modeling. The complex, brainlike structure of deep learning models is used to find intricate patterns in large volumes of data. These models have heavily improved the performance of general supervised models, time series, speech recognition, object detection and classification, and sentiment analysis.

Factorization machines are a relatively new and powerful tool for modeling high-dimensional and sparse data. Most commonly they are used as recommender systems by modeling the relationship between users and items. For example, factorization machines can be used to recommend your next Netflix binge based on how you and other streamers rate content…more details

Instructor's Bio

Ari holds bachelor’s degrees in both physics and mathematics from UNC-Chapel Hill. His research focused on collecting and analyzing low energy physics data to better understand the neutrino. Ari taught introductory and advanced physics and scientific programming courses at UC-Berkeley while working on a master’s in physics with a focus on nonlinear dynamics. While at SAS, Ari has worked to develop courses that teach how to use Python code to control SAS analytical procedures.

Ari Zitin

Analytical Training Consultant at SAS

Training: All The Cool Things You Can Do With Postgresql To Next Level Your Data Analysis

The intention of this VERY hands on workshop is to get you introduced and playing with some of the great features you never knew about in PostgreSQL. You know, and probably already love, PostgreSQL as your relational database. We will show you how you can forget about using ElasticSearch, MongoDB, and Redis for a broad array of use cases. We will add in some nice statistical work with R embedded in PostgreSQL. Finally we will bring this all together using the gold standard in spatial databases, PostGIS. Unless you have a specialized use case, PostgreSQL is the answer. The session will be very hands on with plenty of interactive exercises.

By the end of the workshop participants will leave with hands on experience doing:
Spatial Analysis
JSON search
Full Text Search
Using R for stored procedures and functions
All in PostgreSQL

Instructor Bio

Steve is a Partner, and Director of Developer Relations for Crunchy Data (PostgreSQL people). He goes around and shows off all the great work the PostgreSQL community and Crunchy Committers do. He can teach you about Data Analysis with PostgreSQL, Java, Python, MongoDB, and some JavaScript. He has deep subject area expertise in GIS/Spatial, Statistics, and Ecology. He has spoken at over 75 conferences and done over 50 workshops including Monktoberfest, MongoNY, JavaOne, FOSS4G, CTIA, AjaxWorld, GeoWeb, Where2.0, and OSCON. Before Crunchy Data, Steve was a developer evangelist for DigitalGlobe, Red Hat, LinkedIn, deCarta, and ESRI. Steve has a Ph.D. in Ecology.

Steven Pousty, PhD

Director of Developer Relations at Crunchy Data

Training: Machine Learning in R, Part I

Modern statistics has become almost synonymous with machine learning, a collection of techniques that utilize today’s incredible computing power. This two-part course focuses on the available methods for implementing machine learning algorithms in R, and will examine some of the underlying theory behind the curtain. We start with the foundation of it all, the linear model and its generalization, the glm. We look how to assess model quality with traditional measures and cross-validation and visualize models with coefficient plots. Next we turn to penalized regression with the Elastic Net. After that we turn to Boosted Decision Trees utilizing xgboost. Attendees should have a good understanding of linear models and classification and should have R and RStudio installed, along with the `glmnet`, `xgboost`, `boot`, `ggplot2`, `UsingR` and `coefplot` packages...more details

Instructor Bio

Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.
He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.

Jared Lander

Chief Data Scientist, Author of R for Everyone, Professor at Lander Analytics, Columbia Business School

Training: Machine Learning in R, Part I

Modern statistics has become almost synonymous with machine learning, a collection of techniques that utilize today’s incredible computing power. This two-part course focuses on the available methods for implementing machine learning algorithms in R, and will examine some of the underlying theory behind the curtain. We start with the foundation of it all, the linear model and its generalization, the glm. We look how to assess model quality with traditional measures and cross-validation and visualize models with coefficient plots. Next we turn to penalized regression with the Elastic Net. After that we turn to Boosted Decision Trees utilizing xgboost. Attendees should have a good understanding of linear models and classification and should have R and RStudio installed, along with the `glmnet`, `xgboost`, `boot`, `ggplot2`, `UsingR` and `coefplot` packages...more details

Instructor Bio

Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.
He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.

Jared Lander

Chief Data Scientist, Author of R for Everyone, Professor at Lander Analytics, Columbia Business School

Training: Introduction to Machine Learning

Machine learning has become an indispensable tool across many areas of research and commercial applications. From text-to-speech for your phone to detecting the Higgs boson, machine learning excells at extracting knowledge from large amounts of data. This talk will give a general introduction to machine learning, as well as introduce practical tools for you to apply machine learning in your research. We will focus on one particularly important subfield of machine learning, supervised learning. The goal of supervised learning is to “”learn”” a function that maps inputs x to an output y, by using a collection of training data consisting of input-output pairs. We will walk through formalizing a problem as a supervised machine learning problem, creating the necessary training data and applying and evaluating a machine learning algorithm. The talk should give you all the necessary background to start using machine learning yourself.

Instructor Bio

Andreas Mueller received his MS degree in Mathematics (Dipl.-Math.) in 2008 from the Department of Mathematics at the University of Bonn. In 2013, he finalized his PhD thesis at the Institute for Computer Science at the University of Bonn. After working as a machine learning scientist at the Amazon Development Center Germany in Berlin for a year, he joined the Center for Data Science at the New York University in the end of 2014. In his current position as assistant research engineer at the Center for Data Science, he works on open source tools for machine learning and data science. He is one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, for several years, and has authored and contributed to a number of open source projects related to machine learning.

Andreas Mueller, PhD

Author, Research Scientist, Core Contributor of scikit-learn at Columbia Data Science Institute

Training: TFX: Production ML Pipelines with TensorFlow

Putting machine learning models into production is now mission critical for every business – no matter what size.

TensorFlow is the industry-leading platform for developing, modeling, and serving deep learning solutions. But putting together a complete pipeline for deploying and maintaining a production application of AI and deep learning is much more than training a model. Google has taken years of experience in developing production ML pipelines and offered the open source community TensorFlow Extended (TFX), an open source version of tools and libraries that Google uses internally.

Learn what’s involved in creating a production pipeline, and walk through working code in an example pipeline with experts from Google. You’ll be able to take what you learn and get started on creating your own pipelines for your applications.

Instructor Bio

Kevin Haas is a senior engineering manager at Google Research, driving the open source adoption of Tensorflow Extended (https://github.com/tensorflow/tfx), one of Google’s production ML platforms. Kevin previously served as an engineering leader for multiple machine learning and infrastructure efforts in Google Cloud, Research, and Infrastructure teams. Prior to Google, Kevin led knowledge and search infrastructure efforts in multiple Internet and software companies, including IBM, Microsoft, and Yahoo! Kevin received his MS from Stanford University in Computer Science in dual specializations of distributed systems and databases.

Kevin Haas

Senior Engineering Manager at Google

Training: TFX: Production ML Pipelines with TensorFlow

Putting machine learning models into production is now mission critical for every business – no matter what size.

TensorFlow is the industry-leading platform for developing, modeling, and serving deep learning solutions. But putting together a complete pipeline for deploying and maintaining a production application of AI and deep learning is much more than training a model. Google has taken years of experience in developing production ML pipelines and offered the open source community TensorFlow Extended (TFX), an open source version of tools and libraries that Google uses internally.

Learn what’s involved in creating a production pipeline, and walk through working code in an example pipeline with experts from Google. You’ll be able to take what you learn and get started on creating your own pipelines for your applications.

Instructor Bio

Jarek is the Technical Program Manager at Google Research responsible for TensorFlow Extended (tensorflow.org/tfx). TFX is the Google-production-scale end-to-end machine learning platform based on TensorFlow. Jarek joined Google in 2010, has a MS in Software Management from Carnegie Mellon University and BS Computer Science with concentration on AI from The University of Memphis. Prior to joining Google, he worked on building telecommunications systems software at Hewlett Packard, BEA Systems, Alcatel and a couple of startups.

Jarek Wilkiewicz

Technical Program Manager at Google

Training: Opening The Black Box -- Interpretability In Deep Learning

The recent application of deep neural networks to long-standing problems has brought a break-through in performance and prediction power. However, high accuracy often comes at the price of loss of interpretability, i.e. many of these models are black-boxes that fail to provide explanations on their predictions. This tutorial focuses on illustrating some of the recent advancements in the field of interpretable artificial intelligence. We will show some common techniques that can be used to explain predictions on pretrained models and that can be used to shed light on their inner mechanisms. The tutorial is aimed to strike the right balance between theoretical input and practical exercises. The tutorial has been designed to provide the participants not only with the theory behind deep learning interpretability, but also to offer a set of frameworks, tools and real-life examples that they can implement in their own projects.

Instructor Bio

Matteo is a Research Staff Member in Cognitive Health Care and Life Sciences at IBM Research Zürich. He’s currently working on the development of multimodal deep learning models for drug discovery using chemical features and omic data. He also researches in multimodal learning techniques for the analysis of pediatric cancers in a H2020 EU project, iPC, with the aim of creating treatment models for patients. He received his degree in Mathematical Engineering from Politecnico di Milano in 2013. After getting his MSc he worked in a startup, Moxoff spa, as a software engineer and analyst for scientific computing. In 2019 he obtained his doctoral degree at the end of a joint PhD program between IBM Research and the Institute of Molecular Systems Biology, ETH Zürich, with a thesis on multimodal learning approaches for precision medicine.

Matteo Manica, PhD

Research Staff Member at Cognitive Health Care & Life Sciences, IBM Research Zürich

Training: Introduction to RMarkdown in Shiny

Markdown Primer (45 minutes)Structure Documents with Sections and SubsectionsFormatting TextCreating Ordered and Unordered ListsMaking LinksNumber SectionsInclude Table of Contents
Integrate R Code (30 minutes)Insert Code ChunksHide CodeSet Chunk OptionsDraw PlotsSpeed Up Code with Caching
Build RMarkdown Slideshows (20 minutes)Understand Slide StructureCreate SectionsSet Background ImagesInclude Speaker NotesOpen Slides in Speaker Mode
Develop Flexdashboards (30 minutes)Start with the Flexdashboard LayoutDesign Columns and RowsUse Multiple PagesCreate Social SharingInclude Code
Shiny InputsDrop DownsTextRadioChecks
Shiny OutputsTextTablesPlots
Reactive Expressions
HTML WidgetsInteractive PlotsInteractive MapsInteractive Tables
Shiny Layouts UI and Server Files User Interface

Instructor Bio

Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.
He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.

Jared Lander

Chief Data Scientist, Author of R for Everyone, Professor at Lander Analytics, Columbia Business School

Training: Intermediate RMarkdown in Shiny

Markdown Primer (45 minutes)Structure Documents with Sections and SubsectionsFormatting TextCreating Ordered and Unordered ListsMaking LinksNumber SectionsInclude Table of Contents
Integrate R Code (30 minutes)Insert Code ChunksHide CodeSet Chunk OptionsDraw PlotsSpeed Up Code with Caching
Build RMarkdown Slideshows (20 minutes)Understand Slide StructureCreate SectionsSet Background ImagesInclude Speaker NotesOpen Slides in Speaker Mode
Develop Flexdashboards (30 minutes)Start with the Flexdashboard LayoutDesign Columns and RowsUse Multiple PagesCreate Social SharingInclude Code
Shiny InputsDrop DownsTextRadioChecks
Shiny OutputsTextTablesPlots
Reactive Expressions
HTML WidgetsInteractive PlotsInteractive MapsInteractive Tables
Shiny Layouts UI and Server Files User Interface

Instructor Bio

Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.
He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.

Jared Lander

Chief Data Scientist, Author of R for Everyone, Professor at Lander Analytics, Columbia Business School

Training: Intermediate Machine Learning with Scikit-learn

Scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners. This talk will cover some of the more advanced aspects of scikit-learn, such as building complex machine learning pipelines, model evaluation, parameter search, and out-of-core learning. Apart from metrics for model evaluation, we will cover how to evaluate model complexity, and how to tune parameters with grid search, randomized parameter search, and what their trade-offs are. We will also cover out of core text feature processing via feature hashing.

Instructor Bio

Andreas Mueller received his MS degree in Mathematics (Dipl.-Math.) in 2008 from the Department of Mathematics at the University of Bonn. In 2013, he finalized his PhD thesis at the Institute for Computer Science at the University of Bonn. After working as a machine learning scientist at the Amazon Development Center Germany in Berlin for a year, he joined the Center for Data Science at the New York University in the end of 2014. In his current position as assistant research engineer at the Center for Data Science, he works on open source tools for machine learning and data science. He is one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, for several years, and has authored and contributed to a number of open source projects related to machine learning.

Andreas Mueller, PhD

Author, Research Scientist, Core Contributor of scikit-learn at Columbia Data Science Institute

Training: Reinforcement Learning with TF-Agents & TensorFlow 2.0: Hands On

In this workshop you will discover how machines can learn complex behaviors and anticipatory actions. Using this approach autonomous helicopters fly aerobatic maneuvers and even the GO world champion was beaten with it. A training dataset containing the “right” answers is not needed, nor is “hard-coded” knowledge. The approach is called “reinforcement learning” and is almost magical.

Using TF-Agents on top of TensorFlow 2.0 we will see how a real-life problem can be turned into a reinforcement learning task. In an accompanying Python notebook, we implement – step by step – all solution elements, highlight the design of Google’s newest reinforcement learning library, point out the role of neural networks and look at optimization opportunities…more details

Instructor Bio

Oliver Zeigermann is a developer and consultant from Hamburg, Germany. He has been involved with AI since his studies in the 90s and has written several books and has recently published the “Deep Learning Crash Course” with Manning. More on http://zeigermann.eu/

Oliver Zeigermann

Consultant at embarc / bSquare

Training: Reinforcement Learning with TF-Agents & TensorFlow 2.0: Hands On

In this workshop you will discover how machines can learn complex behaviors and anticipatory actions. Using this approach autonomous helicopters fly aerobatic maneuvers and even the GO world champion was beaten with it. A training dataset containing the “right” answers is not needed, nor is “hard-coded” knowledge. The approach is called “reinforcement learning” and is almost magical.

Using TF-Agents on top of TensorFlow 2.0 we will see how a real-life problem can be turned into a reinforcement learning task. In an accompanying Python notebook, we implement – step by step – all solution elements, highlight the design of Google’s newest reinforcement learning library, point out the role of neural networks and look at optimization opportunities…more details

Instructor Bio

Christian is a consultant at bSquare with a focus on machine learning & .net development. He has a PhD in computer algebra from ETH Zurich and did a postdoc at UC Berkeley where he researched online data mining algorithms. Currently he applies reinforcement learning to industrial hydraulics simulations.

Christian Hidber, PhD

Consultant at bSquare

Workshop Sessions

More sessions added weekly

Instructor Bio

Alex Peattie is the co-founder and CTO of Peg, a technology platform helping multinational brands and agencies to find and work with top YouTubers. Peg is used by over 1500 organisations worldwide including Coca-Cola, L’Oreal and Google.

An experienced digital entrepreneur, Alex spent six years as a developer and consultant for the likes of Grubwithus, Huckberry, UNICEF and Nike, before joining coding bootcamp Makers Academy as senior coach, where he trained hundreds of junior developers. Alex was also a technical judge at this year’s TechCrunch Disrupt conference.

Alex Peattie

Co-founder, CTO at Peg

Workshop: Beyond deep learning - differentiable programming with Flux

Deep learning is a rapidly evolving field, and models are increasingly complex. Recently, researchers have begun to explore “differentiable programming”, a powerful way to combine neural networks with traditional programming. Differentiable programs may include control flow, functions and data structures, and can even incorporate ray tracers, simulations and scientific models, giving us even unprecedented power to find subtle patterns in our data.

This workshop will show you how this technique, and particularly Flux – a state-of-the-art deep learning library – is impacting the machine learning world. We will show you how Flux makes it easy to create traditional deep learning models, and explain how the flexibility of the Julia language allows complex physical models can be optimised by the same architecture. We’ll outline important recent work and show how Flux allows us to easily combine neural networks with tools like differential equations solvers.

Instructor Bio

Avik Sengupta has worked on risk and trading systems in investment banking for many years, mostly using Java interspersed with snippets of the exotic R and K languages. This experience left him wondering whether there were better things out there. Avik’s quest came to a happy conclusion with the appearance of Julia in 2012. He has been happily coding in Julia and contributing to it ever since.

Avik Sengupta

VP of Engineering at Julia Computing

Workshop: Automatic Speech Recognition: a Paradigm Change in Motion

Instructor Bio

Ralf Schlüter serves as Academic Director and senior lecturer in the Computer Science Department at RWTH Aachen University, Germany. He leads the Automatic Speech Recognition Group at the Lehrstuhl Informatik 6: Human Language Technology and Pattern Recognition. He studied Physics at RWTH Aachen University, and Edinburgh University, UK, and received the Diplom degree in Physics, the Dr.rer.nat. degree in Computer Science, and completed his habilitation in Computer Science, all at RWTH Aachen University. His research interests cover automatic speech recognition and machine learning in general, discriminative training, neural network modeling, information theory, stochastic modeling, and speech signal analysis.

Ralf Schlüter, PhD

Academic Director at RWTH Aachen University

Workshop: From Research to Production: Performant Cross-Platform ML/DNN Model Inferencing on Cloud and Edge with ONNX Runtime

Powerful Machine Learning models trained using various frameworks such as scikit-learn, PyTorch, TensorFlow, Keras, and others can often be challenging to deploy, maintain, and performantly operationalize for latency-sensitive customer scenarios. Using the standard Open Neural Network Exchange (ONNX) model format and the open source cross-platform ONNX Runtime inference engine, these models can be scalably deployed to cloud solutions on Azure as well as local devices ranging from Windows, Mac, and Linux to various IoT hardware. Once converted to the interoperable ONNX format, the same model can be served using the cross-platform ONNX Runtime inference engine across a wide variety of technology stacks to provide maximum flexibility and reduce deployment friction…more details

Instructor Bio

Faith Xu is a Senior Program Manager at Microsoft on the Machine Learning Platform team, focusing on frameworks and tools. She leads efforts to enable efficient and performant productization of inferencing workflows for high volume Microsoft product and services though usage of ONNX and ONNX Runtime. She is an evangelist for adoption of the open source ONNX standard with community partners to promote an open ecosystem in AI.

Faith Xu

Senior Program Manager, Machine Learning at Microsoft

Workshop: From Research to Production: Performant Cross-Platform ML/DNN Model Inferencing on Cloud and Edge with ONNX Runtime

Powerful Machine Learning models trained using various frameworks such as scikit-learn, PyTorch, TensorFlow, Keras, and others can often be challenging to deploy, maintain, and performantly operationalize for latency-sensitive customer scenarios. Using the standard Open Neural Network Exchange (ONNX) model format and the open source cross-platform ONNX Runtime inference engine, these models can be scalably deployed to cloud solutions on Azure as well as local devices ranging from Windows, Mac, and Linux to various IoT hardware. Once converted to the interoperable ONNX format, the same model can be served using the cross-platform ONNX Runtime inference engine across a wide variety of technology stacks to provide maximum flexibility and reduce deployment friction…more details

Instructor Bio

Prabhat Roy is a Data and Applied Scientist at Microsoft, where he is a leading contributor to the scikit-learn to ONNX converter project (https://github.com/onnx/sklearn-onnx). In the past, he worked on ML.NET, an open source ML library for .NET developers, focusing on customer engagements for text and image classification problems.

Prabhat Roy

Data and Applied Scientist, AI Frameworks at Microsoft

Instructor Bio

Yuriy Guts is a Machine Learning Engineer at DataRobot with over 10 years of industry experience in data science and software architecture. His primary interests are productionalizing data science, automated machine learning, time series forecasting, and processing spoken and written language. He teaches AI and ML at UCU, competes on Kaggle, and has led multiple international data science and engineering teams.

Yuriy Guts

Machine Learning Engineer at DataRobot

Instructor Bio

Dr. Girmaw Abebe Tadesse is a Postdoctoral Researcher in Machine Learning for Healthcare at the Institute of Biomedical Engineering, University of Oxford. He primarily develops deep learning techniques to assist diagnosis of cardiovascular disease, cancer and infectious disease. He works in multiple projects with international collaborations including clinicians in China and Vietnam.

Girmaw completed his PhD at Queen Mary University of London under the Erasmus Mundus Joint Doctorate Program in Interactive and Cognitive Environments with UPC-BarcelonaTech. His PhD research focused on computer vision and machine learning algorithms for human activity recognition using wearable sensors.

Girmaw received his MSc degree in Telecommunications Engineering from Trento University, Italy in 2012, and the BSc degree in Electrical Engineering from Arba Minch University, Ethiopia in 2007. He has worked in various research groups across Europe, including the Technical Research Center for Dependency Care and Autonomous Living in Spain, KU Leuven in Belgium, and INESC-ID in Portugal.

Dr. Girmaw Abebe Tadesse

Postdoctoral Researcher, ML for Healthcare at the Institute of Biomedical Engineering, University of Oxford

Instructor Bio

Dr. Girmaw Abebe Tadesse is a Postdoctoral Researcher in Machine Learning for Healthcare at the Institute of Biomedical Engineering, University of Oxford. He primarily develops deep learning techniques to assist diagnosis of cardiovascular disease, cancer and infectious disease. He works in multiple projects with international collaborations including clinicians in China and Vietnam.

Girmaw completed his PhD at Queen Mary University of London under the Erasmus Mundus Joint Doctorate Program in Interactive and Cognitive Environments with UPC-BarcelonaTech. His PhD research focused on computer vision and machine learning algorithms for human activity recognition using wearable sensors.

Girmaw received his MSc degree in Telecommunications Engineering from Trento University, Italy in 2012, and the BSc degree in Electrical Engineering from Arba Minch University, Ethiopia in 2007. He has worked in various research groups across Europe, including the Technical Research Center for Dependency Care and Autonomous Living in Spain, KU Leuven in Belgium, and INESC-ID in Portugal.

Giorgio Patrini

CEO, Chief Scientist, Co-Founder at Deeptrace

Instructor Bio

Joaquin Vanschoren is Assistant Professor in Machine Learning at the Eindhoven University of Technology. His research focuses on machine learning, meta-learning, and understanding and automating learning. He founded and leads OpenML.org, an open science platform for machine learning. He received several demo and open data awards, has been tutorial speaker at NeurIPS and ECMLPKDD, and invited speaker at ECDA, StatComp, AutoML@ICML, CiML@NIPS, DEEM@SIGMOD, AutoML@PRICAI, MLOSS@NIPS, and many other occasions. He was general chair at LION 2016, program chair of Discovery Science 2018, demo chair at ECMLPKDD 2013, and he co-organizes the AutoML and meta-learning workshop series at NIPS and ICML. He is also co-editor of the book ’Automatic Machine Learning: Methods, Systems, Challenges’.

Joaquin Vanschoren

Assistant Professor of Machine Learning at the Eindhoven University of Technology

Workshop: Choosing The Right Deep Learning Framework: A Deep Learning Approach

As a developer advocate for IBM, I use machine learning in order to understand machine learning developers. I spend time building models to identify the problems they solve and the tools that they use to do so. This talk will present how deep learning can be used to predict the deep learning framework that most closely resembles the style in which a machine learning developer programs. Many of these frameworks are very new (for instance TensorFlow is in the process of putting out just its second major release). As the field of deep learning and the frameworks enabling them continue to rapidly change, the community using a particular one will be much more consistent. This talk will end by allowing developers to test them models themselves by uploading examples of their work via Jupyter Notebooks and predict the deep learning community and framework that is right for them.

Instructor Bio

Coming Soon!

Nick Acosta

Developer Advocate at IBM

Workshop: How to Make Machine Learning Fair and Accountable

The suitability of Machine Learning models are traditionally measured on its accuracy. Metrics like RMSE, MAPE, AUC, ROC, Gini etc largely decide the ‘fate’ of Machine Learning models. However, if one digs deeper the ‘fate’ of Machine Learning models going beyond a few accuracies driven metrics to its capability of being Fair, Accountable, Transparent and Explainable a.k.a FATE.
Machine Learning, as the name implies, learns whatever it is taught. It’s a ramification of what it is fed. It’s a fallacy that ML don’t have perspective, it has the same perspective that the data has which was used to make it learn what its preaches today. In simple words, algorithms can echo prejudices that data explicitly or implicitly has…more details

Instructor Bio

Sray is currently working for Publicis Sapient as a Data Scientist and is based out of London. His expertise lies in Predictive Modelling, Forecasting and advanced Machine Learning. He possesses a deep understanding of algorithms and advanced statistics. He has a background in management, economic and has done a master equivalent program in Data Science and Analytics. His current area of interest are Fair and Explainable ML.

Sray Agarwal

Specialist – Data Science at Publicis Sapient

Workshop: Machine Learning Interpretability Toolkit

With the recent popularity of machine learning algorithms such as neural networks and ensemble methods, etc., machine learning models become more like a ‘black box’, harder to understand and interpret. To gain the end user’s trust, there is a strong need to develop tools and methodologies to help the user to understand and explain how predictions are made. Data scientists also need to have the necessary insights to learn how the model can be improved. Much research has gone into model interpretability and recently several open sources tools, including LIME, SHAP, and GAMs, etc., have been published on GitHub. In this talk, we present Microsoft’s brand new Machine Learning Interpretability toolkit which incorporates the cutting-edge technologies developed by Microsoft and leverages proven third-party libraries. It creates a common API and data structure across the integrated libraries and integrates Azure Machine Learning services. Using this toolkit, data scientists can explain machine learning models using state-of-art technologies in an easy-to-use and scalable fashion at training and inferencing time.

Instructor Bio

Mehrnoosh Sameki is a technical program manager at Microsoft responsible for leading the product efforts on machine learning transparency within the Azure Machine Learning platform. Prior to Microsoft, she was a data scientist in an eCommerce company, Rue Gilt Groupe, incorporating data science and machine learning in retail space to drive revenue and enhance personalized shopping experiences of customers and prior to that, she completed a PhD degree in computer science at Boston University. In her spare time, she enjoys trying new food recipes, watching classic movies and documentaries, and reading about interior design and house decoration.

Mehrnoosh Sameki, PhD

Technical Program Manager at Microsoft

Tutorial Sessions

More sessions added weekly

Instructor Bio

Olga Isupova received the Specialist (eq. to M.Sc.) degree in applied mathematics and computer science in 2012 from Lomonosov Moscow State University, Moscow, Russia, and the Ph.D. degree in 2017 from the University of Sheffield, Sheffield, U.K. She is a Research Assistant in machine learning with the Department of Engineering Science, University of Oxford, Oxford, U.K. Her research interests include machine learning for disaster response and environment protection, Bayesian nonparametrics, and anomaly detection.

Olga Isupova, PhD

Principal Researcher at University of Oxford, Machine Learning Research Group

Tutorial: Introduction to Data Science

Data science is fast becoming a richer and more varied working field, ranging from research and development al the way through to commercial applications in a wide range of settings and industries. Data science is far from being a single, well-defined discipline, and insteadit is a combination of different specialisations and practices. For newcomers to the field navigating this vast ecosystem may be a daunting task and as the dust settles, getting some perspective may help understand the challenges and opportunities in the field.

In this talk we will cover some of the ingredients that make data science an exciting area of work within a business context. There is no doubt that the variety of positions that require data science skills is ever growing and evolving. This means that the availability of roles has expanded, making the field more diverse. This in turn has implications in both the talent that companies are looking for as well as in the activities that data scientists can get involved with…more details

Instructor Bio

Jesús (aka J.) is a lead data scientist with a strong background and interest in insight, based on simulation, hypothesis generation and predictive analytics. He has substantial practical experience with statistical analysis, machine learning and optimisation tools in product development and innovation, finance, media and others.

Jesús is Data Science Instructor in General Assembly. He has a background in physics and held positions both in academia and industry, including Imperial College, IBM Data Science Studio, Prudential and Dow Jones to name a few. Jesús is the author of “Essential MATLAB and Octave,” a book for students in physics, engineering, and other disciplines. He also authored the upcoming data science book entitled “Data Science and Analytics with Python.

Jesus Rogel-Salazar, PhD

Independent Data Scientist and Consultant

Tutorial on Provenance

A decade of research on provenance, a standardisation of provenance at the World Wide Web Consortium (PROV), and applications, toolkits and services adopting provenance have led to the recognition that provenance is a critical facet of good data governance for businesses, governments and organisations in general. Provenance, which is defined as a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing, is now regarded as an essential function of data-intensive applications, to provide a trusted account of what they performed. In this tutorial, I will explain the motivation for provenance, introduce the standard PROV, present some applications of provenance, and review some tools for provenance. We will aim to conclude with a practical session on provenance modelling.

Instructor Bio

Luc Moreau is a Professor of Computer Science and Head of the department of Informatics, at King’s College London. Before joining King’s, Luc was Head of the Web and Internet Science, in the department of Electronics and Computer Science, at the University of Southampton.

Luc was co-chair of the W3C Provenance Working Group, which resulted in four W3C Recommendations and nine W3C Notes, specifying PROV, a conceptual data model for provenance the Web, and its serializations in various Web languages. Previously, he initiated the successful Provenance Challenge series, which saw the involvement of over 20 institutions investigating provenance inter-operability in 3 successive challenges, and which resulted in the specification of the community Open Provenance Model (OPM). Before that, he led the development of provenance technology in the FP6 Provenance project and the Provenance Aware Service Oriented Architecture (PASOA) project. [For further details, please see https://nms.kcl.ac.uk/luc.moreau/about.html]

He is on the editorial board of “PeerJ Computer Science” and previously he was editor-in-chief of the journal “Concurrency and Computation: Practice and Experience” and on the editorial board of “ACM Transactions on Internet Technology”.

Luc Moreau, PhD

Professor & Head of Informatics at King’s College

Tutorial: Bringing Data to the Masses Through Visualisation

Data visualisation is invaluable in helping data scientists and researchers identify patterns and trends. But it also has a crucial role in communicating meaning to non-expert audiences.

Among the wider public, mistrust of data and experts is worryingly widespread, and in some areas growing. Around emotive issues such as vaccinations and climate change, robust data science can lose the argument against well-meaning public ignorance, or active campaigns of misinformation.

More generally, mistrust or misunderstanding of data science can hinder the ability of any organisation to sell the benefits of the discipline to a wider audience – to secure funding, sell products, drive engagement in open data projects, or sway public opinion to change behaviour…more details

Instructor Bio

Alan Rutter is the founder of consultancy Fire Plus Algebra, and is a specialist in communicating complex subjects through data visualisation, writing and design. He has worked as a journalist, product owner and trainer for brands and organisations including Guardian Masterclasses, WIRED, Time Out,the Home Office, the Biotechnology and Biological Sciences Research Council and Liverpool School of Tropical Medicine.

Alan Rutter

Founder at Fire Plus Algebra

Tutorial: Sequence Modelling with Deep Learning

Much of data is sequential – think speech, text, DNA, stock prices, financial transactions and customer action histories. Modern methods for modelling sequence data are often deep learning-based, composed of either recurrent neural networks (RNNs) or the attention-based transformer. A tremendous amount of research progress has recently been made in sequence modelling, particularly in the application to NLP problems. However, the inner workings of these sequence models can be difficult to dissect and intuitively understand.

This presentation/tutorial will start from the basics and gradually build upon concepts in order to impart an understanding of the inner mechanics of sequence models – why do we need specific architectures for sequences at all, when you could use standard feed-forward networks?..more details

Natasha Latysheva

Machine learning research engineer at Welocalize

Tutorial: Integrating Real-Time Video Analysis with Clinical Data to Enable Digital Diagnostics

The rapid digitization of healthcare has accelerated the adoption of artificial intelligence for clinical applications. A major opportunity for predictive clinical analytics is the ability to provide faster diagnoses and treatment plans tailored to individuals. Applications have been developed for multiple care settings, with many novel point-of-care diagnostics that can assess for diseases ranging from malaria to skin cancer. In this talk, we will demonstrate how to apply machine learning algorithms to identify regions of interest for detection of the pupil for point-of-care drug screening…more details

Instructor Bio

Dr. Schulz is an Assistant Professor of Laboratory Medicine and computational health care researcher at Yale School of Medicine. He received a PhD in Microbiology, Immunology, and Cancer Biology and an MD from the University of Minnesota. He is the Director of Informatics for the Department of Laboratory Medicine, Director of the CORE Center for Computational Health, and Medical Director of Data Science for Yale New Haven Health System. Dr. Schulz has over 20 years’ experience in software development with a focus on enterprise system architecture and has a research interests in the management of large, biomedical data sets and the use of real-world data for predictive modeling. At Yale, he has led the implementation of a distributed data analysis and predictive modeling platform, for which he received the Data Summit IBM Cognitive Honors award. Other projects within his research group include computational phenotyping and the development of clinical prescriptive models for precision medicine initiatives. His clinical areas of expertise include molecular diagnostics and transfusion medicine, where he has ongoing work assessing the use, safety, and efficacy of pathogen-reduced blood products.

Wade Schulz, MD, PhD

Assistant Professor and Director of Computational Health at Yale School of Medicine

Tutorial: The Perks of On Board Deep Learning: Train, Deploy and Use Neural Nets on a Raspberry Pi

Machine learning applications are new in the software development landscape, and tend to be hard to build. As Google noted in an article (source : https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf), it is mainly because the application is much broader than the model itself. Surprisingly though, Machine Learning applications follow a double Pareto’s law. On the one hand, 80% of the time spent on building those applications deals with machine learning problems whereas 20% of the remaining time is spent on integrating the model to a running application. On the other hand, only 20% of the code lines are specific to machine learning ; the vast rest is about integration and run.
I would like to first explore the foundations of this trend, to then show why it kills machine learning application development and sustainability.
In order to illustrate the tips and tricks of shipping a deep learning model to production, I would use a live demo of a model designed to recognised car drawings via the camera of a Raspberry Pi. It would allow me to. Furthermore, by identifying the main stages of the application life cycle (training, deploying and using), I will lay the emphasis on the common mistakes one does not bear in mind to make a successful machine learning product.

Instructor Bio

Constant is strongly interested in the creation of value out of data and helps those who believe in such a potential by accelerating their transition toward a data driven company. In order to address these new problematics, he focuses on mastering every skill of a complete Data Geek : architecture expertise (data, applications, network), data science mastering (statistical learning, data visualisation, algorithmic theory), customer and business understanding (model prediction consumption, business metrics, customer needs).

Constant has been working for about two years for OCTO Technology. He is an an expert in the industry sector and works on several types of mission, ranging from predictive maintenance of production site, to prediction of critical KPIs in video games, via real time monitoring of manufacturing devices. Prior to joining OCTO, Constant was working as a researcher in data61 (formerly known as NICTA), the best research institute in ICT in Australia on applying Machine Learning to profile GUI users and provide the best amount of information to help them make a decision based on a machine learning prediction

Constant Bridon

Data Science Consultant at OCTO Technology

ODSC EUROPE 2019 | November 19th – 22nd

Register Now

What To Expect

As we prepare our 2019 schedule, take a look at some of the previous training and workshops we have hosted at ODSC Europe for an idea of what to expect.

  • High Performance, Distributed Spark ML, Tensorflow AI, and GPU

  • Machine Learning with R

  • Deep Learning with Tensorflow for Absolute Beginners

  • Algorithmic Trading with Machine and Deep Learning

  • Deep Learning in Keras

  • Deep Learning – Beyond the Basics

  • Running Intelligent Applications inside a Database: Deep Learning with Python Stored Procedures in SQL

  • Distributed Deep Learning on Hops

  • R and Spark with Sparklyr

  • Towards Biologically Plausible Deep Learning

  • Deep Learning Ensembles in Toupee

  • A Gentle Introduction to Predictive Analytics with R

  • Deep Learning with Tensorflow for Absolute Beginners

  • Graph Data – Modelling and Quering with Neo4j and Cypher

  • Introduction to Data Science with R

  • Introduction to Python in Data Science

  • High Performance, Distributed Spark ML, TensorFlow AI, and GPU

  • The Magic of Dimensionality Reduction

  • Analyze Data, Build a UI and Deploy on the Cloud with Apache Spark, Notebooks and PixieDust

  • Data Science for Executives

  • Data Science Learnathon. From Raw Data to Deployment: the Data Science Cycle with KNIME

  • Distributed Deep Learning on Hops

  • Drug Discovery with KNIME

  • Interactive Visualisation with R (and just R)

  • Introduction to Algorithmic Trading

  • Introduction to Data Science – A Practical Viewpoint

  • Julia for Data Scientists

  • Running Intelligent Applications inside a Database: Deep Learning with Python Stored Procedures in SQL

  • Telling Stories with Data

  • Towards Biologically Plausible Deep Learning

  • Win Kaggle Competitions Using StackNet Meta Modelling Framework