Training & Workshop Sessions

– Taught by World-Class Data Scientists –

Learn the latest data science concepts, tools, and techniques from the best. Forge a connection with these rock stars from industry and academia, who are passionate about molding the next generation of data scientists.

Highly Experienced Instructors

Our instructors are highly regarded in data science, coming from both academia and renowned companies.

Real World Applications

Gain the skills and knowledge to use data science in your career and business, without breaking the bank.

Cutting Edge Subject Matter

Find training sessions offered on a wide variety of data science topics from machine learning to data visualization to DevOps.

Training, Workshops & Tutorials Sessions

ODSC Europe 2019 will host training and workshop session on some of the latest and in-demand technique, models and frameworks including:

Training Focus Areas

  • Deep Learning and Reinforcement Learning

  • Machine Learning, Transfer Learning and Adversarial Learning 

  • Computer Vision

  • NLP, Speech, Text Analytics and Spatial Analysis

  • Data Visualization

Quick Facts

  • Choose from 25 Training sessions

  • Choose from 30+ workshops

  • Hands-on training session are 4 hours in duration

  • Workshops and tutorial are 2 hours in duration

Frameworks

  • TensorFlow, PyTorch, and MXNet

  • Scikit-learn, PyMC3, Pandas, Theano, NLTK, NumPy, SciPy

  • Kera, Apache Spark, Apache Storm, Airflow, Apache Kafka

  • Kubernetes, Kubeflow, Apache Ignite, Hadoop

Europe 2019 Confirmed Instructors

Training Sessions

More sessions added weekly

Training: R for Python Programmers

Should a data scientist use R or should they use Python? The answer to this rather delicate question is, of course, they should know a bit each and use the most appropriate language for the task in hand. In this tutorial, we’ll take participants through the R tidyverse, one of the (many!) areas that R shines. The tidyverse is essential for any data scientist who deals with data on a day-to-day basis. By focusing on small key tasks, the tidyverse suite of packages removes the pain of data manipulation. We’ll cover some of the core features of the tidyverse, such as dplyr (the workhorse of the tidyverse), string manipulation, graphics and the concept of tidy data…more details

Instructor Bio

Dr. Colin Gillespie is Senior lecturer (Associate Professor) at Newcastle University, UK. His research interests are high performance statistical computing and Bayesian statistics. He is regularly employed as a consultant by Jumping Rivers and has been teaching R since 2005 at a variety of levels, ranging from beginners to advanced programming.

Dr. Colin Gillespie

Senior Lecturer at Newcastle University

Training: Deep Reinforcement Learning

Reinforcement Learning recently progressed greatly in industry as one of the best techniques for sequential decision making and control policies.

DeepMind used RL to greatly reduce energy consumption in Google’s data centre. It has being used to do text summarisation, autonomous driving, dialog systems, media advertisements and in finance by JPMorgan Chase. We are at the very beginning of the adoption of these algorithms as systems are required to operate more and more autonomously.
In this workshop we will explore Reinforcement Learning, starting from its fundamentals and ending creating our own algorithms…more details

Instructor Bio

Leonardo De Marchi holds a Master in Artificial intelligence and has worked as a Data Scientist in the sport world, with clients such as the New York Knicks and Manchester United, and with large social networks, like Justgiving.

He now works as Head of Data Scientist and Analytics in Badoo, the largest dating site with over 420 million users. He is also the lead instructor at ideai.io, a company specialized in Reinforcement Learning, Deep Learning and Machine Learning training.

Leonardo De Marchi

Head of Data Science and Analytics at Badoo

Training: Advanced Machine Learning with Scikit-learn, Part I

Scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners. This training will cover some of the more advanced aspects of scikit-learn, such as building complex machine learning pipelines, advanced model evaluation, feature engineering and working with imbalanced datasets. We will also work with text data using the bag-of-word method for classification.

This workshop assumes familiarity with Jupyter notebooks and basics of pandas, matplotlib and numpy. It also assumes some familiarity with the API of scikit-learn and how to do cross-validations and grid-search with scikit-learn.

Instructor Bio

Andreas Mueller received his MS degree in Mathematics (Dipl.-Math.) in 2008 from the Department of Mathematics at the University of Bonn. In 2013, he finalized his PhD thesis at the Institute for Computer Science at the University of Bonn. After working as a machine learning scientist at the Amazon Development Center Germany in Berlin for a year, he joined the Center for Data Science at the New York University in the end of 2014. In his current position as assistant research engineer at the Center for Data Science, he works on open source tools for machine learning and data science. He is one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, for several years, and has authored and contributed to a number of open source projects related to machine learning.

Andreas Mueller, PhD

Author, Research Scientist, Core Contributor of scikit-learn at Columbia Data Science Institute

Training: Advanced Machine Learning with Scikit-learn, Part II

Scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners. This training will cover some advanced topics in using scikit-learn, such as how to perform out-of-core learning with scikit-learn and how to speed up parameter search. We’ll also cover how to build your own models or feature extraction methods that are compatible with scikit-learn, which is important for feature extraction in many domains. We will see how we can customize scikit-learn even further, using custom methods for cross-validation or model evaluation.

This workshop assumes familiarity with Jupyter notebooks and basics of pandas, matplotlib and numpy. It also assumes experience using scikit-learn and familiarity with the API.

Instructor Bio

Andreas Mueller received his MS degree in Mathematics (Dipl.-Math.) in 2008 from the Department of Mathematics at the University of Bonn. In 2013, he finalized his PhD thesis at the Institute for Computer Science at the University of Bonn. After working as a machine learning scientist at the Amazon Development Center Germany in Berlin for a year, he joined the Center for Data Science at the New York University in the end of 2014. In his current position as assistant research engineer at the Center for Data Science, he works on open source tools for machine learning and data science. He is one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, for several years, and has authored and contributed to a number of open source projects related to machine learning.

Andreas Mueller, PhD

Author, Research Scientist, Core Contributor of scikit-learn at Columbia Data Science Institute

Training: Building Recommendation Engines and Deep Learning Models Using Python, R and SAS

Deep learning is the newest area of machine learning and has become ubiquitous in predictive modeling. The complex, brainlike structure of deep learning models is used to find intricate patterns in large volumes of data. These models have heavily improved the performance of general supervised models, time series, speech recognition, object detection and classification, and sentiment analysis.

Factorization machines are a relatively new and powerful tool for modeling high-dimensional and sparse data. Most commonly they are used as recommender systems by modeling the relationship between users and items. For example, factorization machines can be used to recommend your next Netflix binge based on how you and other streamers rate content…more details

Instructor's Bio

Jordan Bakerman holds a Ph.D. in statistics from North Carolina State University. His dissertation centered on using social media to forecast real world events, such as civil unrest and influenza rates. As an intern at SAS, Jordan wrote the SAS Programming for R Users course for students to efficiently transition from the R to SAS using a cookbook style approach. As an employee, Jordan has developed courses demonstrating how to integrate open source software within SAS products. He is passionate about statistics, programming, and helping others become better statisticians.

Jordan Bakerman, PhD

Sr. Analytical Training Consultant at SAS

Training: Building Recommendation Engines and Deep Learning Models Using Python, R and SAS

Deep learning is the newest area of machine learning and has become ubiquitous in predictive modeling. The complex, brainlike structure of deep learning models is used to find intricate patterns in large volumes of data. These models have heavily improved the performance of general supervised models, time series, speech recognition, object detection and classification, and sentiment analysis.

Factorization machines are a relatively new and powerful tool for modeling high-dimensional and sparse data. Most commonly they are used as recommender systems by modeling the relationship between users and items. For example, factorization machines can be used to recommend your next Netflix binge based on how you and other streamers rate content…more details

Instructor's Bio

Grégoire de Lassence is a Sr Analytical Training Consultant at SAS, he supports in the architecture, administration and implementation of Business Analytics Platforms. Grégoire is also Pedagogy & Research Manager in the Academic Department at SAS France since 2002. He teaches mainly Big Data, Machine Learning, Text Mining and AI topics at Master level in Universities such as Paris Dauphine, Panthéon-Sorbonne, Assas, Paris 12, 13, Toulouse, Aix- Marseille, Bretagne Sud, Clermont-Ferrand. He is also a lecturer at Grandes Ecoles such as ESSEC Centrale Supélec, INSEAD, HEC, SKEMA, Toulouse Business School, Telecom schools.

Grégoire De Lassence

Data Scientist & Associate Professor at SAS

Training: All The Cool Things You Can Do With Postgresql To Next Level Your Data Analysis

The intention of this VERY hands on workshop is to get you introduced and playing with some of the great features you never knew about in PostgreSQL. You know, and probably already love, PostgreSQL as your relational database. We will show you how you can forget about using ElasticSearch, MongoDB, and Redis for a broad array of use cases. We will add in some nice statistical work with R embedded in PostgreSQL. Finally we will bring this all together using the gold standard in spatial databases, PostGIS. Unless you have a specialized use case, PostgreSQL is the answer. The session will be very hands on with plenty of interactive exercises.

By the end of the workshop participants will leave with hands on experience doing:
Spatial Analysis
JSON search
Full Text Search
Using R for stored procedures and functions
All in PostgreSQL

Instructor Bio

Steve is a Partner, and Director of Developer Relations for Crunchy Data (PostgreSQL people). He goes around and shows off all the great work the PostgreSQL community and Crunchy Committers do. He can teach you about Data Analysis with PostgreSQL, Java, Python, MongoDB, and some JavaScript. He has deep subject area expertise in GIS/Spatial, Statistics, and Ecology. He has spoken at over 75 conferences and done over 50 workshops including Monktoberfest, MongoNY, JavaOne, FOSS4G, CTIA, AjaxWorld, GeoWeb, Where2.0, and OSCON. Before Crunchy Data, Steve was a developer evangelist for DigitalGlobe, Red Hat, LinkedIn, deCarta, and ESRI. Steve has a Ph.D. in Ecology.

Steven Pousty, PhD

Director of Developer Relations at Crunchy Data

Training: Machine Learning in R, Part I

Modern statistics has become almost synonymous with machine learning, a collection of techniques that utilize today’s incredible computing power. This two-part course focuses on the available methods for implementing machine learning algorithms in R, and will examine some of the underlying theory behind the curtain. We start with the foundation of it all, the linear model and its generalization, the glm. We look how to assess model quality with traditional measures and cross-validation and visualize models with coefficient plots. Next we turn to penalized regression with the Elastic Net. After that we turn to Boosted Decision Trees utilizing xgboost. Attendees should have a good understanding of linear models and classification and should have R and RStudio installed, along with the `glmnet`, `xgboost`, `boot`, `ggplot2`, `UsingR` and `coefplot` packages...more details

Instructor Bio

Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.
He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.

Jared Lander

Chief Data Scientist, Author of R for Everyone, Professor at Lander Analytics, Columbia Business School

Training: Machine Learning in R, Part II

Modern statistics has become almost synonymous with machine learning, a collection of techniques that utilize today’s incredible computing power. This two-part course focuses on the available methods for implementing machine learning algorithms in R, and will examine some of the underlying theory behind the curtain. We start with the foundation of it all, the linear model and its generalization, the glm. We look how to assess model quality with traditional measures and cross-validation and visualize models with coefficient plots. Next we turn to penalized regression with the Elastic Net. After that we turn to Boosted Decision Trees utilizing xgboost. Attendees should have a good understanding of linear models and classification and should have R and RStudio installed, along with the `glmnet`, `xgboost`, `boot`, `ggplot2`, `UsingR` and `coefplot` packages...more details

Instructor Bio

Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.
He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.

Jared Lander

Chief Data Scientist, Author of R for Everyone, Professor at Lander Analytics, Columbia Business School

Training: Introduction to Machine Learning

Machine learning has become an indispensable tool across many areas of research and commercial applications. From text-to-speech for your phone to detecting the Higgs boson, machine learning excells at extracting knowledge from large amounts of data. This talk will give a general introduction to machine learning, as well as introduce practical tools for you to apply machine learning in your research. We will focus on one particularly important subfield of machine learning, supervised learning. The goal of supervised learning is to “”learn”” a function that maps inputs x to an output y, by using a collection of training data consisting of input-output pairs. We will walk through formalizing a problem as a supervised machine learning problem, creating the necessary training data and applying and evaluating a machine learning algorithm. The talk should give you all the necessary background to start using machine learning yourself.

Instructor Bio

Andreas Mueller received his MS degree in Mathematics (Dipl.-Math.) in 2008 from the Department of Mathematics at the University of Bonn. In 2013, he finalized his PhD thesis at the Institute for Computer Science at the University of Bonn. After working as a machine learning scientist at the Amazon Development Center Germany in Berlin for a year, he joined the Center for Data Science at the New York University in the end of 2014. In his current position as assistant research engineer at the Center for Data Science, he works on open source tools for machine learning and data science. He is one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, for several years, and has authored and contributed to a number of open source projects related to machine learning.

Andreas Mueller, PhD

Author, Research Scientist, Core Contributor of scikit-learn at Columbia Data Science Institute

Training: TFX: Production ML Pipelines with TensorFlow

Putting machine learning models into production is now mission critical for every business – no matter what size.

TensorFlow is the industry-leading platform for developing, modeling, and serving deep learning solutions. But putting together a complete pipeline for deploying and maintaining a production application of AI and deep learning is much more than training a model. Google has taken years of experience in developing production ML pipelines and offered the open source community TensorFlow Extended (TFX), an open source version of tools and libraries that Google uses internally…more details

Instructor Bio

Alex Davies works on the Sciences team at DeepMind, and previously worked in the applied machine learning team in Google, supporting product teams in their use of machine learning, TensorFlow and TFX.

Alex Davies

Machine Learning at DeepMind

Training: TFX: Production ML Pipelines with TensorFlow

Putting machine learning models into production is now mission critical for every business – no matter what size.

TensorFlow is the industry-leading platform for developing, modeling, and serving deep learning solutions. But putting together a complete pipeline for deploying and maintaining a production application of AI and deep learning is much more than training a model. Google has taken years of experience in developing production ML pipelines and offered the open source community TensorFlow Extended (TFX), an open source version of tools and libraries that Google uses internally.

Learn what’s involved in creating a production pipeline, and walk through working code in an example pipeline with experts from Google. You’ll be able to take what you learn and get started on creating your own pipelines for your applications.

Instructor Bio

Jarek is the Technical Program Manager at Google Research responsible for TensorFlow Extended (tensorflow.org/tfx). TFX is the Google-production-scale end-to-end machine learning platform based on TensorFlow. Jarek joined Google in 2010, has a MS in Software Management from Carnegie Mellon University and BS Computer Science with concentration on AI from The University of Memphis. Prior to joining Google, he worked on building telecommunications systems software at Hewlett Packard, BEA Systems, Alcatel and a couple of startups.

Jarek Wilkiewicz

Technical Program Manager at Google

Training: Opening The Black Box -- Interpretability In Deep Learning

The recent application of deep neural networks to long-standing problems has brought a break-through in performance and prediction power. However, high accuracy often comes at the price of loss of interpretability, i.e. many of these models are black-boxes that fail to provide explanations on their predictions. This tutorial focuses on illustrating some of the recent advancements in the field of interpretable artificial intelligence. We will show some common techniques that can be used to explain predictions on pretrained models and that can be used to shed light on their inner mechanisms. The tutorial is aimed to strike the right balance between theoretical input and practical exercises. The tutorial has been designed to provide the participants not only with the theory behind deep learning interpretability, but also to offer a set of frameworks, tools and real-life examples that they can implement in their own projects.

Instructor Bio

Matteo is a Research Staff Member in Cognitive Health Care and Life Sciences at IBM Research Zürich. He’s currently working on the development of multimodal deep learning models for drug discovery using chemical features and omic data. He also researches in multimodal learning techniques for the analysis of pediatric cancers in a H2020 EU project, iPC, with the aim of creating treatment models for patients. He received his degree in Mathematical Engineering from Politecnico di Milano in 2013. After getting his MSc he worked in a startup, Moxoff spa, as a software engineer and analyst for scientific computing. In 2019 he obtained his doctoral degree at the end of a joint PhD program between IBM Research and the Institute of Molecular Systems Biology, ETH Zürich, with a thesis on multimodal learning approaches for precision medicine.

Matteo Manica, PhD

Research Staff Member at Cognitive Health Care & Life Sciences, IBM Research Zürich

Training: Introduction to RMarkdown in Shiny

Markdown Primer (45 minutes)Structure Documents with Sections and SubsectionsFormatting TextCreating Ordered and Unordered ListsMaking LinksNumber SectionsInclude Table of Contents
Integrate R Code (30 minutes)Insert Code ChunksHide CodeSet Chunk OptionsDraw PlotsSpeed Up Code with Caching
Build RMarkdown Slideshows (20 minutes)Understand Slide StructureCreate SectionsSet Background ImagesInclude Speaker NotesOpen Slides in Speaker Mode
Develop Flexdashboards (30 minutes)Start with the Flexdashboard LayoutDesign Columns and RowsUse Multiple PagesCreate Social SharingInclude Code
Shiny InputsDrop DownsTextRadioChecks
Shiny OutputsTextTablesPlots
Reactive Expressions
HTML WidgetsInteractive PlotsInteractive MapsInteractive Tables
Shiny Layouts UI and Server Files User Interface

Instructor Bio

Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.
He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.

Jared Lander

Chief Data Scientist, Author of R for Everyone, Professor at Lander Analytics, Columbia Business School

Training: Intermediate RMarkdown in Shiny

Markdown Primer (45 minutes)Structure Documents with Sections and SubsectionsFormatting TextCreating Ordered and Unordered ListsMaking LinksNumber SectionsInclude Table of Contents
Integrate R Code (30 minutes)Insert Code ChunksHide CodeSet Chunk OptionsDraw PlotsSpeed Up Code with Caching
Build RMarkdown Slideshows (20 minutes)Understand Slide StructureCreate SectionsSet Background ImagesInclude Speaker NotesOpen Slides in Speaker Mode
Develop Flexdashboards (30 minutes)Start with the Flexdashboard LayoutDesign Columns and RowsUse Multiple PagesCreate Social SharingInclude Code
Shiny InputsDrop DownsTextRadioChecks
Shiny OutputsTextTablesPlots
Reactive Expressions
HTML WidgetsInteractive PlotsInteractive MapsInteractive Tables
Shiny Layouts UI and Server Files User Interface

Instructor Bio

Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.
He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.

Jared Lander

Chief Data Scientist, Author of R for Everyone, Professor at Lander Analytics, Columbia Business School

Training: Intermediate Machine Learning with Scikit-learn

Scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners. This talk will cover some of the more advanced aspects of scikit-learn, such as building complex machine learning pipelines, model evaluation, parameter search, and out-of-core learning. Apart from metrics for model evaluation, we will cover how to evaluate model complexity, and how to tune parameters with grid search, randomized parameter search, and what their trade-offs are. We will also cover out of core text feature processing via feature hashing.

Instructor Bio

Andreas Mueller received his MS degree in Mathematics (Dipl.-Math.) in 2008 from the Department of Mathematics at the University of Bonn. In 2013, he finalized his PhD thesis at the Institute for Computer Science at the University of Bonn. After working as a machine learning scientist at the Amazon Development Center Germany in Berlin for a year, he joined the Center for Data Science at the New York University in the end of 2014. In his current position as assistant research engineer at the Center for Data Science, he works on open source tools for machine learning and data science. He is one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, for several years, and has authored and contributed to a number of open source projects related to machine learning.

Andreas Mueller, PhD

Author, Research Scientist, Core Contributor of scikit-learn at Columbia Data Science Institute

Training: Reinforcement Learning with TF-Agents & TensorFlow 2.0: Hands On

In this workshop you will discover how machines can learn complex behaviors and anticipatory actions. Using this approach autonomous helicopters fly aerobatic maneuvers and even the GO world champion was beaten with it. A training dataset containing the “right” answers is not needed, nor is “hard-coded” knowledge. The approach is called “reinforcement learning” and is almost magical.

Using TF-Agents on top of TensorFlow 2.0 we will see how a real-life problem can be turned into a reinforcement learning task. In an accompanying Python notebook, we implement – step by step – all solution elements, highlight the design of Google’s newest reinforcement learning library, point out the role of neural networks and look at optimization opportunities…more details

Instructor Bio

Oliver Zeigermann is a developer and consultant from Hamburg, Germany. He has been involved with AI since his studies in the 90s and has written several books and has recently published the “Deep Learning Crash Course” with Manning. More on http://zeigermann.eu/

Oliver Zeigermann

Consultant at embarc / bSquare

Training: Reinforcement Learning with TF-Agents & TensorFlow 2.0: Hands On

In this workshop you will discover how machines can learn complex behaviors and anticipatory actions. Using this approach autonomous helicopters fly aerobatic maneuvers and even the GO world champion was beaten with it. A training dataset containing the “right” answers is not needed, nor is “hard-coded” knowledge. The approach is called “reinforcement learning” and is almost magical.

Using TF-Agents on top of TensorFlow 2.0 we will see how a real-life problem can be turned into a reinforcement learning task. In an accompanying Python notebook, we implement – step by step – all solution elements, highlight the design of Google’s newest reinforcement learning library, point out the role of neural networks and look at optimization opportunities…more details

Instructor Bio

Christian is a consultant at bSquare with a focus on machine learning & .net development. He has a PhD in computer algebra from ETH Zurich and did a postdoc at UC Berkeley where he researched online data mining algorithms. Currently he applies reinforcement learning to industrial hydraulics simulations.

Christian Hidber, PhD

Consultant at bSquare

Training: Latest Deep Learning Models for Natural Language Processing

Big changes are underway in the world of Natural Language Processing (NLP).
The long reign of word embeddings as NLP’s core representation technique has been challenged recently by an exciting new line of pre-trained deep learning models: among them, ELMo, ULMFiT, and the OpenAI transformer. 
These research works made headlines by demonstrating that pretrained language models can be used to achieve state-of-the-art results on a wide range of NLP tasks, such as Natural Language Inference (NLI), Natural Language Understanding (NLU), Machine Translation, Question Answering or Language Generation. 
These new methods gave birth to the notion of Transfer Learning in NLP and may have the same wide-ranging impact on the field as pre-trained ImageNet models had on computer vision.
This workshop gives an overview of these SOTA deep learning models for NLP, and offers tutorials to implement them in TensorFlow 2.0.

Instructor Bio

Alice Martin holds a MsC in Applied Mathematics and Financial Engineering from one of the French top-league Engineering School.
She is currently working in the lab of Applied Mathematics of Ecole Polytechnique (Paris, France) as a Machine Learning Engineer and a PhD Candidate in Machine Learning.
Her research work focuses on new perspectives on deep reinforcement learning applied to natural language processing, with application to goal-oriented dialogue systems with visual context. She is also a teacher in the fields of reinforcement learning, deep learning and unsupervised learning; she has taught classes in Berlin (Data Science Retreat) and Morocco (Emines Engineering School).
Prior to her work at Ecole polytechnique, she worked on a Machine Learning project to predict disease progression for Parkinson Disease Patients. She started a career 4 yours ago working as a Sales and Marketing Analyst in California in the Aerospace industry.

Alice Martin

Machine Learning Engineer & PhD Candidate | Lecturer | Teacher at École Polytechnique | Mohammed VI Polytechnic University | Data Science Retreat

Workshop Sessions

More sessions added weekly

Workshop: Beyond deep learning - differentiable programming with Flux

Deep learning is a rapidly evolving field, and models are increasingly complex. Recently, researchers have begun to explore “differentiable programming”, a powerful way to combine neural networks with traditional programming. Differentiable programs may include control flow, functions and data structures, and can even incorporate ray tracers, simulations and scientific models, giving us even unprecedented power to find subtle patterns in our data.

This workshop will show you how this technique, and particularly Flux – a state-of-the-art deep learning library – is impacting the machine learning world. We will show you how Flux makes it easy to create traditional deep learning models, and explain how the flexibility of the Julia language allows complex physical models can be optimised by the same architecture. We’ll outline important recent work and show how Flux allows us to easily combine neural networks with tools like differential equations solvers.

Instructor Bio

Avik Sengupta has worked on risk and trading systems in investment banking for many years, mostly using Java interspersed with snippets of the exotic R and K languages. This experience left him wondering whether there were better things out there. Avik’s quest came to a happy conclusion with the appearance of Julia in 2012. He has been happily coding in Julia and contributing to it ever since.

Avik Sengupta

VP of Engineering at Julia Computing

Instructor Bio

Ralf Schlüter serves as Academic Director and senior lecturer in the Computer Science Department at RWTH Aachen University, Germany. He leads the Automatic Speech Recognition Group at the Lehrstuhl Informatik 6: Human Language Technology and Pattern Recognition. He studied Physics at RWTH Aachen University, and Edinburgh University, UK, and received the Diplom degree in Physics, the Dr.rer.nat. degree in Computer Science, and completed his habilitation in Computer Science, all at RWTH Aachen University. His research interests cover automatic speech recognition and machine learning in general, discriminative training, neural network modeling, information theory, stochastic modeling, and speech signal analysis.

Ralf Schlüter, PhD

Academic Director at RWTH Aachen University

Workshop: From Research to Production: Performant Cross-Platform ML/DNN Model Inferencing on Cloud and Edge with ONNX Runtime

Powerful Machine Learning models trained using various frameworks such as scikit-learn, PyTorch, TensorFlow, Keras, and others can often be challenging to deploy, maintain, and performantly operationalize for latency-sensitive customer scenarios. Using the standard Open Neural Network Exchange (ONNX) model format and the open source cross-platform ONNX Runtime inference engine, these models can be scalably deployed to cloud solutions on Azure as well as local devices ranging from Windows, Mac, and Linux to various IoT hardware. Once converted to the interoperable ONNX format, the same model can be served using the cross-platform ONNX Runtime inference engine across a wide variety of technology stacks to provide maximum flexibility and reduce deployment friction…more details

Instructor Bio

Faith Xu is a Senior Program Manager at Microsoft on the Machine Learning Platform team, focusing on frameworks and tools. She leads efforts to enable efficient and performant productization of inferencing workflows for high volume Microsoft product and services though usage of ONNX and ONNX Runtime. She is an evangelist for adoption of the open source ONNX standard with community partners to promote an open ecosystem in AI.

Faith Xu

Senior Program Manager, Machine Learning at Microsoft

Workshop: From Research to Production: Performant Cross-Platform ML/DNN Model Inferencing on Cloud and Edge with ONNX Runtime

Powerful Machine Learning models trained using various frameworks such as scikit-learn, PyTorch, TensorFlow, Keras, and others can often be challenging to deploy, maintain, and performantly operationalize for latency-sensitive customer scenarios. Using the standard Open Neural Network Exchange (ONNX) model format and the open source cross-platform ONNX Runtime inference engine, these models can be scalably deployed to cloud solutions on Azure as well as local devices ranging from Windows, Mac, and Linux to various IoT hardware. Once converted to the interoperable ONNX format, the same model can be served using the cross-platform ONNX Runtime inference engine across a wide variety of technology stacks to provide maximum flexibility and reduce deployment friction…more details

Instructor Bio

Prabhat Roy is a Data and Applied Scientist at Microsoft, where he is a leading contributor to the scikit-learn to ONNX converter project (https://github.com/onnx/sklearn-onnx). In the past, he worked on ML.NET, an open source ML library for .NET developers, focusing on customer engagements for text and image classification problems.

Prabhat Roy

Data and Applied Scientist, AI Frameworks at Microsoft

Workshops: Tutorial on Automated Machine Learning

Automated machine learning is the science of building machine learning models in a data-driven, efficient, and objective way. It replaces manual trial-and-error with automated, guided processes. In this tutorial, we will guide you through the current state of the art in hyperparameter optimization, pipeline construction, and neural architecture search. We will discuss model-free blackbox optimization methods, Bayesian optimization, as well as evolutionary and other techniques. We will also pay attention to meta-learning, i.e. learning how to build machine learning models based on prior experience. Moreover, we will give practical guidance on how to do meta-learning with OpenML, an online platform for sharing and reusing machine learning experiments, and how to perform automated pipeline construction with GAMA, a novel, research-oriented AutoML tool in Python.

Instructor Bio

Joaquin Vanschoren is Assistant Professor in Machine Learning at the Eindhoven University of Technology. His research focuses on machine learning, meta-learning, and understanding and automating learning. He founded and leads OpenML.org, an open science platform for machine learning. He received several demo and open data awards, has been tutorial speaker at NeurIPS and ECMLPKDD, and invited speaker at ECDA, StatComp, AutoML@ICML, CiML@NIPS, DEEM@SIGMOD, AutoML@PRICAI, MLOSS@NIPS, and many other occasions. He was general chair at LION 2016, program chair of Discovery Science 2018, demo chair at ECMLPKDD 2013, and he co-organizes the AutoML and meta-learning workshop series at NIPS and ICML. He is also co-editor of the book ’Automatic Machine Learning: Methods, Systems, Challenges’.

Joaquin Vanschoren, PhD

Assistant Professor of Machine Learning at the Eindhoven University of Technology

Workshop: Choosing The Right Deep Learning Framework: A Deep Learning Approach

As a developer advocate for IBM, I use machine learning in order to understand machine learning developers. I spend time building models to identify the problems they solve and the tools that they use to do so. This talk will present how deep learning can be used to predict the deep learning framework that most closely resembles the style in which a machine learning developer programs. Many of these frameworks are very new (for instance TensorFlow is in the process of putting out just its second major release). As the field of deep learning and the frameworks enabling them continue to rapidly change, the community using a particular one will be much more consistent. This talk will end by allowing developers to test them models themselves by uploading examples of their work via Jupyter Notebooks and predict the deep learning community and framework that is right for them.

Instructor Bio

Coming Soon!

Nick Acosta

Developer Advocate at IBM

Workshop: How to Make Machine Learning Fair and Accountable

The suitability of Machine Learning models are traditionally measured on its accuracy. Metrics like RMSE, MAPE, AUC, ROC, Gini etc largely decide the ‘fate’ of Machine Learning models. However, if one digs deeper the ‘fate’ of Machine Learning models going beyond a few accuracies driven metrics to its capability of being Fair, Accountable, Transparent and Explainable a.k.a FATE.
Machine Learning, as the name implies, learns whatever it is taught. It’s a ramification of what it is fed. It’s a fallacy that ML don’t have perspective, it has the same perspective that the data has which was used to make it learn what its preaches today. In simple words, algorithms can echo prejudices that data explicitly or implicitly has…more details

Instructor Bio

Sray is currently working for Publicis Sapient as a Data Scientist and is based out of London. His expertise lies in Predictive Modelling, Forecasting and advanced Machine Learning. He possesses a deep understanding of algorithms and advanced statistics. He has a background in management, economic and has done a master equivalent program in Data Science and Analytics. His current area of interest are Fair and Explainable ML.

Sray Agarwal

Specialist – Data Science at Publicis Sapient

Workshop: Make Beautiful Web Apps from Jupyter Notebooks

Are you a data scientist looking for new and powerful ways to ship data products to users within your organization?
This session offers a crash course in new open source Python, Jupyter and PyData tools to rapidly prototype interactive apps that are as expressive and powerful as any notebook you can build!
Come and learn how to quickly turn Jupyter notebooks – a central element in a data science workflow – into beautiful web apps to share with team members and stakeholders in your organization!

What you will learn:
At the end of this session we will have built an interactive app that you would be proud to show to your organization…more details

Instructor Bio

Michal leads a successful data science consultancy, delivering strategy and execution on data science, data engineering and ML projects, with clients in retail, transportation, finance, film, and building automation. Beyond technical contributions, he has developed successful approaches to create lasting change in the alignment in organizational collaboration, data literacy, and unlock unrealized human potential that is being stopped by elusive inefficiencies in process and culture.
He holds a Masters degree in Econometrics and Information engineering form Poznan University of Economics. He participated in commercial research projects, including mobile phone data research with published results.

Michal Mucha

Senior Data Scientist (independent consultant) at create.ml

Workshop: PyTorch 101: Building A Model Step-by-Step

Learn the basics of building a PyTorch model using a structured, incremental and from first principles approach. Find out why PyTorch is the fastest growing Deep Learning framework and how to make use of its capabilities: autograd, dynamic computation graph, model classes, data loaders and more.
The main goal of this session is to show you how PyTorch works: we will start with a simple and familiar example in Numpy and “torch” it! At the end of it, you should be able to understand PyTorch’s key components and how to assemble them together into a working model.
We will use Google Colab and work our way together into building a complete model in PyTorch. You should be comfortable using Jupyter notebooks, Numpy and, preferably, object oriented programming.

Instructor Bio

Coming Soon!

Daniel Voigt Godoy

Manager, Financial Advisory Analytics, Dean at Deloitte, Data Science Retreat

Workshop: Machine Learning with R Workshop: Smartphone-Based Indoor Positioning

We will compare some of the most popular R Machine Learning meta-libraries in a workshop with live coding where the audience is invited to code along.
Our goal will be to create a smartphone-based positioning system using a publicly available Wi-Fi fingerprinting dataset (UJIIndoorLoc database).
In the first part of the workshop we will focus on understanding, cleaning, visualising and transforming the data (including dimensionality reduction).
Once the data is ready, we will train classification models (to predict building and floor) and regression models (to predict latitude and longitude) using different R frameworks. We will evaluate our models and make a prediction that can be sent to the researchers of Jaume I University who created the dataset to compete in their ML challenge (they keep a private validation dataset).

Instructor Bio

In the spring of 2016, an enlightening work experience at a London-based Data Consultancy firm pushed Guillem towards a big career leap: from Translating and Social Science Teaching to Data Science. Three years later and having worked as an analytics consultant at Accenture, Guillem is the Senior Data Analytics Mentor at Ubiqum Berlin.

Guillem Perdigó

Lead Data Analytics Mentor at Ubiqum

Tutorial Sessions

More sessions added weekly

TUTORIAL: ML for Social Good: Success Stories and Challenges

Artificial intelligence, machine learning and deep learning have become hot topics recently due to their overwhelming success in various tasks. Nevertheless, can we go beyond superhuman performance in recognizing objects on image datasets? Are AI and ML already able to bring tangible impact in socially meaningful tasks? United Nation Sustainable Development Goals (UN SDGs) set 17 objectives for peace and prosperity for people and the planet including Climate Action, Good Health and Well-being, and No Poverty. In this tutorial the recent advances in machine learning for social good related to the UN SDGs will be presented. How can machine learning help with disaster relief and environment protection today? Which applications that bring immediate impact are ready to exploit state-of-the-art artificial intelligence methods? What are the challenges in applying machine learning algorithms in the wild? We will discuss these questions in this tutorial.

Instructor Bio

Olga Isupova received the Specialist (eq. to M.Sc.) degree in applied mathematics and computer science in 2012 from Lomonosov Moscow State University, Moscow, Russia, and the Ph.D. degree in 2017 from the University of Sheffield, Sheffield, U.K. She is a Research Assistant in machine learning with the Department of Engineering Science, University of Oxford, Oxford, U.K. Her research interests include machine learning for disaster response and environment protection, Bayesian nonparametrics, and anomaly detection.

Olga Isupova, PhD

Principal Researcher at University of Oxford, Machine Learning Research Group

Tutorial: Introduction to Data Science

Data science is fast becoming a richer and more varied working field, ranging from research and development al the way through to commercial applications in a wide range of settings and industries. Data science is far from being a single, well-defined discipline, and insteadit is a combination of different specialisations and practices. For newcomers to the field navigating this vast ecosystem may be a daunting task and as the dust settles, getting some perspective may help understand the challenges and opportunities in the field.

In this talk we will cover some of the ingredients that make data science an exciting area of work within a business context. There is no doubt that the variety of positions that require data science skills is ever growing and evolving. This means that the availability of roles has expanded, making the field more diverse. This in turn has implications in both the talent that companies are looking for as well as in the activities that data scientists can get involved with…more details

Instructor Bio

Jesús (aka J.) is a lead data scientist with a strong background and interest in insight, based on simulation, hypothesis generation and predictive analytics. He has substantial practical experience with statistical analysis, machine learning and optimisation tools in product development and innovation, finance, media and others.

Jesús is Data Science Instructor in General Assembly. He has a background in physics and held positions both in academia and industry, including Imperial College, IBM Data Science Studio, Prudential and Dow Jones to name a few. Jesús is the author of “Essential MATLAB and Octave,” a book for students in physics, engineering, and other disciplines. He also authored the upcoming data science book entitled “Data Science and Analytics with Python.

Jesus Rogel-Salazar, PhD

Independent Data Scientist and Consultant

Tutorial on Provenance

A decade of research on provenance, a standardisation of provenance at the World Wide Web Consortium (PROV), and applications, toolkits and services adopting provenance have led to the recognition that provenance is a critical facet of good data governance for businesses, governments and organisations in general. Provenance, which is defined as a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing, is now regarded as an essential function of data-intensive applications, to provide a trusted account of what they performed. In this tutorial, I will explain the motivation for provenance, introduce the standard PROV, present some applications of provenance, and review some tools for provenance. We will aim to conclude with a practical session on provenance modelling.

Instructor Bio

Luc Moreau is a Professor of Computer Science and Head of the department of Informatics, at King’s College London. Before joining King’s, Luc was Head of the Web and Internet Science, in the department of Electronics and Computer Science, at the University of Southampton.

Luc was co-chair of the W3C Provenance Working Group, which resulted in four W3C Recommendations and nine W3C Notes, specifying PROV, a conceptual data model for provenance the Web, and its serializations in various Web languages. Previously, he initiated the successful Provenance Challenge series, which saw the involvement of over 20 institutions investigating provenance inter-operability in 3 successive challenges, and which resulted in the specification of the community Open Provenance Model (OPM). Before that, he led the development of provenance technology in the FP6 Provenance project and the Provenance Aware Service Oriented Architecture (PASOA) project. [For further details, please see https://nms.kcl.ac.uk/luc.moreau/about.html]

He is on the editorial board of “PeerJ Computer Science” and previously he was editor-in-chief of the journal “Concurrency and Computation: Practice and Experience” and on the editorial board of “ACM Transactions on Internet Technology”.

Luc Moreau, PhD

Professor & Head of Informatics at King’s College

Tutorial: Bringing Data to the Masses Through Visualisation

Data visualisation is invaluable in helping data scientists and researchers identify patterns and trends. But it also has a crucial role in communicating meaning to non-expert audiences.

Among the wider public, mistrust of data and experts is worryingly widespread, and in some areas growing. Around emotive issues such as vaccinations and climate change, robust data science can lose the argument against well-meaning public ignorance, or active campaigns of misinformation.

More generally, mistrust or misunderstanding of data science can hinder the ability of any organisation to sell the benefits of the discipline to a wider audience – to secure funding, sell products, drive engagement in open data projects, or sway public opinion to change behaviour…more details

Instructor Bio

Alan Rutter is the founder of consultancy Fire Plus Algebra, and is a specialist in communicating complex subjects through data visualisation, writing and design. He has worked as a journalist, product owner and trainer for brands and organisations including Guardian Masterclasses, WIRED, Time Out,the Home Office, the Biotechnology and Biological Sciences Research Council and Liverpool School of Tropical Medicine.

Alan Rutter

Founder at Fire Plus Algebra

Tutorial: Sequence Modelling with Deep Learning

Much of data is sequential – think speech, text, DNA, stock prices, financial transactions and customer action histories. Modern methods for modelling sequence data are often deep learning-based, composed of either recurrent neural networks (RNNs) or the attention-based transformer. A tremendous amount of research progress has recently been made in sequence modelling, particularly in the application to NLP problems. However, the inner workings of these sequence models can be difficult to dissect and intuitively understand.

This presentation/tutorial will start from the basics and gradually build upon concepts in order to impart an understanding of the inner mechanics of sequence models – why do we need specific architectures for sequences at all, when you could use standard feed-forward networks?..more details

INSTRUCTOR BIO

Dr. Natasha Latysheva is a machine learning engineer in the NLP group at Welocalize, a leading language services provider. Her work focuses on machine translation and natural language processing. She previously worked as a data scientist in a video game studio, and before that on a PhD in computational biology at Cambridge.

Dr. Natasha Latysheva

Machine learning research engineer at Welocalize

Tutorial: Integrating Real-Time Video Analysis with Clinical Data to Enable Digital Diagnostics

The rapid digitization of healthcare has accelerated the adoption of artificial intelligence for clinical applications. A major opportunity for predictive clinical analytics is the ability to provide faster diagnoses and treatment plans tailored to individuals. Applications have been developed for multiple care settings, with many novel point-of-care diagnostics that can assess for diseases ranging from malaria to skin cancer. In this talk, we will demonstrate how to apply machine learning algorithms to identify regions of interest for detection of the pupil for point-of-care drug screening. We will highlight state-of-the-art technology to support real-time image analysis, approaches for the use of real-world clinical and patient-generated data, and best practices for translating novel AI algorithms from controlled research and development environments into real-world, clinical settings…more details

Instructor Bio

Dr. Schulz is an Assistant Professor of Laboratory Medicine and computational health care researcher at Yale School of Medicine. He received a PhD in Microbiology, Immunology, and Cancer Biology and an MD from the University of Minnesota. He is the Director of Informatics for the Department of Laboratory Medicine, Director of the CORE Center for Computational Health, and Medical Director of Data Science for Yale New Haven Health System. Dr. Schulz has over 20 years’ experience in software development with a focus on enterprise system architecture and has a research interests in the management of large, biomedical data sets and the use of real-world data for predictive modeling. At Yale, he has led the implementation of a distributed data analysis and predictive modeling platform, for which he received the Data Summit IBM Cognitive Honors award. Other projects within his research group include computational phenotyping and the development of clinical prescriptive models for precision medicine initiatives. His clinical areas of expertise include molecular diagnostics and transfusion medicine, where he has ongoing work assessing the use, safety, and efficacy of pathogen-reduced blood products.

Wade Schulz, MD, PhD

Assistant Professor and Director of Computational Health at Yale School of Medicine

Tutorial: The Perks of On Board Deep Learning: Train, Deploy and Use Neural Nets on a Raspberry Pi

Machine learning applications are new in the software development landscape, and tend to be hard to build. As Google noted in an article (source : https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf), it is mainly because the application is much broader than the model itself. Surprisingly though, Machine Learning applications follow a double Pareto’s law. On the one hand, 80% of the time spent on building those applications deals with machine learning problems whereas 20% of the remaining time is spent on integrating the model to a running application. On the other hand, only 20% of the code lines are specific to machine learning ; the vast rest is about integration and run…more details

Instructor Bio

Constant is strongly interested in the creation of value out of data and helps those who believe in such a potential by accelerating their transition toward a data driven company. In order to address these new problematics, he focuses on mastering every skill of a complete Data Geek : architecture expertise (data, applications, network), data science mastering (statistical learning, data visualisation, algorithmic theory), customer and business understanding (model prediction consumption, business metrics, customer needs).

Constant has been working for about two years for OCTO Technology. He is an an expert in the industry sector and works on several types of mission, ranging from predictive maintenance of production site, to prediction of critical KPIs in video games, via real time monitoring of manufacturing devices. Prior to joining OCTO, Constant was working as a researcher in data61 (formerly known as NICTA), the best research institute in ICT in Australia on applying Machine Learning to profile GUI users and provide the best amount of information to help them make a decision based on a machine learning prediction

Constant Bridon

Data Science Consultant at OCTO Technology

Tutorial: Learning Gaussian Processes

In this tutorial, I will give a gentle introduction to Gaussian Processes. Gaussian Processes are a non-parametric tool to model complex non-linear relationships from data by assuming a prior distribution over functions. Compared to many classical machine learning approaches, Gaussian Processes have two main advantages:
They require only learning a small set of parameters which may prevent over-fitting, and they endow all their predictions with uncertainties which is important in safety-critical applications.

Starting from Gaussian distributions and linear regression, we will derive Gaussian Process regression models. We will then continue to the function-space view where Gaussian Processes can be completely defined by their covariance function. The covariance function encodes all assumptions about the form of function by describing the similarity between the data points…more details

Instructor Bio

Barbara Rakitsch has been working as a research scientist at the Bosch Center for Artifical Intelligence in Renningen since 2017. Her interests lie in the area of Bayesian modeling with a focus on Gaussian processes and time-series data.

In 2014, she received her PhD in in probabilistic modeling for computational biology at the Max Planck Institute for Intelligent Systems in Tuebingen. Before joining Bosch, she worked on machine learning problems as a post-doc at the European Bioinformatics Institute in Cambridge and as a researcher in a cancer startup.

Barbara Rakitsch, PhD

Research Scientist at Bosch Center for Artifical Intelligence

Tutorial: An Overview of Responsible Artificial Intelligence

Enabling responsible development of artificial intelligent technologies is one of the major challenges we face as the field moves from research to practice. Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learning in many current and future real-world applications.  Now there are calls from across the industry (academia, government, and industry leaders) for technology creators to ensure that AI is used only in ways that benefit people and “to engineer responsibility into the very fabric of the technology.”   Overcoming these challenges and enabling responsible development is essential to ensure a future where AI and machine learning can be widely used. In this talk we will cover six principles of development and deployment of trustworthy AI systems:  Four core principles of fairness, reliability/safety, privacy/security, and inclusiveness, underpinned by two foundational principles of transparency and accountability. We present on how each principle plays a key role in responsible AI and what it means to take these principles from theory to practice…more details

Instructor Bio

Mehrnoosh Sameki is a technical program manager at Microsoft responsible for leading the product efforts on machine learning transparency within the Azure Machine Learning platform. Prior to Microsoft, she was a data scientist in an eCommerce company, Rue Gilt Groupe, incorporating data science and machine learning in retail space to drive revenue and enhance personalized shopping experiences of customers and prior to that, she completed a PhD degree in computer science at Boston University. In her spare time, she enjoys trying new food recipes, watching classic movies and documentaries, and reading about interior design and house decoration.

Mehrnoosh Sameki, PhD

Technical Program Manager at Microsoft

Tutorial: An Overview of Responsible Artificial Intelligence

Enabling responsible development of artificial intelligent technologies is one of the major challenges we face as the field moves from research to practice. Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learning in many current and future real-world applications.  Now there are calls from across the industry (academia, government, and industry leaders) for technology creators to ensure that AI is used only in ways that benefit people and “to engineer responsibility into the very fabric of the technology.”   Overcoming these challenges and enabling responsible development is essential to ensure a future where AI and machine learning can be widely used. In this talk we will cover six principles of development and deployment of trustworthy AI systems:  Four core principles of fairness, reliability/safety, privacy/security, and inclusiveness, underpinned by two foundational principles of transparency and accountability. We present on how each principle plays a key role in responsible AI and what it means to take these principles from theory to practice…more details

Instructor Bio

Sarah Bird is a principle program manager at Microsoft where she leads research and emerging technology strategy for Azure AI. Sarah works to accelerate the adoption and impact of AI by bringing together the latest innovations research with the best of open source and product expertise to create new tools and technologies. She leads the development of responsible AI tools in Azure Machine Learning. She’s also an active member of the Microsoft Aether committee, where she works to develop and drive company-wide adoption of responsible AI principles, best practices, and technologies. Previously, Sarah was one of the founding researchers in the Microsoft FATE research group and worked on AI fairness in Facebook. She’s an active contributor to the open source ecosystem; she cofounded ONNX, an open source standard for machine learning models and was a leader in the PyTorch 1.0 project. She was an early member of the machine learning systems research community and has been active in growing and forming the community. She cofounded the SysML research conference and the Learning Systems workshops. She holds a PhD in computer science from the University of California, Berkeley, advised by Dave Patterson, Krste Asanovic, and Burton Smith.

Sarah Bird, PhD

Principle Program Manager at Microsoft

Tutorial: Probabilistic Deep Learning in TensorFlow: The Why and How

Bayesian probabilistic techniques allow machine learning practitioners to encode expert knowledge in otherwise-uninformed models and support uncertainty in model output. Probabilistic deep learning models take this further by fitting distributions rather than point estimates to each of the weights in a neural network, allowing its builder to inspect the prediction stability for any given set of input data. Following a slew of recent technical advancements, it’s never been easier to apply probabilistic modeling in a deep learning context, and TensorFlow Probability offers full support for probabilistic layers as a first-class citizen in the TensorFlow 2.0 ecosystem. This tutorial will focus on the motivation for probabilistic deep learning and the trade-offs and design decisions relevant to applying it in practice, with applications and examples demonstrated in TensorFlow Probability.

Instructor Bio

Coming Soon!

Zach Anglin

Lead Data Scientist at S&P Global

Tutorial: Rule Induction and Reasoning in Knowledge Graphs

Advances in information extraction have enabled the automatic construction of large knowledge graphs (KGs) like DBpedia, Freebase, YAGO and Wikidata. Learning rules from KGs is a crucial task for KG completion, cleaning and curation. This tutorial presents state-ofthe-art rule induction methods, recent advances, research opportunities as well as open challenges along this avenue. We put a particular emphasis on the problems of learning exception-enriched rules from highly biased and incomplete data. Finally, we discuss possible extensions of classical rule induction techniques to account for unstructured resources (e.g., text) along with the structured ones.

Instructor Bio

Coming Soon!

Daria Stepanova, PhD

Research Scientist at Bosch Center for AI

Tutorial: Price optimisation: From exploration to productionising

Dynamic price optimisation represents an increasingly profitable yet challenging process, especially for large and established businesses with long-standing practices and legacy data retention systems. Machine learning models built on often large amounts of sales data provide opportunities to grow revenue both by increasing price and reducing lost sales. David, Alexey and their team have developed approaches to solving such problems, from initial data exploration to cloud-based deployment. Key insights from this experience will be covered…more details

Instructor Bio

David is a senior data scientist at Altius. He has a PhD in particle physics and has spent ten years working in accelerator and detector physics at laboratories in the UK, US and China, where his research focussed on simulation and analysis of rare events along with designing and constructing high-frequency, low-latency data acquisition systems. At Altius he focuses on price optimisation solutions from initial data exploration through to robust cloud-based solutions.

David Adey, PhD

Senior Data Scientist at Altius

Tutorial: Price optimisation: From exploration to productionising

Dynamic price optimisation represents an increasingly profitable yet challenging process, especially for large and established businesses with long-standing practices and legacy data retention systems. Machine learning models built on often large amounts of sales data provide opportunities to grow revenue both by increasing price and reducing lost sales. David, Alexey and their team have developed approaches to solving such problems, from initial data exploration to cloud-based deployment. Key insights from this experience will be covered…more details

Instructor Bio

Alexey is an experienced (Data) Scientist. He designed, implemented and lead multiple successful projects from PreSales to Production for clients ranging from small organisations to FTSE100 companies. Alexey holds a PhD in Particle Physics with 20+ years career in science and big data. He worked for 13 years at CERN focusing on Higgs boson discovery and its precise mass measurement and for a couple of years in Computational Biology building the most popular NN-based protein secondary structure predictor. Alexey is the author and co-author of dozens of refereed papers including development of some widely used Machine Learning and statistical algorithms.

Alexey Drozdetskiy, PhD

Data Scientist at Altius

Tutorial: Introduction to Kubeflow Pipelines

This tutorial will demonstrate how to use Kubeflow Pipelines to create a full Machine Learning application on Kubernetes.

Productionizing Machine Learning workloads is today one of the key challenges in turning Machine Learning models into reliable drivers of business value. Model training and retraining, model serving and integration with operational systems, model validation, data validation and data generation are the key components of a robust, production level Machine Learning application, or pipeline. This comes with an inherent complexity that Kubeflow Pipelines attempts to solve for and make it manageable for Engineers and Data Scientists alike…more details

Instructor Bio

Dan Anghel is a Machine Learning Cloud Engineer working at Google for over 4 years. Prior to Google, Dan has spent more than 10 years in the French retail industry building scalable E-commerce solutions. Specialized in Machine Learning and Big Data, he is helping the largest Google customers build scalable production level solutions on Google Cloud

Dan Anghel

Machine Learning Cloud Engineer at Google

Tutorial: Tutorial on Credit Card Fraud Detection

Fraud detection in credit card transactions is a very wide and complex field. Over the years, a number of techniques have been proposed, mostly stemming from the anomaly detection branch of data science. That said, most techniques can be reduced to two main situations depending on the available dataset:
Situation 1: The dataset has a sufficient number of fraud examples
Situation 2: The dataset has no (or just a negligible number of) fraud examples.

The first situation is more standard. Here we can deal with the problem of fraud detection with classic machine learning techniques. All supervised machine learning algorithms for classification problems will do, e.g. Random Forest, Logistic Regression, etc…more details

Instructor Bio

Kathrin Melcher is a Data Scientist at KNIME. She holds a Master’s Degree in Mathematics from the University of Konstanz, Germany. She joined the evangelism team at KNIME in 2017 and has a strong interest in data science and machine learning algorithms. Kathrin enjoys teaching and sharing her data science knowledge with the community – for example in the book “From Excel to KNIME” – as well as on various blog posts and at training courses, workshops, and conference presentations.

Kathrin Melcher

Principal Data Scientist at KNIME

ODSC EUROPE 2019 | November 19th – 22nd

Register Now

What To Expect

As we prepare our 2019 schedule, take a look at some of the previous training and workshops we have hosted at ODSC Europe for an idea of what to expect.

  • High Performance, Distributed Spark ML, Tensorflow AI, and GPU

  • Machine Learning with R

  • Deep Learning with Tensorflow for Absolute Beginners

  • Algorithmic Trading with Machine and Deep Learning

  • Deep Learning in Keras

  • Deep Learning – Beyond the Basics

  • Running Intelligent Applications inside a Database: Deep Learning with Python Stored Procedures in SQL

  • Distributed Deep Learning on Hops

  • R and Spark with Sparklyr

  • Towards Biologically Plausible Deep Learning

  • Deep Learning Ensembles in Toupee

  • A Gentle Introduction to Predictive Analytics with R

  • Deep Learning with Tensorflow for Absolute Beginners

  • Graph Data – Modelling and Quering with Neo4j and Cypher

  • Introduction to Data Science with R

  • Introduction to Python in Data Science

  • High Performance, Distributed Spark ML, TensorFlow AI, and GPU

  • The Magic of Dimensionality Reduction

  • Analyze Data, Build a UI and Deploy on the Cloud with Apache Spark, Notebooks and PixieDust

  • Data Science for Executives

  • Data Science Learnathon. From Raw Data to Deployment: the Data Science Cycle with KNIME

  • Distributed Deep Learning on Hops

  • Drug Discovery with KNIME

  • Interactive Visualisation with R (and just R)

  • Introduction to Algorithmic Trading

  • Introduction to Data Science – A Practical Viewpoint

  • Julia for Data Scientists

  • Running Intelligent Applications inside a Database: Deep Learning with Python Stored Procedures in SQL

  • Telling Stories with Data

  • Towards Biologically Plausible Deep Learning

  • Win Kaggle Competitions Using StackNet Meta Modelling Framework