Training & Workshop Sessions

– Taught by World-Class Data Scientists –

Learn the latest data science concepts, tools and techniques from the best. Forge a connection with these rockstars from industry and academic, who are passionate about molding the next generation of data scientists.

Highly Experienced Instructors

Our instructors are highly regarded in data science, coming from both academia and notable companies.

Real World Applications

Gain the skills and knowledge to use data science in your career and business, without breaking the bank.

Cutting Edge Subject Matter

Find training sessions offered on a wide variety of data science topics from machine learning to data visualization.

ODSC Training Includes

Form a working relationship with some of the world’s top data scientists for follow up questions and advice.

Additionally, your ticket includes access to 50+ talks and workshops.

High quality recordings of each session, exclusively available to premium training attendees.

Equivalent training at other conferences costs much more.

Professionally prepared learning materials, custom tailored to each course.

Opportunities to connect with other ambitious like-minded data scientists.

2018 Training Instructors

We have some of the top names in data science siged up to host  training and workshops sessions.  More instructors will be added weekly.

Workshop: Multivariate Time Series Forecasting Using Statistical and Machine Learning Models

Time series data is ubiquitous: weekly initial unemployment claim, daily term structure of interest rates, tick level stock prices, weekly company sales, daily foot traffic recorded by mobile devices, and daily number of steps taken recorded by a wearable, just to name a few.

Some of the most important and commonly used data science techniques in time series forecasting are those developed in the field of machine learning and statistics. Data scientists should have at least a few basic time series statistical and machine learning modeling techniques in their toolkit.

This lecture discusses the formulation Vector Autoregressive (VAR) Models, one of the most important class of multivariate time series statistical models, and neural network-based techniques, which has received a lot of attention in the data science community in the past few years, demonstrates how they are implemented in practice, and compares their advantages and disadvantages used in practice. Real-world applications, demonstrated using python, are used throughout the lecture to illustrate these techniques. While not the focus in this lecture, exploratory time series data analysis using histogram, kernel density plot, time-series plot, scatterplot matrix, plots of autocorrelation (i.e. correlogram), plots of partial autocorrelation, and plots of cross-correlations will also be included in the demo.

Instructor Bio

Jeffrey is the Chief Data Scientist at AllianceBernstein, a global investment firm managing over $500 billions. He is responsible for building and leading the data science group, partnering with investment professionals to create investment signals using data science, and collaborating with sales and marketing teams to analyze clients. Graduated with a Ph.D. in economics from the University of Pennsylvania, he has also taught statistics, econometrics, and machine learning courses at UC Berkeley, Cornell, NYU, the University of Pennsylvania, and Virginia Tech. Previously, Jeffrey held advanced analytic positions at Silicon Valley Data Science, Charles Schwab Corporation, KPMG, and Moody’s Analytics.

Jeffrey Yau, PhD

Chief Data Scientist, Alliance Bernstein

Training: Introduction to Machine Learning

Machine learning has become an indispensable tool across many areas of research and commercial applications. From text-to-speech for your phone to detecting the Higgs boson, machine learning excells at extracting knowledge from large amounts of data. This talk will give a general introduction to machine learning, as well as introduce practical tools for you to apply machine learning in your research. We will focus on one particularly important subfield of machine learning, supervised learning. The goal of supervised learning is to “”learn”” a function that maps inputs x to an output y, by using a collection of training data consisting of input-output pairs. We will walk through formalizing a problem as a supervised machine learning problem, creating the necessary training data and applying and evaluating a machine learning algorithm. The talk should give you all the necessary background to start using machine learning yourself.

Instructor Bio

Andreas is lecturer at the Data Science Institute at Columbia University and author of the O’Reilly book “Introduction to machine learning with Python,” which describes a practical approach to machine learning with python and scikit-learn. He is one of the core developers of the scikit-learn machine learning library, and he have been co-maintaining it for several years. Andreas is also a Software Carpentry instructor. In the past, he worked at the NYU Center for Data Science on open source and open science, and as Machine Learning Scientist at Amazon. Andreas’s mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize the access to high-quality machine learning algorithms.

Andreas Mueller, PhD,

Author, Lecturer, and Core contributor to scikit-learn

Training: Intermediate Machine Learning with scikit-learn

Scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners. This talk will cover some of the more advanced aspects of scikit-learn, such as building complex machine learning pipelines, model evaluation, parameter search, and out-of-core learning. Apart from metrics for model evaluation, we will cover how to evaluate model complexity, and how to tune parameters with grid search, randomized parameter search, and what their trade-offs are. We will also cover out of core text feature processing via feature hashing.

Instructor Bio

Andreas is lecturer at the Data Science Institute at Columbia University and author of the O’Reilly book “Introduction to machine learning with Python,” which describes a practical approach to machine learning with python and scikit-learn. He is one of the core developers of the scikit-learn machine learning library, and he have been co-maintaining it for several years. Andreas is also a Software Carpentry instructor. In the past, he worked at the NYU Center for Data Science on open source and open science, and as Machine Learning Scientist at Amazon. Andreas’s mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize the access to high-quality machine learning algorithms.

Andreas Mueller, PhD

Author, Lecturer, and Core contributor to scikit-learn

Workshop: Feature Engineering for Natural Language Processing

Have you ever been curious what features can be extracted from written text? Have you ever wondered what a syntactic dependency is or how it can be useful? During this workshop, we will dive into the language structure and use linguistic intuition to solve a natural language processing problem.

The plan of the workshop:

  1. Introduction to structural linguistics, specifically morphology and syntax
  2. Natural language corpora and ways to process them
  3. Feature design for an error correction task
  4. Ambiguities in natural language

Instructor Bio

Computational linguist passionate about building natural language processing applications. Mariana has professional experience in such areas as syntactic parsing, sentiment analysis, named entity recognition, fact extraction, text anonymization, and a few more. For the last three years, she has been working on error correction and text improvement algorithms at Grammarly. Mariana cares a lot about computational linguistics in Ukraine, constantly looks for talented linguists, and spreads the word about the field of NLP by participating at AI conferences, collaborating with Ukrainian universities and organizing educational events (courses, meetups, summer schools). Her main interest is structural linguistics as a method of formalizing the natural language.

Mariana Romanyshyn

Technical Lead, Grammarly Inc.

Training: Feature Engineering for Time Series Data

Most machine learning algorithms today are not time-aware and are not easily applied to time series and forecasting problems. Leveraging algorithms like XGBoost, or even linear models, typically requires substantial data preparation and feature engineering – for example, creating lagged features, detrending the target, and detecting periodicity. The preprocessing required becomes more difficult in the common case where the problem requires predicting a window of multiple future time points. As a result, most practitioners fall back on classical methods, such as ARIMA or trend analysis, which are time-aware but often less expressive. This talk covers practices for solving this challenge and exploring the potential to automate this process in order to apply advanced machine learning algorithms time series problems.

Instructor Bio

Michael Schmidt is the Chief Scientists at DataRobot, and has been featured in the Forbes list of the world’s top 7 data scientists and MIT’s list of the most innovative 35-under-35. He has authored AI research in the journal Science and has appeared in media outlets such as the New York Times, NPR’s RadioLab, the Science Channel, and Communications of the ACM. In 2011, Michael founded Nutonian and led the development Eureqa, a machine learning application and service used by over 80,000 users and later acquired by DataRobot in 2017. Most recently, his work has focused on automated machine learning, feature engineering, and time series prediction.

Michael Schmidt, PhD

Chief Scientist, DataRobot

Training: Advanced Machine Learning with scikit-learn Part I

Abstract Coming Soon

Instructor Bio

Andreas is lecturer at the Data Science Institute at Columbia University and author of the O’Reilly book “Introduction to machine learning with Python,” which describes a practical approach to machine learning with python and scikit-learn. He is one of the core developers of the scikit-learn machine learning library, and he have been co-maintaining it for several years. Andreas is also a Software Carpentry instructor. In the past, he worked at the NYU Center for Data Science on open source and open science, and as Machine Learning Scientist at Amazon. Andreas’s mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize the access to high-quality machine learning algorithms.

Andreas Mueller, PhD

Author, Lecturer, and Core contributor to scikit-learn

Training: Advanced Machine Learning with scikit-learn Part II

Abstract Coming Soon

Instructor Bio

Andreas is lecturer at the Data Science Institute at Columbia University and author of the O’Reilly book “Introduction to machine learning with Python,” which describes a practical approach to machine learning with python and scikit-learn. He is one of the core developers of the scikit-learn machine learning library, and he have been co-maintaining it for several years. Andreas is also a Software Carpentry instructor. In the past, he worked at the NYU Center for Data Science on open source and open science, and as Machine Learning Scientist at Amazon. Andreas’s mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize the access to high-quality machine learning algorithms.

Andreas Mueller, PhD

Author, Lecturer, and Core contributor to scikit-learn

Training: Machine Learning in R Part I

Modern statistics has become almost synonymous with machine learning, a collection of techniques that utilize today’s incredible computing power. This two-part course focuses on the available methods for implementing machine learning algorithms in R, and will examine some of the underlying theory behind the curtain. We start with the foundation of it all, the linear model and its generalization, the glm. We look how to assess model quality with traditional measures and cross-validation and visualize models with coefficient plots. Next we turn to penalized regression with the Elastic Net. After that we turn to Boosted Decision Trees utilizing xgboost. Attendees should have a good understanding of linear models and classification and should have R and RStudio installed, along with the `glmnet`, `xgboost`, `boot`, `ggplot2`, `UsingR` and `coefplot` packages.

Instructor Bio

Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.

He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.

Jared Lander

Statistics Professor, Columbia University, Author of R for Everyone

Training: Machine Learning in R Part II

Modern statistics has become almost synonymous with machine learning, a collection of techniques that utilize today’s incredible computing power. This two-part course focuses on the available methods for implementing machine learning algorithms in R, and will examine some of the underlying theory behind the curtain. We start with the foundation of it all, the linear model and its generalization, the glm. We look how to assess model quality with traditional measures and cross-validation and visualize models with coefficient plots. Next we turn to penalized regression with the Elastic Net. After that we turn to Boosted Decision Trees utilizing xgboost. Attendees should have a good understanding of linear models and classification and should have R and RStudio installed, along with the `glmnet`, `xgboost`, `boot`, `ggplot2`, `UsingR` and `coefplot` packages.

Instructor Bio

Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.

He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.

Jared Lander

Statistics Professor, Columbia University, Author of R for Everyone

Training: Training Session: Learn D3 essentials to get started with web based data visualization

One of the most popular frameworks today for web-based data visualizations is D3. In this workshop you will learn how to leverage to most common parts of the D3 framework to create data visualizations. This will in include making selections, working with data, creating shapes, and more. Although the emphasis is on learning D3, some general visualization design principles will be touched upon briefly as well, to help you make better informed design decisions.

Instructor Bio

Jan Willem Tulp is a independent Data Experience Designer from The Netherlands. With his company TULP interactive he creates custom data visualizations for a variety of clients. He has helped clients such as Google, European Space Agency, Scientific American, Nature and World Economic Forum by creating visualizations, both interactive and in print. His work has appeared in several books and magazines and he speaks regularly at international conferences.

Jan Willem Tulp

Data Experience Designer, TULP Interactive

Training: The New Science of Big Data Analytics, Based on the Geometry and the Topology of Complex, Hierarchic Systems

These foundations of Data Science are solidly based on mathematics and computational science. The hierarchical nature of complex reality is part and parcel of this new, mathematically well-founded way of observing and interacting with (physical, social and all) realities.

These lectures include pattern recognition and knowledge discovery, computer learning and statistics.  Addressed is how geometry and topology can uncover and empower the semantics of data. Key themes include: text mining; computational linear time hierarchical clustering, search and retrieval; the Correspondence Analysis platform that performs latent semantic factor space mapping, and accompanying hierarchical clustering.

Various application domains are covered in the case studies.  These include in text mining, literary text, and social media – Twitter; and clustering in astronomy, chemistry, psychoanalysis.  Final discussion is in regard to the increasingly important domains of smart environments, Internet of Things, health analytics, and further general scope of Big Data.

Instructor Bio

Fionn Murtagh is Professor of Data Science and was Professor of Computer Science, including Department Head, in many universities. Following his primary degres in Mathematics and Engineering Science, before his MSc in Computer Science, that was in Information Retrieval, in Trinity College Dublin, his first position as Statistician/Programmer was in national level (first and second level) education research. PhD in Université P&M Curie, Paris 6, with Prof. Jean-Paul Benzécri, was in conjunction with the national geological research centre, BRGM. After an initial 4 years as lecturer in computer science, there was a period in atomic reactor safety in the European Joint Research Centre, in Ispra (VA), Italy. On the Hubble Space Telescope, as a European Space Agency Senior Scientist, Fionn was based at the European Southern Observatory, in Garching, Munich for 12 years. For 5 years, Fionn was a Director in Science Foundation Ireland, managing mathematics and computing, nanotechnology, and introducing and growing all that is related to environmental science and renewable energy.

Fionn was Editor-in-Chief of the Computer Journal (British Computer Society) for more than 10 years, and is an Editorial Board member of many journals. With over 300 refereed articles and 30 books authored or edited, his fellowships and scholarly academies include: Fellow of: British Computer Society (FBCS), Institute of Mathematics and Its Applications (FIMA), International Association for Pattern Recognition (FIAPR), Royal Statistical Society (FRSS), Royal Society of Arts (FRSA). Elected Member: Royal Irish Academy (MRIA), Academia Europaea (MAE). Senior Member IEEE.

Fionn Murtagh, PhD

Director, Centre of Mathematics and Data Science, University of Huddersfield

Training: Getting to Grips with the Tidyverse (R)

The tidyverse is essential for any statistician or data scientist who deals with data on a day-to-day basis. By focusing on small key tasks, the tidyverse suite of packages removes the pain of data manipulation. In this tutorial, we’ll cover some of the core features of the tidyverse, such as dplyr (the workhorse of the tidyverse), string manipulation, linking directly to databases and the concept of tidy data.

Instructor Bio

Dr. Colin Gillespie is Senior lecturer (Associate Professor) at Newcastle University, UK. His research interests are high performance statistical computing and Bayesian statistics. He is regularly employed as a consultant by Jumping Rivers and has been teaching R since 2005 at a variety of levels, ranging from beginners to advanced programming.

Dr. Colin Gillespie

Author of Efficient R Programming, R Trainer and Consultant, and Senior Lecturer at Newcastle University

Training: Data Science 101

Curious about Data Science? Self-taught on some aspects, but missing the big picture? Well you’ve got to start somewhere and this session is the place to do it. This session will cover, at a layman’s level, some of the basic concepts of data science. In a conversational format, we will discuss: What are the differences between Big Data and Data Science – and why aren’t they the same thing? What distinguishes descriptive, predictive, and prescriptive analytics? What purpose do predictive models serve in a practical context? What kinds of models are there and what do they tell us? What is the difference between supervised and unsupervised learning? What are some common pitfalls that turn good ideas into bad science? During this session, attendees will learn the difference between k-nearest neighbor and k-means clustering, understand the reasons why we do normalize and don’t overfit, and grasp the meaning of No Free Lunch.

Instructor Bio

For more than 20 years, Todd has been highly respected as both a technologist and a trainer. As a tech, he has seen that world from many perspectives: “data guy” and developer; architect, analyst and consultant. As a trainer, he has designed and covered subject matter from operating systems to end-user applications, with an emphasis on data and programming. As a strong advocate for knowledge sharing, he combines his experience in technology and education to impart real-world use cases to students and users of analytics solutions across multiple industries. He is a regular contributor to the community of analytics and technology user groups in the Boston area, writes and teaches on many topics, and looks forward to the next time he can strap on a dive mask and get wet. Todd is a Data Science Evangelist at DataRobot.

Todd Cioffi

Data Science Evangelist, DataRobot

Workshop: Coming soon

Abstract Coming Soon

Instructor Bio

Yves has a Ph.D. in Mathematical Finance and is the founder and managing partner of The Python Quants GmbH. He is also the author of the books Python for Finance, Derivatives Analytics with Python and Listed Volatility & Variance Derivatives. He lectures for Data Science at htw saar University of Applied Sciences and for Computational Finance at the CQF Program and is the organizer of the Python for Quant Finance Meetup in London.

Yves Hilpisch, PhD

Founder, Quant University, Lecturer, and Author of Derivatives Analytics with Python and Python for Finance

Workshop: Probabilistic Graphical Models using PGMPY

Instructor Bio

Harish Kashyap has a Masters from Northeastern University, Boston in Electrical Engineering. He received the Graduate Student Award, a research scholarship as a part of which he worked at the BBN Technologies in the area of Bayesian Machine Learning algorithms as applied to Speech Recognition. He has more than 15 years of experience in the areas of Artificial Intelligence (AI), Digital Signal Processing and Software development. He has several publications and patents filed in the areas of ML. He has led various Data Analytics projects across US and Europe that led to organizational savings of $8M+. He has built out the curriculum for Machine Learning training platform, refactored.ai which is now part of SUNY Buffalo’s graduate ML course. He  is currently the founder of Mysuru Consulting Group (MCG.ai) and Diagram.AI where he works on ML algorithms

Harish Kashyap

CEO, Mysuru Consulting Group

Workshop: Telling human stories with data

Robust data analysis underpins every business decision, public sector project and non-profit initiative. But data in its raw form often fails to convince crucial lay audiences – either due to its complexity, or due to suspicion and mistrust. And you can’t help get guide the world in the right direction made if you alienate key decision-makers.
Visualisation and narrative storytelling offer a path to bringing the numbers to life and persuading your audience – without losing the the crucial truth that’s in the data.

This workshop, delivered by journalist and data visualisation specialist Alan Rutter, will cover an audience-centred approach to visualising data. It will introduce tried-and-tested techniques for communicating data-driven stories effectively to people from a broad range of backgrounds, and deal with some of the common problems that practitioners encounter.
We will discuss how to avoid landing at either extreme of the data visualisation spectrum: bullet-proof academic analysis that fails to communicate a message; or fantastic aesthetic creations that actually say nothing at all.
We will look at the ethics of data visualisation, and how to be transparent about underlying data and methodologies. And we will discuss the considerations of different media, and how to tailor your visual approach to both your channels and audience. We will also look at how human beings interpret data, and how we can use design and psychology to our advantage.

There is no focus on a specific tool or language, as it is far more important to have a clear idea of the intention and desired action than to obsess over a given language or piece of software. This workshop is suited to anyone who wants to create impact with the data they work with by turning it into compelling stories for other audiences – whether through printed materials, presentations, social media, or websites and apps. 

Instructor Bio

Alan Rutter is the co-founder of consultancy Clever Boxer. He first worked with infographics as a magazine journalist (Time Out, WIRED), before moving into technology roles (Condé Nast, Net-A-Porter) and then training and development (The Guardian, General Assembly). He has taught data visualisation techniques to thousands of students, and for organisations including the Home Office, Department of Health, Biotechnology and Biosciences Research Council, Capita, Novartis and Kings College London.

Alan Rutter

Data Visualization Consultant, Trainer at General Assembly and Co-founder at Clever Boxer

Workshop: Understanding unstructured data with Language Models

As data scientists, we’ve seen a rapid improvement in the last decades in the tools available for working with structured data (be it tabular data, graph data, sensor data etc.). Yet, the vast majority of our data (Merrill Lynch puts the figure at roughly 90%) is *unstructured*, and lives in the form of documents, emails, reviews, reports and chat logs etc.

Many of us are far less familiar with how to analyse and understand this trove of unstructured data. This talk focuses on language models, one of the most fundamental tools for working with unstructured data. Language models are all around us (although we’re probably unaware of them), underpinning everything from Word’s spellchecker to home assistants like Alexa.

While plenty of “out of the box” language modelling libraries exists, the first part of the talk focuses on getting an thorough understanding of what a language model is, and how it works. We’ll touch on key ideas from statistics and information theory, and see how Alan Turing, in developing techniques to break Nazi codes at Bletchley Park, created the smoothing techniques which remain widely used in language models today. We’ll then proceed to the present day, looking at how techniques like word vectors and transfer learning have yielded an improved generation of tools.

In the second half of the talk, we’ll look at how we can practically use language models to understand unstructured data. Specifically we’ll explore:

– Classification – the canonical application of language models, they can help us identify spam, analyse sentiment or perform unsupervised clustering. We’ll look at a famous case where language models were able to successfully identify a Shakespeare forgery.
– Predictive modelling – if I were to look at your Tweets (and nothing else), could I guess your gender? It turns out state-of-the-art techniques can successfully predict it with an 80%+ success rate. We’ll look at how language models can enrich your datasets with additional demographic or contextual data.
– Information retrieval – finally, we’ll see how language models have been used extensively (for example in the legal sector), to extract targeted insights from enormous data sets.

 

Instructor Bio

Alex Peattie is the co-founder and CTO of Peg, a technology platform helping multinational brands and agencies to find and work with top YouTubers. Peg is used by over 2000 organisations worldwide including Coca-Cola, L’Oreal and Google.

An experienced digital entrepreneur, Alex spent six years as a developer and consultant for the likes of Grubwithus, Huckberry, UNICEF and Nike, before joining coding bootcamp Makers Academy as senior coach, where he trained hundreds of junior developers. Alex was also a technical judge at the 2017 TechCrunch Disrupt conference.

Alex Peattie

CTO, Peg

Workshop: Training Gradient Boosting models on large datasets with CatBoost

Gradient boosting is a powerful machine-learning technique that achieves state-of-the-art results in a variety of practical tasks. For a number of years, it has remained the primary method for learning problems with heterogeneous features, noisy data, and complex dependencies: web search, recommendation systems, weather forecasting, and many others.
Dataset sizes grow all the time. To train gradient boosting on a large dataset you need an efficient and scalable algorithm. CatBoost is a new gradient boosting library, that has a set of advantages including its very fast and efficient GPU training and support of Spark for distributed training.

In this tutorial we will walk you through main features of the library – we will explain how to train the model, how to analyze it, how to deal with overfitting. We will show examples of training with categorical features and will show how this improves the quality of the resulting model. We will also learn how to train the model on Spark so that you can train your model on very large datasets with a few lines of python code.

 

Instructor Bio

Anna Veronika Dorogush graduated from the Faculty of Computational Mathematics and Cybernetics of Lomonosov Moscow State University and from Yandex School of Data Analysis. She used to work at ABBYY, Microsoft, Bing and Google, and has been working at Yandex since 2015, where she currently holds the position of the head of Machine Learning Systems group and is leading the efforts in development of the CatBoost library.

Anna Veronika Dorogush

ML Lead, Yandex

Workshop: Keras for R

The Keras library has made neural networks accessible to the masses through a simple Python API. The R interface to Keras, by RStudio, brings this accessibility to R users. This hands on workshop, with cloud environment provided, will guide you through creating deep neural networks in R, with problems that don’t need your own data centre to solve.

In this workshop we will get you up and running with Keras for R. We will cover some theoretical background but the focus is on implementation. We will demonstrate how to setup different types of neural network to solve simple problems with time series, and give you the opportunity to build your own with guided exercises.

 

Instructor Bio

Doug is a Senior Data Scientist at Mango Solutions. A statistical physicist by training, he now practices and teaches a wide range of data science disciplines from machine learning pipelines to graph theory.

Douglas Ashton, PhD

Senior Data Scientist, Mango Solutions

Workshop: How to visualize your data: Beyond the eye into the brain

There are decisions neither humans nor computers can make alone. Computer-supported visualizations are a powerful decision-making tool when we need to reason based on complex data. This workshop focuses on how we can achieve an effective visual data analysis. First, we visualize data based on the capabilities of human vision: which visual mappings for which data and tasks. Second, we visualize data based on the capabilities of the human brain: how much information we can actually process and how cognitive biases can sneak into the process.

Instructor Bio

Evanthia Dimara is a postdoctoral research scientist at the Institute for Intelligent Systems and Robotics (ISIR) Laboratory (HCI group) of Sorbonne University. Her fields of research are human-computer interaction and information visualization. Her focus is on decision making — how to help people make unbiased and informed decisions alone or in groups.

Evanthia Dimara, PhD

Research Scientist, Institute for Intelligent Systems and Robotics

Workshop: AI for Executives Part I

Gain insight into how to drive success in data science. Identify key points in the machine learning life cycle where executive oversight really matters. Learn effective methods to help your team deliver better predictive models, faster. You’ll leave this seminar able to identify business challenges well suited for machine learning, with fully defined predictive analytics projects your team can implement now to improve operational results.

 

Instructor Bio

John Boersma is Director of Education for DataRobot. In this role he oversees the company’s client training operations and relations with academic institutions using DataRobot in analytics courses. Previously, John founded and led Adapt Courseware, an adaptive online college curriculum venture. John holds a PhD in computational particle physics and an MBA in general management.

John Boersma

Director of Education, DataRobot

Workshop: AI for Executives Part II

Gain insight into how to drive success in data science. Identify key points in the machine learning life cycle where executive oversight really matters. Learn effective methods to help your team deliver better predictive models, faster. You’ll leave this seminar able to identify business challenges well suited for machine learning, with fully defined predictive analytics projects your team can implement now to improve operational results.

 

Instructor Bio

John Boersma is Director of Education for DataRobot. In this role he oversees the company’s client training operations and relations with academic institutions using DataRobot in analytics courses. Previously, John founded and led Adapt Courseware, an adaptive online college curriculum venture. John holds a PhD in computational particle physics and an MBA in general management.

John Boersma

Director of Education, DataRobot

Workshop: Pricing optimization with machine learning, potential approaches to reject inferencing, tackling fraud through a mix of supervised and unsupervised techniques

Abstract Coming Soon

 

Instructor Bio

Bio Coming Soon

Jon Dahlberg

Customer Facing Data Scientist, DataRobot

Workshop: Competitive Model Stacking: An Introduction to Stacknet Meta Modelling Framework

StackNet is a computational, scalable and analytical framework mainly implemented in Java that resembles a feedforward neural network and uses Wolpert’s stacked generalization in multiple levels to improve the accuracy of predictions. StackNet will be demonstrated through practical examples.

 

Instructor Bio

Marios Michailidis is a Research data scientist at H2O.ai . He holds a Bsc in accounting Finance from the University of Macedonia in Greece and an Msc in Risk Management from the University of Southampton. He has also nearly finished his PhD in machine learning at University College London (UCL) with a focus on ensemble modelling. He has worked in both marketing and credit sectors in the UK Market and has led many analytics’ projects with various themes including: Acquisition, Retention, Recommenders, Uplift, fraud detection, portfolio optimization and more.

He is the creator of KazAnova(http://www.kazanovaforanalytics.com/), a freeware GUI for credit scoring and data mining 100% made in Java as well as is the creator of StackNet Meta-Modelling Framework (https://github.com/kaz-Anova/StackNet). In his spare time he loves competing on data science challenges and was ranked 1st out of 500,000 members in the popular Kaggle.com data competition platform. Here (http://blog.kaggle.com/2016/02/10/profiling-top-kagglers-kazanova-new-1-in-the-world/) is a blog about Marios being ranked at the top in Kaggle and sharing his knowledge with tricks and ideas.

Marios Michailidis, PhD

Ranked #1 On Kaggle.Com, Data Scientist at H20.ai

Tutorial: Scale your AI

This talk presents the challenges that industry faces in adopting AI. Specifically, focusing on how to scale AI. We deep dive into training large scale models, including looking at modeling and infrastructure aspects.

Some people say “AI can be an existential threat” while in reality, as AI scientists, we carry the huge responsibility of finding solutions to the most life-threatening problems of our generation. For instance, self-driving car technology could reduce millions of death in road crashes, and machine learning makes mass population early cancer screening possible that can significantly increases the survival rate of cancer patients.

While AI research progress has accelerated over the past years, its wide adoption has been hampered due to scaling challenges. In this talk, I will present the challenges that I found during years of academic and industrial experience with machine learning and computer vision. Specifically, I will dive deep into the scaling challenges that industry faces, including scaling expertise, data, computation and algorithms.

Instructor Bio

Bio Coming Soon

Mohsen Hejrati, PhD

Co-founder, CEO Clusterone

 

Workshop: Network Analysis using Python

Daenerys or Jon Snow? JFK, ORD or ATL, do these codes look familiar? In this tutorial we build up on the fundamentals of network science and look at various applications of network analysis to real world datasets like the US Airport Dataset, Game of Thrones character co-occurrence network, and foray into algorithms and application of network science.

In this tutorial we will go through an introduction to network science and 2 case studies, i.e. datasets of applied network analysis. Participants should be comfortable with python and the basic pydata stack [pandas, matplotlib]

By the end of the tutorial everyone should be comfortable with hacking on the NetworkX API, modelling data as networks and basic analysis on networks using python.

repo for notebooks: https://github.com/MridulS/pydata-networkx

 

Instructor Bio

Mridul is trying to understand the world around us using data and tools from network science, currently at UCLouvain. He is an open source enthusiast and has been selected for Google Summer of Code, and is involved with NumFOCUS. He has given multiple tutorials at PyCon, PyData, and EuroScipy to share his love of Python and network science.

Mridul Seth

Research Assistant, Université catholique de Louvain, ICTEAM

Workshop: A.I and Cyber Security

This workshop will explore the application of Machine Learning and A.I techniques as a Cyber Defence solution. Current Cyber defence system are clearly struggling with the volume and sophistication of cyber-attacks. This is fundamentally a big data and weak signal problem. It is thus well suited to the application of machine learning technology to help identify, classify and manage cyber threats.
If well applied, A.I has the potential to significantly reduce the costs and human burden of cyber defence. It has most value in its ability to enhance and amplify human analyst’s capacity to process the flood of threat data within a security operations context. It therefore represents an area of significant commercial growth potential. At the same time, advanced Machine Learning (ML) techniques are being applied by the hacker community to plan and execute sophisticated attacks against ICT systems. An arms race is now in progress to develop ML driven attacks and counter measures. Many state actors now see A.I as a geo-political imperative and have assigned significant resources to develop their capacity in the field. One of the key drivers for this is to gain advantage in offensive cyber security processes. History has shown that once developed such state tools tend to leak into the criminal domain, and become a major problem.
The key approach presented is how by combining advanced Visual Analytic methods with ML we can gain an advantage over the attack threats. We will explore how the application of ML has evolved and how these techniques add value to augmenting the capacity of human analysts. We will also address some of the business drivers, including the skills gap and how far ML should be applied, or if a fully automated approach is desirable.

 

Instructor Bio

Bio Coming Soon

Robert Hercock, PhD

Chief Researcher, BT Labs

Workshop: Target leakage in machine learning

Abstract Coming Soon

 

Instructor Bio

Yuriy is a professional with 10 years of experience in the industry. He has extensive experience in Data Science, ML, R&D and software architecture. Yuriy is a developer of the ML platform DataRobot. Moreover, he has worked with the oral and written speech processing. He also teaches at CS@UCU, Kyivstar Big Data School, LITS.

Yuriy Guts

Machine Learning Engineer, DataRobot

ODSC EUROPE 2018 | September 19-22

Register Now

What To Expect

As we prepare out 2018 schedule take a look at some of our previous training and workshops we have hosted at ODSC Europe for an idea of what to expect.

  • High Performance, Distributed Spark ML, Tensorflow AI, and GPU

  • Machine Learning with R

  • Deep Learning with Tensorflow for Absolute Beginners

  • Algorithmic Trading with Machine and Deep Learning

  • Deep Learning in Keras

  • Deep Learning – Beyond the Basics

  • Running Intelligent Applications inside a Database: Deep Learning with Python Stored Procedures in SQL

  • Distributed Deep Learning on Hops

  • R and Spark with Sparklyr

  • Towards Biologically Plausible Deep Learning

  • Machine Learning with R

  • Deep Learning Ensembles in Toupee

  • A Gentle Introduction to Predictive Analytics with R

  • Deep Learning with Tensorflow for Absolute Beginners

  • Graph Data – Modelling and Quering with Neo4j and Cypher

  • Introduction to Data Science with R

  • Introduction to Python in Data Science

  • Machine Learning with R

  • High Performance, Distributed Spark ML, Tensorflow AI, and GPU

  • The Magic of Dimensionality Reduction

  • Analyze Data, Build a UI and Deploy on the Cloud with Apache Spark, Notebooks and PixieDust

  • Data Science for Executives

  • Data Science Learnathon. From Raw Data to Deployment: the Data Science Cycle with KNIME

  • Distributed Deep Learning on Hops

  • Distributed Deep Learning on Hops

  • Drug Discovery with KNIME

  • Interactive Visualisation with R (and just R)

  • Introduction to Algorithmic Trading

  • Introduction to Data Science – A Practical Viewpoint

  • Julia for Data Scientists

  • Machine Learning with R

  • R and Spark with Sparklyr

  • Running Intelligent Applications inside a Database: Deep Learning with Python Stored Procedures in SQL

  • Telling Stories with Data

  • Towards Biologically Plausible Deep Learning

  • Win Kaggle Competitions Using StackNet Meta Modelling Framework