Training & Workshop Sessions

– Taught by World-Class Data Scientists –

Learn the latest data science concepts, tools and techniques from the best. Forge a connection with these rockstars from industry and academic, who are passionate about molding the next generation of data scientists.

Highly Experienced Instructors

Our instructors are highly regarded in data science, coming from both academia and notable companies.

Real World Applications

Gain the skills and knowledge to use data science in your career and business, without breaking the bank.

Cutting Edge Subject Matter

Find training sessions offered on a wide variety of data science topics from machine learning to data visualization.

ODSC Training Includes

Form a working relationship with some of the world’s top data scientists for follow up questions and advice.

Additionally, your ticket includes access to 50+ talks and workshops.

High quality recordings of each session, exclusively available to premium training attendees.

Equivalent training at other conferences costs much more.

Professionally prepared learning materials, custom tailored to each course.

Opportunities to connect with other ambitious like-minded data scientists.

2018 Training Instructors

We have some of the top names in data science siged up to host  training and workshops sessions.  More instructors will be added weekly.

Workshop: Multivariate Time Series Forecasting Using Statistical and Machine Learning Models

Time series data is ubiquitous: weekly initial unemployment claim, daily term structure of interest rates, tick level stock prices, weekly company sales, daily foot traffic recorded by mobile devices, and daily number of steps taken recorded by a wearable, just to name a few.

Some of the most important and commonly used data science techniques in time series forecasting are those developed in the field of machine learning and statistics. Data scientists should have at least a few basic time series statistical and machine learning modeling techniques in their toolkit.

This lecture discusses the formulation Vector Autoregressive (VAR) Models, one of the most important class of multivariate time series statistical models, and neural network-based techniques, which has received a lot of attention in the data science community in the past few years, demonstrates how they are implemented in practice, and compares their advantages and disadvantages used in practice. Real-world applications, demonstrated using python, are used throughout the lecture to illustrate these techniques. While not the focus in this lecture, exploratory time series data analysis using histogram, kernel density plot, time-series plot, scatterplot matrix, plots of autocorrelation (i.e. correlogram), plots of partial autocorrelation, and plots of cross-correlations will also be included in the demo.

Instructor Bio

Jeffrey is the Chief Data Scientist at AllianceBernstein, a global investment firm managing over $500 billions. He is responsible for building and leading the data science group, partnering with investment professionals to create investment signals using data science, and collaborating with sales and marketing teams to analyze clients. Graduated with a Ph.D. in economics from the University of Pennsylvania, he has also taught statistics, econometrics, and machine learning courses at UC Berkeley, Cornell, NYU, the University of Pennsylvania, and Virginia Tech. Previously, Jeffrey held advanced analytic positions at Silicon Valley Data Science, Charles Schwab Corporation, KPMG, and Moody’s Analytics.

Jeffrey Yau, PhD

Chief Data Scientist, Alliance Bernstein

Training: Introduction to Machine Learning

Machine learning has become an indispensable tool across many areas of research and commercial applications. From text-to-speech for your phone to detecting the Higgs boson, machine learning excells at extracting knowledge from large amounts of data. This talk will give a general introduction to machine learning, as well as introduce practical tools for you to apply machine learning in your research. We will focus on one particularly important subfield of machine learning, supervised learning. The goal of supervised learning is to “”learn”” a function that maps inputs x to an output y, by using a collection of training data consisting of input-output pairs. We will walk through formalizing a problem as a supervised machine learning problem, creating the necessary training data and applying and evaluating a machine learning algorithm. The talk should give you all the necessary background to start using machine learning yourself.

Instructor Bio

Andreas is lecturer at the Data Science Institute at Columbia University and author of the O’Reilly book “Introduction to machine learning with Python,” which describes a practical approach to machine learning with python and scikit-learn. He is one of the core developers of the scikit-learn machine learning library, and he have been co-maintaining it for several years. Andreas is also a Software Carpentry instructor. In the past, he worked at the NYU Center for Data Science on open source and open science, and as Machine Learning Scientist at Amazon. Andreas’s mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize the access to high-quality machine learning algorithms.

Andreas Mueller, PhD,

Author, Lecturer, and Core contributor to scikit-learn

Training: Intermediate Machine Learning with scikit-learn

Scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners. This talk will cover some of the more advanced aspects of scikit-learn, such as building complex machine learning pipelines, model evaluation, parameter search, and out-of-core learning. Apart from metrics for model evaluation, we will cover how to evaluate model complexity, and how to tune parameters with grid search, randomized parameter search, and what their trade-offs are. We will also cover out of core text feature processing via feature hashing.

Instructor Bio

Andreas is lecturer at the Data Science Institute at Columbia University and author of the O’Reilly book “Introduction to machine learning with Python,” which describes a practical approach to machine learning with python and scikit-learn. He is one of the core developers of the scikit-learn machine learning library, and he have been co-maintaining it for several years. Andreas is also a Software Carpentry instructor. In the past, he worked at the NYU Center for Data Science on open source and open science, and as Machine Learning Scientist at Amazon. Andreas’s mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize the access to high-quality machine learning algorithms.

Andreas Mueller, PhD

Author, Lecturer, and Core contributor to scikit-learn

Workshop: Feature Engineering for Natural Language Processing

Have you ever been curious what features can be extracted from written text? Have you ever wondered what a syntactic dependency is or how it can be useful? During this workshop, we will dive into the language structure and use linguistic intuition to solve a natural language processing problem.

The plan of the workshop:

  1. Introduction to structural linguistics, specifically morphology and syntax
  2. Natural language corpora and ways to process them
  3. Feature design for an error correction task
  4. Ambiguities in natural language

Instructor Bio

Computational linguist passionate about building natural language processing applications. Mariana has professional experience in such areas as syntactic parsing, sentiment analysis, named entity recognition, fact extraction, text anonymization, and a few more. For the last three years, she has been working on error correction and text improvement algorithms at Grammarly. Mariana cares a lot about computational linguistics in Ukraine, constantly looks for talented linguists, and spreads the word about the field of NLP by participating at AI conferences, collaborating with Ukrainian universities and organizing educational events (courses, meetups, summer schools). Her main interest is structural linguistics as a method of formalizing the natural language.

Mariana Romanyshyn

Technical Lead, Grammarly Inc.

Training: Feature Engineering for Time Series Data

Most machine learning algorithms today are not time-aware and are not easily applied to time series and forecasting problems. Leveraging algorithms like XGBoost, or even linear models, typically requires substantial data preparation and feature engineering – for example, creating lagged features, detrending the target, and detecting periodicity. The preprocessing required becomes more difficult in the common case where the problem requires predicting a window of multiple future time points. As a result, most practitioners fall back on classical methods, such as ARIMA or trend analysis, which are time-aware but often less expressive. This talk covers practices for solving this challenge and exploring the potential to automate this process in order to apply advanced machine learning algorithms time series problems.

Instructor Bio

Michael Schmidt is the Chief Scientists at DataRobot, and has been featured in the Forbes list of the world’s top 7 data scientists and MIT’s list of the most innovative 35-under-35. He has authored AI research in the journal Science and has appeared in media outlets such as the New York Times, NPR’s RadioLab, the Science Channel, and Communications of the ACM. In 2011, Michael founded Nutonian and led the development Eureqa, a machine learning application and service used by over 80,000 users and later acquired by DataRobot in 2017. Most recently, his work has focused on automated machine learning, feature engineering, and time series prediction.

Michael Schmidt, PhD

Chief Scientist, DataRobot

Training: Advanced Machine Learning with scikit-learn Part I

Abstract Coming Soon

Instructor Bio

Andreas is lecturer at the Data Science Institute at Columbia University and author of the O’Reilly book “Introduction to machine learning with Python,” which describes a practical approach to machine learning with python and scikit-learn. He is one of the core developers of the scikit-learn machine learning library, and he have been co-maintaining it for several years. Andreas is also a Software Carpentry instructor. In the past, he worked at the NYU Center for Data Science on open source and open science, and as Machine Learning Scientist at Amazon. Andreas’s mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize the access to high-quality machine learning algorithms.

Andreas Mueller, PhD

Author, Lecturer, and Core contributor to scikit-learn

Training: Advanced Machine Learning with scikit-learn Part II

Abstract Coming Soon

Instructor Bio

Andreas is lecturer at the Data Science Institute at Columbia University and author of the O’Reilly book “Introduction to machine learning with Python,” which describes a practical approach to machine learning with python and scikit-learn. He is one of the core developers of the scikit-learn machine learning library, and he have been co-maintaining it for several years. Andreas is also a Software Carpentry instructor. In the past, he worked at the NYU Center for Data Science on open source and open science, and as Machine Learning Scientist at Amazon. Andreas’s mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize the access to high-quality machine learning algorithms.

Andreas Mueller, PhD

Author, Lecturer, and Core contributor to scikit-learn

Training: Machine Learning in R Part I

Modern statistics has become almost synonymous with machine learning, a collection of techniques that utilize today’s incredible computing power. This two-part course focuses on the available methods for implementing machine learning algorithms in R, and will examine some of the underlying theory behind the curtain. We start with the foundation of it all, the linear model and its generalization, the glm. We look how to assess model quality with traditional measures and cross-validation and visualize models with coefficient plots. Next we turn to penalized regression with the Elastic Net. After that we turn to Boosted Decision Trees utilizing xgboost. Attendees should have a good understanding of linear models and classification and should have R and RStudio installed, along with the `glmnet`, `xgboost`, `boot`, `ggplot2`, `UsingR` and `coefplot` packages.

Instructor Bio

Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.

He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.

Jared Lander

Statistics Professor, Columbia University, Author of R for Everyone

Training: Machine Learning in R Part II

Modern statistics has become almost synonymous with machine learning, a collection of techniques that utilize today’s incredible computing power. This two-part course focuses on the available methods for implementing machine learning algorithms in R, and will examine some of the underlying theory behind the curtain. We start with the foundation of it all, the linear model and its generalization, the glm. We look how to assess model quality with traditional measures and cross-validation and visualize models with coefficient plots. Next we turn to penalized regression with the Elastic Net. After that we turn to Boosted Decision Trees utilizing xgboost. Attendees should have a good understanding of linear models and classification and should have R and RStudio installed, along with the `glmnet`, `xgboost`, `boot`, `ggplot2`, `UsingR` and `coefplot` packages.

Instructor Bio

Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.

He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.

Jared Lander

Statistics Professor, Columbia University, Author of R for Everyone

Training: Training Session: Learn D3 essentials to get started with web based data visualization

One of the most popular frameworks today for web-based data visualizations is D3. In this workshop you will learn how to leverage to most common parts of the D3 framework to create data visualizations. This will in include making selections, working with data, creating shapes, and more. Although the emphasis is on learning D3, some general visualization design principles will be touched upon briefly as well, to help you make better informed design decisions.

Instructor Bio

Jan Willem Tulp is a independent Data Experience Designer from The Netherlands. With his company TULP interactive he creates custom data visualizations for a variety of clients. He has helped clients such as Google, European Space Agency, Scientific American, Nature and World Economic Forum by creating visualizations, both interactive and in print. His work has appeared in several books and magazines and he speaks regularly at international conferences.

Jan Willem Tulp

Data Experience Designer, TULP Interactive

Training: The New Science of Big Data Analytics, Based on the Geometry and the Topology of Complex, Hierarchic Systems

These foundations of Data Science are solidly based on mathematics and computational science. The hierarchical nature of complex reality is part and parcel of this new, mathematically well-founded way of observing and interacting with (physical, social and all) realities.

These lectures include pattern recognition and knowledge discovery, computer learning and statistics.  Addressed is how geometry and topology can uncover and empower the semantics of data. Key themes include: text mining; computational linear time hierarchical clustering, search and retrieval; the Correspondence Analysis platform that performs latent semantic factor space mapping, and accompanying hierarchical clustering.

Various application domains are covered in the case studies.  These include in text mining, literary text, and social media – Twitter; and clustering in astronomy, chemistry, psychoanalysis.  Final discussion is in regard to the increasingly important domains of smart environments, Internet of Things, health analytics, and further general scope of Big Data.

Instructor Bio

Fionn Murtagh is Professor of Data Science and was Professor of Computer Science, including Department Head, in many universities. Following his primary degres in Mathematics and Engineering Science, before his MSc in Computer Science, that was in Information Retrieval, in Trinity College Dublin, his first position as Statistician/Programmer was in national level (first and second level) education research. PhD in Université P&M Curie, Paris 6, with Prof. Jean-Paul Benzécri, was in conjunction with the national geological research centre, BRGM. After an initial 4 years as lecturer in computer science, there was a period in atomic reactor safety in the European Joint Research Centre, in Ispra (VA), Italy. On the Hubble Space Telescope, as a European Space Agency Senior Scientist, Fionn was based at the European Southern Observatory, in Garching, Munich for 12 years. For 5 years, Fionn was a Director in Science Foundation Ireland, managing mathematics and computing, nanotechnology, and introducing and growing all that is related to environmental science and renewable energy.

Fionn was Editor-in-Chief of the Computer Journal (British Computer Society) for more than 10 years, and is an Editorial Board member of many journals. With over 300 refereed articles and 30 books authored or edited, his fellowships and scholarly academies include: Fellow of: British Computer Society (FBCS), Institute of Mathematics and Its Applications (FIMA), International Association for Pattern Recognition (FIAPR), Royal Statistical Society (FRSS), Royal Society of Arts (FRSA). Elected Member: Royal Irish Academy (MRIA), Academia Europaea (MAE). Senior Member IEEE.

Fionn Murtagh, PhD

Director, Centre of Mathematics and Data Science, University of Huddersfield

Training: Getting to Grips with the Tidyverse (R)

The tidyverse is essential for any statistician or data scientist who deals with data on a day-to-day basis. By focusing on small key tasks, the tidyverse suite of packages removes the pain of data manipulation. In this tutorial, we’ll cover some of the core features of the tidyverse, such as dplyr (the workhorse of the tidyverse), string manipulation, linking directly to databases and the concept of tidy data.

Instructor Bio

Dr. Colin Gillespie is Senior lecturer (Associate Professor) at Newcastle University, UK. His research interests are high performance statistical computing and Bayesian statistics. He is regularly employed as a consultant by Jumping Rivers and has been teaching R since 2005 at a variety of levels, ranging from beginners to advanced programming.

Dr. Colin Gillespie

Author of Efficient R Programming, R Trainer and Consultant, and Senior Lecturer at Newcastle University

Training: The Path to Deep Learning with TensorFlow + Keras

Agenda • Why do we need to create our own models? • Introduction to Deep Learning • Lab: The “Hello World” of TensorFlow + Keras: Logistic Regression
• Convolutional Neural Networks: At last! Real Deep Learning • Lab: Computer Vision with CNNs • Beyond Computer Vision

Target Audience
Developers interested in building deep learning models, and researchers interested in comparing the specific implementation with other frameworks. No previous experience is required, as all concepts will be introduced in the theory modules of the workshop; however, a minimum knowledge of Machine Learning concepts and practices (such as understanding the train / test / validation cycle, etc…) would be beneficial.
Practice

The following labs will be done during the course of the workshop:
• Environment set up • Basic Logistic Regression • MNIST classifier (guided) o Logistic Regression o CNN • Playing with the hyperparameters: o Minibatch sizes o Learning Rates • MNIST classifier challenge (your turn!)

Requirements
We will perform the installation of the required wheels for using TensorFlow as part of the labs, but having the following pre-requisites installed will save time and potential issues
during the workshop: • Anaconda distribution with Python 3.5 environment • Python IDE (VSCode recommended)• Git client”

Instructor Bio

Dr. Colin Gillespie is Senior lecturer (Associate Professor) at Newcastle University, UK. His research interests are high performance statistical computing and Bayesian statistics. He is regularly employed as a consultant by Jumping Rivers and has been teaching R since 2005 at a variety of levels, ranging from beginners to advanced programming.

Juliet Moreiro Bockhop

Support Engineer, Microsoft

Training: Data science applications in quantitative finance

In this training session I will give an introduction to quantitative finance applications for data scientists. I will start by discussing a few fundamental ideas such as the efficient market hypothesis and the capital asset pricing model. I will also introduce some concepts on investment strategies including portfolio theory and smart beta investing. Many such strategies are based on fundamental factors revealed by academic research, such as value or profitability, which can yield excess returns when suitably applied. There is currently a lot of interest to use alternative data sets and machine learning tools to uncover further factors, making the field exciting for data scientists.

Throughout the training session we will work on some simplified hands-on examples to illustrate the main ideas.

We will discuss the setup and subtleties of backtesting for investment strategies and define relevant metrics. Typically, a major challenge is to avoid overfitting, which we will address as well.

Although the focus of the session is investment strategies, I will also illustrate a few other machine learning applications in finance from my experience.

Participants don’t need to have prior experience in finance. However, familiarity with python and the numpy, pandas, scikit-learn ecosystem would be very beneficial.

Instructor Bio

Johannes is Data Analytics Associate Director at IHS Markit, a global information and intelligence provider. He technically manages multiple data science projects across various business lines including finance, the automotive industry and the energy sector. He has a keen interest in the full data science spectrum including mathematical statistics, machine learning, databases, distributed computing, and dynamic visualizations.

He holds a PhD in theoretical condensed matter physics from Imperial College and has been active in quantitative research for more than 10 years including research positions at Harvard University and the Max-Planck Institute in Germany. He has disseminated his research in more than 30 peer reviewed publications and over 50 talks.

Johannes Bauer, PhD

Data Analytics Associate Director, IHS Markit

Training: From prediction to prescription: multi-objective optimization with PyGMO

With greater machine learning and big data software from the open source community, it is now easier than ever to build powerful predictive models to help people making decisions. On the other hand, even if one can predict the outcome accurately, it is not always trivial to make real life decisions, especially when there are a large number of choices and trade-off between multiple objectives. Bio-inspired optimization algorithms can be a great combination with the predictive modelling techniques under such scenario, allowing efficient exploration of the multi-dimensional choice space to find out the optimal frontier of the key objectives. Decision makers can then focus on these objectives rather than worrying about the choices at the execution level.

In this session we are going to walk through an example of multi-objective optimization problem in the context of a promotion campaign, using the open source package PyGMO (the Python Parallel Global Multiobjective Optimizer) from ESA. We will first briefly touch upon how to build a propensity model for such marketing activities. Then we will see how to optimize our promotion strategy with PyGMO, based on the prediction of the propensity model. We will also go a bit into the details of various algorithms available in PyGMO, as well as how to handle constraints.

Instructor Bio

Dr Jiahang Zhong is the leader of the data science team at Zopa, one of UK’s earliest fintech company. He has broad experience of data science projects in credit risk, operational optimization and marketing, with keen interests in machine learning, optimization algorithms and big data technologies. Prior to Zopa, he worked as a PhD and Postdoctoral researcher on the Large Hadron Collider Project at CERN, with focus on data analysis, statistics and distributed computing

Jiahang Zhong, PhD

Head of Data Science, Zopa Ltd

Training: Interactive data visualisation in python

When creating complex visualisations, interactivity can help communicate your core concepts. It allows the audience to familiarise themselves with the data, and makes understanding the data a step along the journey of understanding your visualisation.

However, creating interactive visualisations adds a layer of complexity to the data science workflow: during modelling and data exploration, interactivity is effectively achieved by re-running chunks of code with different parameters. Giving the reader the ability to achieve the same interactivity without having to change and re-run code therefore requires extra development from the data scientist.

In this workshop, we will go through python libraries that make this extra development as frictionless as possible, and produce interactive visualisations with as little code as possible. We will also go through options for producing interactivity for the wider public, and what steps need to be taken to achieve resilient interactive graphs.
Libraries to be used include ipywidgets, plotly, and plotly dash.

Instructor Bio

Dr. Jan Freyberg is a data scientist at ASI. He has worked on data science projects in the private and public sector, and his experience ranges from geospatial to unstructured language data. He is an expert in building interactive tools for communicating complex models, and is active in developing open-source data science software.

Jan completed a PhD and a fellowship studying brain activity, vision and consciousness in autism at the University of Cambridge and King’s College London, where he taught statistics and programming at undergraduate and postgraduate level.

Dr. Jan Freyberg

Data Scientist, ASI

Training: Data Science 101

Curious about Data Science? Self-taught on some aspects, but missing the big picture? Well you’ve got to start somewhere and this session is the place to do it. This session will cover, at a layman’s level, some of the basic concepts of data science. In a conversational format, we will discuss: What are the differences between Big Data and Data Science – and why aren’t they the same thing? What distinguishes descriptive, predictive, and prescriptive analytics? What purpose do predictive models serve in a practical context? What kinds of models are there and what do they tell us? What is the difference between supervised and unsupervised learning? What are some common pitfalls that turn good ideas into bad science? During this session, attendees will learn the difference between k-nearest neighbor and k-means clustering, understand the reasons why we do normalize and don’t overfit, and grasp the meaning of No Free Lunch.

Instructor Bio

For more than 20 years, Todd has been highly respected as both a technologist and a trainer. As a tech, he has seen that world from many perspectives: “data guy” and developer; architect, analyst and consultant. As a trainer, he has designed and covered subject matter from operating systems to end-user applications, with an emphasis on data and programming. As a strong advocate for knowledge sharing, he combines his experience in technology and education to impart real-world use cases to students and users of analytics solutions across multiple industries. He is a regular contributor to the community of analytics and technology user groups in the Boston area, writes and teaches on many topics, and looks forward to the next time he can strap on a dive mask and get wet. Todd is a Data Science Evangelist at DataRobot.

Todd Cioffi

Data Science Evangelist, DataRobot

Workshop: Coming soon

Abstract Coming Soon

Instructor Bio

Yves has a Ph.D. in Mathematical Finance and is the founder and managing partner of The Python Quants GmbH. He is also the author of the books Python for Finance, Derivatives Analytics with Python and Listed Volatility & Variance Derivatives. He lectures for Data Science at htw saar University of Applied Sciences and for Computational Finance at the CQF Program and is the organizer of the Python for Quant Finance Meetup in London.

Yves Hilpisch, PhD

Founder, Quant University, Lecturer, and Author of Derivatives Analytics with Python and Python for Finance

Workshop: Probabilistic Graphical Models using PGMPY

Instructor Bio

Harish Kashyap has a Masters from Northeastern University, Boston in Electrical Engineering. He received the Graduate Student Award, a research scholarship as a part of which he worked at the BBN Technologies in the area of Bayesian Machine Learning algorithms as applied to Speech Recognition. He has more than 15 years of experience in the areas of Artificial Intelligence (AI), Digital Signal Processing and Software development. He has several publications and patents filed in the areas of ML. He has led various Data Analytics projects across US and Europe that led to organizational savings of $8M+. He has built out the curriculum for Machine Learning training platform, refactored.ai which is now part of SUNY Buffalo’s graduate ML course. He  is currently the founder of Mysuru Consulting Group (MCG.ai) and Diagram.AI where he works on ML algorithms

Harish Kashyap

CEO, Mysuru Consulting Group

Workshop: Telling human stories with data

Robust data analysis underpins every business decision, public sector project and non-profit initiative. But data in its raw form often fails to convince crucial lay audiences – either due to its complexity, or due to suspicion and mistrust. And you can’t help get guide the world in the right direction made if you alienate key decision-makers.
Visualisation and narrative storytelling offer a path to bringing the numbers to life and persuading your audience – without losing the the crucial truth that’s in the data.

This workshop, delivered by journalist and data visualisation specialist Alan Rutter, will cover an audience-centred approach to visualising data. It will introduce tried-and-tested techniques for communicating data-driven stories effectively to people from a broad range of backgrounds, and deal with some of the common problems that practitioners encounter.
We will discuss how to avoid landing at either extreme of the data visualisation spectrum: bullet-proof academic analysis that fails to communicate a message; or fantastic aesthetic creations that actually say nothing at all.
We will look at the ethics of data visualisation, and how to be transparent about underlying data and methodologies. And we will discuss the considerations of different media, and how to tailor your visual approach to both your channels and audience. We will also look at how human beings interpret data, and how we can use design and psychology to our advantage.

There is no focus on a specific tool or language, as it is far more important to have a clear idea of the intention and desired action than to obsess over a given language or piece of software. This workshop is suited to anyone who wants to create impact with the data they work with by turning it into compelling stories for other audiences – whether through printed materials, presentations, social media, or websites and apps. 

Instructor Bio

Alan Rutter is the co-founder of consultancy Clever Boxer. He first worked with infographics as a magazine journalist (Time Out, WIRED), before moving into technology roles (Condé Nast, Net-A-Porter) and then training and development (The Guardian, General Assembly). He has taught data visualisation techniques to thousands of students, and for organisations including the Home Office, Department of Health, Biotechnology and Biosciences Research Council, Capita, Novartis and Kings College London.

Alan Rutter

Data Visualization Consultant, Trainer at General Assembly and Co-founder at Clever Boxer

Workshop: Understanding unstructured data with Language Models

As data scientists, we’ve seen a rapid improvement in the last decades in the tools available for working with structured data (be it tabular data, graph data, sensor data etc.). Yet, the vast majority of our data (Merrill Lynch puts the figure at roughly 90%) is *unstructured*, and lives in the form of documents, emails, reviews, reports and chat logs etc.

Many of us are far less familiar with how to analyse and understand this trove of unstructured data. This talk focuses on language models, one of the most fundamental tools for working with unstructured data. Language models are all around us (although we’re probably unaware of them), underpinning everything from Word’s spellchecker to home assistants like Alexa.

While plenty of “out of the box” language modelling libraries exists, the first part of the talk focuses on getting an thorough understanding of what a language model is, and how it works. We’ll touch on key ideas from statistics and information theory, and see how Alan Turing, in developing techniques to break Nazi codes at Bletchley Park, created the smoothing techniques which remain widely used in language models today. We’ll then proceed to the present day, looking at how techniques like word vectors and transfer learning have yielded an improved generation of tools.

In the second half of the talk, we’ll look at how we can practically use language models to understand unstructured data. Specifically we’ll explore:

– Classification – the canonical application of language models, they can help us identify spam, analyse sentiment or perform unsupervised clustering. We’ll look at a famous case where language models were able to successfully identify a Shakespeare forgery.
– Predictive modelling – if I were to look at your Tweets (and nothing else), could I guess your gender? It turns out state-of-the-art techniques can successfully predict it with an 80%+ success rate. We’ll look at how language models can enrich your datasets with additional demographic or contextual data.
– Information retrieval – finally, we’ll see how language models have been used extensively (for example in the legal sector), to extract targeted insights from enormous data sets.

 

Instructor Bio

Alex Peattie is the co-founder and CTO of Peg, a technology platform helping multinational brands and agencies to find and work with top YouTubers. Peg is used by over 2000 organisations worldwide including Coca-Cola, L’Oreal and Google.

An experienced digital entrepreneur, Alex spent six years as a developer and consultant for the likes of Grubwithus, Huckberry, UNICEF and Nike, before joining coding bootcamp Makers Academy as senior coach, where he trained hundreds of junior developers. Alex was also a technical judge at the 2017 TechCrunch Disrupt conference.

Alex Peattie

CTO, Peg

Workshop: Training Gradient Boosting models on large datasets with CatBoost

Gradient boosting is a powerful machine-learning technique that achieves state-of-the-art results in a variety of practical tasks. For a number of years, it has remained the primary method for learning problems with heterogeneous features, noisy data, and complex dependencies: web search, recommendation systems, weather forecasting, and many others.
Dataset sizes grow all the time. To train gradient boosting on a large dataset you need an efficient and scalable algorithm. CatBoost is a new gradient boosting library, that has a set of advantages including its very fast and efficient GPU training and support of Spark for distributed training.

In this tutorial we will walk you through main features of the library – we will explain how to train the model, how to analyze it, how to deal with overfitting. We will show examples of training with categorical features and will show how this improves the quality of the resulting model. We will also learn how to train the model on Spark so that you can train your model on very large datasets with a few lines of python code.

 

Instructor Bio

Anna Veronika Dorogush graduated from the Faculty of Computational Mathematics and Cybernetics of Lomonosov Moscow State University and from Yandex School of Data Analysis. She used to work at ABBYY, Microsoft, Bing and Google, and has been working at Yandex since 2015, where she currently holds the position of the head of Machine Learning Systems group and is leading the efforts in development of the CatBoost library.

Anna Veronika Dorogush

ML Lead, Yandex

Workshop: How to visualize your data: Beyond the eye into the brain

There are decisions neither humans nor computers can make alone. Computer-supported visualizations are a powerful decision-making tool when we need to reason based on complex data. This workshop focuses on how we can achieve an effective visual data analysis. First, we visualize data based on the capabilities of human vision: which visual mappings for which data and tasks. Second, we visualize data based on the capabilities of the human brain: how much information we can actually process and how cognitive biases can sneak into the process.

Instructor Bio

Evanthia Dimara is a postdoctoral research scientist at the Institute for Intelligent Systems and Robotics (ISIR) Laboratory (HCI group) of Sorbonne University. Her fields of research are human-computer interaction and information visualization. Her focus is on decision making — how to help people make unbiased and informed decisions alone or in groups.

Evanthia Dimara, PhD

Research Scientist, Institute for Intelligent Systems and Robotics

Workshop: AI for Executives Part I

Gain insight into how to drive success in data science. Identify key points in the machine learning life cycle where executive oversight really matters. Learn effective methods to help your team deliver better predictive models, faster. You’ll leave this seminar able to identify business challenges well suited for machine learning, with fully defined predictive analytics projects your team can implement now to improve operational results.

 

Instructor Bio

John Boersma is Director of Education for DataRobot. In this role he oversees the company’s client training operations and relations with academic institutions using DataRobot in analytics courses. Previously, John founded and led Adapt Courseware, an adaptive online college curriculum venture. John holds a PhD in computational particle physics and an MBA in general management.

John Boersma

Director of Education, DataRobot

Workshop: AI for Executives Part II

Gain insight into how to drive success in data science. Identify key points in the machine learning life cycle where executive oversight really matters. Learn effective methods to help your team deliver better predictive models, faster. You’ll leave this seminar able to identify business challenges well suited for machine learning, with fully defined predictive analytics projects your team can implement now to improve operational results.

 

Instructor Bio

John Boersma is Director of Education for DataRobot. In this role he oversees the company’s client training operations and relations with academic institutions using DataRobot in analytics courses. Previously, John founded and led Adapt Courseware, an adaptive online college curriculum venture. John holds a PhD in computational particle physics and an MBA in general management.

John Boersma

Director of Education, DataRobot

Workshop: Competitive Model Stacking: An Introduction to Stacknet Meta Modelling Framework

StackNet is a computational, scalable and analytical framework mainly implemented in Java that resembles a feedforward neural network and uses Wolpert’s stacked generalization in multiple levels to improve the accuracy of predictions. StackNet will be demonstrated through practical examples.

 

Instructor Bio

Marios Michailidis is a Research data scientist at H2O.ai . He holds a Bsc in accounting Finance from the University of Macedonia in Greece and an Msc in Risk Management from the University of Southampton. He has also nearly finished his PhD in machine learning at University College London (UCL) with a focus on ensemble modelling. He has worked in both marketing and credit sectors in the UK Market and has led many analytics’ projects with various themes including: Acquisition, Retention, Recommenders, Uplift, fraud detection, portfolio optimization and more.

He is the creator of KazAnova(http://www.kazanovaforanalytics.com/), a freeware GUI for credit scoring and data mining 100% made in Java as well as is the creator of StackNet Meta-Modelling Framework (https://github.com/kaz-Anova/StackNet). In his spare time he loves competing on data science challenges and was ranked 1st out of 500,000 members in the popular Kaggle.com data competition platform. Here (http://blog.kaggle.com/2016/02/10/profiling-top-kagglers-kazanova-new-1-in-the-world/) is a blog about Marios being ranked at the top in Kaggle and sharing his knowledge with tricks and ideas.

Marios Michailidis, PhD

Ranked #1 On Kaggle.Com, Data Scientist at H20.ai

Tutorial: Scale your AI

This talk presents the challenges that industry faces in adopting AI. Specifically, focusing on how to scale AI. We deep dive into training large scale models, including looking at modeling and infrastructure aspects.

Some people say “AI can be an existential threat” while in reality, as AI scientists, we carry the huge responsibility of finding solutions to the most life-threatening problems of our generation. For instance, self-driving car technology could reduce millions of death in road crashes, and machine learning makes mass population early cancer screening possible that can significantly increases the survival rate of cancer patients.

While AI research progress has accelerated over the past years, its wide adoption has been hampered due to scaling challenges. In this talk, I will present the challenges that I found during years of academic and industrial experience with machine learning and computer vision. Specifically, I will dive deep into the scaling challenges that industry faces, including scaling expertise, data, computation and algorithms.

Instructor Bio

Bio Coming Soon

Mohsen Hejrati, PhD

Co-founder, CEO Clusterone

 

Workshop: Network Analysis using Python

Daenerys or Jon Snow? JFK, ORD or ATL, do these codes look familiar? In this tutorial we build up on the fundamentals of network science and look at various applications of network analysis to real world datasets like the US Airport Dataset, Game of Thrones character co-occurrence network, and foray into algorithms and application of network science.

In this tutorial we will go through an introduction to network science and 2 case studies, i.e. datasets of applied network analysis. Participants should be comfortable with python and the basic pydata stack [pandas, matplotlib]

By the end of the tutorial everyone should be comfortable with hacking on the NetworkX API, modelling data as networks and basic analysis on networks using python.

repo for notebooks: https://github.com/MridulS/pydata-networkx

 

Instructor Bio

Mridul is trying to understand the world around us using data and tools from network science, currently at UCLouvain. He is an open source enthusiast and has been selected for Google Summer of Code, and is involved with NumFOCUS. He has given multiple tutorials at PyCon, PyData, and EuroScipy to share his love of Python and network science.

Mridul Seth

Research Assistant, Université catholique de Louvain, ICTEAM

Workshop: A.I and Cyber Security

This workshop will explore the application of Machine Learning and A.I techniques as a Cyber Defence solution. Current Cyber defence system are clearly struggling with the volume and sophistication of cyber-attacks. This is fundamentally a big data and weak signal problem. It is thus well suited to the application of machine learning technology to help identify, classify and manage cyber threats.
If well applied, A.I has the potential to significantly reduce the costs and human burden of cyber defence. It has most value in its ability to enhance and amplify human analyst’s capacity to process the flood of threat data within a security operations context. It therefore represents an area of significant commercial growth potential. At the same time, advanced Machine Learning (ML) techniques are being applied by the hacker community to plan and execute sophisticated attacks against ICT systems. An arms race is now in progress to develop ML driven attacks and counter measures. Many state actors now see A.I as a geo-political imperative and have assigned significant resources to develop their capacity in the field. One of the key drivers for this is to gain advantage in offensive cyber security processes. History has shown that once developed such state tools tend to leak into the criminal domain, and become a major problem.
The key approach presented is how by combining advanced Visual Analytic methods with ML we can gain an advantage over the attack threats. We will explore how the application of ML has evolved and how these techniques add value to augmenting the capacity of human analysts. We will also address some of the business drivers, including the skills gap and how far ML should be applied, or if a fully automated approach is desirable.

 

Instructor Bio

Bio Coming Soon

Robert Hercock, PhD

Chief Researcher, BT Labs

Workshop: Target leakage in machine learning

Target leakage is one of the most difficult problems in developing real-world machine learning models. Leakage occurs when the training data gets contaminated with information that will not be known at prediction time. Additionally, there can be multiple sources of leakage, from data collection and feature engineering to partitioning and model validation. As a result, even experienced data scientists can inadvertently introduce leaks and become overly optimistic about the performance of the models they deploy. In this talk, we will look through real-life examples of data leakage at different stages of the data science project lifecycle, and discuss various countermeasures and best practices for model validation.

 

Instructor Bio

Yuriy is a professional with 10 years of experience in the industry. He has extensive experience in Data Science, ML, R&D and software architecture. Yuriy is a developer of the ML platform DataRobot. Moreover, he has worked with the oral and written speech processing. He also teaches at CS@UCU, Kyivstar Big Data School, LITS.

Yuriy Guts

Machine Learning Engineer, DataRobot

Workshop: Sentiment analysis with deep learning, machine learning or lexicon based? You choose!

Do you want to know what your customers, users, contacts, or relatives really think? Find out by building your own sentiment analysis application.

In this workshop you will build a sentiment analysis application, step by step, using KNIME Analytics Platform. After an introduction to the most common techniques used for sentiment analysis and text mining we will work in three groups, each one focusing on a different technique.

Deep Learning. This group will work with the visual Keras deep learning integration available in KNIME (completely code free)
Machine Learning. This group will use other machine learning techniques, based on native KNIME nodes
Lexicon Based. This group will focus on a lexicon based approach for sentiment analysis

Workshop Requirements:

Your own laptop preinstalled with KNIME Analytics Platform, which you can download from the KNIME website
KNIME Textprocessing extension. See video link, below about installing KNIME extensions. https://www.youtube.com/watch?v=8HMx3mjJXiw

Help Installing KNIME Analytics Platform:

Here are some links to YouTube videos to help you install KNIME Analytics Platform:
Windows https://www.youtube.com/watch?v=yeHblDxakLk&feature=youtu.be
Mac https://www.youtube.com/watch?v=1jvRWryJ220&feature=youtu.be
Linux https://www.youtube.com/watch?v=wibggQYr4ZA&feature=youtu.be

Instructor Bio

Rosaria Silipo has been mining data, big and small, since her master degree in 1992. She kept mining data throughout all her doctoral program, her postdoctoral program, and most of her following job positions. So many years of experience and passion for data analytics, data visualization, data manipulation, reporting, business intelligence, and KNIME tools, naturally led her to become a principal data scientist and an evangelist for data science at KNIME.

Rosaria Silipo, PhD

Principal Data Scientist, KNIME

Workshop: Sentiment analysis with deep learning, machine learning or lexicon based? You choose!

Do you want to know what your customers, users, contacts, or relatives really think? Find out by building your own sentiment analysis application.

In this workshop you will build a sentiment analysis application, step by step, using KNIME Analytics Platform. After an introduction to the most common techniques used for sentiment analysis and text mining we will work in three groups, each one focusing on a different technique.

Deep Learning. This group will work with the visual Keras deep learning integration available in KNIME (completely code free)
Machine Learning. This group will use other machine learning techniques, based on native KNIME nodes
Lexicon Based. This group will focus on a lexicon based approach for sentiment analysis

Workshop Requirements:

Your own laptop preinstalled with KNIME Analytics Platform, which you can download from the KNIME website
KNIME Textprocessing extension. See video link, below about installing KNIME extensions. https://www.youtube.com/watch?v=8HMx3mjJXiw

Help Installing KNIME Analytics Platform:

Here are some links to YouTube videos to help you install KNIME Analytics Platform:
Windows https://www.youtube.com/watch?v=yeHblDxakLk&feature=youtu.be
Mac https://www.youtube.com/watch?v=1jvRWryJ220&feature=youtu.be
Linux https://www.youtube.com/watch?v=wibggQYr4ZA&feature=youtu.be

Instructor Bio

Kathrin Melcher is a Data Scientist at KNIME. She holds a Master Degree in Mathematics obtained at the University of Konstanz, Germany. She joined the Evangelism team at KNIME in 2017. She has a strong interest in data science, machine learning and algorithms, and enjoys teaching and sharing her knowledge about it.

Kathrin Melcher

Data Scientist,  KNIME

Workshop: Transfer and Multi-task Learning

In this workshop, we will explore the problem of transfer learning in batch learning and online learning setting. The goal of transfer learning is to either a) generalize learning to a different (but related) domain/task with the knowledge of existing domains/tasks or b) learn multiple related domains/tasks together in the hope that one may learn them better and quicker by sharing the knowledge. Here domain is the space from which features are sampled, and a task is the decision function that we hope to learn from the labeled data. Consider a subfield of transfer learning called as multi-task learning where very few labeled examples for the task in hand (also called as target task) are available but ample of labeled data is available for a different but related task (also called as source task). Can one use the labeled data of the source task to learn the target task better? If yes, how can one transfer the learning from source task to the target task? There are other similar problems in transfer learning like domain adaptation, domain generalization, zero-shot learning, etc. Transfer learning also has exciting applications in the area of multi-armed bandits and reinforcement learning.

In the workshop, first, I will introduce the problem of transfer learning and give motivating applications. Second, I will describe the mathematical framework and provide details of various sub-fields within transfer learning. Third, I will talk about the details of algorithms in some of the sub-fields, and it’s practical applications. Third, I will give details about how one might use these algorithms practically for their problems.

 

Instructor Bio

Bio Coming Soon

Aniket Anand Deshmukh

PhD Researcher, Uni. of Michigan

Workshop: A Gentle Introduction to Survival Models with Applications in Python and R

Survival/duration models are common ways to model the probability to fail/survive at each period in your data set. Though they are common in certain fields in economics, econometrics and biology, they are less commonly applied in data science, despite them often being the most appropriate approach to a problem.

This workshop will start with a theoretical introduction on basic non-parametric, semi-parametric and parametric models such as Kaplan-Meier, Cox Proportional Hazard (with and without time-varying covariates), and Aalen additive model, and random survival forests. In the second part of the workshop, we will look at how we can apply these models in Python and R. 

Instructor Bio

Violeta has been working as a data scientist in the Data Innovation and Analytics department in ABN AMRO bank located in Amsterdam, the Netherlands.In her daily job, she works on projects with different business lines applying latest machine learning and advanced analytics technologies and algorithms. Before that, she worked for about 1.5 years as a data science consultant in Accenture, the Netherlands. Violeta enjoyed helping clients solve their problems with the use of data and data science but wanted to be able to develop more sophisticated tools, therefore the switch.

Before her position at Accenture, she worked on her PhD, which she obtained from Erasmus University, Rotterdam in the area of Applied Microeconometrics.In her research she used data to investigate the causal effect of negative experiences on human capital, education, problematic behavior and crime commitment.

Violeta Misheva, PhD

Data Scientist, ABN AMRO Bank N.V.

Workshop: How to learn many, many labels with Machine Learning

Classification is one of the most common type of supervised machine learning tasks. In this talk we will talk about the challenges that arise when one has hundreds or even thousands of distinct classes to deal with, from both performance and engineering perspectives.

 

Instructor Bio

Bio Coming Soon

Yuriy Guts

Chief Scientist, Evolution AI

Workshop: How to Build High Performing Weighted XGBoost ML Model for Real Life Imbalance Dataset

Creating end to end ML Flow and Predict Financial Purchase for Imbalance financial data using weighted XGBoost code pattern is for anyone who is also interested in using XGBoost and creating Scikit-Learn based end to end machine learning pipeline for the real dataset where class imbalances are very common.

1. Introduction and Background.
2. Data Set Description.
3. Statement of Classification Problem.
4. Software and Tools(Xgboost and Scikit Learn).
5. Visual Data Exploration to understand data (Using seaborn and matplotlib).

We will also explore output and will note the class imbalance issues.

6. Create Scikit learn ML Pipelines for Data Processing.
7. Model Training.
7.1 What and Why of XGBoost.
7.2 Discuss Metrics for Model Performance.
7.3 First Attempt at Model Training and it’s performance, evaluation and analysis.
7.4 Strategy For Better Classifier for the Imbalance Data.
7.5 Second Attempt at Model Training using Weighted Samples and it’s performance, evaluation and analysis.
7.6 Third Attempt at Model Training using Weighted Samples and Feature Selection and it’s performance analysis.
8. Inference Discussion
9. Summary 
10. Pointers to Other Advanced Techniques

Instructor Bio

Alok Singh is a Principal Engineer at the IBM CODAIT (Center for Open-Source Data and AI Technologies). He has built and architected multiple analytical frameworks and implemented machine learning algorithms along with various data science use cases. His interest is in creating Big Data and scalable machine learning software and algorithms. He has also created many Data Science based applications.

Alok Singh

Principal Engineer, IBM

Workshop: Introduction to Data Science - A Practical Viewpoint

In this talk I will address some of the most important aspects in and around doing data science. We will cover the data science workflow through applications in different industries and explore the data science landscape including tools and methodologies, roles, challenges and opportunities.

Instructor Bio

Jesús (aka J.) is a lead data scientist with a strong background and interest in insight, based on simulation, hypothesis generation and predictive analytics. He has substantial practical experience with statistical analysis, machine learning and optimisation tools in product development and innovation, finance, media and others.

Jesús is the Principal Data Scientist at AKQA. He has a background in physics and held positions both in academia and industry, including Imperial College, IBM Data Science Studio, Prudential and Dow Jones to name a few.

Jesús is the author of “Essential MATLAB and Octave,” a book for students in physics, engineering, and other disciplines. He also authored the upcoming data science book entitled “Data Science and Analytics with Python.

Dr. Jesús Rogel-Salazar

Data Science Instructor, General Assembly

Workshop: Debugging and Visualizing Tensorflow Programs with Images

The frameworks for Deep Learning provides the critical blocks for designing , training and validating Deep Networks. They offer a high level of abstraction with the flexible and simple programming interface to diverse community of deep learning practitioners.
Undoubtedly, Tensorflow is emerged as the industry’s most popular framework adopted by many software giants due to its highly scalable and flexible system architecture.
When its comes to easy programming interface and debugging, it is still a pain due to its statistic graph’s construction approach. The tools like tf Debugger and Tensorboard are especially introduced to make user’s life simple to debug and visualize the state of learning.
In this workshop we will try to understand Deep Learning concepts applied in Computer Vision and Image Processing.When it comes to the myriad of applications with images, these tools helps the user in understanding the state of a learning algorithm.
We will examine a simple deep learning based Image Recognition program, breaks it and learn to fix it effectively with the help of tools like tf debugger and Tensorboard. We will then try to understand the sophisticated concepts of tuning the deep network by visualizing the effects on Tensorboard.We will also explore the much more complex project like Image Segmentation for real world data and Image Domain Transfer with Adversarial Networks with Tensorboard.
We will learn how these handy tools can really make a big difference in deciphering the complexities of the algorithm and remove the “Dont know what to do next! ” situation in DL practitioner’s life.

Instructor Bio

Muzahid is working as a Research Engineer in Dassault Systems 3d Excite,Germany. He is responsible for the applied Deep Learning research to improve traditional Computer Graphics/Image Rendering approaches. Previously, he has worked with Fraunhofer in Machine Learning based Software Evolution system for improving programmer’s experience. His academics was focused on exploring Machine Learning domain and did his thesis on Knowledge Retrieval in Probabilistic Graphical Models.

Muzahid Hussain

Research Engineer, Dassault Systems 3DEXCITE

ODSC EUROPE 2018 | September 19-22

Register Now

What To Expect

As we prepare out 2018 schedule take a look at some of our previous training and workshops we have hosted at ODSC Europe for an idea of what to expect.

  • High Performance, Distributed Spark ML, Tensorflow AI, and GPU

  • Machine Learning with R

  • Deep Learning with Tensorflow for Absolute Beginners

  • Algorithmic Trading with Machine and Deep Learning

  • Deep Learning in Keras

  • Deep Learning – Beyond the Basics

  • Running Intelligent Applications inside a Database: Deep Learning with Python Stored Procedures in SQL

  • Distributed Deep Learning on Hops

  • R and Spark with Sparklyr

  • Towards Biologically Plausible Deep Learning

  • Machine Learning with R

  • Deep Learning Ensembles in Toupee

  • A Gentle Introduction to Predictive Analytics with R

  • Deep Learning with Tensorflow for Absolute Beginners

  • Graph Data – Modelling and Quering with Neo4j and Cypher

  • Introduction to Data Science with R

  • Introduction to Python in Data Science

  • Machine Learning with R

  • High Performance, Distributed Spark ML, Tensorflow AI, and GPU

  • The Magic of Dimensionality Reduction

  • Analyze Data, Build a UI and Deploy on the Cloud with Apache Spark, Notebooks and PixieDust

  • Data Science for Executives

  • Data Science Learnathon. From Raw Data to Deployment: the Data Science Cycle with KNIME

  • Distributed Deep Learning on Hops

  • Distributed Deep Learning on Hops

  • Drug Discovery with KNIME

  • Interactive Visualisation with R (and just R)

  • Introduction to Algorithmic Trading

  • Introduction to Data Science – A Practical Viewpoint

  • Julia for Data Scientists

  • Machine Learning with R

  • R and Spark with Sparklyr

  • Running Intelligent Applications inside a Database: Deep Learning with Python Stored Procedures in SQL

  • Telling Stories with Data

  • Towards Biologically Plausible Deep Learning

  • Win Kaggle Competitions Using StackNet Meta Modelling Framework

Open Data Science Conference