Boston | April 14th – April 17th, 2020

R for Data Science

Learn the latest models, packages, and tools from the top practitioners and R Data Scientists

R, long one of the most popular languages for statistical modeling, has evolved into a data science powerhouse platform with dozens of packages and interfaces for machine learning, deep learning, data visualization, and various other data science methods.

This advanced language is used for performing complex statistical modeling and provides support for operations on arrays, matrices, and vectors. Its many graphical libraries allow users to delineate aesthetic graphs and also develop web-applications using R Shiny, which is used for embedding visualizations in web-pages. R also provides  several options for advanced data analytics like prediction modeling and advanced machine learning algorithms. With interfaces to popular frameworks like TensorFlow and Keras, R users can easily build and deploy the latest deep learning models. 

Some of Our Current R Presenters


See our full speaker list
2020 Speakers
R for Data Science Sessions
Wednesday, April 15th
Tuesday, April 14th
Thursday, April 16th
Friday, April 17th
Wednesday, April 15th
Tuesday, April 14th
Thursday, April 16th
Friday, April 17th
09:00 - 13:00
Machine Learning in R Part I: Penalized Regression and Boosted Trees

Training | Machine Learning | R-programming | All Levels

 

Linear regression is the foundation of supervised learning, though it has its limits. During this workshop we extend regression using penalization for automated variable selection and increased flexibility. We then introduce trees, and in particular boosted trees, via xgboost to get incredibly powerful predictions. We will go over some of the theory and also practical considerations such as hyperparameters…more details

 

 

Machine Learning in R Part I: Penalized Regression and Boosted Trees image
Jared Lander
Chief Data Scientist, Author of R for Everyone, Professor | Lander Analytics, Columbia Business School
09:30 - 12:30
Machine Learning in R Part III: Forecasting Time Series Data

Training | Machine Learning | R-programming | All Levels

 

Temporal data requires special care to model as it violates several principles of standard machine learning models. R has long had top-of-the-line forecasting tools, though recently new ones have been developed which greatly ease working with time series data. We use the tsibble package for manipulating time series data, feasts for visualization , and fable for building forecasting models such as ETS and ARIMA…more details

Machine Learning in R Part III: Forecasting Time Series Data image
Daniel Chen, PhD
Data Science Consultant | Lander Analytics
09:30 - 12:30
Deep Learning (with TensorFlow 2)

Training | Deep Learning | Machine Learning | Beginner-Intermediate

 

Relatively obscure a few short years ago, Deep Learning is ubiquitous today across data-driven applications as diverse as machine vision, natural language processing, and super-human game-playing. This Deep Learning primer brings the revolutionary machine-learning approach behind contemporary artificial intelligence to life with interactive demos featuring TensorFlow 2, the major, cutting-edge revision of the world’s most popular Deep Learning library. To facilitate an intuitive understanding of Deep Learning’s artificial-neural-network foundations, the essential theory will be introduced visually and pragmatically. Paired with tips for overcoming common pitfalls and hands-on Python code run-throughs provided in straightforward Jupyter notebooks, this foundational knowledge empowers you to build powerful state-of-the-art Deep Learning models…more details

Deep Learning (with TensorFlow 2) image
Dr. Jon Krohn
Chief Data Scientist, Author of Deep Learning Illustrated | Untapt
09:30 - 12:30
Data Science and Machine Learning in the Cloud for Cloud Novices

Training | Machine Learning | MLOps & Data Engineering | Beginner-Intermediate

 

In this half-day hands-on training, we will use free-tier resources in the Google Cloud Platform (GCP) to introduce learners to the practical use of cloud computing resources in data science and machine learning. Learners should have some experience with data analytics, data science or machine learning. Learners should also have a Gmail account with no former GCP use associated with it, or be willing to create such an account. While fluency in R or Python will be very helpful, it is not rigorously required, as well-annotated scripts will be provided. No previous exposure to or use of cloud computing is required; this is introductory-level in terms of its cloud computing assumptions. This training will be useful for those considering cloud adoption, interested in data engineering, or interested in working with public data as citizen scientists…more details

Data Science and Machine Learning in the Cloud for Cloud Novices image
Joy Payton
Supervisor, Data Education | Children's Hospital of Philadelphia
09:45 - 11:15
Quick Package Development in R and Python (from “Python or R” to “Python and R”)

Workshop | ML for Programmers | Kick-starter | Beginner-Intermediate

 

When building a data science team it is important to document and record work flows in both R and python. A data science department can use installable git controlled packages to form a foundation of code and best practices. In addition, libraries dramatically cut down the time and effort required for a team to bring work to production. Additionally, package development is the first step for engineers and scientists aiming to contribute their unique ideas back to the community.

This workshop aims to teach the basics of package development in both R and python in 90 minutes. We will touch upon why a data science team should strive to be fully fluent in both languages. We will show simple R package development and simple Python package development. Finally, we will demonstrate how one can use an open source package to test the interface similarity between R and Python packages designed to support identical workflowsmore details

 

Quick Package Development in R and Python (from “Python or R” to “Python and R”) image
Theodore Bakanas
Senior Data Scientist | Uptake
11:30 - 12:15
Finding Correlated Trends Across Multiple Data Sets Using Matrix Factorization

Talk | Machine Learning | R-programming | Intermediate-Advanced

 

Matrix factorization or latent variable analysis is a powerful approach to identify trends in data or reduce the dimension of high dimensional data. Whilst most data scientists are familiar with principal component analysis (PCA), this talk will describe extensions to PCA that can be used to examine trends and extract the most variant component across many datasets. I will compare matrix factorization approaches for integrative analysis of multiple datasets (including canonical correlation analysis, multiple factor analysis, joint non-negative matrix factorization) and describe how we apply these methods to identify biomarkers of disease in oncology...more details

Finding Correlated Trends Across Multiple Data Sets Using Matrix Factorization image
Aedin Culhane, PhD
Senior Research Scientist | Dana-Farber Cancer Institute, Harvard TH Chan School of Public Health
12:40 - 14:10
Transform your NLP Skills: Using BERT (and Transformers) in Real Life

Workshop | NLP | Machine Learning | Intermediate-Advanced

 

This workshop teaches you the use of transformer neural networks and their incarnations (BERT, RoBERTa, GPT-2) for solving real-world natural language use cases. NLP has advanced tremendously over the last few years and BERT is at the forefront of this success having achieved state-of-the-art results on 11 different NLP tasks. For businesses, BERT has unlocked new NLP use cases that have been previously unattainable.

This workshop will teach you what transformers and systems like BERT and GPT-2 are and how to use and modify them for your needs. Organizations have a wealth of unstructured text sources in every line of business, such as employee feedback in human resources, purchase orders and legal documents in contracting and procurement, communication records throughout the org, and many more. Making sense of this information and organizing it into knowledge and actionable insights to improve business outcomes is a key function every data scientist should be aware of…more details

Transform your NLP Skills: Using BERT (and Transformers) in Real Life image
Niels Kasch, PhD
Data Scientist and Founding Partner | Miner & Kasch
12:45 - 13:30
Biomarker, Sleep, and Activity Patterns Data from a Web-Based Nutrition Platform for Healthy Individuals: Insights for Personalized Recommendations

Talk | Machine Learning | Research Frontiers | Beginner-Intermediate

 

In order to analyze correlations between biomarkers and sleep and day activity patterns to produce actionable biological insights, we used data collected from healthy individuals using an automated, web-based personalized nutrition and lifestyle platform. With a machine learning approach, we found associations between sleep, activity and exercise patterns, resting heart rate, and blood biomarkers. Some of those associations are of a particular interest, for example, insights on how physical activity can affect sleeping patterns depending on the intensity and timing of the exercise. Quantifying correlations between lifestyle patterns and measured biomarkers allows us to be more precise in recommending specific interventions (such as lifestyle and diet changes) and increases the chances of bringing individual’s biomarkers into optimal zone…more details

Biomarker, Sleep, and Activity Patterns Data from a Web-Based Nutrition Platform for Healthy Individuals: Insights for Personalized Recommendations image
Svetlana Vinogradova, PhD
Data Scientist | Research Fellow | Inside Tracker | Dana-Farber Cancer Institute
12:45 - 13:30
ML in Production – Serverless and Painless

Talk | MLOps & Data Engineering | Machine Learning | Intermediate

 

In this session, Oliver will walk through some of the best serverless options on how to operationalize ML pipelines within the Tensorflow ecosystem and on Google Cloud Platform, based on actual case studies. One of these real-life case studies will dive into the journey of a global cosmetics brand to become packaging-free with the help of ML. The first step towards this goal allows customers to view product information simply by taking a picture. This completely eliminates the need for packaging and labels in stores. However, in order to do this effectively, an accurate image classification model, accessible on mobile phones, is needed. This session will cover the details of the end-to-end machine learning pipeline that was created to deliver and update performance ML models to mobile users…more details

ML in Production – Serverless and Painless image
Oliver Gindele, PhD
Head of Machine Learning | Datatonic
13:00 - 16:00
Machine Learning in R Part IV: Putting R-Based Machine Learning into Production with Plumber and Docker

Training | Machine Learning | R-programming | All Levels

 

After we build various machine learning models we need to make them accessible to others. We use modeldb so we can store models in a database and plumber to expose our model as a REST API that can be hosted in a Docker container…more details

Machine Learning in R Part IV: Putting R-Based Machine Learning into Production with Plumber and Docker image
Daniel Chen, PhD
Data Science Consultant | Lander Analytics
13:00 - 16:00
Machine Learning in R Part II: Using workflows to build an ML optimization pipeline

Training | Machine Learning | R-programming | All Levels

 

Modern machine learning is mostly brute forcing through a multitude of hyperparameters. We look at new tools for building and tuning models in R, some so new they are only available on GitHub. We use xgboost and glmnet models (learned in Part I) as motivation for learning how to split data and conduct cross-validation with rsample, perform feature engineering with recipes, build model specifications with parsnip, tune over hyperparameters with dials and tune, evaluate performance with yardstick and put it all together with workflows…more details

Machine Learning in R Part II: Using workflows to build an ML optimization pipeline image
Jared Lander
Chief Data Scientist, Author of R for Everyone, Professor | Lander Analytics, Columbia Business School
13:15 - 14:45
SQL Deep Dive for Data Science

Workshop | ML for Programmers | MLOps & Data Engineering | Intermediate-Advanced

 

Organizations have long used relational databases for a wide variety of data-intensive applications. Data scientists and analysts need to understand how to work with relational databases, particularly for common data science tasks, such as finding, exploring, analyzing and extracting data within a relational database. SQL is an expressive declarative language designed for working with tabular data. Since its inception, new features have been added that allow for complex expressions and computation, such as regular expressions, sliding window functions, and analytical operations that create non-relational results. Understanding the advanced features of SQL will help data scientists reduce the need for programming custom data manipulation functionality…more details

SQL Deep Dive for Data Science image
Dan Sullivan, PhD
AI and Cloud Architect & Author | New Relic, Inc | Wiley
13:35 - 14:20
Predictive Maintenance: Zero to Deployment in Manufacturing

Talk | Machine Learning | ML for Programmers | Intermediate

 

The manufacturing industry is going through it’s IV industrial revolution where machines are connected, and data is harvested to discover deeper insights and solved problems to achieve a competitive edge in the market. There are various applications in industry 4.0, such as machine connectivity, machine productivity monitoring, predictive maintenance, predictive quality, inventory optimization, supply chain optimization, data discovery, and the possibilities are unlimited. Hence, there is a significant need for more reliable and robust approaches to achieve the benefits of various applications. By integrating the philosophy of lean manufacturing and “A small success eventually leads to bigger success,” one case of predictive maintenance (PdM) led to a global scale implementation by developing new technologies using ML & DL models and deploying them on an industrial scale. This case study discusses different levels of analytics, basics of predictive maintenance, anomaly detection, remaining useful life prediction techniques, combining data and SME knowledge, ML models, and how they can be deployed in manufacturing for different use cases. This case study also discusses various modeling and deployment challenges of implementing predictive maintenance in manufacturing…more details

Predictive Maintenance: Zero to Deployment in Manufacturing image
Nagdev Amruthnath, PhD
Data Scientist III | DENSO North America
13:35 - 14:20
Credit Models and Binning Variables Are Winning and I’m Keeping Score!

Talk | Machine Learning | All Levels

 

This talk will discuss the mathematical underpinnings of conditional inference trees to strategically bin variables and weight of evidence (WOE) values applied to these bins as well as the way to implement these into a variety of classification models. This talk will also cover the implementation of this approach to credit modeling in economically developing countries around the world. These developing countries don’t have the credit institutions that countries like the United States have. That leaves these institutions and banks hampered on their ability to make loan decisions. These techniques allowed these institutions to use advanced modeling, but with an interpretable layer on top for easy implementation and data-decisions…more details

Credit Models and Binning Variables Are Winning and I’m Keeping Score! image
Aric LaBarr, PhD
Associate Professor of Analytics | Institute for Advanced Analytics at NC State University
14:15 - 15:45
Data, I/O, and TensorFlow: Building a Reliable Machine Learning Data Pipeline

Workshop | Deep Learning | MLOps & Data Engineering | Advanced

 

In this tutorial, we will guide you through hands-on examples of integrating tf.keras model with different data input sources through tf.data in production environments, from simple csv/json files to SQL databases and to cloud data warehouse services such as Google Cloud BigQuery. We will also cover Apache Kafka as a data input to illustrate the streaming data pipeline architecture for continuous data processing with machine learning. As a bonus, attendees will learn the basics of distributed machine learning and its production usage…more details

Data, I/O, and TensorFlow: Building a Reliable Machine Learning Data Pipeline image
Yong Tang, PhD
Director of Engineering | MobileIron
15:15 - 16:00
Open Source Tools for Social Impact

Talk | Data for Good | Open-source | Intermediate


Over the last 5 years, Data Clinic, the data and tech for good philanthropic arm of investment manager Two Sigma, has utilized employee volunteer teams to partner with nonprofits and develop pro bono solutions that enable these organizations to use data more effectively. Over the course of working on these projects, we noticed that some hurdles are common to multiple organizations and found across different sectors. While volunteer efforts can help organizations on an individual basis, we believe that open source tooling that addresses these specific challenges is key to scaling capacity and moving the entire sector forward.

In this vein, some of Data Clinic’s recent efforts have been around developing open source tools that aid nonprofits across various stages of the data pipeline. In this talk, we will highlight two such open source applications, Smooshr and NewerHoods. Aided by machine learning techniques, Smooshr is designed to help consolidate and redefine categories within data sets through an easy-to-use interactive web application. NewerHoods, on the other hand, is a web application that allows users to visualize and interact with spatial data using a geo-clustering technique to help observe underlying spatial patterns…more details

Open Source Tools for Social Impact image
Stuart Lynn, PhD
Data Scientist | Data Clinic, Two Sigma
Open Source Tools for Social Impact image
Kaushik Mohan
Data Scientist | Data Clinic, Two Sigma
Select date to see events.

See all our talks and hands-on workshop and training sessions
See all sessions

 

What You’ll Learn

Talks, Tutorials, and hands-on workshop and training sessions for this focus area include:

Topics

  • Machine Learning with R

  • Deep Learning with R

  • Statistical Modeling

  • Predictive Analytics

  • Data Analysis

  • Data Visualization

  • Text Analytics

  • Natural Language Parsing (NLP)

Models

  • Regression

  • Classification

  • Bayesian Models

  • Machine Learning Models

  • Deep Learning Models

  • Gradient Boosting 

  • Time Series Models 

  • Random Forest  

  • KNN, SVM, & LDA 

Tools 

  • MLR

  • R Shiny & tidyr

  • ggplot2

  • Caret 

  • TensorFlow and Keras Interfaces

  • R Markdown

  • XGBoost

  • tm & OpenNLP 

  • e1071 & dplyr

You Will Meet

  • Some of the world’s best data science speakers

  • The brains and authors behind today’s most popular open data science tools, topics, and languages

  • Hundreds of attendees focused on data science

  • Chief Data Scientists

  • Thought leaders working in data science

  • Data Scientists and Analysts

  • Software Developers

  • CEOs, CTOs, CIOs

  • Data Visualization professionals

  • Venture Capitalists and Investors

  • Startup Founders and Executives

  • Attendees from Healthcare, Finance, Education, Business, Intelligence, and other industries

  • Big data and data science innovators

Why Attend?

Several of the best minds and biggest names in data science will be presenting

Network with attendees from leading data science companies to learn how others are tackling similar problems

Gain quality training in the hottest data science topics, tools, and languages

Learn the latest in data science from industry leaders without having to make room in the budget — tickets are surprisingly affordable

Sign Up for ODSC EAST 2020 | April 14th – April 17th

Register Now