Boston | April 14th – April 17th, 2020

R for Data Science

Learn the latest models, packages, and tools from the top practitioners and R Data Scientists

R, long one of the most popular languages for statistical modeling, has evolved into a data science powerhouse platform with dozens of packages and interfaces for machine learning, deep learning, data visualization, and various other data science methods.

This advanced language is used for performing complex statistical modeling and provides support for operations on arrays, matrices, and vectors. Its many graphical libraries allow users to delineate aesthetic graphs and also develop web-applications using R Shiny, which is used for embedding visualizations in web-pages. R also provides  several options for advanced data analytics like prediction modeling and advanced machine learning algorithms. With interfaces to popular frameworks like TensorFlow and Keras, R users can easily build and deploy the latest deep learning models. 

Some of Our Current R Presenters

See our full speaker list
2020 Speakers
R for Data Science Sessions
Friday, April 17th
Thursday, April 16th
Wednesday, April 15th
Friday, April 17th
Thursday, April 16th
Wednesday, April 15th
Session Title by Dr. Jon Krohn Comoing Soon!


Session Title by Dr. Jon Krohn Comoing Soon! image
Dr. Jon Krohn
Chief Data Scientist, Author of Deep Learning Illustrated | Untapt
Session Title by Joy Payton Coming Soon!


Session Title by Joy Payton Coming Soon! image
Joy Payton
Supervisor, Data Education | Children's Hospital of Philadelphia
Session Title by Theodore Bakanas Coming Soon!



Session Title by Theodore Bakanas Coming Soon! image
Theodore Bakanas
Senior Data Scientist | Uptake
11:30 - 12:15
Finding Correlated Trends Across Multiple Data Sets Using Matrix Factorization

Talk | Machine Learning | R-programming | Intermediate-Advanced


Matrix factorization or latent variable analysis is a powerful approach to identify trends in data or reduce the dimension of high dimensional data. Whilst most data scientists are familiar with principal component analysis (PCA), this talk will describe extensions to PCA that can be used to examine trends and extract the most variant component across many datasets. I will compare matrix factorization approaches for integrative analysis of multiple datasets (including canonical correlation analysis, multiple factor analysis, joint non-negative matrix factorization) and describe how we apply these methods to identify biomarkers of disease in oncology...more details

Finding Correlated Trends Across Multiple Data Sets Using Matrix Factorization image
Aedin Culhane, PhD
Senior Research Scientist | Dana-Farber Cancer Institute, Harvard TH Chan School of Public Health
12:40 - 14:10
Transform your NLP Skills: Using BERT (and Transformers) in Real Life

Workshop | NLP | Machine Learning | Intermediate-Advanced


This workshop teaches you the use of transformer neural networks and their incarnations (BERT, RoBERTa, GPT-2) for solving real-world natural language use cases. NLP has advanced tremendously over the last few years and BERT is at the forefront of this success having achieved state-of-the-art results on 11 different NLP tasks. For businesses, BERT has unlocked new NLP use cases that have been previously unattainable.

This workshop will teach you what transformers and systems like BERT and GPT-2 are and how to use and modify them for your needs. Organizations have a wealth of unstructured text sources in every line of business, such as employee feedback in human resources, purchase orders and legal documents in contracting and procurement, communication records throughout the org, and many more. Making sense of this information and organizing it into knowledge and actionable insights to improve business outcomes is a key function every data scientist should be aware of…more details

Transform your NLP Skills: Using BERT (and Transformers) in Real Life image
Niels Kasch, PhD
Data Scientist and Founding Partner | Miner & Kasch
12:45 - 13:30
Biomarker, Sleep, and Activity Patterns Data from a Web-Based Nutrition Platform for Healthy Individuals: Insights for Personalized Recommendations

Talk | Machine Learning | Research Frontiers | Beginner-Intermediate


In order to analyze correlations between biomarkers and sleep and day activity patterns to produce actionable biological insights, we used data collected from healthy individuals using an automated, web-based personalized nutrition and lifestyle platform. With a machine learning approach, we found associations between sleep, activity and exercise patterns, resting heart rate, and blood biomarkers. Some of those associations are of a particular interest, for example, insights on how physical activity can affect sleeping patterns depending on the intensity and timing of the exercise. Quantifying correlations between lifestyle patterns and measured biomarkers allows us to be more precise in recommending specific interventions (such as lifestyle and diet changes) and increases the chances of bringing individual’s biomarkers into optimal zone…more details

Biomarker, Sleep, and Activity Patterns Data from a Web-Based Nutrition Platform for Healthy Individuals: Insights for Personalized Recommendations image
Svetlana Vinogradova, PhD
Data Scientist | Inside Tracker
13:00 - 16:00
Session Title by Daniel Chen Coming Soon!

Training | Machine Learning | R-programming | All Levels


After we build various machine learning models we need to make them accessible to others. We use modeldb so we can store models in a database and plumber to expose our model as a REST API that can be hosted in a Docker container…more details

Session Title by Daniel Chen Coming Soon! image
Daniel Chen, PhD
Data Science Consultant | Lander Analytics
13:15 - 14:45
SQL Deep Dive for Data Science

Workshop | ML for Programmers | MLOps & Data Engineering | Intermediate-Advanced


Organizations have long used relational databases for a wide variety of data-intensive applications. Data scientists and analysts need to understand how to work with relational databases, particularly for common data science tasks, such as finding, exploring, analyzing and extracting data within a relational database. SQL is an expressive declarative language designed for working with tabular data. Since its inception, new features have been added that allow for complex expressions and computation, such as regular expressions, sliding window functions, and analytical operations that create non-relational results. Understanding the advanced features of SQL will help data scientists reduce the need for programming custom data manipulation functionality…more details

SQL Deep Dive for Data Science image
Dan Sullivan, PhD
AI and Cloud Architect & Author | New Relic, Inc | Wiley
13:35 - 14:20
Predictive Maintenance: Zero to Deployment in Manufacturing

Talk | Machine Learning | ML for Programmers | Intermediate


The manufacturing industry is going through it’s IV industrial revolution where machines are connected, and data is harvested to discover deeper insights and solved problems to achieve a competitive edge in the market. There are various applications in industry 4.0, such as machine connectivity, machine productivity monitoring, predictive maintenance, predictive quality, inventory optimization, supply chain optimization, data discovery, and the possibilities are unlimited. Hence, there is a significant need for more reliable and robust approaches to achieve the benefits of various applications. By integrating the philosophy of lean manufacturing and “A small success eventually leads to bigger success,” one case of predictive maintenance (PdM) led to a global scale implementation by developing new technologies using ML & DL models and deploying them on an industrial scale. This case study discusses different levels of analytics, basics of predictive maintenance, anomaly detection, remaining useful life prediction techniques, combining data and SME knowledge, ML models, and how they can be deployed in manufacturing for different use cases. This case study also discusses various modeling and deployment challenges of implementing predictive maintenance in manufacturing…more details

Predictive Maintenance: Zero to Deployment in Manufacturing image
Nagdev Amruthnath, PhD
Data Scientist III | DENSO North America
13:35 - 14:20
Credit Models and Binning Variables Are Winning and I’m Keeping Score!

Talk | Machine Learning | All Levels


This talk will discuss the mathematical underpinnings of conditional inference trees to strategically bin variables and weight of evidence (WOE) values applied to these bins as well as the way to implement these into a variety of classification models. This talk will also cover the implementation of this approach to credit modeling in economically developing countries around the world. These developing countries don’t have the credit institutions that countries like the United States have. That leaves these institutions and banks hampered on their ability to make loan decisions. These techniques allowed these institutions to use advanced modeling, but with an interpretable layer on top for easy implementation and data-decisions…more details

Credit Models and Binning Variables Are Winning and I’m Keeping Score! image
Aric LaBarr, PhD
Associate Professor of Analytics | Institute for Advanced Analytics at NC State University
14:15 - 15:45
Data, I/O, and TensorFlow: Building a Reliable Machine Learning Data Pipeline

Workshop | Deep Learning | MLOps & Data Engineering | Advanced


In this tutorial, we will guide you through hands-on examples of integrating tf.keras model with different data input sources through in production environments, from simple csv/json files to SQL databases and to cloud data warehouse services such as Google Cloud BigQuery. We will also cover Apache Kafka as a data input to illustrate the streaming data pipeline architecture for continuous data processing with machine learning. As a bonus, attendees will learn the basics of distributed machine learning and its production usage…more details

Data, I/O, and TensorFlow: Building a Reliable Machine Learning Data Pipeline image
Yong Tang, PhD
Director of Engineering | MobileIron
15:15 - 16:00
Open Source Tools for Social Impact

Talk | Data for Good | Open-source | Intermediate

Over the last 5 years, Data Clinic, the data and tech for good philanthropic arm of investment manager Two Sigma, has utilized employee volunteer teams to partner with nonprofits and develop pro bono solutions that enable these organizations to use data more effectively. Over the course of working on these projects, we noticed that some hurdles are common to multiple organizations and found across different sectors. While volunteer efforts can help organizations on an individual basis, we believe that open source tooling that addresses these specific challenges is key to scaling capacity and moving the entire sector forward.

In this vein, some of Data Clinic’s recent efforts have been around developing open source tools that aid nonprofits across various stages of the data pipeline. In this talk, we will highlight two such open source applications, Smooshr and NewerHoods. Aided by machine learning techniques, Smooshr is designed to help consolidate and redefine categories within data sets through an easy-to-use interactive web application. NewerHoods, on the other hand, is a web application that allows users to visualize and interact with spatial data using a geo-clustering technique to help observe underlying spatial patterns…more details

Open Source Tools for Social Impact image
Stuart Lynn, PhD
Data Scientist | Data Clinic, Two Sigma
Open Source Tools for Social Impact image
Kaushik Mohan
Data Scientist | Data Clinic, Two Sigma
Select date to see events.

See all our talks and hands-on workshop and training sessions
See all sessions


What You’ll Learn

Talks, Tutorials, and hands-on workshop and training sessions for this focus area include:


  • Machine Learning with R

  • Deep Learning with R

  • Statistical Modeling

  • Predictive Analytics

  • Data Analysis

  • Data Visualization

  • Text Analytics

  • Natural Language Parsing (NLP)


  • Regression

  • Classification

  • Bayesian Models

  • Machine Learning Models

  • Deep Learning Models

  • Gradient Boosting 

  • Time Series Models 

  • Random Forest  

  • KNN, SVM, & LDA 


  • MLR

  • R Shiny & tidyr

  • ggplot2

  • Caret 

  • TensorFlow and Keras Interfaces

  • R Markdown

  • XGBoost

  • tm & OpenNLP 

  • e1071 & dplyr

You Will Meet

  • Some of the world’s best data science speakers

  • The brains and authors behind today’s most popular open data science tools, topics, and languages

  • Hundreds of attendees focused on data science

  • Chief Data Scientists

  • Thought leaders working in data science

  • Data Scientists and Analysts

  • Software Developers

  • CEOs, CTOs, CIOs

  • Data Visualization professionals

  • Venture Capitalists and Investors

  • Startup Founders and Executives

  • Attendees from Healthcare, Finance, Education, Business, Intelligence, and other industries

  • Big data and data science innovators

Why Attend?

Several of the best minds and biggest names in data science will be presenting

Network with attendees from leading data science companies to learn how others are tackling similar problems

Gain quality training in the hottest data science topics, tools, and languages

Learn the latest in data science from industry leaders without having to make room in the budget — tickets are surprisingly affordable

Sign Up for ODSC EAST 2020 | April 14th – April 17th

Register Now