Big Data Analytics and Visualization with R


R is one of the most popular programming languages in the world of data science, and is used in almost every industry for comprehensive statistical analysis. R also offers rapid visualization and deployment features, as well as extensive packages tailored to big data analysis.

In this workshop, you will learn techniques for handling data at scale using open source R packages. You will practice generating interactive plots with dynamic data filtering and explore how to take advantage of R’s native parallelization features to deliver highly scalable solutions. You will also learn how to design, run, and deploy R-based web applications in minutes.

Session Outline:

Lesson 1: Foundational Analysis Tools
(Re)familiarize yourself with R syntax, data structures, and coding conventions. You will learn how to use packages such as ‘data.table’ and ‘dplyr’ to create dataframes in preparation for visualization or more complex statistical analysis.

Lesson 2: ETL for Big Data Analysis
Efficiently extract, transform, and load high volumes of data using R. At the end of this lesson, you will be able to parallelize R code for faster data ingestion.

Lesson 3: Scalable Data Visualizations
You will practice generating heat maps, box plots, and graphs for trace data using open source packages. In this lesson, you will learn how to quickly render interactive plots with thousands of data points.

Lesson 4: Web Apps
At the end of this lesson, you will be able to create web applications in minutes using Rmarkdown, Rshiny, and flexdashboard. You will learn how to render interactive and downloadable plots and data tables, as well as the benefits and limitations of shiny deployment.

Lesson 5: Modularization
In this lesson, you will learn when it is most appropriate to use R for big data, and how to modularize R code in order to leverage features in other tools or languages. By the end of this lesson, you will be able to containerize and call your R code.

Background Knowledge:

Basic knowledge of R


Ysis Tarter is a senior data engineer at Absci, where deep learning AI and synthetic biology are harnessed to translate ideas into drugs. She leads the development of data platforms and pipelines for high-throughput biological data, as well as scientific tools for data analysis. Ysis is also the co-tech lead of the Bay Area chapter of Black Girls Code and teaches data visualization and analytics. She has lectured at several institutions, including Columbia, USC, and UC Berkeley. Tarter holds an MS in Applied Biomedical Engineering from Johns Hopkins University and a BS in Computer Science from Stanford University where she specialized in biocomputation. She has published peer-reviewed articles in the fields of scalable neuroscience and synthetic biological design.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from Youtube
Consent to display content from Vimeo
Google Maps
Consent to display content from Google