Spark and R with sparklyr
Spark and R with sparklyr


One of the frustrations in data science is when the size of a problem crosses from being manageable on a laptop or a single server to being too big to fit in memory or taking too long to process. This often involves switching to a completely different environment and even a different language.

Apache Spark is the leader for distributed in-memory data analysis. It comes with advanced machine-learning modules and has interfaces with Scala, Python, and R. The SparkR project brings much of Spark’s capabilities to R but is still missing many of the machine-learning tools available with Python or Scala.

In late 2016 RStudio released the sparklyr package to provide tighter integration with RStudio IDE and Spark. Sparklyr provides a backend to the commonly used dplyr package, allowing R users who are familiar with dplyr to continue using this interface, and it provides much more in terms of machine learning and feature transformations through Spark's MLlib.

This workshop will offer an overview of Apache Spark and the types of problems it can solve before walking you through hands-on examples covering the basics of working with distributed data, data manipulation, and machine learning. You’ll leave with everything you need to seamlessly scale your R data analysis to a distributed environment—without learning a entirely new language.

The workshop will be delivered using RStudio Server in the cloud with Spark infrastructure provided by us. Attendees will only require a laptop with a modern browser and wifi connectivity. Basic R knowledge is required.


Doug is a Senior Data Scientist at Mango Solutions. A statistical physicist by training, he now practices and teaches a wide range of data science disciplines from machine learning pipelines to graph theory.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from Youtube
Consent to display content from Vimeo
Google Maps
Consent to display content from Google