General Training Session: Modeling big data with R, sparklyr, and Apache Spark
General Training Session: Modeling big data with R, sparklyr, and Apache Spark


In this workshop I will show you, hands-on, how to work with big data using R and Apache Spark.

I’ll use sparklyr, developed by RStudio in conjunction with IBM, Cloudera, and H2O, as my main tool. Sparklyr provides an R interface to Spark’s distributed machine-learning algorithms and much more. It makes practical machine learning scalable and easy. With sparklyr, you can interactively manipulate Spark data using both dplyr and SQL (via DBI); filter and aggregate Spark datasets then bring them into R for analysis and visualization; orchestrate distributed machine learning from R using either Spark MLlib or H2O SparkingWater; create extensions that call the full Spark API and provide interfaces to Spark packages; and establish Spark connections and browse Spark data frames within the RStudio IDE.


Dr. John Mount is a principal consultant at Win-Vector LLC a San Francisco data science consultancy. John has worked as a computational scientist in biotechnology and a stock-trading algorithm designer and has managed a research team for (now an eBay company). John is the coauthor of Practical Data Science with R (Manning Publications, 2014). John started his advanced education in mathematics at UC Berkeley and holds a Ph.D. in computer science from Carnegie Mellon.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from Youtube
Consent to display content from Vimeo
Google Maps
Consent to display content from Google