Scalable Data Science and Machine Learning using R4ML


Distributed Machine Learning and Data Science has it’s unique challenges and approaches.

R has been a de facto language for statistics but with the rise of the big data, a lot of ML and statistical functions doesn't scale.

In this talk, we will introduce R4ML, a new open source scalable Machine Learning Framework build using Apache Spark and Apache SystemML, allowing R scripts to invoke custom algorithms developed in Apache SystemML. R4ML integrates seamlessly with SparkR, so data scientists can use best features of SparkR and SystemML together in the same scripts. In addition, the R4ML package provides a number of useful new R functions that simplify common data cleaning and statistical analysis tasks.

Our talk will begin with overview of R4ML R package, it’s API, supported canned algorithms, and integration to Spark and SystemML. In this tutorial style presentation, We will walk through
- a small example of creating a custom algorithm and
- A typical end to end Machine learning flow using the built-in scalable algorithms to get the deeper insights.
-- Exploratory and analytical analysis
-- typical pre processing and data cleaning
-- Dimension reduction
-- Classification using SVM
-- cross validation

The talk will conclude with pointers to how the audience can try out R4ML .


Alok Singh is a Principal Engineer at the IBM Spark Technology Center, where he leads the R4ML project. He has built and architected multiple analytical frameworks and implemented various machine learning algorithms. His interest is in creating Big Data and scalable machine learning software and algorithms. He has worked in the software industry for last 20 years in various roles.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google