Matrix Math at Scale with Apache Mahout and Spark
Matrix Math at Scale with Apache Mahout and Spark


This workshop will provide an overview of the process of building predictions from raw data using the Apache Mahout machine-learning library and the Apache Spark compute framework. We will present recent developments in Mahout including the matrix-math-oriented domain-specific language named “Samsara” as well as a new algorithm development framework. Then we will walk through the steps required to convert a data set to the format required for analytic methods, split the data, train and test a model, and then output predictions.

This approach takes advantage of a declarative Scala-based DSL that allows practitioners to operate on very large data sets while focusing on the math in their work symbolically rather than needing to acquire significant programming experience and skill to manipulate matrices and other structures. Attendees will be given example source code and pointers on getting started on their own projects.


Andrew Musselman is a member of the Apache Mahout Project Management Committee, an independent data engineering and analytics consultant, and co-host of the Adversarial Learning podcast. He loves distributed matrix math and lives in Seattle with his wife and kids.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google