Data Mastering at Scale
Data Mastering at Scale


The gap between business needs around data quality and availability and the reality of the state of enterprise data has never been larger. As enterprise data grows exponentially, decades of technologies have failed to address the challenge of large data volume and variety and unintended data silos. There is often enormous value in integrating these silos for customer cross-selling, “know your customer,” optimizing supply chains, etc.

In this talk, Michael Stonebraker dives into why the data accessibility gap exists and the leading methods to solve this data problem.

Traditional rules-based mastering for silo integration does not scale. It is well known that rule systems will work as long as the rule base is small (say 500 rules). When a mastering project requires substantially more than this number, traditional mastering projects tend to fail. This leads one to conclude that traditional data mastering is fine for small/simple problems; however, a new approach is needed for bigger/more complex projects. So, how can we solve this issue? Enter data mastering.

You'll get to learn from Turing Award winner Michael Stonebraker the following:

- Real silo integration issues faced by organizations today
- The definition of “data mastering” and why traditional MDM fails.
- Outline the strategies for implementing mastering projects in large enterprises.
- Indicate why traditional solutions based on extract, transform, and load (ETL) and rules-based systems will never scale to the size of problems encountered in the enterprise today.
- Argue that integration systems based on machine learning (ML) are the only viable alternatives at scale.

Join this session to see how Stonebraker uses real-world examples to illustrate these points and showcase actionable insights to implement successful data mastering projects.


Dr. Stonebraker has been a pioneer of database research and technology for more than forty years. He was the main architect of the INGRES relational DBMS, and the object-relational DBMS, POSTGRES. These prototypes were developed at the University of California at Berkeley where Stonebraker was a Professor of Computer Science for twenty five years. More recently at M.I.T., he was a co-architect of the Aurora/Borealis stream processing engine, the C-Store column-oriented DBMS, the H-Store transaction processing engine, the SciDB array DBMS, and the Data Tamer data curation system.
Presently he serves as Chief Technology Officer of Paradigm4 and Tamr, Inc.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google