Data Analytics at Scale: A Four-legged Stool


In this talk, I discuss four tactics that enable successful enterprise analytics efforts. The first concerns data integration. Because essentially all enterprise data resides in data silos, an integration effort is required before meaningful cross-silo analysis is possible. Data science practitioners routinely report spending at least 80% of their time doing “data preparation” (aka data munging). I describe why this activity is hard and tactics that can be employed to make it less costly. Once one has clean cross-silo data, then two further tactics entail using an analytics suite and an information discovery tool. The first is required to do data analytics while the second is necessary when one doesn’t know what analysis to perform. I discuss desired features of each tool, as well as make some comments about machine learning. The fourth tactic entails data lakes and lake houses. Please put everything in a DBMS, so the integration challenge of data lakes is as manageable as possible.


Dr. Stonebraker has been a pioneer of data base research and technology for more than forty years. He was the main architect of the INGRES relational DBMS, and the object-relational DBMS, POSTGRES. These prototypes were developed at the University of California at Berkeley where Stonebraker was a Professor of Computer Science for twenty five years. More recently at M.I.T. he was a co-architect of the Aurora/Borealis stream processing engine, the C-Store column-oriented DBMS, the H-Store transaction processing engine, the SciDB array DBMS, and the Data Tamer data curation system. Presently he serves as Chief Technology Officer of Hopara and Tamr, Inc.

Professor Stonebraker was awarded the ACM System Software Award in 1992 for his work on INGRES. Additionally, he was awarded the first annual SIGMOD Innovation award in 1994, and was elected to the National Academy of Engineering in 1997. He was awarded the IEEE John Von Neumann award in 2005 and the 2014 Turing Award, and is presently an Adjunct Professor of Computer Science at M.I.T.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google