How We Got Data Prep (and Machine Learning) All Wrong?


Supernaturally fast and accurate search engines, cars that drive themselves, robots who can pass for humans - the future is here. No one would ever dare deny data scientists their reputation as miracle workers. Think about it: in a decade, they basically brought us voice assistants, chatbots and cheetah-shaped robots capable of doing the dishes for us. Pretty cool, right?

But things seem too good to be true, that’s usually because they are. And that’s not to say we haven’t made a tremendous amount of progress - we unquestionably have. However, to some extent, we’ve gotten to where we are almost in spite of ourselves. We now have technology at our fingertips that could easily destroy the fabric of our society (just think about DeepFakes). We almost lost control what we built.

But perhaps the most astonishing of all is that we’ve made it her in spite of preestablished misconceptions about the very way Machine Learning works. “Larger models will get us better performance”; “more data is needed when a model overfits”; “autolabeling is the solution to the labeling crisis ML startups are facing today”; “the training set and the test set need to follow same distribution”; “data augmentation is the silver bullet when not enough data is available”: all of these statements were true… until they were not.

If you are happy with your perception of ML, you might want to skip this talk. Otherwise, if you enjoy controversial positions and tough conversations and are in quest of the truth about Machine Learning, attend by all means and you will never think about Machine Learning the same way!

Cheers to the next phase in the History of ML!


Dr. Jennifer Prendki is the founder and CEO of Alectio, the first startup focused on DataPrepOps, a portmanteau term that she coined to refer to the nascent field focused on automating the optimization of a training dataset. She and her team are on a fundamental mission to help ML teams build models with less data (leading to both the reduction of ML operations costs and CO2 emissions) and have developed technology that dynamically selects and tunes a dataset that facilitates the training process of a specific ML model.

Prior to Alectio, Jennifer was the VP of Machine Learning at Figure Eight; she also built an entire ML function from scratch at Atlassian, and led multiple Data Science projects on the Search team at Walmart Labs. She is recognized as one of the top industry experts on Data Preparation, Active Learning and ML lifecycle management, and is an accomplished speaker who enjoys addressing both technical and non-technical audiences.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google