In-Database Machine Learning with Python


Traditional Python Data Science Libraries such as Pandas and Scikit-Learn are limited by memory - they load the whole dataset into working memory before any data analysis. In the era of Big Data, its not practical be constantly moving and loading large volumes of data. But what if we still want to work with Pandas like code and DataFrames? This workshop shows you how to create end to end Machine Learning pipelines in Python on large volumes of data without the data ever leaving the database. This results in not just faster load times but also faster model training and deployment.

This workshop will use open source tools VerticaPy and Jupyter notebooks to showcase examples of Data Ingestion, Preparation, Analysis, and Modeling in a Pandas-like way. The data will be stored in a free community edition of Vertica which will do all the heavy computations, utilizing its MPP architecture for blazingly fast speed and results. We will cover different modeling and data prep techniques, and using Vertica's AutoML for model selection. Vertica helps bring the analytics to the data, rather than the other way around.


Pranjal Singh is a Data Science Solutions Architect at Vertica with a focus on Machine Learning. Pranjal works with customers to understand their business needs and data to design and implement solutions using Vertica. He received his Bachelor's degree in Data Science from Northeastern University in Boston, MA with a minor in Mathematics. Pranjal has a passion for problem solving using Predictive Analytics, and helping organizations make better decisions with data. He's an avid sports fan, with a special interest in fantasy sports, analytics, and advanced metrics.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google