In-Database Machine Learning in Jupyter
In-Database Machine Learning in Jupyter


Moving data, transforming data types, taking small samples so they’ll fit in your sandbox – these are all things every data scientist puts up with as routine. And when you’re finished, a data engineer has to build full production pipelines to reproduce all that work at scale. But what if you could leave all the data where it is and analyze it in place?
What if you could jump straight to the meat of the work, and when you’re finished, a single line of code would push it all into production?
In this tutorial, you’ll use familiar Pandas and SciKit code to build a churn reduction model without ever moving data.
• Modern in-database machine learning
• How to use Python code and a Jupyter notebook inside a database
• How to manage, train, and evaluate models inside a database
• What makes your model ready for production and how to get it there

Session Outline
Session Outline
Lesson 1: Set up environment and load data
Familiarize yourself with how data is stored and accessed in a Vertica analytical database. Set up your environment with MatPlotLib for visualization. Get a quick tour of what is possible in a VerticaPy notebook.

Lesson 2: Prepare Data
Load data. Explore and visualize correlations, outliers, distribution, statistics, etc. Convert categorical to Boolean variables, and determine variables that are likely to contribute most to accuracy. Modify data set as needed for algorithm compatibility.

Lesson 3: Train and Evaluate Model
Train and validate a couple of different churn probability models. Evaluate each model and compare results. Save the model in the database, and apply it to new data. Practice model comparison, retraining, and versioning.

Background Knowledge
Basic Python - familiarity with Pandas and Scikit helpful
Basic understanding of Jupyter notebook


Pranjal Singh has been a Data Scientist at Vertica since May 2020. Prior to joining Vertica, Pranjal received his Bachelor's degree in Data Science from Northeastern University in Boston, MA. He has experience with Software Engineering, Data Analytics, and Machine Learning. Pranjal has a passion for ML and Predictive Analytics, and helping organizations make better decisions with data. He's an avid sports fan, with a special interest in sports analytics and data.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google