SQL driven ML


In this short talk, we'll cover how to easily run SQL from Jupyter Notebooks through popular open-source tools (JupySQL). The main focus will be a real-life industry use case (predicting customer churn) where we need to connect to the data source (we'll use an example data set, but we can easily use a database, data warehouse, or a data lake), and query the data. Once our data is ready, we will run exploratory data analysis, which includes plotting the data and gaining insights about it. Once we're done with this part, we will be ready to fit a model and evaluate its results. The moment we've decided on the model, dataset, and hyperparameters - we will create a report that we can share with the rest of the team and the business stakeholders.

Jupyter is the most common tool today to perform data science. Its ability to interactively work with the data and transform it match no other IDE. We think with JupySQL and some other packages Jupyter can become even better and we can overcome some of the challenges that currently exist in the platform.

This lecture is at a beginner/intermediate level and is intended for data scientists, data engineers or data analysts who are looking for a better way to query their data stores from Jupyter notebooks.

This will be the layout of the talk:
1. Introduction of the problem, use case, and background story.
2. Introduction to JupySQL and the dataset.
3. EDA + fitting the model
4. Generating a stakeholder report
5. Questions and Answers


Ido Michael, a seasoned data engineering and science professional, co-founded Ploomber & JupySQL with the mission of empowering data scientists to build faster and more efficient solutions. Prior to this, he led data engineering and science teams at Amazon Web Services (AWS), where he played an instrumental role in building hundreds of data pipelines during various customer engagements, working closely with his team.

A proud alumnus of Columbia University, Ido moved to New York to pursue his Master's degree in Computer Engineering. It was during his time at Columbia that he identified the challenges in working with multiple data sources and Jupyter notebooks for reliable model development. This realization inspired him to concentrate on building Ploomber, a platform designed to address these issues and streamline the data science workflow.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from Youtube
Consent to display content from Vimeo
Google Maps
Consent to display content from Google