SQL for Data Science
SQL for Data Science

Abstract: 

Structured Query Language (SQL) is used to retrieve, shape, and transform data stored in relational databases. It’s also the most commonly used tool among data scientists today. SQL is an invaluable tool in your journey to become a data scientist, as well as being an excellent starting place to understand how data is stored, transformed, and evaluated.

By completing this workshop, you will develop an understanding of relational models of data, how SQL is used to retrieve that data, and how to join tables, aggregate information, and answer data science questions. You will also become familiar with many of the common types of SQL databases, how to access information in a database from the command line, and how to integrate database access from within Python.

Lesson 1: Relational Databases and Foundational SQL
Familiarize yourself with relational databases and the SQL syntax necessary to retrieve information from tables in a database. At the end of this lesson, you will be able to comfortably explore a database and retrieve, filter, and sort information from a table.

Lesson 2: Combining Data From Multiple Tables
Practice joining tables in a database and aggregating that information to answer simple questions about the data in your database. You will be able to identify columns used for joining/combining tables, choose the correct method for joining tables, and perform simple mathematical calculations on your data.

Lesson 3: Transform Your Data for Analysis
Let’s answer some common analytics and business questions using our database! We’ll put our skills to the test, leveraging existing information in a database to create new columns, filter based on complex conditions, and prepare a dataset for data visualization or more complex statistical analyses.

Bio: 

Mona is a Data Scientist at Greenhouse Software in New York City, where they contribute to data-informed decision making across the company and machine learning solutions to improve the hiring process for Greenhouse customers. They’ve previously worked in government, creating analytics and machine learning solutions to improve the lives of New Yorkers, and continue to be involved in civic projects through a number of volunteer and non-profit organizations. They’ve also been a statistics and data science educator with DataCamp, Emeritus, and in university settings. They hold a graduate degree in Developmental Psychology, and are passionate about contributing to the ethical use of data science methodology in the public and private sector.