An Introduction to Data Literacy with SQL


Data literacy is an essential foundational topic for anyone considering data science or machine learning. This session will help one understand core data workflow concepts including what is data, data generation and collecting, data profiling, data transformation, data shaping, and other essential workflow topics. As this is an interactive training session, in addition to covering these data topics, we will layer on SQL and an introduction to relational databases. As we journey through the data workflow we will use SQL to wrangle and transform the data as needed. SQL consistently makes the top 5 job requirements list for data scientists, data analysts, machine learning engineers, and other related data roles. The SQL standard is the universal go-to tool for manipulating structured data stores including relational databases. Additional derivations of SQL are now widely employed in unstructured data stores also. With this foundational understanding, you will not only have job-ready data skills but you will be better position to proceed to other introductory-level courses in data analysis, programming, data science, and machine learning.

The Lesson Plan is as Follows

Lesson 1: Introduction
Introduction to Data Literacy
Why SQL?

Lesson 2: The Data Life Cycle
What is Data?
Understanding Data Types
Sourcing & Collecting Data

Lesson 3: Relational Databases
Introduction Relational Database
Data Normalization

Lesson 4: SQL
Introduction to SQL Syntax

Lesson 5: Data Profiling
Kaggle Example Dataset
Describing Data
Examples: Outliers & Correlations
Data Profiling with SQL

Lesson 6: Data Preparation
Data Transformation
Data Enrichment
Data Preparation with SQL

Lesson 7: Data Shaping
Big Data & Wide Data
Data Shaping with SQL

Lesson 8: An Overview of Machine Learning Features
Feature Selection
Feature Engineering

- No prior programming or data experience s required
- Please have a Laptop and current web browser
- A working knowledge of spreadsheets is helpful
- You can download SQLite from the link below. During training, we will avail of a simple online SQL client from W3. However, you can download SQLite from Github here to try SQL yourself post-training. Note if your MAC prevents you from installing this download please refer to this video

Level: Beginner

Who This Training Session is for
- Anyone looking to better understand what is data
- Anyone seeking to grasp the various steps in the typical data workflow
- Anyone seeking to take introductory-level courses in data science, machine learning, or deep learning


Sheamus McGovern is the founder of ODSC (The Open Data Science Conference). He is also a software architect, data engineer, and AI expert. He started his career in finance by building stock and bond trading systems and risk assessment platforms and has worked for numerous financial institutions and quant hedge funds. Over the last decade, Sheamus has consulted with dozens of companies and startups to build leading-edge data-driven applications in finance, healthcare, eCommerce, and venture capital. He holds degrees from Northeastern University, Boston University, Harvard University, and a CQF in Quantitative Finance.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google