Getting Started with Pandas for Data Analysis


This tutorial offers a comprehensive introduction to the powerful panda's library for data analysis built on top of the Python programming language. Pandas represent a great step forward for graphical spreadsheet users looking to grow their data manipulation skills. I like to call it "Excel on steroids".
By completing this workshop, you'll have a strong foundation for using Pandas in your day-to-day data analysis needs. We'll start out with the basics -- importing datasets, selecting rows and columns, filtering rows by criteria -- and progress to advanced concepts like grouping values, joining multiple datasets together, and cleaning text.
Students will be exposed to diverse data sets across different disciplines -- sports, finance, entertainment, and more. The training is open to all industries and is targeted for beginners --- basic Python knowledge is preferred but not required.

Session Outline
Lesson 01: Foundations of pandas
Discover the 1-dimensional Series and the 2-dimensional DataFrame, the two core data structures in pandas. Learn how to sort values across one or more columns, identify missing values, unique values and duplicates, count occurrences of values, filter rows based on one or more criteria, and more. At the end of this lesson, you'll have knowledge of the most popular features of pandas.

Lesson 02: Working with Data of Different Types
Data can come in a variety of formats (and be messy to boot)! In this lesson, we'll explore how to convert columns from one data type to another. We'll optimize our dataset to reduce memory consumption. We'll discover how to clean messy text data and how to extract datetime information from text. Finally, we'll have a chance to review all concepts from the previous lesson with new datasets.

Lesson 03: Aggregating and Joining Datasets
In this section, we'll learn how to merge data across multiple datasets. We'll explore the pandas equivalent of common SQL operations like inner joins, outer joins, left joins, and right joins. We'll also introduce the GroupBy object for grouping rows by shared values across one or more columns. Finally, we'll walk through common aggregation operations like pivoting, melting, stacking, unstacking, and more.

Background Knowledge
Intermediate knowledge of spreadsheet software (Excel, Google Sheets, Numbers, etc), basic knowledge of the Python programming language, local installation of Jupyter Notebook development environment


Boris Paskhaver is a full-stack web developer based in New York City with experience building apps in React / Redux and Ruby on Rails. His favorite part of programming is the never-ending sense that there's always something new to master --- a secret language feature, a popular design pattern, an emerging library or -- most importantly -- a different way of looking at a problem.

Open Data Science

Open Data Science
Innovation Center
101 Main St
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from Youtube
Consent to display content from Vimeo
Google Maps
Consent to display content from Google