Introduction to Python for Data Analysis


Python is becoming the leading programming language for data analysis. Its endless capabilities along with its fairly simple syntax had made it the way-to-go for data analysis. Python provides various libraries (also known as packages)for analyzing various type of data, with the tabular data being the most common among them.

In this workshop, you will get acquainted with the pandas library, which is the most widely used package for reading, analyzing and exporting datasets in Python. You will also learn how to visualize many kinds of tabular data using the plotnine package, along with some tips and tricks on how to make your visualizations stand out. Lastly, you will have the opportunity make predictions and take decisions using data, based on basic statistical methods.

Session Outline:
Lesson 1: DataFrames and Python
You will familiarize yourself with the pandas package and how to use it’s core objects: the DataFrames. You will learn how to import and export tabular files in Python, along with many different operations, such as grouping and transforming your data.

Lesson 2: Creating meaningful visualizations
Once you have uploaded and transformed your data, you will practice the art of crating meaningful visualizations to get a better understanding of your data using the plotnine package in Python. At the end of this lesson, you will be able to identify which is the most appropriate visualization technique according to your goal and your available data.

Lesson 3: Performing statistical analysis
Visualizations sometimes can be deceiving when it comes in making decisions and predictions. You will be introduced to the basic statistical techniques to make inference using your data, depending on you goal, and how to form their assumption correctly. For this lesson, you will use the numerical and scientific libraries of Python, numpy and scipy.

Background Knowledge:
Basic knowledge of Python: Basic syntax and data types


Leonidas (Leo) is a Senior Data Scientist at Astrazeneca. His work is focused around machine learning in oncology, including clinical and non clinical applications. He is also enthusiastic about NLP applications in oncology and how this can be used to leverage patient treatment. He is also a workshop facilitator in the European Leadership University (ELU), NL and has also been a data science educator at DataCamp. He holds a PhD from the University of Warwick, UK. in bioinformatics and ML, an MSc in statistics from Imperial College London, UK and a BSc in Statistics and Insurance Science from the University of Piraeus, GR.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from Youtube
Consent to display content from Vimeo
Google Maps
Consent to display content from Google