Abstract: Python is becoming the leading programming language for data analysis. Its endless capabilities along with its fairly simple syntax had made it the way-to-go for data analysis. Python provides various libraries (also known as packages)for analyzing various type of data, with the tabular data being the most common among them.
In this workshop, you will get acquainted with the pandas library, which is the most widely used package for reading, analyzing and exporting datasets in Python. You will also learn how to visualize many kinds of tabular data using the plotnine package, along with some tips and tricks on how to make your visualizations stand out. Lastly, you will have the opportunity make predictions and take decisions using data, based on basic statistical methods.
Lesson 1: DataFrames and Python
You will familiarize yourself with the pandas package and how to use its core objects: the DataFrames. You will learn how to import and export tabular files in Python, along with many different operations, such as grouping and transforming your data.
Lesson 2: Creating meaningful visualizations
Once you have uploaded and transformed your data, you will practice the art of crating meaningful visualizations to get a better understanding of your data using the plotnine package in Python. At the end of this lesson, you will be able to identify which is the most appropriate visualization technique according to your goal and your available data.
Lesson 3: Performing statistical analysis
Visualizations sometimes can be deceiving when it comes in making decisions and predictions. You will be introduced to the basic statistical techniques to make inference using your data, depending on you goal, and how to form their assumption correctly. For this lesson, you will use the numerical and scientific libraries of Python, numpy and scipy.
Basic knowledge of Python: Basic syntax and data types
Bio: Leonidas (Leo) is a Senior Data Scientist at Astrazeneca. His work is focused around machine learning in oncology, including clinical and non clinical applications. He is also enthusiastic about NLP applications in oncology and how this can be used to leverage patient treatment. He is also a workshop facilitator in the European Leadership University (ELU), NL and has also been a data science educator at DataCamp. He holds a PhD from the University of Warwick, UK. in bioinformatics and ML, an MSc in statistics from Imperial College London, UK and a BSc in Statistics and Insurance Science from the University of Piraeus, GR.