Dataframes.jl: a Perfect Sidekick for Your Next Data Science Project

Abstract: 

In many data science ecosystems data frame is a pivotal object. It is not only very useful conceptually, but also ensures that data transformation operations can be performed efficiently. Therefore packages like data.table in R or pandas in Python are star players.
With the Julia language the situation is different because it gives you the speed out of the box. Therefore the DataFrames.jl package is designed to be a sidekick that conveniently supports your core data analysis pipeline. It has a more focused functionality than e.g. pandas, but at the same time it seamlessly integrates with the whole Julia data science ecosystem.
During this workshop, using hands-on examples, I will discuss the design principles behind DataFrames.jl and walk you through key functionalities provided by this package. All presented materials will be made available before the workshop in a blog post on https://bkamins.github.io/.

Session Outline
By completing this workshop you should have a basic knowledge of key functionalities that DataFrames.jl provides and how it is integrated with the data science ecosystem in Julia.
In the session we are going to cover the following topics:
* basic operations on data frames
* reading and writing CSV and Apache Arrow files
* filtering
* sorting
* joining
* reshaping
* transforming
* aggregation using split-apply-combine
* integration with the ecosystem: plotting data, building basic predictive models

Background Knowledge
I assume knowledge of at least one data manipulation framework like e.g. pandas or data.table. I also assume an entry level ability to read Julia code.

Bio: 

Bogumił Kamiński is Head of Decision Analysis and Support Unit at Warsaw School of Economics, Poland, and Adjunct Professor at Data Science Laboratory, Ryerson University, Canada. His research interests are techniques of large scale mathematical modelling of complex systems combining simulation, optimization, and machine learning. A particular area of his expertise are agent based simulation and modeling and analysis of complex networks.

Bogumił is one of the core developers of DataFrames.jl package for data wrangling in the Julia language. He is also a top answerer for the [julia] tag on StackOverflow and regularly discusses a wide range of data science related topics on his blog https://bkamins.github.io/.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google