Sprinting Pandas

Abstract: 

Sometimes our Python Pandas code feels slow and sometimes we can't fit enough data into RAM. Based on recent updates to the 2nd edition of Ian's High-Performance Python book and his public training classes come and learn how to get more into RAM (reducing your need for other technologies like Spark), how to quickly compile for significant speedups, how to run in parallel and which libraries you're missing that unlock additional performance benefits. You'll leave with new techniques to make your DataFrames smaller and many ideas for processing your data faster.

This talk is inspired by Ian's work updating his O'Reilly book High-Performance Python to the 2nd edition for 2020. With over 10 years of evolution, the Pandas DataFrame library has gained a huge amount of functionality and it is used by millions of Pythonistas - but the most obvious way to solve a task isn't always the fastest or most RAM efficient. This talk will help any Pandas user (beginner or beyond) process more data faster, making them more effective at their jobs

Bio: 

Ian is a Chief Data Scientist and has worked in AI and Data Science building teams and high-value IP since 1999. He's published the 2nd edition of his High-Performance Python book with O'Reilly, speaks and gives keynote talks internationally and co-founded the 11,000 member PyDataLondon community which has delivered 7 years of volunteer-run meetups and conferences to the community