Abstract: In this talk we will present a new open-source project called Kangas that allows easy exploration and analysis of data when it is mixed with multimedia datatypes, such as images, video, and audio. Typically, Pandas DataFrame is the go-to tool for EDA (Exploratory Data Analysis). However, it has some limitations that can make using large datasets of mixed-media painful and time-consuming. In this presentation we will explore these pain points for the data scientist:
1. Processing large mixed-media datasets
2. Creating easy visualizations
3. Using visualizations for exploration and analysis
Kangas builds on the latest technology stack to create a snappy web-based UI, backend Python server, and Python SDK API. The UI is designed to specifically for EDA, focusing on the core functionality for grouping, sorting, and filtering of multimedia and JSON metadata. Kangas uses a unique Python-based syntax query language that is parsed directly into SQL for fast queries. Designed to work along side Pandas, Kangas can directly import DataFrames, parquet files, and CSV files. Additionally, using the SDK API, you can construct your own DataGrid. These spreadsheet-like objects can be used to store and visualize most any kind of data. For visualizations, you can customize both the columns of data, and the method of visualization. In this manner, you can customize the manner and method to create additional charts and computation for use in the DataGrid. Finally, In this talk we will demonstrate using Kangas on a variety of problems, including dataset exploration and a machine learning task analysis.
Bio: Dr. Blank is Professor Emeritus at Bryn Mawr College and Head of Research at Comet ML. Doug has 30 years of experience in Deep Learning and Robotics, was one of the founders of the area of Developmental Robotics, and is a contributor to the open source Jupyter Project, a core tool in Data Science. He currently lives in San Francisco, California, along with his family and animals.