How to Explore and Analyze Mixed-Media Data Quickly and Easily

Abstract: 

In this talk we will present a new open-source project called Kangas that allows easy exploration and analysis of data when it is mixed with multimedia datatypes, such as images, video, and audio. Typically, Pandas DataFrame is the go-to tool for EDA (Exploratory Data Analysis). However, it has some limitations that can make using large datasets of mixed-media painful and time-consuming. In this presentation we will explore these pain points for the data scientist:

1. Processing large mixed-media datasets
2. Creating easy visualizations
3. Using visualizations for exploration and analysis

Kangas builds on the latest technology stack to create a snappy web-based UI, backend Python server, and Python SDK API. The UI is designed to specifically for EDA, focusing on the core functionality for grouping, sorting, and filtering of multimedia and JSON metadata. Kangas uses a unique Python-based syntax query language that is parsed directly into SQL for fast queries. Designed to work along side Pandas, Kangas can directly import DataFrames, parquet files, and CSV files. Additionally, using the SDK API, you can construct your own DataGrid. These spreadsheet-like objects can be used to store and visualize most any kind of data. For visualizations, you can customize both the columns of data, and the method of visualization. In this manner, you can customize the manner and method to create additional charts and computation for use in the DataGrid. Finally, In this talk we will demonstrate using Kangas on a variety of problems, including dataset exploration and a machine learning task analysis.

Bio: 

Dr. Blank is Professor Emeritus at Bryn Mawr College and Head of Research at Comet ML. Doug has 30 years of experience in Deep Learning and Robotics, was one of the founders of the area of Developmental Robotics, and is a contributor to the open source Jupyter Project, a core tool in Data Science. He currently lives in San Francisco, California, along with his family and animals.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google