Maximizing Dataset Potential: Challenges, Considerations & Best Practices

Abstract: 

We make data for Machine Learning. Our clients are industry leaders in autonomous vehicles, geospatial analytics, robotics, and tracking technology. They have set their sights on providing driverless and contactless solutions and evaluating what is happening on and to the world at global scale. In theory, making data is really straightforward. The client gives us data and instructions; we then put the data in a tool and follow the instructions. However, in reality, it’s a collaboration between many different people. There are unexpected twists and turns that determine the quality, quantity, accuracy, precision, breadth, depth, and timeliness of the data - the data that will ultimately fuel the next generation of how we live and operate. In my talk, I will point out the common pitfalls I’ve identified after working on countless use cases, across industries, with clients around the world. I hope my insights will enable and guide you to create the best possible datasets for your unique application.

Bio: 

Soo has been working with Computer Vision, Machine Learning Engineers, and Research Scientists, across industries to create training datasets for the last 4+ years. As a Solutions Architect at iMerit, she helps our clients by connecting the dots between the technical details of tooling, designing annotation workflows, and integrating a remote data labeling team for the execution. Previously, Soo served as the Data Operations Manager at a geospatial analytics startup where she built and scaled a Data Operations team from the ground up, leading a team 10 analysts.