Budgeting, Building & Scaling Data Labeling Operations

Abstract: 

We make data for Machine Learning. Our clients are industry leaders in autonomous vehicles, geospatial analytics, robotics, and tracking technology. They have set their sights on providing driverless and contactless solutions, and evaluating what is happening on and to the world at global scale. Needless to say if you’re teaching an algorithm to drive on the road in different environments and situations, or you’re scanning the entire world for signs of development or deforestation, you need a lot of data. Therefore the dispute and struggle is rarely about needing a lot of data, but rather about how much data and how much will it cost? These are questions that ML/AI and data labeling companies alike have to answer. In my talk I will provide some perspective on how to answer the question about how much data you need. Furthermore, I want to share a blueprint of how to estimate the cost, and create a calculator, that will help you easily evaluate your costs and budget in a data-driven way.

Bio: 

Soo has been working with Computer Vision, Machine Learning Engineers, and Research Scientists, across industries to create training datasets for the last 4+ years. As a Solutions Architect at iMerit, she helps our clients by connecting the dots between the technical details of tooling, designing annotation workflows, and integrating a remote data labeling team for the execution. Previously, Soo served as the Data Operations Manager at a geospatial analytics startup where she built and scaled a Data Operations team from the ground up, leading a team 10 analysts.