Data-centric AI: The Winning Combination of Human-in-the-Loop Real-World Data and Synthetic Data for Edge Cases


A case study of efficiently solving a real-world computer vision problem using a combination of labelled real-world data and synthetic data, combining the strengths of each data type. It considers best practices for combining the datasets and showcases the benefits of a platform approach using Appen's platform for real world sourcing and labelling and the Mindtech Chameleon platform to generate the synthetic data.


Romain is one of Appen’s Senior Manager’s, overseeing and supporting their European client-base with Appen’s breadth of Data for the AI Lifecycle services (data sourcing, annotation/labelling, and model evaluation). Romain came from the localization industry and noted firsthand the advancements and impacts of ML/AI via Machine Translation/Transcription, ASR and TTS offerings. Therefore, he saw a transition into the world of AI/ML as the next logical step. Passionate about ethically sourced, high-quality labelled data, which powers Machine Learning/AI programmes for good.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google