When Privacy Meets AI – Your Kick-Start Guide to Machine Learning with Synthetic Data


Over 80% of all AI projects fail – but a huge chunk of projects doesn’t even get started due to privacy constraints. Therefore Gartner predicts that by 2024 60% of all machine learning training data will be synthetic. High time to kick-start your synthetic data (SD) journey! Join Alexandra for a hands-on tutorial on synthetic data fundamentals to learn how to create synthetic data you can trust, assess its quality, and use it for privacy-preserving ML training. As a bonus, we’ll look into boosting your ML performance with smart upsampling.

Session Outline:

1. What is AI-generated synthetic data? [Objective: Gaining a basic understanding what synthetic data is and is not]
2. How to create realistic and representative SD? [Objective: Learning about the different ways to generate SD and their respective pros and cons]
3. The best way to assess synthetic data quality: Train an ML model on it [Objective: Building trust in synthetic data's feasibility for AI training]
4. Bonus exercise: smart upsampling with synthetic data [Objective: understanding that synthetic data can be of value beyond privacy; e.g. by smartly upsampling rare events to help increase predictive performance of downstream models]

Background Knowledge:

Basic data pre-processing skills, basic R/Python skills


Alexandra Ebert is a Responsible AI, synthetic data & privacy expert and serves as Chief Trust Officer at MOSTLY AI. As a member of the company's senior leadership team, she is engaged in public policy issues in the emerging field of synthetic data and Ethical AI and is responsible for engaging with the privacy community, with regulators, the media, and with customers. She regularly speaks at international conferences on AI, privacy, and digital banking and hosts The Data Democratization Podcast, where she discusses emerging digital policy trends as well as Responsible AI and privacy best practices with regulators, policy experts and senior executives.

Apart from her work at MOSTLY AI, she serves as the chair of the IEEE Synthetic Data IC expert group and was pleased to be invited to join the group of AI experts for the #humanAIze initiative, which aims to make AI more inclusive and accessible to everyone.

Before joining the company, she researched GDPR's impact on the deployment of artificial intelligence in Europe and its economic, societal, and technological consequences. Besides being an advocate for privacy protection, Alexandra is deeply passionate about Ethical AI and ensuring the fair and responsible use of machine learning algorithms. She is the co-author of an ICLR paper and a popular blog series on fairness in AI and fair synthetic data, which was featured in Forbes, IEEE Spectrum, and by distinguished AI expert Andrew Ng.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google