Abstract: Despite the laudable pursuit of making all data open, many data custodians handle data containing confidential information that they may not be allowed to release or share. In such cases, it may be feasible to create a synthetic dataset that mimics important features of the real data but preserves privacy. Surprisingly, although there are many techniques available for synthetic data generation, few of these are suitable for generating tabular data. In this presentation, we will present an overview of a modern class of techniques that are suitable for generating synthetic tabular data. We will focus in particular on deep generative models that can fit complex probability distributions and efficiently generate samples to construct synthetic datasets. We will discuss a variety of deep generative models, how their output can ensure privacy and how to assess the fidelity of the synthetic samples.
Acknowledgements are given to the Centre for Data Science at QUT Australia and collaborators Dr Robert Salomone and Conor Hassan.
Bio: Kerrie Mengersen is a Distinguished Professor of Statistics and Director of the Centre for Data Science at QUT. Her career in statistical consulting and academic research has taken her across three states of Australia, the USA and France. Kerrie is a Fellow of the Australian Academy of Science, the Australian Academy of Social Sciences, and the Queensland Academy of the Arts and Sciences. Her overall ambition is to 'use data better', particularly in the fields of health, environment and industry. To this end, she has led over 30 major projects such as the current Long-term Benefits and Impacts Study with Queens Wharf Brisbane, the online interactive Australian Cancer Atlas and the Virtual Reef Diver program.