Generating Realistic Data While Preserving Privacy
Generating Realistic Data While Preserving Privacy


Individual-level data is very valuable to researchers, as it allows for arbitrary analyses. However, the value of individual-level data must be balanced against the privacy concerns of individuals who appear in data sets.

Traditional approaches to protecting privacy, such as reporting summary statistics or adding noise directly to the data, can throw away more information than necessary and may not even protect privacy. In this talk, I discuss an alternative approach: generating synthetic data sets that preserve relevant statistical properties of input data while also providing privacy guarantees for individuals who appear in the data.

I begin with a high-level introduction to differential privacy, a mathematical framework for analyzing and proving privacy guarantees. I introduce the notion of generative adversarial networks (GANs), which are used to create the synthetic data, and I illustrate how GANs are trained with a toy example using handwritten digits. I explain a small modification to the procedure for training GANs that ensures differential privacy.

I offer an implementation of a differentially private generative adversarial network (DPWGAN) using PyTorch. To illustrate how this method works, I apply the DPWGAN to ACS PUMS data, a collection of individual-level from the U.S. Census Bureau. I show that the DPWGAN models correlations in the data, and that cross tabs on the synthetic data are close to those in the original data. The PyTorch code is available and open source.

Participants will come away with an understanding of the basics of differential privacy and generative adversarial networks, as well as the ability to apply the DPWGAN code to their own data sets.


Joshua Falk is a data scientist at Civis Analytics, where he focuses on improving survey methodology and infrastructure, making pipelines more robust, and causal inference. Prior to Civis, he studied linguistics and statistics at the University of Chicago. Joshua also contributes to data analysis and visualizations at the South Side Weekly.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google