Analyzing Sensitive Data Using Differential Privacy


The privacy risks of analyzing/sharing sensitive data about individuals have never been more apparent, due to increasingly sophisticated attacks that demonstrate that private information can be leaked even from aggregate statistics or trained models.

Differential privacy addresses these risks with a rigorous, mathematically proven model of privacy protection, which has broad applications, from analytics to machine learning to synthetic data. By adding carefully calibrated noise to statistical calculations, it is possible to ensure strong privacy and still enjoy statistically accurate results.

In this workshop, we introduce differential privacy and show how to do sql-like analytics using the Tumult Labs differential privacy platform. Tumult Analytics ( is used by a variety of organizations -- including the US Census Bureau, US Internal Revenue Service, and Wikimedia -- to publicly share aggregate statistics about populations of interest. Tumult Analytics will be made open-source this year and is available now for trial users.

Session Outline:
Module 1: Introduction to differential privacy (30 minutes)

In the first module, we motivate the need for strong privacy protection by looking at sophisticated attacks (reconstruction and membership inference attacks) that render conventional privacy protection methods ineffective. We then present differential privacy, describing what it means, where it can be used and how it protects against attacks that are both known and unknown. We provide a brief overview of existing open-source differential privacy tools.

Module 2: Analyzing data under differential privacy (30 minutes)

Next, we show using examples the key ideas behind accurately analyzing data under differential privacy. Key concepts we will cover include:

The concept of a privacy loss budget and privacy loss accounting

The interplay between noise and privacy loss

How queries must be modified to ensure differential privacy (and therefore reduce the risk of catastrophic privacy loss)

The focus will be on supporting workflows that involve statistical queries (counts, averages, medians) on structured tabular data, possibly transformed through operations such as group by, filter, join, map, etc. Examples will use the Tumult Analytics programming platform, but the ideas are applicable to any differentially private computation.

Module 3: Differentially private data release using Tumult Analytics

In the final module, we do a deeper dive into an example application where a data custodian wishes to release aggregate statistics with high accuracy and strong privacy guarantees. We will use the Tumult Analytics programming platform to build an end-to-end solution.


Ashwin Machanavajjhala is an Assistant Professor in the Department of Computer Science, Duke University and an Associate Director at the Information Initiative@Duke (iiD). Previously, he was a Senior Research Scientist in the Knowledge Management group at Yahoo! Research. His primary research interests lie in algorithms for ensuring privacy in statistical databases and augmented reality applications. He is a recipient of the National Science Foundation Faculty Early CAREER award in 2013, and the 2008 ACM SIGMOD Jim Gray Dissertation Award Honorable Mention. Ashwin graduated with a Ph.D. from the Department of Computer Science, Cornell University and a B.Tech in Computer Science and Engineering from the Indian Institute of Technology, Madras.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google