Weizmann Logo
ECCC
Electronic Colloquium on Computational Complexity

Under the auspices of the Computational Complexity Foundation (CCF)

Login | Register | Classic Style



REPORTS > DETAIL:

Revision(s):

Revision #4 to TR10-017 | 13th July 2014 03:00

PCPs and the Hardness of Generating Private Synthetic Data

RSS-Feed




Revision #4
Authors: Jonathan Ullman, Salil Vadhan
Accepted on: 13th July 2014 03:00
Downloads: 2643
Keywords: 


Abstract:

Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $\san$ that takes a database $D\in (\{0,1\}^d)^n$ and outputs a ``synthetic database'' $\hat{D}$ all of whose two-way marginals are approximately equal to those of $D$. (A two-way marginal is the fraction of database rows $x\in (\{0,1\})^d$ with a given pair of values in a given pair of columns.) This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time $\mathrm{poly}(n,2^d)$.

Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC `09) with encodings based on the PCP Theorem.

We also present both negative and positive results for generating ``relaxed'' synthetic data, where the fraction of rows in $D$ satisfying a predicate $c$ are estimated by applying $c$ to each row of $\hat{D}$ and aggregating the results in some way.


Revision #3 to TR10-017 | 13th July 2014 02:59

PCPs and the Hardness of Generating Synthetic Data





Revision #3
Authors: Jonathan Ullman, Salil Vadhan
Accepted on: 13th July 2014 02:59
Downloads: 2942
Keywords: 


Abstract:

Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $\san$ that takes a database $D\in (\{0,1\}^d)^n$ and outputs a ``synthetic database'' $\hat{D}$ all of whose two-way marginals are approximately equal to those of $D$. (A two-way marginal is the fraction of database rows $x\in (\{0,1\})^d$ with a given pair of values in a given pair of columns.) This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time $\mathrm{poly}(n,2^d)$.

Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC `09) with encodings based on the PCP Theorem.

We also present both negative and positive results for generating ``relaxed'' synthetic data, where the fraction of rows in $D$ satisfying a predicate $c$ are estimated by applying $c$ to each row of $\hat{D}$ and aggregating the results in some way.


Revision #2 to TR10-017 | 7th January 2011 02:17

PCPs and the Hardness of Generating Private Synthetic Data


Abstract:

Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $A$ that takes a database
$D\in (\{0,1\}^d)^n$ and outputs a ``synthetic database'' $\hat{D}$ all of whose two-way
marginals are approximately equal to those of $D$. (A two-way marginal is the fraction
of database rows $x\in \{0,1\}^d$ with a given pair of values in a given pair of columns.)
This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time
$\poly(n,2^d)$.

Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC `09) with PCP-based Levin-reductions from NP search problems to finding approximate solutions to CSPs.


Revision #1 to TR10-017 | 6th November 2010 04:43

PCPs and the Hardness of Generating Synthetic Data


Abstract:

Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $A$ that takes a database
$D\in (\{0,1\}^d)^n$ and outputs a ``synthetic database'' $\hat{D}$ all of whose two-way
marginals are approximately equal to those of $D$. (A two-way marginal is the fraction
of database rows $x\in \{0,1\}^d$ with a given pair of values in a given pair of columns.)
This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time
$poly(n,2^d)$.

Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC `09) with encodings based on the PCP Theorem.

We also present both negative and positive results for generating ``relaxed'' synthetic data, where the fraction of rows in $D$ satisfying a predicate $c$ are estimated by applying $c$ to each row of $\hat{D}$ and aggregating the results in some way.


Paper:

TR10-017 | 10th February 2010 21:08

PCPs and the Hardness of Generating Synthetic Data


Abstract:

Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $A$ that takes a database
$D\in (\{0,1\}^d)^n$ and outputs a ``synthetic database'' $\hat{D}$ all of whose two-way
marginals are approximately equal to those of $D$. (A two-way marginal is the fraction
of database rows $x\in \{0,1\}^d$ with a given pair of values in a given pair of columns.)
This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time
$\poly(n,2^d)$.

Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC `09) with PCP-based Levin-reductions from NP search problems to finding approximate solutions to CSPs.



ISSN 1433-8092 | Imprint