Under the auspices of the Computational Complexity Foundation (CCF)

REPORTS > DETAIL:

Revision(s):

Revision #4 to TR10-017 | 13th July 2014 03:00

PCPs and the Hardness of Generating Private Synthetic Data

Revision #4
Accepted on: 13th July 2014 03:00
Keywords:

Abstract:

Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $\san$ that takes a database $D\in (\{0,1\}^d)^n$ and outputs a synthetic database'' $\hat{D}$ all of whose two-way marginals are approximately equal to those of $D$. (A two-way marginal is the fraction of database rows $x\in (\{0,1\})^d$ with a given pair of values in a given pair of columns.) This answers a question of Barak et al. (PODS 07), who gave an algorithm running in time $\mathrm{poly}(n,2^d)$.

Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC 09) with encodings based on the PCP Theorem.

We also present both negative and positive results for generating relaxed'' synthetic data, where the fraction of rows in $D$ satisfying a predicate $c$ are estimated by applying $c$ to each row of $\hat{D}$ and aggregating the results in some way.

Revision #3 to TR10-017 | 13th July 2014 02:59

PCPs and the Hardness of Generating Synthetic Data

Revision #3
Accepted on: 13th July 2014 02:59
Keywords:

Abstract:

Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $\san$ that takes a database $D\in (\{0,1\}^d)^n$ and outputs a synthetic database'' $\hat{D}$ all of whose two-way marginals are approximately equal to those of $D$. (A two-way marginal is the fraction of database rows $x\in (\{0,1\})^d$ with a given pair of values in a given pair of columns.) This answers a question of Barak et al. (PODS 07), who gave an algorithm running in time $\mathrm{poly}(n,2^d)$.

Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC 09) with encodings based on the PCP Theorem.

We also present both negative and positive results for generating relaxed'' synthetic data, where the fraction of rows in $D$ satisfying a predicate $c$ are estimated by applying $c$ to each row of $\hat{D}$ and aggregating the results in some way.

Revision #2 to TR10-017 | 7th January 2011 02:17

PCPs and the Hardness of Generating Private Synthetic Data

Revision #2
Accepted on: 7th January 2011 02:17
Keywords:

Abstract:

Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $A$ that takes a database
$D\in (\{0,1\}^d)^n$ and outputs a synthetic database'' $\hat{D}$ all of whose two-way
marginals are approximately equal to those of $D$. (A two-way marginal is the fraction
of database rows $x\in \{0,1\}^d$ with a given pair of values in a given pair of columns.)
This answers a question of Barak et al. (PODS 07), who gave an algorithm running in time
$\poly(n,2^d)$.

Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC 09) with PCP-based Levin-reductions from NP search problems to finding approximate solutions to CSPs.

Revision #1 to TR10-017 | 6th November 2010 04:43

PCPs and the Hardness of Generating Synthetic Data

Revision #1
Accepted on: 6th November 2010 04:43
Keywords:

Abstract:

Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $A$ that takes a database
$D\in (\{0,1\}^d)^n$ and outputs a synthetic database'' $\hat{D}$ all of whose two-way
marginals are approximately equal to those of $D$. (A two-way marginal is the fraction
of database rows $x\in \{0,1\}^d$ with a given pair of values in a given pair of columns.)
This answers a question of Barak et al. (PODS 07), who gave an algorithm running in time
$poly(n,2^d)$.

Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC 09) with encodings based on the PCP Theorem.

We also present both negative and positive results for generating relaxed'' synthetic data, where the fraction of rows in $D$ satisfying a predicate $c$ are estimated by applying $c$ to each row of $\hat{D}$ and aggregating the results in some way.

Paper:

TR10-017 | 10th February 2010 21:08

PCPs and the Hardness of Generating Synthetic Data

TR10-017
Publication: 11th February 2010 10:59
Keywords:

Abstract:

Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $A$ that takes a database
$D\in (\{0,1\}^d)^n$ and outputs a synthetic database'' $\hat{D}$ all of whose two-way
marginals are approximately equal to those of $D$. (A two-way marginal is the fraction
of database rows $x\in \{0,1\}^d$ with a given pair of values in a given pair of columns.)
This answers a question of Barak et al. (PODS 07), who gave an algorithm running in time
$\poly(n,2^d)$.

Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC 09) with PCP-based Levin-reductions from NP search problems to finding approximate solutions to CSPs.

ISSN 1433-8092 | Imprint