ECCC-Report TR10-017https://eccc.weizmann.ac.il/report/2010/017Comments and Revisions published for TR10-017en-usSun, 13 Jul 2014 03:00:11 +0300
Revision 4
| PCPs and the Hardness of Generating Private Synthetic Data |
Jonathan Ullman,
Salil Vadhan
https://eccc.weizmann.ac.il/report/2010/017#revision4Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $\san$ that takes a database $D\in (\{0,1\}^d)^n$ and outputs a ``synthetic database'' $\hat{D}$ all of whose two-way marginals are approximately equal to those of $D$. (A two-way marginal is the fraction of database rows $x\in (\{0,1\})^d$ with a given pair of values in a given pair of columns.) This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time $\mathrm{poly}(n,2^d)$.
Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC `09) with encodings based on the PCP Theorem.
We also present both negative and positive results for generating ``relaxed'' synthetic data, where the fraction of rows in $D$ satisfying a predicate $c$ are estimated by applying $c$ to each row of $\hat{D}$ and aggregating the results in some way.Sun, 13 Jul 2014 03:00:11 +0300https://eccc.weizmann.ac.il/report/2010/017#revision4
Revision 3
| PCPs and the Hardness of Generating Synthetic Data |
Jonathan Ullman,
Salil Vadhan
https://eccc.weizmann.ac.il/report/2010/017#revision3Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $\san$ that takes a database $D\in (\{0,1\}^d)^n$ and outputs a ``synthetic database'' $\hat{D}$ all of whose two-way marginals are approximately equal to those of $D$. (A two-way marginal is the fraction of database rows $x\in (\{0,1\})^d$ with a given pair of values in a given pair of columns.) This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time $\mathrm{poly}(n,2^d)$.
Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC `09) with encodings based on the PCP Theorem.
We also present both negative and positive results for generating ``relaxed'' synthetic data, where the fraction of rows in $D$ satisfying a predicate $c$ are estimated by applying $c$ to each row of $\hat{D}$ and aggregating the results in some way.Sun, 13 Jul 2014 02:59:54 +0300https://eccc.weizmann.ac.il/report/2010/017#revision3
Revision 2
| PCPs and the Hardness of Generating Private Synthetic Data |
Jonathan Ullman,
Salil Vadhan
https://eccc.weizmann.ac.il/report/2010/017#revision2Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $A$ that takes a database
$D\in (\{0,1\}^d)^n$ and outputs a ``synthetic database'' $\hat{D}$ all of whose two-way
marginals are approximately equal to those of $D$. (A two-way marginal is the fraction
of database rows $x\in \{0,1\}^d$ with a given pair of values in a given pair of columns.)
This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time
$\poly(n,2^d)$.
Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC `09) with PCP-based Levin-reductions from NP search problems to finding approximate solutions to CSPs.Fri, 07 Jan 2011 02:17:45 +0200https://eccc.weizmann.ac.il/report/2010/017#revision2
Revision 1
| PCPs and the Hardness of Generating Synthetic Data |
Jonathan Ullman,
Salil Vadhan
https://eccc.weizmann.ac.il/report/2010/017#revision1Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $A$ that takes a database
$D\in (\{0,1\}^d)^n$ and outputs a ``synthetic database'' $\hat{D}$ all of whose two-way
marginals are approximately equal to those of $D$. (A two-way marginal is the fraction
of database rows $x\in \{0,1\}^d$ with a given pair of values in a given pair of columns.)
This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time
$poly(n,2^d)$.
Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC `09) with encodings based on the PCP Theorem.
We also present both negative and positive results for generating ``relaxed'' synthetic data, where the fraction of rows in $D$ satisfying a predicate $c$ are estimated by applying $c$ to each row of $\hat{D}$ and aggregating the results in some way. Sat, 06 Nov 2010 04:43:12 +0200https://eccc.weizmann.ac.il/report/2010/017#revision1
Paper TR10-017
| PCPs and the Hardness of Generating Synthetic Data |
Jonathan Ullman,
Salil Vadhan
https://eccc.weizmann.ac.il/report/2010/017Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $A$ that takes a database
$D\in (\{0,1\}^d)^n$ and outputs a ``synthetic database'' $\hat{D}$ all of whose two-way
marginals are approximately equal to those of $D$. (A two-way marginal is the fraction
of database rows $x\in \{0,1\}^d$ with a given pair of values in a given pair of columns.)
This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time
$\poly(n,2^d)$.
Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC `09) with PCP-based Levin-reductions from NP search problems to finding approximate solutions to CSPs.Thu, 11 Feb 2010 10:59:56 +0200https://eccc.weizmann.ac.il/report/2010/017