Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $\san$ that takes a database $D\in (\{0,1\}^d)^n$ and outputs a ``synthetic database'' $\hat{D}$ all of whose two-way marginals are approximately equal to those of $D$. (A two-way marginal is the fraction of database rows $x\in (\{0,1\})^d$ with a given pair of values in a given pair of columns.) This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time $\mathrm{poly}(n,2^d)$.
Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC `09) with encodings based on the PCP Theorem.
We also present both negative and positive results for generating ``relaxed'' synthetic data, where the fraction of rows in $D$ satisfying a predicate $c$ are estimated by applying $c$ to each row of $\hat{D}$ and aggregating the results in some way.
Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $\san$ that takes a database $D\in (\{0,1\}^d)^n$ and outputs a ``synthetic database'' $\hat{D}$ all of whose two-way marginals are approximately equal to those of $D$. (A two-way marginal is the fraction of database rows $x\in (\{0,1\})^d$ with a given pair of values in a given pair of columns.) This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time $\mathrm{poly}(n,2^d)$.
Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC `09) with encodings based on the PCP Theorem.
We also present both negative and positive results for generating ``relaxed'' synthetic data, where the fraction of rows in $D$ satisfying a predicate $c$ are estimated by applying $c$ to each row of $\hat{D}$ and aggregating the results in some way.
Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $A$ that takes a database
$D\in (\{0,1\}^d)^n$ and outputs a ``synthetic database'' $\hat{D}$ all of whose two-way
marginals are approximately equal to those of $D$. (A two-way marginal is the fraction
of database rows $x\in \{0,1\}^d$ with a given pair of values in a given pair of columns.)
This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time
$\poly(n,2^d)$.
Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC `09) with PCP-based Levin-reductions from NP search problems to finding approximate solutions to CSPs.
Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $A$ that takes a database
$D\in (\{0,1\}^d)^n$ and outputs a ``synthetic database'' $\hat{D}$ all of whose two-way
marginals are approximately equal to those of $D$. (A two-way marginal is the fraction
of database rows $x\in \{0,1\}^d$ with a given pair of values in a given pair of columns.)
This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time
$poly(n,2^d)$.
Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC `09) with encodings based on the PCP Theorem.
We also present both negative and positive results for generating ``relaxed'' synthetic data, where the fraction of rows in $D$ satisfying a predicate $c$ are estimated by applying $c$ to each row of $\hat{D}$ and aggregating the results in some way.
Assuming the existence of one-way functions, we show that there is no
polynomial-time, differentially private algorithm $A$ that takes a database
$D\in (\{0,1\}^d)^n$ and outputs a ``synthetic database'' $\hat{D}$ all of whose two-way
marginals are approximately equal to those of $D$. (A two-way marginal is the fraction
of database rows $x\in \{0,1\}^d$ with a given pair of values in a given pair of columns.)
This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time
$\poly(n,2^d)$.
Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC `09) with PCP-based Levin-reductions from NP search problems to finding approximate solutions to CSPs.