We consider the problem of identifying a planted assignment given a random $k$-SAT formula consistent with the assignment. This problem exhibits a large algorithmic gap: while the planted solution can always be identified given a formula with $O(n\log n)$ clauses, there are distributions over clauses for which the best known efficient algorithms require $n^{k/2}$ clauses. We propose and study a unified model for planted $k$-SAT, which captures well-known special cases. An instance is described by a planted assignment and a distribution on clauses with $k$ literals. We define its distribution complexity as the largest $r$ for which the distribution is not $r$-wise independent ($1 \leq r \leq k$ for any distribution with a planted assignment).
Our main result is an unconditional lower bound, tight up to logarithmic factors, of $\tilde\Omega(n^{r/2})$ clauses for statistical (query) algorithms, matching the known upper bound (which, as we show, can be implemented using a statistical algorithm). Since known approaches for problems over distributions have statistical analogues (spectral, MCMC, gradient-based, convex optimization etc.), this lower bound provides a rigorous explanation of the observed algorithmic gap. The proof introduces a new general technique for the analysis of statistical algorithms. It also points to a geometric paring phenomenon in the space of all planted assignments.
We describe consequences of our lower bounds to Feige's (2002) refutation hypothesis and to lower bounds on general convex programs that solve planted $k$-SAT. Our bounds also extend to other planted $k$-CSP models, and, in particular, provide concrete evidence for the security of Goldreich's (2000) one-way function and the associated pseudorandom generator when used with a sufficiently hard predicate.
1. Significantly revised the section on lower bounds for convex programs. Now relied on results from [FGV15].
2. Generalized results for planted CSPs. Added a reduction from planted k-SAT to planted k-CPS and relationship to the model in [AM15].
3. Added a discussion section
We consider the problem of identifying a planted assignment given a random $k$-SAT formula consistent with the assignment. This problem exhibits a large algorithmic gap: while the planted solution can always be identified given a formula with $O(n\log n)$ clauses, there are distributions over clauses for which the best known efficient algorithms require $n^{k/2}$ clauses. We propose and study a unified model for planted $k$-SAT, which captures well-known special cases. An instance is described by a planted assignment and a distribution on clauses with $k$ literals. We define its distribution complexity as the largest $r$ for which the distribution is not $r$-wise independent ($1 \leq r \leq k$ for any distribution with a planted assignment).
Our main result is an unconditional lower bound, tight up to logarithmic factors, of $\tilde\Omega(n^{r/2})$ clauses for statistical algorithms, matching the known upper bound (which, as we show, can be implemented using a statistical algorithm). Since known approaches for problems over distributions have statistical analogues (spectral, MCMC, gradient-based, convex optimization etc.), this lower bound provides a rigorous explanation of the observed algorithmic gap. The proof introduces a new general technique for the analysis of statistical algorithms. It also points to a geometric paring phenomenon in the space of all planted assignments that might be of independent interest.
As a consequence, we prove that a strong form of Feige's refutation hypothesis holds for statistical algorithms. Our bounds extend to the planted $k$-CSP model, defined by Goldreich as a candidate for one-way function, and therefore provide concrete evidence for the security of Goldreich's one-way function and the associated pseudorandom generator when used with a sufficiently hard predicate.