We introduce a framework for proving lower bounds on computational problems over distributions, for a class of algorithms called statistical algorithms. For such algorithms, access to the input distribution is limited to obtaining an estimate of the expectation of any given function on a sample drawn randomly from the input distribution, rather than directly accessing samples. Most natural algorithms of interest in theory and in practice, e.g., moments-based methods, local search, standard iterative methods for convex optimization, MCMC and simulated annealing rely only on such estimates and can be viewed as statistical algorithms.
Our framework is inspired by and generalizes the statistical query model in learning theory (Kearns, 1998). Our main application is a nearly optimal lower bound on the complexity of any statistical algorithm for detecting planted bipartite clique distributions (or planted dense subgraph distributions) when the planted clique has size $O(n^{1/2-\delta})$ for any constant $\delta > 0$. Variants of this problem have been assumed to be hard for applications in cryptography and to prove hardness of other problems; our lower bounds provide concrete evidence of hardness.
Major revision
We introduce a framework for proving lower bounds on computational problems over distributions, based on a class of algorithms called statistical algorithms. For such algorithms, access to the input distribution is limited to obtaining an estimate of the expectation of any given function on a sample drawn randomly from the input distribution, rather than directly accessing samples.
Most natural algorithms of interest in theory and in practice, e.g., moments-based methods, local search, standard iterative methods for convex optimization, MCMC and simulated annealing, are statistical algorithms or have statistical counterparts. Our framework is inspired by and generalizes the statistical query model in learning theory [Kearns 1998].
Our main application is a nearly optimal lower bound on the complexity of any statistical algorithm for detecting planted bipartite clique distributions (or planted dense subgraph distributions) when the planted clique has size $O(n^{1/2-\delta})$ for any constant $\delta > 0$. Variants of these problems have been assumed to be hard to prove hardness for other problems and for cryptographic applications. Our lower bounds provide concrete evidence of hardness, thus supporting these assumptions.
This is a major revision of the paper:
1) introduction and overview were substantially revised
2) main lower bounds strengthened
3) new results added
4) a number of minor bugs fixed
We develop a framework for proving lower bounds on computational problems over distributions, including optimization and unsupervised learning. Our framework is based on defining a restricted class of algorithms, called statistical algorithms, that instead of accessing samples from the input distribution can only obtain an estimate of the expectation of any given function on a sample drawn randomly from the input distribution. Our definition captures many natural algorithms used in theory and practice, e.g. moments-based methods, local search, MCMC and simulated annealing. Our techniques are inspired by (and generalize) the statistical query model in learning theory, which captures the complexity of PAC learning using essentially all known learning methods [Kearns, 1998].
For specific well-known problems over distributions, we give lower bounds on the complexity of any statistical algorithm. These include an exponential lower bounds for moment maximization in $R^n$, and a nearly optimal lower bound for detecting planted clique distributions when the planted clique has size $O(n^{1/2-\delta})$ for any constant $\delta > 0$. Variants of the latter problem have been assumed to be hard to prove hardness for other problems and for cryptographic applications. Our lower bounds provide concrete evidence supporting these assumptions.