Do natural proofs imply efficient learning algorithms?

Carmosino et al. (2016) demonstrated that natural proofs of circuit lower bounds for $\Lambda$ imply efficient algorithms for learning $\Lambda$-circuits, but only over \textit{the uniform distribution}, with \textit{membership queries}, and provided $\AC^0[p] \subseteq \Lambda$. We consider whether this implication can be generalized to $\Lambda \not\supseteq \AC^0[p]$, and to learning algorithms which use only random examples and learn over arbitrary example distributions (Valiant's PAC-learning model).

We first observe that, if, for any circuit class $\Lambda$, there is an implication from natural proofs for $\Lambda$ to PAC-learning for $\Lambda$, then standard assumptions from lattice-based cryptography do not hold. In particular, we observe that depth-2 majority circuits are a (conditional) counter example to the implication, since Nisan (1993) gave a natural proof, but Klivans and Sherstov (2009) showed hardness of PAC-learning under lattice-based assumptions. We thus ask:

what learning algorithms can we reasonably expect to follow from Nisan's natural proofs?

Our main result is that all natural proofs arising from a type of communication complexity argument, including Nisan's, imply PAC-learning algorithms in a new \textit{distributional} variant (i.e., an ``average-case'' relaxation) of Valiant's PAC model. Our distributional PAC model is stronger than the average-case prediction model of Blum et al. (1993) and the heuristic PAC model of Nanashima (2021), and has several important properties which make it of independent interest, such as being \textit{boosting-friendly}. The main applications of our result are new distributional PAC-learning algorithms for depth-2 majority circuits, polytopes and DNFs over natural target distributions, as well as the nonexistence of encoded-input weak PRFs that can be evaluated by depth-2 majority circuits.

Do strong natural proofs imply efficient learning algorithms? Carmosino et al. (2016) demonstrated that strong natural proofs of circuit lower bounds for $\Lambda$ imply efficient algorithms for learning $\Lambda$-circuits, but only over \textit{the uniform distribution}, with \textit{membership queries}, and provided $\AC^0[p] \in \Lambda$. We consider whether this implication can be generalized to $\Lambda \not\supseteq \AC^0[p]$, and to learning algorithms that utilize only random examples and learn over arbitrary example distributions. We give results of both positive and negative flavor.

On the negative side, we observe that if, for \textit{every} circuit class $\Lambda$, the implication from natural proofs for $\Lambda$ to learning $\Lambda$-circuits in Valiant's PAC model holds, then there is a polynomial time solution to the $O(n^{1.5})$-$\textrm{uSVP}$ (unique Shortest Vector Problem), and polynomial time quantum solutions to $\tilde{O}(n^{1.5})$-$\mathrm{SVP}$ and $\tilde{O}(n^{1.5})$-$\mathrm{SIVP}$. This indicates that whether natural proofs for $\Lambda$ imply efficient learning algorithms for $\Lambda$ in Valiant's PAC model may depend on $\Lambda$.

On the positive side, our main result is that \textit{specific} natural proofs arising from a type of communication complexity argument (e.g., Nisan (1993), for depth-2 majority circuits) imply PAC-learning algorithms in a new \textit{distributional} variant of Valiant's model. Our distributional PAC model is stronger than the average-case prediction model of Blum et al (1993) and the heuristic PAC model of Nanashima (2021), and has several important properties which make it of independent interest, such as being \textit{boosting-friendly}. The main applications of our result are new distributional PAC-learning algorithms for depth-2 majority circuits, polytopes and DNFs over natural target distributions, as well as the nonexistence of encoded-input weak PRFs that may be evaluated by depth-2 majority circuits.

This version of the paper renders the previous version significantly out-of-date. The new version is a significant re-write, for clarity, and new results. The previous version studies a model called 'average-case PAC-learning,' which we have renamed distributional PAC-learning to decrease confusion. Theorem 1.1, 1.4, 1.5, 1.6 are introduced. Theorem 1.2 is the same as Theorem 1.1 in the previous version. Sections 3, 4, 7 are new. We thank an anonymous reviewer for explaining a simple and direct argument for Theorem 1.5 in the previous version, and remove this claim from the paper (though it is correct).

Carmosino et al. (2016) demonstrated that natural proofs of circuit lower bounds imply algorithms for learning circuits with membership queries over the uniform distribution. Indeed, they exercised this implication to obtain a quasi-polynomial time learning algorithm for ${AC}^0[p]$ circuits, for any prime $p$, by leveraging the existing natural proofs from Razborov (1987) and Smolensky (1987). This achievement raises a logical question: can existing natural proofs be adapted into learning algorithms that utilize random examples and learn over unknown, arbitrary example distributions?

In this work, we show that natural circuit lower bounds proven by specific communication complexity arguments (e.g., Nisan (1994)) witness a ``yes'' answer to this question, under the one limitation of average-case learning. Our primary technical contribution demonstrates a connection between the complexity of learning a concept class in the average-case, and the randomized communication complexity of an evaluation game associated with the class. We apply this finding to derive polynomial time average-case PAC-learning algorithms that use only random examples from arbitrary and unknown distributions, for any concept class that may be evaluated by (for instance) a majority vote of linear threshold functions.

Additionally, our work contributes to a better understanding of the optimal parameters in XOR lemmas for communication complexity. We address a question posed by Viola and Wigderson (2007) by demonstrating that certain enhancements of parameters in their XOR lemmas are false, assuming the existence of one-way functions.