The problem of learning $t$-term DNF formulas (for $t = O(1)$) has been studied extensively in the PAC model since its introduction by Valiant (STOC 1984). A $t$-term DNF can be efficiently learnt using a $t$-term DNF only if $t = 1$ i.e., when it is an AND, while even weakly learning a $2$-term DNF using a constant term DNF was shown to be NP-hard by Khot and Saket (FOCS 2008). On the other hand, Feldman et al. (FOCS 2009) showed the hardness of weakly learning a noisy AND using a halfspace -- the latter being a generalization of an AND, while Khot and Saket (STOC 2008) showed that an intersection of two halfspaces is hard to weakly learn using any function of constantly many halfspaces. The question of whether a $2$-term DNF is efficiently learnable using $2$ or constantly many halfspaces remained open.
In this work we answer this question in the negative by showing the hardness of weakly learning a $2$-term DNF as well as a noisy AND using any function of a constant number of halfspaces. In particular we prove the following.
For any constants $\nu, \zeta > 0$ and $\ell \in \mathbb{N}$, given a distribution over point-value pairs $\{0,1\}^n \times \{0,1\}$, it is NP-hard to decide whether,
YES Case. There is a $2$-term DNF that classifies all the points of the distribution, and an AND that classifies at least $1-\zeta$ fraction of the points correctly.
NO Case. Any boolean function depending on at most $\ell$ halfspaces classifies at most $1/2 + \nu$ fraction of the points of the distribution correctly.
Our result generalizes and strengthens the previous best results mentioned above on the hardness of learning a $2$-term DNF, learning an intersection of two halfspaces, and learning a noisy AND.