We prove a general time-space lower bound that applies for a large class of learning problems and shows that for every problem in that class, any learning algorithm requires either a memory of quadratic size or an exponential number of samples.
Our result is stated in terms of the norm of the matrix that corresponds to the learning problem. Let $X$, $A$ be two finite sets. Let $M: A \times X \rightarrow \{-1,1\}$ be a matrix. The matrix $M$ corresponds to the following learning problem: An unknown element $x \in X$ was chosen uniformly at random. A learner tries to learn $x$ from a stream of samples, $(a_1, b_1), (a_2, b_2) \ldots$, where for every $i$, $a_i \in A$ is chosen uniformly at random and $b_i = M(a_i,x)$.
Let $\sigma_{\max}$ be the largest singular value of $M$ and note that always $\sigma_{\max} \leq |A|^{\frac{1}{2}}\cdot |X|^{\frac{1}{2}}$.
We show that if $\sigma_{\max} \leq |A|^{\frac{1}{2}}\cdot |X|^{\frac{1}{2} -
\varepsilon}$, then any learning algorithm for the corresponding learning problem requires either a memory of size at least $\Omega \left( (\varepsilon n)^2 \right) $ or at least $2^{\Omega(\varepsilon n)}$ samples, where $n = \log_2 |X|$.
As a special case, this gives a new proof for the time-space lower bound for
parity learning [Raz 2016].