Time-Space Lower Bounds for Two-Pass Learning

Sumegha Garg; Ran Raz; Avishay Tal

Electronic Colloquium on Computational Complexity

Under the auspices of the Computational Complexity Foundation (CCF)

REPORTS > DETAIL:

Paper:

TR19-071 | 14th May 2019 18:49

Time-Space Lower Bounds for Two-Pass Learning

TR19-071 Authors: Sumegha Garg, Ran Raz, Avishay Tal
Publication: 14th May 2019 19:35
Downloads: 1625

Keywords:

branching program, lower bounds, PAC learning, time-space tradeoff, two-pass streaming

Abstract:

A line of recent works showed that for a large class of learning problems, any learning algorithm requires either super-linear memory size or a super-polynomial number of samples [Raz16,KRT17,Raz17,MM18,BOGY18,GRT18]. For example, any algorithm for learning parities of size $n$ requires either a memory of size $\Omega(n^{2})$ or an exponential number of samples [Raz16].

All these works modeled the learner as a one-pass branching program, allowing only one pass over the stream of samples. In this work, we prove the first memory-samples lower bounds (with a super-linear lower bound on the memory size and super-polynomial lower bound on the number of samples) when the learner is allowed two passes over the stream of samples. For example, we prove that any two-pass algorithm for learning parities of size $n$ requires either a memory of size $\Omega(n^{1.5})$ or at least $2^{\Omega(\sqrt{n})}$ samples.

More generally, a matrix $M: A \times X \rightarrow \{-1,1\}$ corresponds to the following learning problem: An unknown element $x \in X$ is chosen uniformly at random. A learner tries to learn $x$ from a stream of samples, $(a_1, b_1), (a_2, b_2) \ldots$, where for every $i$, $a_i \in A$ is chosen uniformly at random and $b_i = M(a_i,x)$.

Assume that $k,l, r$ are such that any submatrix of $M$ of at least $2^{-k} \cdot |A|$ rows and at least $2^{-l} \cdot |X|$ columns, has a bias of at most $2^{-r}$. We show that any two-pass learning algorithm for the learning problem corresponding to $M$ requires either a memory of size at least $\Omega\left(k \cdot \min\{k,\sqrt{l}\} \right)$, or at least $2^{\Omega(\min\{k,\sqrt{l},r\})}$ samples.

ISSN 1433-8092 | Imprint