A neural network is said to be nonoverlapping if there is at most one
edge outgoing from each node. We investigate the number of examples
that a learning algorithm needs when using nonoverlapping neural
networks as hypotheses. We derive bounds for this sample complexity
in terms of the Vapnik-Chervonenkis dimension. ...
more >>>
We show that the class of monotone $2^{O(\sqrt{\log n})}$-term DNF
formulae can be PAC learned in polynomial time under the uniform
distribution. This is an exponential improvement over previous
algorithms in this model, which could learn monotone
$o(\log^2 n)$-term DNF, and is the first efficient algorithm
for ...
more >>>
We prove two lower bounds on the Statistical Query (SQ) learning
model. The first lower bound is on weak-learning. We prove that for a
concept class of SQ-dimension $d$, a running time of
$\Omega(d/\log d)$ is needed. The SQ-dimension of a concept class is
defined to be the maximum number ...
more >>>
We give the first representation-independent hardness results for
PAC learning intersections of halfspaces, a central concept class
in computational learning theory. Our hardness results are derived
from two public-key cryptosystems due to Regev, which are based on the
worst-case hardness of well-studied lattice problems. Specifically, we
prove that a polynomial-time ...
more >>>
Traditional quantum state tomography requires a number of measurements that grows exponentially with the number of qubits n. But using ideas from computational learning theory, we show that "for most practical purposes" one can learn a state using a number of measurements that grows only linearly with n. Besides possible ... more >>>
We study the properties of the agnostic learning framework of Haussler (1992)and Kearns, Schapire and Sellie (1992). In particular, we address the question: is there any situation in which membership queries are useful in agnostic learning?
Our results show that the answer is negative for distribution-independent agnostic learning and positive ... more >>>
The threshold degree of a Boolean function
$f\colon\{0,1\}\to\{-1,+1\}$ is the least degree of a real
polynomial $p$ such $f(x)\equiv\mathrm{sgn}\; p(x).$ We
construct two halfspaces on $\{0,1\}^n$ whose intersection has
threshold degree $\Theta(\sqrt n),$ an exponential improvement on
previous lower bounds. This solves an open problem due to Klivans
(2002) and ...
more >>>
The threshold degree of a function
$f\colon\{0,1\}^n\to\{-1,+1\}$ is the least degree of a
real polynomial $p$ with $f(x)\equiv\mathrm{sgn}\; p(x).$ We
prove that the intersection of two halfspaces on
$\{0,1\}^n$ has threshold degree $\Omega(n),$ which
matches the trivial upper bound and completely answers
a question due to Klivans (2002). The best ...
more >>>
We initiate a study of learning and testing dynamic environments,
focusing on environment that evolve according to a fixed local rule.
The (proper) learning task consists of obtaining the initial configuration
of the environment, whereas for non-proper learning it suffices to predict
its future values. The testing task consists of ...
more >>>
We extend the line of research initiated by Fortnow and Klivans \cite{FortnowKlivans09} that studies the relationship between efficient learning algorithms and circuit lower bounds. In \cite{FortnowKlivans09}, it was shown that if a Boolean circuit class $\mathcal{C}$ has an efficient \emph{deterministic} exact learning algorithm, (i.e. an algorithm that uses membership and ... more >>>
Monotone Boolean functions, and the monotone Boolean circuits that compute them, have been intensively studied in complexity theory. In this paper we study the structure of Boolean functions in terms of the minimum number of negations in any circuit computing them, a complexity measure that interpolates between monotone functions and ... more >>>
We prove that proper PAC learnability implies compression. Namely, if a concept $C \subseteq \Sigma^X$ is properly PAC learnable with $d$ samples, then $C$ has a sample compression scheme of size $2^{O(d)}$.
In particular, every boolean concept class with constant VC dimension has a sample compression scheme of constant size. ...
more >>>
We define a concept class ${\cal F}$ to be time-space hard (or memory-samples hard) if any learning algorithm for ${\cal F}$ requires either a memory of size super-linear in $n$ or a number of samples super-polynomial in $n$, where $n$ is the length of one sample.
A recent work shows ... more >>>
One can learn any hypothesis class $H$ with $O(\log|H|)$ labeled examples. Alas, learning with so few examples requires saving the examples in memory, and this requires $|X|^{O(\log|H|)}$ memory states, where $X$ is the set of all labeled examples. A question that arises is how many labeled examples are needed in ... more >>>
With any hypothesis class one can associate a bipartite graph whose vertices are the hypotheses H on one side and all possible labeled examples X on the other side, and an hypothesis is connected to all the labeled examples that are consistent with it. We call this graph the hypotheses ... more >>>
A matrix $M: A \times X \rightarrow \{-1,1\}$ corresponds to the following learning problem: An unknown element $x \in X$ is chosen uniformly at random. A learner tries to learn $x$ from a stream of samples, $(a_1, b_1), (a_2, b_2) \ldots$, where for every $i$, $a_i \in A$ is chosen ... more >>>
A line of recent works showed that for a large class of learning problems, any learning algorithm requires either super-linear memory size or a super-polynomial number of samples [Raz16,KRT17,Raz17,MM18,BOGY18,GRT18]. For example, any algorithm for learning parities of size $n$ requires either a memory of size $\Omega(n^{2})$ or an exponential number ... more >>>
It is becoming increasingly important to understand the vulnerability of machine learning models to adversarial attacks. In this paper we study the feasibility of robust learning from the perspective of computational learning theory, considering both sample and computational complexity. In particular, our definition of robust learnability requires polynomial sample complexity. ... more >>>
Consider the following heuristic for building a decision tree for a function $f : \{0,1\}^n \to \{\pm 1\}$. Place the most influential variable $x_i$ of $f$ at the root, and recurse on the subfunctions $f_{x_i=0}$ and $f_{x_i=1}$ on the left and right subtrees respectively; terminate once the tree is an ... more >>>
We consider the following question: using a source of labeled data and interaction with an untrusted prover, what is the complexity of verifying that a given hypothesis is "approximately correct"? We study interactive proof systems for PAC verification, where a verifier that interacts with a prover is required to accept ... more >>>
We establish the first general connection between the design of quantum algorithms and circuit lower bounds. Specifically, let $\mathrm{C}$ be a class of polynomial-size concepts, and suppose that $\mathrm{C}$ can be PAC-learned with membership queries under the uniform distribution with error $1/2 - \gamma$ by a time $T$ quantum algorithm. ... more >>>
This paper aims to derandomize the following problems in the smoothed analysis of Spielman and Teng. Learn Disjunctive Normal Form (DNF), invert Fourier Transforms (FT), and verify small circuits' unsatisfiability. Learning algorithms must predict a future observation from the only $m$ i.i.d. samples of a fixed but unknown joint-distribution $P(G(x),y)$ ... more >>>
A seminal result in learning theory characterizes the PAC learnability of binary classes through the Vapnik-Chervonenkis dimension. Extending this characterization to the general multiclass setting has been open since the pioneering works on multiclass PAC learning in the late 1980s. This work resolves this problem: we characterize multiclass PAC learnability ... more >>>
A long-standing open question in computational learning theory is to prove NP-hardness of learning efficient programs, the setting of which is in between proper learning and improper learning. Ko (COLT'90, SICOMP'91) explicitly raised this open question and demonstrated its difficulty by proving that there exists no relativizing proof of NP-hardness ... more >>>
A polynomial-stretch pseudorandom generator (PPRG) in NC$^0$ (i.e., constant parallel time) is one of the most important cryptographic primitives, especially for constructing highly efficient cryptography and indistinguishability obfuscation. The celebrated work (Applebaum, Ishai, and Kushilevitz, SIAM Journal on Computing, 2006) on randomized encodings yields the characterization of sublinear-stretch pseudorandom generators ... more >>>
The Forster transform is a method of regularizing a dataset
by placing it in {\em radial isotropic position}
while maintaining some of its essential properties.
Forster transforms have played a key role in a diverse range of settings
spanning computer science and functional analysis. Prior work had given
{\em ...
more >>>
Carmosino et al. (2016) demonstrated that natural proofs of circuit lower bounds imply algorithms for learning circuits with membership queries over the uniform distribution. Indeed, they exercised this implication to obtain a quasi-polynomial time learning algorithm for ${AC}^0[p]$ circuits, for any prime $p$, by leveraging the existing natural proofs from ... more >>>
We revisit the problem of characterising the complexity of Quantum PAC learning, as introduced by Bshouty and Jackson [SIAM J. Comput.
1998, 28, 1136–1153]. Several quantum advantages have been demonstrated in this setting, however, none are generic: they apply to particular concept classes and typically only work when the distribution ...
more >>>