Michael Schmitt

A neural network is said to be nonoverlapping if there is at most one

edge outgoing from each node. We investigate the number of examples

that a learning algorithm needs when using nonoverlapping neural

networks as hypotheses. We derive bounds for this sample complexity

in terms of the Vapnik-Chervonenkis dimension. ...
more >>>

Rocco Servedio

We show that the class of monotone $2^{O(\sqrt{\log n})}$-term DNF

formulae can be PAC learned in polynomial time under the uniform

distribution. This is an exponential improvement over previous

algorithms in this model, which could learn monotone

$o(\log^2 n)$-term DNF, and is the first efficient algorithm

for ...
more >>>

Ke Yang

We prove two lower bounds on the Statistical Query (SQ) learning

model. The first lower bound is on weak-learning. We prove that for a

concept class of SQ-dimension $d$, a running time of

$\Omega(d/\log d)$ is needed. The SQ-dimension of a concept class is

defined to be the maximum number ...
more >>>

Adam Klivans, Alexander A. Sherstov

We give the first representation-independent hardness results for

PAC learning intersections of halfspaces, a central concept class

in computational learning theory. Our hardness results are derived

from two public-key cryptosystems due to Regev, which are based on the

worst-case hardness of well-studied lattice problems. Specifically, we

prove that a polynomial-time ...
more >>>

Scott Aaronson

Traditional quantum state tomography requires a number of measurements that grows exponentially with the number of qubits n. But using ideas from computational learning theory, we show that "for most practical purposes" one can learn a state using a number of measurements that grows only linearly with n. Besides possible ... more >>>

Vitaly Feldman

We study the properties of the agnostic learning framework of Haussler (1992)and Kearns, Schapire and Sellie (1992). In particular, we address the question: is there any situation in which membership queries are useful in agnostic learning?

Our results show that the answer is negative for distribution-independent agnostic learning and positive ... more >>>

Alexander A. Sherstov

The threshold degree of a Boolean function

$f\colon\{0,1\}\to\{-1,+1\}$ is the least degree of a real

polynomial $p$ such $f(x)\equiv\mathrm{sgn}\; p(x).$ We

construct two halfspaces on $\{0,1\}^n$ whose intersection has

threshold degree $\Theta(\sqrt n),$ an exponential improvement on

previous lower bounds. This solves an open problem due to Klivans

(2002) and ...
more >>>

Alexander A. Sherstov

The threshold degree of a function

$f\colon\{0,1\}^n\to\{-1,+1\}$ is the least degree of a

real polynomial $p$ with $f(x)\equiv\mathrm{sgn}\; p(x).$ We

prove that the intersection of two halfspaces on

$\{0,1\}^n$ has threshold degree $\Omega(n),$ which

matches the trivial upper bound and completely answers

a question due to Klivans (2002). The best ...
more >>>

Oded Goldreich, Dana Ron

We initiate a study of learning and testing dynamic environments,

focusing on environment that evolve according to a fixed local rule.

The (proper) learning task consists of obtaining the initial configuration

of the environment, whereas for non-proper learning it suffices to predict

its future values. The testing task consists of ...
more >>>

Ilya Volkovich

We extend the line of research initiated by Fortnow and Klivans \cite{FortnowKlivans09} that studies the relationship between efficient learning algorithms and circuit lower bounds. In \cite{FortnowKlivans09}, it was shown that if a Boolean circuit class $\mathcal{C}$ has an efficient \emph{deterministic} exact learning algorithm, (i.e. an algorithm that uses membership and ... more >>>

Eric Blais, Clement Canonne, Igor Carboni Oliveira, Rocco Servedio, Li-Yang Tan

Monotone Boolean functions, and the monotone Boolean circuits that compute them, have been intensively studied in complexity theory. In this paper we study the structure of Boolean functions in terms of the minimum number of negations in any circuit computing them, a complexity measure that interpolates between monotone functions and ... more >>>

Shay Moran, Amir Yehudayoff

We prove that proper PAC learnability implies compression. Namely, if a concept $C \subseteq \Sigma^X$ is properly PAC learnable with $d$ samples, then $C$ has a sample compression scheme of size $2^{O(d)}$.

In particular, every boolean concept class with constant VC dimension has a sample compression scheme of constant size. ...
more >>>

Gillat Kol, Ran Raz, Avishay Tal

We define a concept class ${\cal F}$ to be time-space hard (or memory-samples hard) if any learning algorithm for ${\cal F}$ requires either a memory of size super-linear in $n$ or a number of samples super-polynomial in $n$, where $n$ is the length of one sample.

A recent work shows ... more >>>

Michal Moshkovitz, Dana Moshkovitz

One can learn any hypothesis class $H$ with $O(\log|H|)$ labeled examples. Alas, learning with so few examples requires saving the examples in memory, and this requires $|X|^{O(\log|H|)}$ memory states, where $X$ is the set of all labeled examples. A question that arises is how many labeled examples are needed in ... more >>>

Michal Moshkovitz, Dana Moshkovitz

With any hypothesis class one can associate a bipartite graph whose vertices are the hypotheses H on one side and all possible labeled examples X on the other side, and an hypothesis is connected to all the labeled examples that are consistent with it. We call this graph the hypotheses ... more >>>

Sumegha Garg, Ran Raz, Avishay Tal

A matrix $M: A \times X \rightarrow \{-1,1\}$ corresponds to the following learning problem: An unknown element $x \in X$ is chosen uniformly at random. A learner tries to learn $x$ from a stream of samples, $(a_1, b_1), (a_2, b_2) \ldots$, where for every $i$, $a_i \in A$ is chosen ... more >>>

Sumegha Garg, Ran Raz, Avishay Tal

A line of recent works showed that for a large class of learning problems, any learning algorithm requires either super-linear memory size or a super-polynomial number of samples [Raz16,KRT17,Raz17,MM18,BOGY18,GRT18]. For example, any algorithm for learning parities of size $n$ requires either a memory of size $\Omega(n^{2})$ or an exponential number ... more >>>

Pascale Gourdeau, Varun Kanade, Marta Kwiatkowska, James Worrell

It is becoming increasingly important to understand the vulnerability of machine learning models to adversarial attacks. In this paper we study the feasibility of robust learning from the perspective of computational learning theory, considering both sample and computational complexity. In particular, our definition of robust learnability requires polynomial sample complexity. ... more >>>

Guy Blanc, Jane Lange, Li-Yang Tan

Consider the following heuristic for building a decision tree for a function $f : \{0,1\}^n \to \{\pm 1\}$. Place the most influential variable $x_i$ of $f$ at the root, and recurse on the subfunctions $f_{x_i=0}$ and $f_{x_i=1}$ on the left and right subtrees respectively; terminate once the tree is an ... more >>>

Shafi Goldwasser, Guy Rothblum, Jonathan Shafer, Amir Yehudayoff

We consider the following question: using a source of labeled data and interaction with an untrusted prover, what is the complexity of verifying that a given hypothesis is "approximately correct"? We study interactive proof systems for PAC verification, where a verifier that interacts with a prover is required to accept ... more >>>