Learning Juntas in the Presence of Noise

Jan Arpe

Electronic Colloquium on Computational Complexity

Under the auspices of the Computational Complexity Foundation (CCF)

REPORTS > DETAIL:

Paper:

TR05-088 | 3rd August 2005 00:00

Learning Juntas in the Presence of Noise

TR05-088 Authors: Jan Arpe
Publication: 16th August 2005 17:10
Downloads: 3741

Keywords:

Fourier analysis, juntas, learning of Boolean functions, learning under irrelevant information, noise-tolerant learning

Abstract:

The combination of two major challenges in machine learning is investigated: dealing with large amounts of irrelevant information and learning from noisy data. It is shown that large classes of Boolean concepts that depend on a small number of variables---so-called juntas---can be learned efficiently from random examples corrupted by random attribute and classification noise.
To accomplish this goal, a two-phase algorithm is presented that copes with several problems arising from the presence of noise: firstly, a suitable method for approximating Fourier coefficients in the presence of noise is applied to infer the relevant variables. Secondly, as one cannot simply read off a truth table from the examples as in the noise-free case, an alternative method to build a hypothesis is established and applied to the examples restricted to the relevant variables.
In particular, for the class of monotone juntas depending on $d$ out of $n$ variables, the sample complexity is polynomial in $\log(n/\delta)$, $2^d$, $\gamma_a^{-d}$, and $\gamma_b^{-1}$, where $\delta$ is the confidence parameter and $\gamma_a,\gamma_b>0$ are noise parameters bounding the noise rates away from $1/2$. The running time is bounded by the sample complexity times a polynomial in $n$.
So far, all results hold for the case of uniformly distributed examples---the only case that (apart from side notes) has been studied in the literature yet. We show how to extend our methods to non-uniformly distributed examples and derive new results for monotone juntas.
For the attribute noise, we have to assume that it is generated by a product distribution since otherwise fault-tolerant learning is in general impossible: we construct a noise distribution $P$ and a concept class $\C$ such that it is impossible to learn $\C$ under $P$-noise.

ISSN 1433-8092 | Imprint