__
TR05-088 | 3rd August 2005 00:00
__

#### Learning Juntas in the Presence of Noise

TR05-088
Authors:

Jan Arpe
Publication: 16th August 2005 17:10

Downloads: 1625

Keywords:

**Abstract:**
The combination of two major challenges in machine learning is investigated: dealing with large amounts of irrelevant information and learning from noisy data. It is shown that large classes of Boolean concepts that depend on a small number of variables---so-called juntas---can be learned efficiently from random examples corrupted by random attribute and classification noise.

To accomplish this goal, a two-phase algorithm is presented that copes with several problems arising from the presence of noise: firstly, a suitable method for approximating Fourier coefficients in the presence of noise is applied to infer the relevant variables. Secondly, as one cannot simply read off a truth table from the examples as in the noise-free case, an alternative method to build a hypothesis is established and applied to the examples restricted to the relevant variables.

In particular, for the class of monotone juntas depending on $d$ out of $n$ variables, the sample complexity is polynomial in $\log(n/\delta)$, $2^d$, $\gamma_a^{-d}$, and $\gamma_b^{-1}$, where $\delta$ is the confidence parameter and $\gamma_a,\gamma_b>0$ are noise parameters bounding the noise rates away from $1/2$. The running time is bounded by the sample complexity times a polynomial in $n$.

So far, all results hold for the case of uniformly distributed examples---the only case that (apart from side notes) has been studied in the literature yet. We show how to extend our methods to non-uniformly distributed examples and derive new results for monotone juntas.

For the attribute noise, we have to assume that it is generated by a product distribution since otherwise fault-tolerant learning is in general impossible: we construct a noise distribution $P$ and a concept class $\C$ such that it is impossible to learn $\C$ under $P$-noise.