Approximating and Testing $k$-Histogram Distributions in Sub-linear time

Piotr Indyk; Reut Levi; Ronitt Rubinfeld

Electronic Colloquium on Computational Complexity

Under the auspices of the Computational Complexity Foundation (CCF)

REPORTS > DETAIL:

Revision(s):

Revision #1 to TR11-171 | 31st July 2014 21:36

Approximating and Testing k-Histogram Distributions in Sub-linear time

Revision #1 Authors: Piotr Indyk, Reut Levi, Ronitt Rubinfeld
Accepted on: 31st July 2014 21:36
Downloads: 1406

Keywords:

Abstract:

A discrete distribution $p$, over $[n]$, is a $k$-histogram if its probability distribution function can be represented as a piece-wise constant function with $k$ pieces. Such a function is represented by a list of $k$ intervals and $k$ corresponding values. We consider the following problem: given a collection of samples from a distribution $p$, find a $k$-histogram that (approximately) minimizes the $\ell_2$ distance to the distribution $p$. We give time and sample efficient algorithms for this problem.

We further provide algorithms that distinguish distributions that have the property of being a $k$-histogram from distributions that are $\epsilon$-far from any $k$-histogram in the $\ell_1$ distance and $\ell_2$ distance respectively.

Paper:

TR11-171 | 15th December 2011 14:29

Approximating and Testing $k$-Histogram Distributions in Sub-linear time

TR11-171 Authors: Piotr Indyk, Reut Levi, Ronitt Rubinfeld
Publication: 17th December 2011 00:05
Downloads: 3539

Keywords:

distribution, histogram

Abstract:

A discrete distribution $p$, over $[n]$, is a $k$-histogram if its probability distribution function can be
represented as a piece-wise constant function with $k$ pieces. Such a function
is
represented by a list of $k$ intervals and $k$ corresponding values. We consider
the following problem: given a collection of samples from a distribution $p$,
find a $k$-histogram that (approximately) minimizes the $\ell_2$ distance to the
distribution $p$.
We give time and sample efficient algorithms for this problem.

We further provide algorithms that distinguish distributions that have the
property of being a $k$-histogram from distributions that are $\eps$-far from
any
$k$-histogram in the $\ell_1$ distance and $\ell_2$ distance respectively.

ISSN 1433-8092 | Imprint