Revision #1 Authors: Piotr Indyk, Reut Levi, Ronitt Rubinfeld

Accepted on: 31st July 2014 21:36

Downloads: 822

Keywords:

A discrete distribution $p$, over $[n]$, is a $k$-histogram if its probability distribution function can be represented as a piece-wise constant function with $k$ pieces. Such a function is represented by a list of $k$ intervals and $k$ corresponding values. We consider the following problem: given a collection of samples from a distribution $p$, find a $k$-histogram that (approximately) minimizes the $\ell_2$ distance to the distribution $p$. We give time and sample efficient algorithms for this problem.

We further provide algorithms that distinguish distributions that have the property of being a $k$-histogram from distributions that are $\epsilon$-far from any $k$-histogram in the $\ell_1$ distance and $\ell_2$ distance respectively.

TR11-171 Authors: Piotr Indyk, Reut Levi, Ronitt Rubinfeld

Publication: 17th December 2011 00:05

Downloads: 1866

Keywords:

A discrete distribution $p$, over $[n]$, is a $k$-histogram if its probability distribution function can be

represented as a piece-wise constant function with $k$ pieces. Such a function

is

represented by a list of $k$ intervals and $k$ corresponding values. We consider

the following problem: given a collection of samples from a distribution $p$,

find a $k$-histogram that (approximately) minimizes the $\ell_2$ distance to the

distribution $p$.

We give time and sample efficient algorithms for this problem.

We further provide algorithms that distinguish distributions that have the

property of being a $k$-histogram from distributions that are $\eps$-far from

any

$k$-histogram in the $\ell_1$ distance and $\ell_2$ distance respectively.