A discrete distribution $p$, over $[n]$, is a $k$-histogram if its probability distribution function can be represented as a piece-wise constant function with $k$ pieces. Such a function is represented by a list of $k$ intervals and $k$ corresponding values. We consider the following problem: given a collection of samples from a distribution $p$, find a $k$-histogram that (approximately) minimizes the $\ell_2$ distance to the distribution $p$. We give time and sample efficient algorithms for this problem.
We further provide algorithms that distinguish distributions that have the property of being a $k$-histogram from distributions that are $\epsilon$-far from any $k$-histogram in the $\ell_1$ distance and $\ell_2$ distance respectively.
A discrete distribution $p$, over $[n]$, is a $k$-histogram if its probability distribution function can be
represented as a piece-wise constant function with $k$ pieces. Such a function
is
represented by a list of $k$ intervals and $k$ corresponding values. We consider
the following problem: given a collection of samples from a distribution $p$,
find a $k$-histogram that (approximately) minimizes the $\ell_2$ distance to the
distribution $p$.
We give time and sample efficient algorithms for this problem.
We further provide algorithms that distinguish distributions that have the
property of being a $k$-histogram from distributions that are $\eps$-far from
any
$k$-histogram in the $\ell_1$ distance and $\ell_2$ distance respectively.