Loading jsMath...
Weizmann Logo
ECCC
Electronic Colloquium on Computational Complexity

Under the auspices of the Computational Complexity Foundation (CCF)

Login | Register | Classic Style



REPORTS > DETAIL:

Revision(s):

Revision #1 to TR15-160 | 29th January 2016 16:47

Are Few Bins Enough: Testing Histogram Distributions

RSS-Feed




Revision #1
Authors: Clement Canonne
Accepted on: 29th January 2016 16:47
Downloads: 1403
Keywords: 


Abstract:

A probability distribution over an ordered universe [n]=\{1,\dots,n\} is said to be a k-histogram if it can be represented as a piecewise-constant function over at most k contiguous intervals. We study the following question: given samples from an arbitrary distribution D over [n], one must decide whether D is a k-histogram, or is far in \ell_1 distance from any such succinct representation. We obtain a sample and time-efficient algorithm for this problem, complemented by a nearly-matching information-theoretic lower bound on the number of samples required for this task. Our results significantly improve on the previous state-of-the-art, due to Indyk, Levi, and Rubinfeld (2012) and Canonne, Diakonikolas, Gouleakis, and Rubinfeld (2015).



Changes to previous version:

Added some discussion; corrected some typos, and updated the bibliography.


Paper:

TR15-160 | 30th September 2015 04:05

Are Few Bins Enough: Testing Histogram Distributions





TR15-160
Authors: Clement Canonne
Publication: 30th September 2015 16:21
Downloads: 1568
Keywords: 


Abstract:

A probability distribution over an ordered universe [n]=\{1,\dots,n\} is said to be a k-histogram if it can be represented as a piecewise-constant function over at most k contiguous intervals. We study the following question: given samples from an arbitrary distribution D over [n], one must decide whether D is a k-histogram, or is far in \ell_1 distance from any such succinct representation. We obtain a sample and time-efficient algorithm for this problem, complemented by a nearly-matching information-theoretic lower bound on the number of samples required for this task. Our results significantly improve on the previous state-of-the-art, due to Indyk, Levi, and Rubinfeld (2012) and Canonne, Diakonikolas, Gouleakis, and Rubinfeld (2015).



ISSN 1433-8092 | Imprint