ECCC-Report TR10-180https://eccc.weizmann.ac.il/report/2010/180Comments and Revisions published for TR10-180en-usThu, 18 Nov 2010 16:12:04 +0200
Paper TR10-180
| Estimating the unseen: A sublinear-sample canonical estimator of distributions |
Gregory Valiant,
Paul Valiant
https://eccc.weizmann.ac.il/report/2010/180We introduce a new approach to characterizing the unobserved portion of a distribution, which provides sublinear-sample additive estimators for a class of properties that includes entropy and distribution support size. Together with the lower bounds proven in the companion paper [29], this settles the longstanding question of the sample complexities of these estimation problems (up to constant factors). Our algorithm estimates these properties up to an arbitrarily small additive constant, using $O(n/\log n)$ samples; [29] shows that no algorithm on $o(n/\log n)$ samples can achieve this (where $n$ is a bound on the support size, or in the case of estimating the support size, $1/n$ is a lower bound the probability of any element of the domain). Previously, no explicit sublinear-sample algorithms for either of these problems were known.
Additionally, our algorithm runs in time linear in the number of samples used.Thu, 18 Nov 2010 16:12:04 +0200https://eccc.weizmann.ac.il/report/2010/180