Under the auspices of the Computational Complexity Foundation (CCF)
We introduce a new approach to characterizing the unobserved portion of a distribution, which provides sublinear-sample additive estimators for a class of properties that includes entropy and distribution support size. Together with the lower bounds proven in the companion paper , this settles the longstanding question of the sample complexities of these estimation problems (up to constant factors). Our algorithm estimates these properties up to an arbitrarily small additive constant, using $O(n/\log n)$ samples;  shows that no algorithm on $o(n/\log n)$ samples can achieve this (where $n$ is a bound on the support size, or in the case of estimating the support size, $1/n$ is a lower bound the probability of any element of the domain). Previously, no explicit sublinear-sample algorithms for either of these problems were known.
Additionally, our algorithm runs in time linear in the number of samples used.