Loosely speaking, the effective support size of a distribution is the size of the support of a distribution that is close to it (in totally variation distance).
We study the complexity of estimating the effective support size of an unknown distribution when given samples of the distributions as well as an evaluation oracle (which returns the probability that the queried element appears in the distribution).
In this context, we present several algorithms that exhibit a trade-off between the quality of the approximation and the complexity of obtaining it, and leave open the question of their optimality.
In particular, for any constant $\beta>1$, we present an algorithm that, on input $\epsilon>0$ and oracle access to a distribution $\cal D$, uses $O(1/\epsilon^{1.001})$ samples and queries, and outputs a number $\widetilde{n}$ such that $\cal D$ is $\epsilon$-far from any distribution that has support of size $\widetilde{n}$ but is $\beta\cdot\epsilon$-close to a distribution that has support size $f\cdot\widetilde{n}$, where $f=O(\log\log\log\log\log({\widetilde{n}}/\epsilon))$.
(Indeed, 1.001 stands for any constant larger than 1, and $\log\log\log\log\log$ stands for any constant iterations of the logarithmic function.)
The original presentation contained several inaccuracies and minor errors, which are corrected in the current version. Most importantly, the original Claim 2.1.2 (Part 2) was off by a factor of $\beta-1$, but this has negligible effect on the main results.
In addition, the current provides more technical details.
Loosely speaking, the effective support size of a distribution is the size of the support of a distribution that is close to it (in totally variation distance).
We study the complexity of estimating the effective support size of an unknown distribution when given samples of the distributions as well as an evaluation oracle (which returns the probability that the queried element appears in the distribution).
In this context, we present several algorithms that exhibit a trade-off between the quality of the approximation and the complexity of obtaining it, and leave open the question of their optimality.