ECCC-Report TR19-088

ECCC-Report TR19-088https://eccc.weizmann.ac.il/report/2019/088Comments and Revisions published for TR19-088en-usTue, 26 Oct 2021 18:24:59 +0300 Revision 1 | On the Complexity of Estimating the Effective Support Size | Oded Goldreich https://eccc.weizmann.ac.il/report/2019/088#revision1Loosely speaking, the effective support size of a distribution is the size of the support of a distribution that is close to it (in totally variation distance). We study the complexity of estimating the effective support size of an unknown distribution when given samples of the distributions as well as an evaluation oracle (which returns the probability that the queried element appears in the distribution). In this context, we present several algorithms that exhibit a trade-off between the quality of the approximation and the complexity of obtaining it, and leave open the question of their optimality. In particular, for any constant $\beta>1$, we present an algorithm that, on input $\epsilon>0$ and oracle access to a distribution $\cal D$, uses $O(1/\epsilon^{1.001})$ samples and queries, and outputs a number $\widetilde{n}$ such that $\cal D$ is $\epsilon$-far from any distribution that has support of size $\widetilde{n}$ but is $\beta\cdot\epsilon$-close to a distribution that has support size $f\cdot\widetilde{n}$, where $f=O(\log\log\log\log\log({\widetilde{n}}/\epsilon))$. (Indeed, 1.001 stands for any constant larger than 1, and $\log\log\log\log\log$ stands for any constant iterations of the logarithmic function.) Tue, 26 Oct 2021 18:24:59 +0300https://eccc.weizmann.ac.il/report/2019/088#revision1 Paper TR19-088 | On the Complexity of Estimating the Effective Support Size | Oded Goldreich https://eccc.weizmann.ac.il/report/2019/088Loosely speaking, the effective support size of a distribution is the size of the support of a distribution that is close to it (in totally variation distance). We study the complexity of estimating the effective support size of an unknown distribution when given samples of the distributions as well as an evaluation oracle (which returns the probability that the queried element appears in the distribution). In this context, we present several algorithms that exhibit a trade-off between the quality of the approximation and the complexity of obtaining it, and leave open the question of their optimality. Sun, 16 Jun 2019 18:33:01 +0300https://eccc.weizmann.ac.il/report/2019/088