Collision-based Testers are Optimal for Uniformity and Closeness

Ilias Diakonikolas; Themis Gouleakis; John Peebles; Eric Price

Electronic Colloquium on Computational Complexity

Under the auspices of the Computational Complexity Foundation (CCF)

REPORTS > DETAIL:

Paper:

TR16-178 | 11th November 2016 04:51

Collision-based Testers are Optimal for Uniformity and Closeness

TR16-178 Authors: Ilias Diakonikolas, Themis Gouleakis, John Peebles, Eric Price
Publication: 12th November 2016 09:34
Downloads: 3443

Keywords:

Abstract:

We study the fundamental problems of (i) uniformity testing of a discrete distribution,
and (ii) closeness testing between two discrete distributions with bounded $\ell_2$-norm.
These problems have been extensively studied in distribution testing
and sample-optimal estimators are known for them~\cite{Paninski:08, CDVV14, VV14, DKN:15}.

In this work, we show that the original collision-based testers proposed for these problems
~\cite{GRdist:00, BFR+:00} are sample-optimal, up to constant factors.
Previous analyses showed sample complexity upper bounds for these testers that are optimal
as a function of the domain size $n$, but suboptimal by polynomial factors
in the error parameter $\epsilon$. Our main contribution is a new tight analysis
establishing that these collision-based testers are information-theoretically optimal,
up to constant factors, both in the dependence on $n$ and in the dependence on $\epsilon$.

Comment(s):

Comment #1 to TR16-178 | 6th September 2017 16:30

On the optimal analysis of the collision probability tester

Comment #1 Authors: Oded Goldreich
Accepted on: 6th September 2017 16:30
Downloads: 1841

Keywords:

Abstract:

The collision probability tester, introduced by Goldreich and Ron ({\em ECCC}, TR00-020, 2000),
distinguishes the uniform distribution over $[n]$ from any distribution that is $\e$-far from this distribution using $\poly(1/\epilon)\cdot{\sqrt n}$ samples.
While the original analysis established only an
upper bound of $O(1/\epsilon)^4\cdot{\sqrt n}$ on the sample complexity, a recent analysis of Diakonikolas, Gouleakis, Peebles, and Price
({\em ECCC}, TR16-178, 2016) established the optimal upper bound of $O(1/\epsilon)^2\cdot{\sqrt n}$.
In this note we survey their analysis,
while highlighting the sources of improvement. Specifically:
\begin{itemize}
\item
While the original analysis reduces the testing problem to approximating the collision probability of the unknown distribution up to a $1+\epsilon^2$ factor, the improved analysis capitalizes on the fact that the latter problem
needs only be solved ``at the extreme'' (i.e., it suffices to distinguish the uniform distribution, which has collision probability $1/n$, from any distribution that has collision probability exceeding $(1+4\epsilon^2)/n$).
\item
While the original analysis provides an almost
optimal analysis of the variance of the estimator when $\epsilon=\Omega(1)$, a more careful analysis yields a significantly better bound for the case of $\epsilon=o(1)$, which is the case that is relevant here.
\end{itemize}

ISSN 1433-8092 | Imprint