We study the fundamental problems of (i) uniformity testing of a discrete distribution,
and (ii) closeness testing between two discrete distributions with bounded $\ell_2$-norm.
These problems have been extensively studied in distribution testing
and sample-optimal estimators are known for them~\cite{Paninski:08, CDVV14, VV14, DKN:15}.
In this work, we show that the original collision-based testers proposed for these problems
~\cite{GRdist:00, BFR+:00} are sample-optimal, up to constant factors.
Previous analyses showed sample complexity upper bounds for these testers that are optimal
as a function of the domain size $n$, but suboptimal by polynomial factors
in the error parameter $\epsilon$. Our main contribution is a new tight analysis
establishing that these collision-based testers are information-theoretically optimal,
up to constant factors, both in the dependence on $n$ and in the dependence on $\epsilon$.
The collision probability tester, introduced by Goldreich and Ron ({\em ECCC}, TR00-020, 2000),
distinguishes the uniform distribution over $[n]$ from any distribution that is $\e$-far from this distribution using $\poly(1/\epilon)\cdot{\sqrt n}$ samples.
While the original analysis established only an
upper bound of $O(1/\epsilon)^4\cdot{\sqrt n}$ on the sample complexity, a recent analysis of Diakonikolas, Gouleakis, Peebles, and Price
({\em ECCC}, TR16-178, 2016) established the optimal upper bound of $O(1/\epsilon)^2\cdot{\sqrt n}$.
In this note we survey their analysis,
while highlighting the sources of improvement. Specifically:
\begin{itemize}
\item
While the original analysis reduces the testing problem to approximating the collision probability of the unknown distribution up to a $1+\epsilon^2$ factor, the improved analysis capitalizes on the fact that the latter problem
needs only be solved ``at the extreme'' (i.e., it suffices to distinguish the uniform distribution, which has collision probability $1/n$, from any distribution that has collision probability exceeding $(1+4\epsilon^2)/n$).
\item
While the original analysis provides an almost
optimal analysis of the variance of the estimator when $\epsilon=\Omega(1)$, a more careful analysis yields a significantly better bound for the case of $\epsilon=o(1)$, which is the case that is relevant here.
\end{itemize}