We consider the problem of verifying the identity of a distribution: Given the description of a distribution over a discrete support $p=(p_1,p_2,\ldots,p_n)$, how many samples (independent draws) must one obtain from an unknown distribution, $q$, to distinguish, with high probability, the case that $p=q$ from the case that the total variation distance ($L_1$ distance) $||p-q||_1 \ge \epsilon$? We resolve this question, up to constant factors, on an instance by instance basis: there exist universal constants $c,c'$ and a function $f(p,\epsilon)$ on distributions and error parameters, such that our tester distinguishes $p=q$ from $||p-q||_1\ge \epsilon$ using $f(p,\epsilon)$ samples with success probability $>2/3$, but no tester can distinguish $p=q$ from $||p-q||_1\ge c\cdot \epsilon$ when given $c'\cdot f(p,\epsilon)$ samples. The function $f(p,\epsilon)$ is upper-bounded by a multiple of $\frac{||p||_{2/3}}{\epsilon^2}$, but is more complicated, and is significantly smaller in cases when $p$ has many small domain elements. This result significantly generalizes and tightens previous results: since distributions of support at most $n$ have $L_{2/3}$ norm bounded by $\sqrt{n},$ this result immediately shows that for such distributions, $O(\sqrt{n}/{\epsilon^2})$ samples suffice, tightening the previous bound of $O(\frac{\sqrt{n} polylog n}{\epsilon^4})$ for this class of distributions, and matching the (tight) results for the case that $p$ is the uniform distribution over support $n$.
The analysis of our very simple testing algorithm involves several hairy inequalities. To facilitate this analysis, we give a complete characterization of a general class of inequalities---generalizing Cauchy-Schwarz, Holder's inequality, and the monotonicity of $L_p$ norms. Specifically, we characterize the set of sequences $a=a_1,\ldots,a_m,$ $b=b_1,\ldots,b_m,$ $c=c_1\ldots,c_m,$ for which it holds that for all finite sequences of positive numbers $x=x_1,\ldots$ and $y=y_1,\ldots,$ $$\prod_i \left( \sum_j x_j^{a_i} y_j^{b_i}\right)^{c_i} \ge 1.$$ The characterization is of a perhaps non-traditional nature in that it uses linear programming to compute a derivation that may otherwise have to be sought through trial and error, by hand. We do not believe such a characterization has appeared in the literature, and hope its computational nature will facilitate analyses like the one here.