__
TR22-052 | 18th April 2022 10:30
__

#### Verifying The Unseen: Interactive Proofs for Label-Invariant Distribution Properties

**Abstract:**
Given i.i.d. samples from an unknown distribution over a large domain $[N]$, approximating several basic quantities, including the distribution's support size, its entropy, and its distance from the uniform distribution, requires $\Theta(N / \log N)$ samples [Valiant and Valiant, STOC 2011].

Suppose, however, that we can interact with a powerful but untrusted prover, who knows the entire distribution (or a good approximation of it). Can we use such a prover to approximate (or rather, to approximately {\em verify}) such statistical quantities more efficiently? We show that this is indeed the case: the support size, the entropy, and the distance from the uniform distribution, can all be approximately verified via a 2-message interactive proof, where the communication complexity, the verifier's running time, and the sample complexity are $\widetilde{O}({\sqrt{N}})$. For all these quantities, the sample complexity is tight up to $\polylog N$ factors (for any interactive proof, regardless of its communication complexity or verification time).

More generally, we give a tolerant interactive proof system with the above sample and communication complexities for verifying a distribution's proximity to any label-invariant property (any property that is invariant to re-labeling of the elements in the distribution's support). The verifier's running time in this more general protocol is also $\widetilde{O}({\sqrt{N}})$, under a mild assumption about the complexity of deciding, given a compact representation of a distribution, whether it is in the property or far from it.