TR24-094 Authors: Tal Herman, Guy Rothblum

Publication: 20th May 2024 04:57

Downloads: 242

Keywords:

Suppose Alice has collected a small number of samples from an unknown distribution, and would like to learn about the distribution. Bob, an untrusted data analyst, claims that he ran a sophisticated data analysis on the distribution, and makes assertions about its properties. Can Alice efficiently verify Bob's claims using fewer resources (say in terms of samples and computation) than would be needed to run the analysis herself?

We construct an interactive proof system for any distribution property that can be decided by uniform polynomial-size circuits of bounded depth: the circuit gets a complete description of the distribution and decides whether it has the property. Taking $N$ to be an upper bound on the size of the distribution's support, the verifier's sample complexity, running time, and the communication complexity are all sublinear in $N$: they are bounded by $\widetilde{O}( N^{1-\alpha} + D)$ for a constant $\alpha > 0$, where $D$ is a bound on the depth of the circuits that decide the property. The honest prover runs in $\text{poly}(N)$ time and has quasi-linear sample complexity. Moreover, the proof system is tolerant: it can be used to approximate the distribution's distance from the property. We show similar results for any distribution property that can be decided by a bounded-space Turing machine (that gets as input a complete description of the distribution).

We remark that even for simple properties, deciding the property without a prover requires quasi-linear sample complexity and running time. Prior work [Herman and Rothblum, FOCS 2023] demonstrated sublinear interactive proof systems, but only for the much more restricted class of label-invariant distribution properties.