Revision #1 Authors: Tal Herman, Guy Rothblum

Accepted on: 5th April 2024 18:24

Downloads: 8

Keywords:

Suppose we have access to a small number of samples from an unknown distribution, and would like learn facts about the distribution.

An untrusted data server claims to have studied the distribution and makes assertions about its properties. Can the untrusted data server prove that its assertions are approximately correct? Can a short efficiently verifiable proof be generated in polynomial time?

We study doubly-efficient interactive proof systems that can be used to verify properties of an unknown distribution over a domain $[N]$. On top of efficient verification, our focus is on proofs that the honest prover can generate in polynomial time. More generally, the complexity of generating the proof should be as close as possible to the complexity of simply running a standalone analysis to determine whether the distribution has the property.

Our main result is a new 2-message doubly-efficient interactive proof protocol for verifying any label-invariant distribution property (any property that is invariant to re-labeling of the elements in the domain of the distribution). The sample complexity, communication complexity and verifier runtime are all $\widetilde{O}({\sqrt{N}})$. The proof can be generated in quasi-linear $\widetilde{O}(N)$ time and sample complexities (the runtimes of the verifier and the honest prover hold under a mild assumption about the property's computational complexity). This improves on prior work, where constructing the proof required super-polynomial time [Herman and Rothblum, STOC 2022]. Our new proof system is directly applicable to proving (and verifying) several natural and widely-studied properties, such as a distribution's support size, its Shannon entropy, and its distance from the uniform distribution. For these (and many other) properties, the runtime and sample complexities for generating the proof are within $\text{polylog} (N)$ factors of the complexities for simply determining whether the property holds.

TR23-161 Authors: Tal Herman, Guy Rothblum

Publication: 1st November 2023 19:14

Downloads: 236

Keywords:

Suppose we have access to a small number of samples from an unknown distribution, and would like learn facts about the distribution.

An untrusted data server claims to have studied the distribution and makes assertions about its properties. Can the untrusted data server prove that its assertions are approximately correct? Can a short efficiently verifiable proof be generated in polynomial time?

We study doubly-efficient interactive proof systems that can be used to verify properties of an unknown distribution over a domain $[N]$. On top of efficient verification, our focus is on proofs that the honest prover can generate in polynomial time. More generally, the complexity of generating the proof should be as close as possible to the complexity of simply running a standalone analysis to determine whether the distribution has the property.

Our main result is a new 2-message doubly-efficient interactive proof protocol for verifying any label-invariant distribution property (any property that is invariant to re-labeling of the elements in the domain of the distribution). The sample complexity, communication complexity and verifier runtime are all $\widetilde{O}({\sqrt{N}})$. The proof can be generated in quasi-linear $\widetilde{O}(N)$ time and sample complexities (the runtimes of the verifier and the honest prover hold under a mild assumption about the property's computational complexity). This improves on prior work, where constructing the proof required super-polynomial time [Herman and Rothblum, STOC 2022]. Our new proof system is directly applicable to proving (and verifying) several natural and widely-studied properties, such as a distribution's support size, its Shannon entropy, and its distance from the uniform distribution. For these (and many other) properties, the runtime and sample complexities for generating the proof are within $\text{polylog} (N)$ factors of the complexities for simply determining whether the property holds.