Revision #1 Authors: Clement Canonne, Ilias Diakonikolas, Alistair Stewart

Accepted on: 8th August 2017 06:03

Downloads: 471

Keywords:

We study the general problem of testing whether an unknown distribution belongs to a specified family of distributions. More specifically, given a distribution family $\mathcal{P}$ and sample access to an unknown discrete distribution $\mathbf{P}$, we want to distinguish (with high probability) between the case that $\mathbf{P} \in \mathcal{P}$ and the case that $\mathbf{P}$ is $\epsilon$-far, in total variation distance, from every distribution in $\mathcal{P}$. This is the prototypical hypothesis testing problem that has received significant attention in statistics and, more recently, in theoretical computer science.

The sample complexity of this general inference task depends on the underlying family $\mathcal{P}$. The gold standard in distribution property testing is to design sample-optimal and computationally efficient algorithms for this task. The main contribution of this work is a simple and general testing technique that is applicable to all distribution families whose Fourier spectrum satisfies a certain approximate sparsity property. To the best of our knowledge, ours is the first use of the Fourier transform in the context of distribution testing.

We apply our Fourier-based framework to obtain near sample-optimal and computationally efficient testers for the following fundamental distribution families: Sums of Independent Integer Random Variables (SIIRVs), Poisson Multinomial Distributions (PMDs), and Discrete Log-Concave Distributions. For the first two, ours are the first non-trivial testers in the literature, vastly generalizing previous work on testing Poisson Binomial Distributions. For the third, our tester improves on prior work in both sample and time complexity.

Restructured, and fixed some typos.

TR17-075 Authors: Clement Canonne, Ilias Diakonikolas, Alistair Stewart

Publication: 29th April 2017 21:31

Downloads: 666

Keywords:

We study the general problem of testing whether an unknown discrete distribution belongs to a given family of distributions. More specifically, given a class of distributions $\mathcal{P}$ and sample access to an unknown distribution $\mathbf{P}$, we want to distinguish (with high probability) between the case that $\mathbf{P} \in \mathcal{P}$ and the case that $\mathbf{P}$ is $\epsilon$-far, in total variation distance, from every distribution in $\mathcal{P}$. This is the prototypical hypothesis testing problem that has received significant attention in statistics and,

more recently, in theoretical computer science.

The sample complexity of this general problem depends on the underlying family $\mathcal{P}$. We are interested in designing sample-optimal and computationally efficient algorithms for this task. The main contribution of this work is a new and simple testing technique that is applicable to distribution families whose Fourier spectrum approximately satisfies a certain sparsity property. As the main applications of our Fourier-based testing technique, we obtain the first non-trivial testers for two fundamental families of discrete distributions: Sums of Independent Integer Random Variables (SIIRVs) and Poisson Multinomial Distributions (PMDs). Our testers for these families are nearly sample-optimal and computationally efficient. We also obtain a tester with improved sample complexity for discrete log-concave distributions. To the best of our knowledge, ours is the first use of the Fourier transform in the context of distribution testing.