Sourav Chakraborty, Eldar Fischer, Yonatan Goldhirsh, Arie Matsliah

In this paper we define and examine the power of the conditional-sampling oracle in the context of distribution-property testing. The conditional-sampling oracle for a discrete distribution $\mu$ takes as input a subset $S \subset [n]$ of the domain, and outputs a random sample $i \in S$ drawn according to $\mu$, ... more >>>

Gregory Valiant, Paul Valiant

We consider the problem of verifying the identity of a distribution: Given the description of a distribution over a discrete support $p=(p_1,p_2,\ldots,p_n)$, how many samples (independent draws) must one obtain from an unknown distribution, $q$, to distinguish, with high probability, the case that $p=q$ from the case that the total ... more >>>

Jayadev Acharya, Clement Canonne, Gautam Kamath

A recent model for property testing of probability distributions enables tremendous savings in the sample complexity of testing algorithms, by allowing them to condition the sampling on subsets of the domain.

In particular, Canonne et al. showed that, in this setting, testing identity of an unknown distribution $D$ (i.e., ...
more >>>

Clement Canonne

The field of property testing originated in work on program checking, and has evolved into an established and very active research area. In this work, we survey the developments of one of its most recent and prolific offspring, distribution testing. This subfield, at the junction of property testing and Statistics, ... more >>>

Oded Goldreich

Inspired by Diakonikolas and Kane (2016), we reduce the class of problems consisting of testing whether an unknown distribution over $[n]$ equals a fixed distribution to this very problem when the fixed distribution is uniform over $[n]$. Our reduction preserves the parameters of the problem, which are $n$ and the ... more >>>

Ilias Diakonikolas, Daniel Kane

We study problems in distribution property testing:

Given sample access to one or more unknown discrete distributions,

we want to determine whether they have some global property or are $\epsilon$-far

from having the property in $\ell_1$ distance (equivalently, total variation distance, or ``statistical distance'').

In this work, we give a ...
more >>>

Eric Blais, Clement Canonne, Tom Gur

We present a new methodology for proving distribution testing lower bounds, establishing a connection between distribution testing and the simultaneous message passing (SMP) communication model. Extending the framework of Blais, Brody, and Matulef [BBM12], we show a simple way to reduce (private-coin) SMP problems to distribution testing problems. This method ... more >>>

Ilias Diakonikolas, Daniel Kane, Alistair Stewart

We prove the first {\em Statistical Query lower bounds} for two fundamental high-dimensional learning problems involving Gaussian distributions: (1) learning Gaussian mixture models (GMMs), and (2) robust (agnostic) learning of a single unknown mean Gaussian. In particular, we show a {\em super-polynomial gap} between the (information-theoretic) sample complexity and the ... more >>>

Constantinos Daskalakis, Nishanth Dikkala, Gautam Kamath

Given samples from an unknown multivariate distribution $p$, is it possible to distinguish whether $p$ is the product of its marginals versus $p$ being far from every product distribution? Similarly, is it possible to distinguish whether $p$ equals a given distribution $q$ versus $p$ and $q$ being far from each ... more >>>

Clement Canonne, Ilias Diakonikolas, Alistair Stewart

We study the general problem of testing whether an unknown discrete distribution belongs to a given family of distributions. More specifically, given a class of distributions $\mathcal{P}$ and sample access to an unknown distribution $\mathbf{P}$, we want to distinguish (with high probability) between the case that $\mathbf{P} \in \mathcal{P}$ and ... more >>>

Ilias Diakonikolas, Daniel Kane, Alistair Stewart

We study the problem of {\em generalized uniformity testing}~\cite{BC17} of a discrete probability distribution: Given samples from a probability distribution $p$ over an {\em unknown} discrete domain $\mathbf{\Omega}$, we want to distinguish, with probability at least $2/3$, between the case that $p$ is uniform on some {\em subset} of $\mathbf{\Omega}$ ... more >>>

Ilias Diakonikolas, Themis Gouleakis, John Peebles, Eric Price

We study the problem of testing identity against a given distribution (a.k.a. goodness-of-fit) with a focus on the high confidence regime. More precisely, given samples from an unknown distribution $p$ over $n$ elements, an explicitly given distribution $q$, and parameters $0< \epsilon, \delta < 1$, we wish to distinguish, {\em ... more >>>

Alessandro Chiesa, Tom Gur

Distribution testing is an area of property testing that studies algorithms that receive few samples from a probability distribution D and decide whether D has a certain property or is far (in total variation distance) from all distributions with that property. Most natural properties of distributions, however, require a large ... more >>>

Constantinos Daskalakis, Gautam Kamath, John Wright

Given samples from an unknown distribution $p$ and a description of a distribution $q$, are $p$ and $q$ close or far? This question of "identity testing" has received significant attention in the case of testing whether $p$ and $q$ are equal or far in total variation distance. However, in recent ... more >>>

Jayadev Acharya, Clement Canonne, Himanshu Tyagi

Independent samples from an unknown probability distribution $\mathbf{p}$ on a domain of size $k$ are distributed across $n$ players, with each player holding one sample. Each player can communicate $\ell$ bits to a central referee in a simultaneous message passing (SMP) model of communication to help the referee infer a ... more >>>

Gautam Kamath, Christos Tzamos

We investigate distribution testing with access to non-adaptive conditional samples.

In the conditional sampling model, the algorithm is given the following access to a distribution: it submits a query set $S$ to an oracle, which returns a sample from the distribution conditioned on being from $S$.

In the non-adaptive setting, ...
more >>>

Oded Goldreich

Loosely speaking, the effective support size of a distribution is the size of the support of a distribution that is close to it (in totally variation distance).

We study the complexity of estimating the effective support size of an unknown distribution when given samples of the distributions as well ...
more >>>

Jayadev Acharya, Clement Canonne, Yanjun Han, Ziteng Sun, Himanshu Tyagi

We study goodness-of-fit of discrete distributions in the distributed setting, where samples are divided between multiple users who can only release a limited amount of information about their samples due to various information constraints. Recently, a subset of the authors showed that having access to a common random seed (i.e., ... more >>>

Clement Canonne, Xi Chen, Gautam Kamath, Amit Levi, Erik Waingarten

We give a nearly-optimal algorithm for testing uniformity of distributions supported on $\{-1,1\}^n$, which makes $\tilde O (\sqrt{n}/\varepsilon^2)$ queries to a subcube conditional sampling oracle (Bhattacharyya and Chakraborty (2018)). The key technical component is a natural notion of random restriction for distributions on $\{-1,1\}^n$, and a quantitative analysis of how ... more >>>

Shafi Goldwasser, Guy Rothblum, Jonathan Shafer, Amir Yehudayoff

We consider the following question: using a source of labeled data and interaction with an untrusted prover, what is the complexity of verifying that a given hypothesis is "approximately correct"? We study interactive proof systems for PAC verification, where a verifier that interacts with a prover is required to accept ... more >>>

Clement Canonne, Karl Wimmer

Motivated by the question of data quantization and "binning," we revisit the problem of identity testing of discrete probability distributions. Identity testing (a.k.a. one-sample testing), a fundamental and by now well-understood problem in distribution testing, asks, given a reference distribution (model) $\mathbf{q}$ and samples from an unknown distribution $\mathbf{p}$, both ... more >>>