Sourav Chakraborty, Eldar Fischer, Yonatan Goldhirsh, Arie Matsliah

In this paper we define and examine the power of the conditional-sampling oracle in the context of distribution-property testing. The conditional-sampling oracle for a discrete distribution $\mu$ takes as input a subset $S \subset [n]$ of the domain, and outputs a random sample $i \in S$ drawn according to $\mu$, ... more >>>

Gregory Valiant, Paul Valiant

We consider the problem of verifying the identity of a distribution: Given the description of a distribution over a discrete support $p=(p_1,p_2,\ldots,p_n)$, how many samples (independent draws) must one obtain from an unknown distribution, $q$, to distinguish, with high probability, the case that $p=q$ from the case that the total ... more >>>

Jayadev Acharya, Clement Canonne, Gautam Kamath

A recent model for property testing of probability distributions enables tremendous savings in the sample complexity of testing algorithms, by allowing them to condition the sampling on subsets of the domain.

In particular, Canonne et al. showed that, in this setting, testing identity of an unknown distribution $D$ (i.e., ...
more >>>

Clement Canonne

The field of property testing originated in work on program checking, and has evolved into an established and very active research area. In this work, we survey the developments of one of its most recent and prolific offspring, distribution testing. This subfield, at the junction of property testing and Statistics, ... more >>>

Oded Goldreich

Inspired by Diakonikolas and Kane (2016), we reduce the class of problems consisting of testing whether an unknown distribution over $[n]$ equals a fixed distribution to this very problem when the fixed distribution is uniform over $[n]$. Our reduction preserves the parameters of the problem, which are $n$ and the ... more >>>

Ilias Diakonikolas, Daniel Kane

We study problems in distribution property testing:

Given sample access to one or more unknown discrete distributions,

we want to determine whether they have some global property or are $\epsilon$-far

from having the property in $\ell_1$ distance (equivalently, total variation distance, or ``statistical distance'').

In this work, we give a ...
more >>>

Eric Blais, Clement Canonne, Tom Gur

We present a new methodology for proving distribution testing lower bounds, establishing a connection between distribution testing and the simultaneous message passing (SMP) communication model. Extending the framework of Blais, Brody, and Matulef [BBM12], we show a simple way to reduce (private-coin) SMP problems to distribution testing problems. This method ... more >>>

Ilias Diakonikolas, Daniel Kane, Alistair Stewart

We prove the first {\em Statistical Query lower bounds} for two fundamental high-dimensional learning problems involving Gaussian distributions: (1) learning Gaussian mixture models (GMMs), and (2) robust (agnostic) learning of a single unknown mean Gaussian. In particular, we show a {\em super-polynomial gap} between the (information-theoretic) sample complexity and the ... more >>>

Constantinos Daskalakis, Nishanth Dikkala, Gautam Kamath

Given samples from an unknown multivariate distribution $p$, is it possible to distinguish whether $p$ is the product of its marginals versus $p$ being far from every product distribution? Similarly, is it possible to distinguish whether $p$ equals a given distribution $q$ versus $p$ and $q$ being far from each ... more >>>

Clement Canonne, Ilias Diakonikolas, Alistair Stewart

We study the general problem of testing whether an unknown discrete distribution belongs to a given family of distributions. More specifically, given a class of distributions $\mathcal{P}$ and sample access to an unknown distribution $\mathbf{P}$, we want to distinguish (with high probability) between the case that $\mathbf{P} \in \mathcal{P}$ and ... more >>>

Ilias Diakonikolas, Daniel Kane, Alistair Stewart

We study the problem of {\em generalized uniformity testing}~\cite{BC17} of a discrete probability distribution: Given samples from a probability distribution $p$ over an {\em unknown} discrete domain $\mathbf{\Omega}$, we want to distinguish, with probability at least $2/3$, between the case that $p$ is uniform on some {\em subset} of $\mathbf{\Omega}$ ... more >>>

Ilias Diakonikolas, Themis Gouleakis, John Peebles, Eric Price

We study the problem of testing identity against a given distribution (a.k.a. goodness-of-fit) with a focus on the high confidence regime. More precisely, given samples from an unknown distribution $p$ over $n$ elements, an explicitly given distribution $q$, and parameters $0< \epsilon, \delta < 1$, we wish to distinguish, {\em ... more >>>

Alessandro Chiesa, Tom Gur

Distribution testing is an area of property testing that studies algorithms that receive few samples from a probability distribution D and decide whether D has a certain property or is far (in total variation distance) from all distributions with that property. Most natural properties of distributions, however, require a large ... more >>>

Constantinos Daskalakis, Gautam Kamath, John Wright

Given samples from an unknown distribution $p$ and a description of a distribution $q$, are $p$ and $q$ close or far? This question of "identity testing" has received significant attention in the case of testing whether $p$ and $q$ are equal or far in total variation distance. However, in recent ... more >>>

Jayadev Acharya, Clement Canonne, Himanshu Tyagi

Independent samples from an unknown probability distribution $\mathbf{p}$ on a domain of size $k$ are distributed across $n$ players, with each player holding one sample. Each player can communicate $\ell$ bits to a central referee in a simultaneous message passing (SMP) model of communication to help the referee infer a ... more >>>

Gautam Kamath, Christos Tzamos

We investigate distribution testing with access to non-adaptive conditional samples.

In the conditional sampling model, the algorithm is given the following access to a distribution: it submits a query set $S$ to an oracle, which returns a sample from the distribution conditioned on being from $S$.

In the non-adaptive setting, ...
more >>>

Oded Goldreich

Loosely speaking, the effective support size of a distribution is the size of the support of a distribution that is close to it (in totally variation distance).

We study the complexity of estimating the effective support size of an unknown distribution when given samples of the distributions as well ...
more >>>

Jayadev Acharya, Clement Canonne, Yanjun Han, Ziteng Sun, Himanshu Tyagi

We study goodness-of-fit of discrete distributions in the distributed setting, where samples are divided between multiple users who can only release a limited amount of information about their samples due to various information constraints. Recently, a subset of the authors showed that having access to a common random seed (i.e., ... more >>>

Clement Canonne, Xi Chen, Gautam Kamath, Amit Levi, Erik Waingarten

We give a nearly-optimal algorithm for testing uniformity of distributions supported on $\{-1,1\}^n$, which makes $\tilde O (\sqrt{n}/\varepsilon^2)$ queries to a subcube conditional sampling oracle (Bhattacharyya and Chakraborty (2018)). The key technical component is a natural notion of random restriction for distributions on $\{-1,1\}^n$, and a quantitative analysis of how ... more >>>

Shafi Goldwasser, Guy Rothblum, Jonathan Shafer, Amir Yehudayoff

We consider the following question: using a source of labeled data and interaction with an untrusted prover, what is the complexity of verifying that a given hypothesis is "approximately correct"? We study interactive proof systems for PAC verification, where a verifier that interacts with a prover is required to accept ... more >>>

Clement Canonne, Karl Wimmer

Motivated by the question of data quantization and "binning," we revisit the problem of identity testing of discrete probability distributions. Identity testing (a.k.a. one-sample testing), a fundamental and by now well-understood problem in distribution testing, asks, given a reference distribution (model) $\mathbf{q}$ and samples from an unknown distribution $\mathbf{p}$, both ... more >>>

Ilias Diakonikolas, Themis Gouleakis, Daniel Kane, John Peebles, Eric Price

We study the problem of testing discrete distributions with a focus on the high probability regime.

Specifically, given samples from one or more discrete distributions, a property $\mathcal{P}$, and

parameters $0< \epsilon, \delta <1$, we want to distinguish {\em with probability at least $1-\delta$}

whether these distributions satisfy $\mathcal{P}$ ...
more >>>

Oded Goldreich, Dana Ron

A distribution is called $m$-grained if each element appears with probability that is an integer multiple of $1/m$.

We prove that, for any constant $c<1$, testing whether a distribution over $[\Theta(m)]$ is $m$-grained requires $\Omega(m^c)$ samples.

Oded Goldreich, Dana Ron

We initiate a study of a new model of property testing that is a hybrid of testing properties of distributions and testing properties of strings.

Specifically, the new model refers to testing properties of distributions, but these are distributions over huge objects (i.e., very long strings).

Accordingly, the ...
more >>>

Sourav Chakraborty, Eldar Fischer, Arijit Ghosh, Gopinath Mishra, Sayantan Sen

The study of distribution testing has become ubiquitous in the area of property testing, both for its theoretical appeal, as well as for its applications in other fields of Computer Science, and in various real-life statistical tasks.

The original distribution testing model relies on samples drawn independently from the distribution ... more >>>

Tal Herman, Guy Rothblum

Suppose we have access to a small number of samples from an unknown distribution, and would like learn facts about the distribution.

An untrusted data server claims to have studied the distribution and makes assertions about its properties. Can the untrusted data server prove that its assertions are approximately correct? ...
more >>>

Tal Herman

Assume we are given sample access to an unknown distribution $D$ over a large domain $[N]$. An emerging line of work has demonstrated that many basic quantities relating to the distribution, such as its distance from uniform and its Shannon entropy, despite being hard to approximate through the samples only, ... more >>>