ECCC - Reports tagged with distribution testing

Electronic Colloquium on Computational Complexity

Under the auspices of the Computational Complexity Foundation (CCF)

REPORTS > KEYWORD > DISTRIBUTION TESTING:

Reports tagged with distribution testing:

TR12-154 | 31st October 2012
Sourav Chakraborty, Eldar Fischer, Yonatan Goldhirsh, Arie Matsliah

On the Power of Conditional Samples in Distribution Testing

Revisions: 1

In this paper we define and examine the power of the conditional-sampling oracle in the context of distribution-property testing. The conditional-sampling oracle for a discrete distribution $\mu$ takes as input a subset $S \subset [n]$ of the domain, and outputs a random sample $i \in S$ drawn according to $\mu$, ... more >>>

TR13-111 | 17th August 2013
Gregory Valiant, Paul Valiant

Instance-by-instance optimal identity testing

We consider the problem of verifying the identity of a distribution: Given the description of a distribution over a discrete support $p=(p_1,p_2,\ldots,p_n)$, how many samples (independent draws) must one obtain from an unknown distribution, $q$, to distinguish, with high probability, the case that $p=q$ from the case that the total ... more >>>

TR14-156 | 26th November 2014
Jayadev Acharya, Clement Canonne, Gautam Kamath

A Chasm Between Identity and Equivalence Testing with Conditional Queries

Revisions: 2

A recent model for property testing of probability distributions enables tremendous savings in the sample complexity of testing algorithms, by allowing them to condition the sampling on subsets of the domain.
In particular, Canonne et al. showed that, in this setting, testing identity of an unknown distribution $D$ (i.e., ... more >>>

TR15-063 | 15th April 2015
Clement Canonne

A Survey on Distribution Testing: Your Data is Big. But is it Blue?

Revisions: 1

The field of property testing originated in work on program checking, and has evolved into an established and very active research area. In this work, we survey the developments of one of its most recent and prolific offspring, distribution testing. This subfield, at the junction of property testing and Statistics, ... more >>>

TR16-015 | 4th February 2016
Oded Goldreich

The uniform distribution is complete with respect to testing identity to a fixed distribution

Revisions: 3

Inspired by Diakonikolas and Kane (2016), we reduce the class of problems consisting of testing whether an unknown distribution over $[n]$ equals a fixed distribution to this very problem when the fixed distribution is uniform over $[n]$. Our reduction preserves the parameters of the problem, which are $n$ and the ... more >>>

TR16-074 | 9th May 2016
Ilias Diakonikolas, Daniel Kane

A New Approach for Testing Properties of Discrete Distributions

We study problems in distribution property testing:
Given sample access to one or more unknown discrete distributions,
we want to determine whether they have some global property or are $\epsilon$-far
from having the property in $\ell_1$ distance (equivalently, total variation distance, or ``statistical distance'').
In this work, we give a ... more >>>

TR16-168 | 2nd November 2016
Eric Blais, Clement Canonne, Tom Gur

Alice and Bob Show Distribution Testing Lower Bounds (They don't talk to each other anymore.)

Revisions: 1

We present a new methodology for proving distribution testing lower bounds, establishing a connection between distribution testing and the simultaneous message passing (SMP) communication model. Extending the framework of Blais, Brody, and Matulef [BBM12], we show a simple way to reduce (private-coin) SMP problems to distribution testing problems. This method ... more >>>

TR16-177 | 11th November 2016
Ilias Diakonikolas, Daniel Kane, Alistair Stewart

Statistical Query Lower Bounds for Robust Estimation of High-dimensional Gaussians and Gaussian Mixtures

Revisions: 1

We prove the first {\em Statistical Query lower bounds} for two fundamental high-dimensional learning problems involving Gaussian distributions: (1) learning Gaussian mixture models (GMMs), and (2) robust (agnostic) learning of a single unknown mean Gaussian. In particular, we show a {\em super-polynomial gap} between the (information-theoretic) sample complexity and the ... more >>>

TR17-006 | 15th December 2016
Constantinos Daskalakis, Nishanth Dikkala, Gautam Kamath

Testing Ising Models

Revisions: 2

Given samples from an unknown multivariate distribution $p$, is it possible to distinguish whether $p$ is the product of its marginals versus $p$ being far from every product distribution? Similarly, is it possible to distinguish whether $p$ equals a given distribution $q$ versus $p$ and $q$ being far from each ... more >>>

TR17-075 | 29th April 2017
Clement Canonne, Ilias Diakonikolas, Alistair Stewart

Fourier-Based Testing for Families of Distributions

Revisions: 1

We study the general problem of testing whether an unknown discrete distribution belongs to a given family of distributions. More specifically, given a class of distributions $\mathcal{P}$ and sample access to an unknown distribution $\mathbf{P}$, we want to distinguish (with high probability) between the case that $\mathbf{P} \in \mathcal{P}$ and ... more >>>

TR17-132 | 7th September 2017
Ilias Diakonikolas, Daniel Kane, Alistair Stewart

Sharp Bounds for Generalized Uniformity Testing

We study the problem of {\em generalized uniformity testing}~\cite{BC17} of a discrete probability distribution: Given samples from a probability distribution $p$ over an {\em unknown} discrete domain $\mathbf{\Omega}$, we want to distinguish, with probability at least $2/3$, between the case that $p$ is uniform on some {\em subset} of $\mathbf{\Omega}$ ... more >>>

TR17-133 | 7th September 2017
Ilias Diakonikolas, Themis Gouleakis, John Peebles, Eric Price

Sample-Optimal Identity Testing with High Probability

We study the problem of testing identity against a given distribution (a.k.a. goodness-of-fit) with a focus on the high confidence regime. More precisely, given samples from an unknown distribution $p$ over $n$ elements, an explicitly given distribution $q$, and parameters $0< \epsilon, \delta < 1$, we wish to distinguish, {\em ... more >>>

TR17-155 | 13th October 2017
Alessandro Chiesa, Tom Gur

Proofs of Proximity for Distribution Testing

Revisions: 1

Distribution testing is an area of property testing that studies algorithms that receive few samples from a probability distribution D and decide whether D has a certain property or is far (in total variation distance) from all distributions with that property. Most natural properties of distributions, however, require a large ... more >>>

TR18-002 | 31st December 2017
Constantinos Daskalakis, Gautam Kamath, John Wright

Which Distribution Distances are Sublinearly Testable?

Given samples from an unknown distribution $p$ and a description of a distribution $q$, are $p$ and $q$ close or far? This question of "identity testing" has received significant attention in the case of testing whether $p$ and $q$ are equal or far in total variation distance. However, in recent ... more >>>

TR18-079 | 19th April 2018
Jayadev Acharya, Clement Canonne, Himanshu Tyagi

Distributed Simulation and Distributed Inference

Revisions: 1

Independent samples from an unknown probability distribution $\mathbf{p}$ on a domain of size $k$ are distributed across $n$ players, with each player holding one sample. Each player can communicate $\ell$ bits to a central referee in a simultaneous message passing (SMP) model of communication to help the referee infer a ... more >>>

TR18-131 | 17th July 2018
Gautam Kamath, Christos Tzamos

Anaconda: A Non-Adaptive Conditional Sampling Algorithm for Distribution Testing

We investigate distribution testing with access to non-adaptive conditional samples.
In the conditional sampling model, the algorithm is given the following access to a distribution: it submits a query set $S$ to an oracle, which returns a sample from the distribution conditioned on being from $S$.
In the non-adaptive setting, ... more >>>

TR19-088 | 16th June 2019
Oded Goldreich

On the Complexity of Estimating the Effective Support Size

Revisions: 1

Loosely speaking, the effective support size of a distribution is the size of the support of a distribution that is close to it (in totally variation distance).
We study the complexity of estimating the effective support size of an unknown distribution when given samples of the distributions as well ... more >>>

TR19-098 | 20th July 2019
Jayadev Acharya, Clement Canonne, Yanjun Han, Ziteng Sun, Himanshu Tyagi

Domain Compression and its Application to Randomness-Optimal Distributed Goodness-of-Fit

We study goodness-of-fit of discrete distributions in the distributed setting, where samples are divided between multiple users who can only release a limited amount of information about their samples due to various information constraints. Recently, a subset of the authors showed that having access to a common random seed (i.e., ... more >>>

TR19-165 | 18th November 2019
Clement Canonne, Xi Chen, Gautam Kamath, Amit Levi, Erik Waingarten

Random Restrictions of High-Dimensional Distributions and Uniformity Testing with Subcube Conditioning

We give a nearly-optimal algorithm for testing uniformity of distributions supported on $\{-1,1\}^n$, which makes $\tilde O (\sqrt{n}/\varepsilon^2)$ queries to a subcube conditional sampling oracle (Bhattacharyya and Chakraborty (2018)). The key technical component is a natural notion of random restriction for distributions on $\{-1,1\}^n$, and a quantitative analysis of how ... more >>>

TR20-058 | 24th April 2020
Shafi Goldwasser, Guy Rothblum, Jonathan Shafer, Amir Yehudayoff

Interactive Proofs for Verifying Machine Learning

Revisions: 1

We consider the following question: using a source of labeled data and interaction with an untrusted prover, what is the complexity of verifying that a given hypothesis is "approximately correct"? We study interactive proof systems for PAC verification, where a verifier that interacts with a prover is required to accept ... more >>>

TR20-062 | 29th April 2020
Clement Canonne, Karl Wimmer

Testing Data Binnings

Motivated by the question of data quantization and "binning," we revisit the problem of identity testing of discrete probability distributions. Identity testing (a.k.a. one-sample testing), a fundamental and by now well-understood problem in distribution testing, asks, given a reference distribution (model) $\mathbf{q}$ and samples from an unknown distribution $\mathbf{p}$, both ... more >>>

TR20-140 | 14th September 2020
Ilias Diakonikolas, Themis Gouleakis, Daniel Kane, John Peebles, Eric Price

Optimal Testing of Discrete Distributions with High Probability

We study the problem of testing discrete distributions with a focus on the high probability regime.
Specifically, given samples from one or more discrete distributions, a property $\mathcal{P}$, and
parameters $0< \epsilon, \delta <1$, we want to distinguish {\em with probability at least $1-\delta$}
whether these distributions satisfy $\mathcal{P}$ ... more >>>

TR21-129 | 6th September 2021
Oded Goldreich, Dana Ron

A Lower Bound on the Complexity of Testing Grained Distributions

A distribution is called $m$-grained if each element appears with probability that is an integer multiple of $1/m$.
We prove that, for any constant $c<1$, testing whether a distribution over $[\Theta(m)]$ is $m$-grained requires $\Omega(m^c)$ samples.

more >>>

TR21-133 | 12th September 2021
Oded Goldreich, Dana Ron

Testing Distributions of Huge Objects

Revisions: 3

We initiate a study of a new model of property testing that is a hybrid of testing properties of distributions and testing properties of strings.
Specifically, the new model refers to testing properties of distributions, but these are distributions over huge objects (i.e., very long strings).
Accordingly, the ... more >>>

TR22-155 | 15th November 2022
Sourav Chakraborty, Eldar Fischer, Arijit Ghosh, Gopinath Mishra, Sayantan Sen

Testing of Index-Invariant Properties in the Huge Object Model

The study of distribution testing has become ubiquitous in the area of property testing, both for its theoretical appeal, as well as for its applications in other fields of Computer Science, and in various real-life statistical tasks.

The original distribution testing model relies on samples drawn independently from the distribution ... more >>>

TR23-161 | 1st November 2023
Tal Herman, Guy Rothblum

Doubly-Efficient Interactive Proofs for Distribution Properties

Revisions: 1

Suppose we have access to a small number of samples from an unknown distribution, and would like learn facts about the distribution.
An untrusted data server claims to have studied the distribution and makes assertions about its properties. Can the untrusted data server prove that its assertions are approximately correct? ... more >>>

TR24-141 | 12th September 2024
Tal Herman

Public Coin Interactive Proofs for Label-Invariant Distribution Properties

Assume we are given sample access to an unknown distribution $D$ over a large domain $[N]$. An emerging line of work has demonstrated that many basic quantities relating to the distribution, such as its distance from uniform and its Shannon entropy, despite being hard to approximate through the samples only, ... more >>>

TR24-196 | 2nd December 2024
Clement Canonne, Sayantan Sen, Qiping Yang

Settling the complexity of testing grainedness of distributions, and application to uniformity testing in the Huge Object model

In this work, we study the problem of testing $m$-\emph{grainedness} of probability distributions over an $n$-element universe $\mathcal{U}$, or, equivalently, of whether a probability distribution is induced by a multiset $S\subseteq \mathcal{U}$ of size $|S|=m$. Recently, Goldreich and Ron (Computational Complexity, 2023) proved that $\Omega(n^c)$ samples are necessary for testing ... more >>>

TR24-200 | 4th December 2024
Sourav Chakraborty, Eldar Fischer, Arijit Ghosh, Amit Levi, Gopinath Mishra, Sayantan Sen

Testing vs Estimation for Index-Invariant Properties in the Huge Object Model

The Huge Object model of property testing [Goldreich and Ron, TheoretiCS 23] concerns properties of distributions supported on $\{0,1\}^n$, where $n$ is so large that even reading a single sampled string is unrealistic. Instead, query access is provided to the samples, and the efficiency of the algorithm is measured by ... more >>>

TR25-076 | 14th June 2025
Jan Seyfried, Sayantan Sen, Marco Tomamichel

Testing (Conditional) Mutual Information

We investigate the sample complexity of mutual information and conditional mutual information testing. For conditional mutual information testing, given access to independent samples of a triple of random variables $(A, B, C)$ with unknown distribution, we want to distinguish between two cases: (i) $A$ and $C$ are conditionally independent, i.e., ... more >>>

TR25-105 | 29th July 2025
Oded Goldreich, Guy Rothblum

Location-Invariant Properties of Functions versus Properties of Distributions: United in Testing but Separated in Verification

Revisions: 2

A property of functions is called location-invariant (or symmetric) if it can be characterized in terms of the frequencies in which each value occurs in the function, regardless of the locations in which each value occurs.
It is known that the (query) complexity of testing location-invariant properties of functions ... more >>>

TR25-129 | 7th September 2025
Irit Dinur, Oded Goldreich

Expansion without Connectivity: A Property Testing Perspective

We consider the query complexity of testing whether a bounded-degree graph is expanding, regardless of whether or not it is connected.

Whereas prior work studied testing the property of being an expander (equiv., testing the set of expander graphs), here we study testing the set of graphs that consist of ... more >>>

TR25-181 | 11th November 2025
Bruno Pasqualotto Cavalar, Eli Goldin, Matthew Gray, Taiga Hiroka, Tomoyuki Morimae

On Cryptography and Distribution Verification, with Applications to Quantum Advantage

One of the most fundamental problems in the field of hypothesis testing is the identity testing problem: whether samples from some unknown distribution $\mathcal{G}$ are actually from some explicit distribution $\mathcal{D}$. It is known that when the distribution $\mathcal{D}$ has support $[N]$, the optimal sample complexity for the identity testing ... more >>>

ISSN 1433-8092 | Imprint