In this work, we study the problem of testing $m$-\emph{grainedness} of probability distributions over an $n$-element universe $\mathcal{U}$, or, equivalently, of whether a probability distribution is induced by a multiset $S\subseteq \mathcal{U}$ of size $|S|=m$. Recently, Goldreich and Ron (Computational Complexity, 2023) proved that $\Omega(n^c)$ samples are necessary for testing ... more >>>
Motivated by the question of data quantization and "binning," we revisit the problem of identity testing of discrete probability distributions. Identity testing (a.k.a. one-sample testing), a fundamental and by now well-understood problem in distribution testing, asks, given a reference distribution (model) $\mathbf{q}$ and samples from an unknown distribution $\mathbf{p}$, both ... more >>>
We give a nearly-optimal algorithm for testing uniformity of distributions supported on $\{-1,1\}^n$, which makes $\tilde O (\sqrt{n}/\varepsilon^2)$ queries to a subcube conditional sampling oracle (Bhattacharyya and Chakraborty (2018)). The key technical component is a natural notion of random restriction for distributions on $\{-1,1\}^n$, and a quantitative analysis of how ... more >>>
We study the problem of finding monotone subsequences in an array from the viewpoint of sublinear algorithms. For fixed $k \in \mathbb{N}$ and $\varepsilon > 0$, we show that the non-adaptive query complexity of finding a length-$k$ monotone subsequence of $f \colon [n] \to \mathbb{R}$, assuming that $f$ is $\varepsilon$-far ... more >>>
We study goodness-of-fit of discrete distributions in the distributed setting, where samples are divided between multiple users who can only release a limited amount of information about their samples due to various information constraints. Recently, a subset of the authors showed that having access to a common random seed (i.e., ... more >>>
Independent samples from an unknown probability distribution $\mathbf{p}$ on a domain of size $k$ are distributed across $n$ players, with each player holding one sample. Each player can communicate $\ell$ bits to a central referee in a simultaneous message passing (SMP) model of communication to help the referee infer a ... more >>>
We study the general problem of testing whether an unknown discrete distribution belongs to a given family of distributions. More specifically, given a class of distributions $\mathcal{P}$ and sample access to an unknown distribution $\mathbf{P}$, we want to distinguish (with high probability) between the case that $\mathbf{P} \in \mathcal{P}$ and ... more >>>
Adaptivity is known to play a crucial role in property testing. In particular, there exist properties for which there is an exponential gap between the power of \emph{adaptive} testing algorithms, wherein each query may be determined by the answers received to prior queries, and their \emph{non-adaptive} counterparts, in which all ... more >>>
We present a new methodology for proving distribution testing lower bounds, establishing a connection between distribution testing and the simultaneous message passing (SMP) communication model. Extending the framework of Blais, Brody, and Matulef [BBM12], we show a simple way to reduce (private-coin) SMP problems to distribution testing problems. This method ... more >>>
A Boolean $k$-monotone function defined over a finite poset domain ${\cal D}$ alternates between the values $0$ and $1$ at most $k$ times on any ascending chain in ${\cal D}$. Therefore, $k$-monotone functions are natural generalizations of the classical monotone functions, which are the $1$-monotone functions.
Motivated by the ... more >>>
The function $f\colon \{-1,1\}^n \to \{-1,1\}$ is a $k$-junta if it depends on at most $k$ of its variables. We consider the problem of tolerant testing of $k$-juntas, where the testing algorithm must accept any function that is $\epsilon$-close to some $k$-junta and reject any function that is $\epsilon'$-far from ... more >>>
A probability distribution over an ordered universe $[n]=\{1,\dots,n\}$ is said to be a $k$-histogram if it can be represented as a piecewise-constant function over at most $k$ contiguous intervals. We study the following question: given samples from an arbitrary distribution $D$ over $[n]$, one must decide whether $D$ is a ... more >>>
The field of property testing originated in work on program checking, and has evolved into an established and very active research area. In this work, we survey the developments of one of its most recent and prolific offspring, distribution testing. This subfield, at the junction of property testing and Statistics, ... more >>>
A recent model for property testing of probability distributions enables tremendous savings in the sample complexity of testing algorithms, by allowing them to condition the sampling on subsets of the domain.
In particular, Canonne et al. showed that, in this setting, testing identity of an unknown distribution $D$ (i.e., ...
more >>>
The communication complexity of many fundamental problems reduces greatly
when the communicating parties share randomness that is independent of the
inputs to the communication task. Natural communication processes (say between
humans) however often involve large amounts of shared correlations among the
communicating players, but rarely allow for perfect sharing of ...
more >>>
Monotone Boolean functions, and the monotone Boolean circuits that compute them, have been intensively studied in complexity theory. In this paper we study the structure of Boolean functions in terms of the minimum number of negations in any circuit computing them, a complexity measure that interpolates between monotone functions and ... more >>>
In this paper, we analyze and study a hybrid model for testing and learning probability distributions. Here, in addition to samples, the testing algorithm is provided with one of two different types of oracles to the unknown distribution $D$ over $[n]$. More precisely, we define both the dual and extended ... more >>>
We study a new framework for property testing of probability distributions, by considering distribution testing algorithms that have access to a conditional sampling oracle. \footnote{Independently from our work, Chakraborty et al. [CFGM13] also considered this framework. We discuss their work in Subsection 1.4.} This is an oracle that takes as ... more >>>