We raise the question of approximating compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE) and Lempel-Ziv (LZ), and present algorithms and lower bounds for approximating compressibility with respect to ... more >>>
We prove two new multivariate central limit theorems; the first relates the sum of independent distributions to the multivariate Gaussian of corresponding mean and covariance, under the earthmover distance matric (also known as the Wasserstein metric). We leverage this central limit theorem to prove a stronger but more specific central ... more >>>
We introduce a new approach to characterizing the unobserved portion of a distribution, which provides sublinear-sample additive estimators for a class of properties that includes entropy and distribution support size. Together with the lower bounds proven in the companion paper [29], this settles the longstanding question of the sample complexities ... more >>>
The advent of data science has spurred interest in estimating properties of discrete distributions over large alphabets. Fundamental symmetric properties such as support size, support coverage, entropy, and proximity to uniformity, received most attention, with each property estimated using a different technique and often intricate analysis tools.
Motivated by the ... more >>>
Entropy is a fundamental property of both classical and quantum systems, spanning myriad theoretical and practical applications in physics and computer science. We study the problem of obtaining estimates to within a multiplicative factor $\gamma>1$ of the Shannon entropy of probability distributions and the von Neumann entropy of mixed quantum ... more >>>
Given i.i.d. samples from an unknown distribution over a large domain $[N]$, approximating several basic quantities, including the distribution's support size, its entropy, and its distance from the uniform distribution, requires $\Theta(N / \log N)$ samples [Valiant and Valiant, STOC 2011].
Suppose, however, that we can interact with a powerful ... more >>>