Sofya Raskhodnikova, Dana Ron, Ronitt Rubinfeld, Amir Shpilka, Adam Smith

We raise the question of approximating compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE) and Lempel-Ziv (LZ), and present algorithms and lower bounds for approximating compressibility with respect to ... more >>>

Gregory Valiant, Paul Valiant

We prove two new multivariate central limit theorems; the first relates the sum of independent distributions to the multivariate Gaussian of corresponding mean and covariance, under the earthmover distance matric (also known as the Wasserstein metric). We leverage this central limit theorem to prove a stronger but more specific central ... more >>>

Gregory Valiant, Paul Valiant

We introduce a new approach to characterizing the unobserved portion of a distribution, which provides sublinear-sample additive estimators for a class of properties that includes entropy and distribution support size. Together with the lower bounds proven in the companion paper [29], this settles the longstanding question of the sample complexities ... more >>>

Jayadev Acharya, Hirakendu Das, Alon Orlitsky, Ananda Theertha Suresh

The advent of data science has spurred interest in estimating properties of discrete distributions over large alphabets. Fundamental symmetric properties such as support size, support coverage, entropy, and proximity to uniformity, received most attention, with each property estimated using a different technique and often intricate analysis tools.

Motivated by the ... more >>>

Tom Gur, Min-Hsiu Hsieh, Sathyawageeswar Subramanian

Entropy is a fundamental property of both classical and quantum systems, spanning myriad theoretical and practical applications in physics and computer science. We study the problem of obtaining estimates to within a multiplicative factor $\gamma>1$ of the Shannon entropy of probability distributions and the von Neumann entropy of mixed quantum ... more >>>

Tal Herman, Guy Rothblum

Given i.i.d. samples from an unknown distribution over a large domain $[N]$, approximating several basic quantities, including the distribution's support size, its entropy, and its distance from the uniform distribution, requires $\Theta(N / \log N)$ samples [Valiant and Valiant, STOC 2011].

Suppose, however, that we can interact with a powerful ... more >>>