We consider the problem of approximating the entropy of a discrete distribution P on a domain of size q, given access to n independent samples from the distribution. It is known that n > q is necessary, in general, for a good additive estimate of the entropy. A problem of multiplicative entropy estimate was recently addressed by Batu, Dasgupta, Kumar, and Rubinfeld. They show that n = q^alpha suffices for a factor-alpha approximation, alpha < 1.
We introduce a new parameter of a distribution - its *effective alphabet size* q_ef. This is a more intrinsic property of the distribution depending only on its entropy moments. We show q_ef is bounded by q, up to logarithmic factors. When the distribution P is essentially concentrated on a small part of the domain q_ef << q. We strengthen the result of Batu et al. by showing it holds with q_ef replacing q.
This has several implications. In particular the rate of convergence of the maximum-likelihood entropy estimator (empirical entropy) for both finite and infinite alphabets is shown to be dictated by effective alphabet size of the distribution. Several new, and some known, facts about this estimator follow easily.
Our main result is algorithmic. Though effective alphabet size is, in general, an unknown parameter of the distribution, we give an efficient procedure (with access to alphabet size only) that achieves a factor-alpha approximation of the entropy with n a (compicated) function of both q and q_ef. Assuming (for instance) log q_ef << log q, this function is smaller than any power of q. Taking alpha to 1 leads in this case to efficient *additive* estimates for the entropy as well. Several extensions of the results above are discussed.