The Earth Mover Distance (EMD) between two equal-size sets
of points in R^d is defined to be the minimum cost of a
bipartite matching between the two pointsets. It is a natural metric
for comparing sets of features, and as such, it has received
significant interest in computer vision. Motivated by recent
developments in that area, we address computational
problems involving EMD over high-dimensional pointsets.
<br>
A natural approach is to embed the EMD metric into l_1, and use
the algorithms designed for the latter space. However, Khot and
Naor [FOCS'05] show that any embedding of EMD over the d-dimensional
Hamming cube into l_1 must incur a distortion \Omega(d), thus
practically losing all distance information. We circumvent this
roadblock by focusing on sets with cardinalities upper-bounded by a
parameter s, and achieve a distortion of only O(\log s \cdot \log
d). Since in applications the feature sets have bounded size, the
resulting distortion is much smaller than the \Omega(d) lower
bound. Our approach is quite general and easily extends to EMD over
R^d.
<br>
We then provide a strong lower bound on the multi-round communication
complexity of estimating EMD, which in particular strengthens the known
non-embeddability result of [Khot-Naor, FOCS'05].
Our bound exhibits a smooth tradeoff between
approximation and communication, and for example implies that every algorithm
that estimates EMD using constant size sketches can only achieve
\Omega(\log s) approximation.