TR07-048 Authors: Alexandr Andoni, Piotr Indyk, Robert Krauthgamer

Publication: 29th May 2007 12:47

Downloads: 1840

Keywords:

The Earth Mover Distance (EMD) between two equal-size sets

of points in R^d is defined to be the minimum cost of a

bipartite matching between the two pointsets. It is a natural metric

for comparing sets of features, and as such, it has received

significant interest in computer vision. Motivated by recent

developments in that area, we address computational

problems involving EMD over high-dimensional pointsets.

<br>

A natural approach is to embed the EMD metric into l_1, and use

the algorithms designed for the latter space. However, Khot and

Naor [FOCS'05] show that any embedding of EMD over the d-dimensional

Hamming cube into l_1 must incur a distortion \Omega(d), thus

practically losing all distance information. We circumvent this

roadblock by focusing on sets with cardinalities upper-bounded by a

parameter s, and achieve a distortion of only O(\log s \cdot \log

d). Since in applications the feature sets have bounded size, the

resulting distortion is much smaller than the \Omega(d) lower

bound. Our approach is quite general and easily extends to EMD over

R^d.

<br>

We then provide a strong lower bound on the multi-round communication

complexity of estimating EMD, which in particular strengthens the known

non-embeddability result of [Khot-Naor, FOCS'05].

Our bound exhibits a smooth tradeoff between

approximation and communication, and for example implies that every algorithm

that estimates EMD using constant size sketches can only achieve

\Omega(\log s) approximation.