__
TR09-130 | 1st December 2009 21:22
__

#### Optimal lower bounds for locality sensitive hashing (except when $q$ is tiny)

**Abstract:**
We study lower bounds for Locality Sensitive Hashing (LSH) in the strongest setting: point sets in $\{0,1\}^d$ under the Hamming distance. Recall that $\mathcal{H}$ is said to be an $(r, cr, p, q)$-sensitive hash family if all pairs $x,y \in \{0,1\}^d$ with dist$(x,y) \leq r$ have probability at least $p$ of collision under a randomly chosen $h \in \mathcal{H}$, whereas all pairs $x,y \in \{0,1\}^d$ with dist$(x,y) \geq cr$ have probability at most $q$ of collision. Typically, one considers $d \to \infty$, with $c > 1$ fixed and $q$ bounded away from $0$.

For its applications to approximate nearest neighbor search in high dimensions, the quality of an LSH family $\mathcal{H}$ is governed by how small its ``rho parameter'' $\rho = \ln(1/p)/\ln(1/q)$ is as a function of the parameter $c$. The seminal paper of Indyk and Motwani showed that for each $c \geq 1$, the extremely simple family $\mathcal{H} = \{x \mapsto x_i : i \in d\}$ achieves $\rho \leq 1/c$. The only known lower bound, due to Motwani, Naor, and Panigrahy, is that $\rho$ must be at least $(e^{1/c}-1)/(e^{1/c}+1)\geq .46/c$ (minus $o_d(1)$).

In this paper we show an optimal lower bound: $\rho$ must be at least $1/c$ (minus $o_d(1)$). This lower bound for Hamming space yields a lower bound of $1/c^2$ for Euclidean space (or the unit sphere) and $1/c$ for the Jaccard distance on sets; both of these match known upper bounds. Our proof is simple; the essence is that the noise stability of a boolean function at $e^{-t}$ is a log-convex function of $t$.

Like the Motwani--Naor--Panigrahy lower bound, our proof relies on the assumption that $q$ is not ``tiny'', meaning of the form $2^{-\Theta(d)}$. Some lower bound on $q$ is always necessary, as otherwise it is trivial to achieve $\rho = 0$. The range of $q$ for which our lower bound holds is the same as the range of $q$ for which $\rho$ accurately reflects an LSH family's quality. Still, we conclude by discussing why it would be more satisfying to find LSH lower bounds that hold for tiny $q$.