Clustering for Edge-Cost Minimization

Leonard Schulman

Electronic Colloquium on Computational Complexity

Under the auspices of the Computational Complexity Foundation (CCF)

REPORTS > DETAIL:

Paper:

TR99-035 | 6th July 1999 00:00

Clustering for Edge-Cost Minimization

TR99-035 Authors: Leonard Schulman
Publication: 15th September 1999 16:34
Downloads: 4139

Keywords:

Abstract:

We address the problem of partitioning a set of $n$ points into
clusters, so as to minimize the sum, over all intracluster pairs of
points, of the cost associated with each pair. We obtain a randomized
approximation algorithm for this problem, for the cost functions
$\ell_2^2,\ell_1$ and $\ell_2$, as well as any cost function
isometrically embeddable in $\ell_2^2$.

The maximization problem (maximize the costs of intercluster edges) is
approximated with high probability to within multiplicative factor
$(1-\ep)$; while the minimization problem either receives a
multiplicative $1+\ep$ approximation, or else the optimal clustering
is correctly identified except for a mislabelling of an $\ep$ fraction
of the point set. Given a fixed approximation parameter $\ep$, the
runtime is linear in $n$ for $\ell_2^2$ problems of dimension
$o(\log n / \log \log n)$; and $n^{O(\log \log n)}$ in the general
case.

The case $\ell_2^2$ is addressed by combining three elements: (a)
Variable-probability sampling of the given points, to reduce the size
of the data set. (b) Near-isometric dimension reduction. (c) A
deterministic exact algorithm which runs in time exponential in the
dimension (rather than the number of points). The remaining cases are
addressed by reduction to $\ell_2^2$.

ISSN 1433-8092 | Imprint