TR15-113 Authors: Amit Chakrabarti, Tony Wirth

Publication: 17th July 2015 18:16

Downloads: 874

Keywords:

Set cover, over a universe of size $n$, may be modelled as a

data-streaming problem, where the $m$ sets that comprise the instance

are to be read one by one. A semi-streaming algorithm is allowed only

$O(n \text{ poly}\{\log n, \log m\})$ space to process this stream. For each

$p \ge 1$, we give a very simple deterministic algorithm that makes $p$ passes

over the input stream and returns an

appropriately certified $(p+1)n^{1/(p+1)}$-approximation to the

optimum set cover. More importantly, we proceed to show that this

approximation factor is essentially tight, by showing that a factor

better than $0.99\,n^{1/(p+1)}/(p+1)^2$ is unachievable for a $p$-pass

semi-streaming algorithm, even allowing randomisation. In particular,

this implies that achieving a $\Theta(\log n)$-approximation requires

$\Omega(\log n/\log\log n)$ passes, which is tight up to the

$\log\log n$ factor.

These results extend to a relaxation of the set cover problem where we

are allowed to leave an $\varepsilon$ fraction of the universe uncovered: the

tight bounds on the best approximation factor achievable in $p$ passes

turn out to be $\Theta_p(\min\{n^{1/(p+1)}, \varepsilon^{-1/p}\})$.

Our lower bounds are based on a construction of a family of high-rank

incidence geometries, which may be thought of as vast generalisations

of affine planes. This construction, based on algebraic techniques,

appears flexible enough to find other applications and is therefore

interesting in its own right.