Weizmann Logo
Electronic Colloquium on Computational Complexity

Under the auspices of the Computational Complexity Foundation (CCF)

Login | Register | Classic Style



TR15-193 | 26th November 2015 03:30

On the hardness of learning sparse parities



This work investigates the hardness of computing sparse solutions to systems of linear equations over $\mathbb{F}_2$. Consider the $k$-EvenSet problem: given a homogeneous system of linear equations over $\mathbb{F}_2$ on $n$ variables, decide if there exists a nonzero solution of Hamming weight at most $k$ (i.e. a $k$-sparse solution). While there is a simple $O(n^{k/2})$-time algorithm for it, establishing fixed parameter intractability for $k$-EvenSet has been a notorious open problem. Towards this goal, we show that unless $k$-Clique can be solved in $n^{o(k)}$ time, $k$-EvenSet has no $\textrm{poly}(n)\cdot 2^{o(\sqrt{k})}$ time algorithm for all $k$ and no polynomial time algorithm when $k = \omega(\log^2 n)$.

Our work also shows that the non-homogeneous generalization of the problem -- which we call $k$-VectorSum -- is W[1]-hard on instances where the number of equations is $O(k\log n)$, improving on previous reductions which produced $\Omega(n)$ equations. We use the hardness of $k$-VectorSum as a starting point to prove the result for $k$-EvenSet, and additionally strengthen the former to show the hardness of approximately learning $k$-juntas. In particular, we prove that for any constant $\epsilon > 0$, given a system of $O(\textrm{exp}(O(k))\cdot \log n)$ linear equations, it is W[1]-hard to decide if there is a $k$-sparse linear form satisfying all the equations or any function on at most $k$-variables (a $k$-junta) satisfies at most $(1/2 + \epsilon)$-fraction of the equations. In the setting of computational learning, this shows hardness of approximate non-proper learning of $k$-parities. In a similar vein, we use the hardness of $k$-EvenSet to show that that for any constant $d$, unless $k$-Clique can be solved in $n^{o(k)}$ time, there is no $\textrm{poly}(m, n)\cdot 2^{o(\sqrt{k})}$ time algorithm to decide whether a given set of $m$ points in $\mathbb{F}_2^n$ satisfies: (i) there exists a non-trivial $k$-sparse homogeneous linear form evaluating to $0$ on all the points, or (ii) any non-trivial degree $d$ polynomial $P$ supported on at most $k$ variables evaluates to zero on $\approx \Pr_{\mathbb{F}_2^n}[P({\bf z}) = 0]$ fraction of the points i.e., $P$ is fooled by the set of points.

Lastly, we study the approximation in the sparsity of the solution. Let the Gap-$k$-VectorSum problem be: given an instance of $k$-VectorSum of size $n$, decide if there exist a $k$-sparse solution, or every solution is of sparsity at least $k' = (1+\delta_0)k$. Assuming ETH, we show that for some constants $c_0$, $\delta_0 > 0$ there is no $\textrm{poly}(n)$ time algorithm for Gap-$k$-VectorSum when $k = \omega((\log \log n)^{c_0})$.

ISSN 1433-8092 | Imprint