A homogeneous depth three circuit $C$ computes a polynomial
$$f = T_1 + T_2 + ... + T_s ,$$ where each $T_i$ is a product of $d$ linear forms in $n$ variables over some underlying field $\mathbb{F}$. Given black-box access to $f$, can we efficiently reconstruct (i.e. proper learn) a homogeneous depth three circuit computing $f$? Learning various subclasses of circuits is natural and interesting from both theoretical and practical standpoints and in particular, properly learning homogeneous depth three circuits efficiently is stated as an open problem in a work by Klivans and Shpilka (COLT 2003) and is well-studied. Unfortunately, there is substantial amount of evidence to show that this is a hard problem $in~the~worst~case$.
We give a (randomized) $poly(n,d,s)$-time algorithm to reconstruct $non$-$degenerate$ homogeneous depth three circuits for $n = \Omega(d^2)$ (with some additional mild requirements on $s$ and the characteristic of $\mathbb{F}$). We call a circuit $C$ as non-degenerate if the dimension of the partial derivative space of $f$ equals the sum of the dimensions of the partial derivative spaces of the terms $T_1, T_2, \ldots, T_s$. In this sense, the terms are ``independent'' of each other in a non-degenerate circuit. A random homogeneous depth three circuit (where the coefficients of the linear forms are chosen according to the uniform distribution or any other reasonable distribution) is almost surely non-degenerate. In comparison, previous learning algorithms for this circuit class were either improper (with an exponential dependence on $d$), or they only worked for $s < n$ (with a doubly exponential dependence of the running time on $s$).
The main contribution of this work is to formulate the following paradigm for efficiently handling addition gates and to successfully implement it for the class of homogeneous depth three circuits. The problem of finding the children of an addition gate with large fan-in $s$ is first reduced to the problem of decomposing a suitable vector space $U$ into a (direct) sum of $simpler$ subspaces $U_1, U_2, \ldots, U_s$. One then constructs a suitable space of operators $\mathcal{S}$ consisting of linear maps acting on $U$ such that analyzing the $simultaneous~global~structure$ of $\mathcal{S}$ enables us to efficiently decompose $U$. In our case, we exploit the structure of the set of low rank matrices in $\mathcal{S}$ and of the invariant subspaces of $U$ induced by $\mathcal{S}$. We feel that this paradigm is novel and powerful: it should lead to efficient reconstruction of many other subclasses of circuits for which the efficient reconstruction problem had hitherto looked unapproachable because of the presence of large fan-in addition gates.