# Lower Bounds for (Non-monotone) Comparator Circuits* 

Anna Gál ${ }^{\dagger} \quad$ Robert Robere ${ }^{\ddagger}$


#### Abstract

Comparator circuits are a natural circuit model for studying the concept of bounded fan-out computations, which intuitively corresponds to whether or not a computational model can make "copies" of intermediate computational steps. Comparator circuits are believed to be weaker than general Boolean circuits, but they can simulate Branching Programs and Boolean formulas. In this paper we prove the first superlinear lower bounds in the general (non-monotone) version of this model for an explicitly defined function. More precisely, we prove that the $n$-bit Element Distinctness function requires $\Omega\left((n / \log n)^{3 / 2}\right)$ size comparator circuits.


## 1 Introduction

One of the central open problems in circuit complexity is to prove superlinear lower bounds on the size of general Boolean circuits for explicitly defined functions. While theorists have been outstandingly successful at analyzing some restricted circuit models - for instance, exponential lower bounds are known when the circuit is monotone [22], or bounded-depth [11] - the progress remains modest for less restricted classes of circuits:

1. The best lower bounds for the Boolean formula size of explicit functions over $n$ variables are of the form $\Omega\left(n^{3-o(1)}\right)[12,18,26,5,7,19,27,4,8]$. The first such bound is due to Håstad [12] and the current largest bound (improving lower order terms) is due to Tal [27].
2. The best lower bounds for branching programs, which capture small-space computation and can simulate formulas, are $\Omega\left((n / \log n)^{2}\right)$ for the deterministic model by Nechiporuk [21] and $\Omega\left(n^{3 / 2} / \log n\right)$ for the non-deterministic- and parity- models by Pudlák ${ }^{1}$ [15], and Karchmer and Wigderson [15], respectively.

[^0]3. The best lower bounds for span programs over $G F(2)$, which can simulate all of the above models, is $\Omega\left(n^{3 / 2} / \log n\right)$ by Beimel, Gál, and Paterson [3], following the initial $\Omega\left(n \log \log \log ^{*} n\right)$ lower bounds by Karchmer and Wigderson [15].
4. Finally (and, perhaps, most notoriously) the best lower bounds for general Boolean circuits are $5 n-o(n)$, due to Iwama and Morizumi [13].

Improving any of these lower bounds is an outstanding open problem; however, the problem of proving any superlinear lower bound on general Boolean circuits is perhaps the most interesting. Span programs form a remarkable model here, since they can simulate all models above for which we have superlinear lower bounds - this means that improving the lower bounds for span programs is one of the outstanding open question at the "frontier" of current techniques for proving lower bounds in general non-monotone circuit models.

In this paper we introduce a new problem at the frontier: the complexity of nonmonotone comparator circuits. A comparator circuit is a circuit composed purely of comparator gates, each of which maps a pair of bits $(x, y)$ to $(x \wedge y, x \vee y)$. It is known that, just like span programs, comparator circuits can efficiently simulate nondeterministic branching programs [6], and thus lower bounds for comparator circuits imply lower bounds on branching programs and formulas. However, like branching programs, comparator circuits are known to be stronger than boolean formulas, as it is easy to construct comparator circuits computing Parity in $O(n)$ size (unlike formulas, which require $\Omega\left(n^{2}\right)$ size [16]). Similarly, like span programs, comparator circuits are believed to be weaker than general non-monotone circuits [6, 24]. However, it appears that comparator circuits may be incomparable to, or possibly even stronger than span programs: the class of functions computable by polynomial-size comparator circuits is conjectured to be incomparable to NC [25], and this conjecture is supported by oracle separations [6]. In contrast, the class of functions computable by polynomial-size span programs over finite fields is contained in $\mathrm{NC}^{2}$. Furthermore, while we already had superlinear lower bounds for span programs [3, 15], there have been no superlinear lower bounds known for the comparator circuit size of any explicit function.

Continuing the line of work proving superlinear lower bounds for formulas, branching programs, and span programs, we prove the first superlinear lower bounds on the size of comparator circuits computing an explicit Boolean function. Let $n=2 m \log m$, and let $E D_{n}:\{0,1\}^{n} \rightarrow\{0,1\}$ be the $n$-bit Element Distinctness function, which takes $m$ integers in the range $\left\{1,2, \ldots, m^{2}\right\}$ as input (each encoded as $2 \log m$ bits) and outputs 1 iff all of the integers are distinct. Our main result is the following:

Theorem 1.1. The size of any comparator circuit computing the $n$-bit Element Distinctness function $E D_{n}$ is at least $\Omega\left((n / \log n)^{3 / 2}\right)$.

The above result is interesting for several reasons. First, it provides a new and natural class of circuits for which we can obtain superlinear size lower bounds, and thus represents progress towards the goal of proving superlinear lower bounds for general non-monotone circuits. A second reason is that the techniques we employ are notably
different than those used in previous lower bounds for non-monotone models. The current best $\Omega\left(n^{3-o(1)}\right)$ lower bounds for formulas follow by restriction based techniques, which we do not use for our lower bounds. In fact, as we discuss in Section 5, there seem to be obstacles to obtaining superlinear lower bounds for comparator circuits this way. Instead, we use a generalized version of the classic Nechiporuk method [21], which was used for obtaining the current best lower bounds for branching programs and span programs stated above. However, it is not possible to apply Nechiporuk's method directly to estimate the number of gates of comparator circuits, because the partial computations corresponding to subcircuits over disjoint subsets of inputs may significantly overlap (see Remark 4.3 and Section 1.2 for more details.) Instead, we need to exploit some interesting structure of comparator circuits to enable a more general version of the method. For these reasons we believe that our above lower bound (and comparator circuits more generally) are particularly interesting.

### 1.1 Comparator Circuits and Bounded Fanout Computation

Let us first describe comparator circuits more formally. As stated above, a comparator gate computes the map $(x, y) \mapsto(x \wedge y, x \vee y)$ where $x, y \in\{0,1\}$; that is, the gate outputs both the AND and the OR of its input bits. A comparator circuit is a Boolean circuit whose inputs are labeled by input literals (i.e. either variables $x_{i}$ or their negations $\bar{x}_{i}$ ), and is composed entirely of comparator gates; the comparator circuit is said to be monotone if no input literal is negated. We emphasize here that every comparator gate in the circuit has exactly two inputs and two outputs, and the (Boolean) output of the circuit is obtained by fixing some designated output of a comparator gate in the circuit.

The definition of a comparator gate suggests a convenient method of visualizing such circuits, depicted in Figure 1. We draw $m$ lines from left-to-right - we call these lines wires $^{2}$ - that are initially labeled with input literals or the Boolean constants 0,1 . We connect wires together with comparator gates: each gate is drawn connecting a pair of wires; one of the inputs is drawn with a circle (representing the output of the $\wedge$ gate) and the other with an arrow (representing the output of the $\vee$ gate).


Figure 1: Examples of comparator circuits, with outputs designated. The left is a simple monotone comparator circuit, the right is a non-monotone comparator circuit for parity.

[^1]The most studied examples of comparator circuits in the literature are sorting networks, which are typically studied in a more general (i.e. non-Boolean) setting. A sorting network is a mathematical model of a sorting algorithm which is input-oblivious: the algorithm performs the same sequence of compare-and-swap operations on its input sequence of integers for all possible inputs. There are $n$ wires traveling from left to right, each labeled with a distinct input variable, and a sequence of comparator gates connecting the wires. (Note that a comparator gate, intuitively, just sorts its inputs in order, with the larger input sent to the $\vee$ output and the smaller to the $\wedge$ output.) The goal is to construct the fastest network - where fastest can mean "shallowest", with the smallest depth connecting inputs to outputs, or with the fewest possible comparator gates - that sorts all $n$ inputs. Shallow sorting networks have many applications in theory and practice, and thus have been extensively studied in theoretical computer science, with much effort spent on finding the shallowest possible networks (see [1, 2, 17, 9]). A restricted version of comparator circuits appeared implicitly in a paper by Graham [10]. His model is equivalent to read once comparator circuits, where each literal is used on at most one wire. Graham [10] showed that there exists a Boolean function on 11 variables that cannot be computed by this model, and it was proved by Robere [23] that the $s, t$ connectivity problem cannot be solved by read once monotone comparator circuits.

Studying comparator circuits as a circuit model for general computation arguably began with the work of Subramanian [25], who showed that comparator circuits are a natural model for extending the study of bounded fan-out circuits beyond Boolean formulas. Subramanian considered different classes of circuits depending on whether or not the gates in the circuit could simulate a COPY gate: that is, the gate that takes some input $x$ and outputs two copies of $x$. For example, when a gate can "fan-out" its output then we are allowing the gate to implicitly copy its output many times over. On the other hand, it is easy to see that a circuit constructed of comparator gates can not simulate a COPY gate, as the Hamming weight of the output of a comparator gate is always the same as the Hamming weight of the input of a comparator gate. Thus comparator circuits form an interesting intermediate class between Boolean formulas and Boolean circuits: they cannot create copies of intermediate computations, and so are apparently weaker than Boolean circuits; but, the ability to compute AND and OR simultaneously makes them apparently more powerful than Boolean formulas.

Let CC denote the class of languages computable by $\mathrm{AC}^{0}$-uniform polynomial-size comparator circuits. This complexity class has an interesting characterization as the class of problems reducible to the stable marriage problem [25]. Closer to our purposes, Subramanian showed that if a circuit composed from gates chosen from a set $S$ has "bounded fanout", in the sense that the gates from $S$ cannot simulate a COPY gate, and all the gates in $S$ compute monotone functions, then the corresponding circuit value problem for circuits with gates from $S$ is either in NC or is CC-hard [25]. Despite their inability to copy, polynomial-size comparator circuits can compute everything in NL $[20,6]$ (which is conjectured to be strictly more powerful than polynomial-size Boolean formulas), and appear to be incomparable with NC [25]. Therefore, CC is a natural class to continue the study of bounded fan-out computation past $\mathrm{NC}^{1}$, along with being a candidate for a class $C$ satisfying $\mathrm{NL} \subset C \subset \mathrm{P}$ and $C \neq \mathrm{NC}$. The computational
complexity of the class CC was recently analyzed by Cook, Filmus and Le [6] where, among other things, the question of uniformity and oracle separations with other classes was addressed.

However, a natural question left open by [6] is that of lower bounds. In the monotone case, since monotone comparator circuits are simulated by monotone circuits, exponential lower bounds are implied by the known lower bounds for monotone circuits (for instance, by Razborov [22]), and exponential separations between monotone comparator circuits and monotone circuits have been proved in [24]. Lower bounds for general Boolean circuits imply lower bounds for comparator circuits, but other than such implications, we are not aware of any previous nontrivial lower bounds for general (non-monotone) comparator circuits. In particular, before our work, no superlinear lower bounds have been obtained for comparator circuits computing an explicit Boolean function.

### 1.2 Main Results and Techniques

To prove Theorem 1.1 we recruit a classic tool from non-monotone circuit lower bounds: the Nechiporuk method [21]. This method was invented by Nechiporuk to prove lower bounds on the size of deterministic branching programs and formulas, and is currently one of the few techniques capable of proving superlinear lower bounds against nonmonotone computation. The idea of the method is as follows, in the setting of branching programs (the argument for formulas is identical, see e.g. [29]). Consider a branching program $B$ computing a Boolean function $f$ on $n$ variables. First, observe that fixing the values outside of a subset $S \subseteq[n]$ of the variables to constants results in a branching program that computes a subfunction of the original function on the variables in the subset $S$. It follows that as we range over all substitutions to $[n] \backslash S$, the number of different branching programs obtained must be at least as large as the number of different subfunctions of $f$ on $S$. Thus, to obtain a lower bound, take any partition of the input variables $[n]=Y_{1} \cup Y_{2} \cup \cdots \cup Y_{n}$ and let $h_{i}$ be the number of nodes of the branching program $B$ labeled with variables from the set $Y_{i}$. Recall that the size of a deterministic branching program is the number of its nodes, and each node is labeled by exactly one input variable. Thus, the size of the branching program $B$ is $|B|=\sum_{i} h_{i}$ and, in turn, we can lower-bound each $h_{i}$ by relating it to the number of subfunctions of $f$ on the variables $Y_{i}$ obtained by restricting $[n] \backslash Y_{i}$ in all possible ways. This yields lower bounds for functions with large number of different subfunctions.

The Nechiporuk method works well for branching programs as the sets of nodes of the program associated with disjoint subsets of inputs are disjoint. The situation is similar for Boolean formulas: each leaf of the formula is labeled by a variable (or negated variable), and again the sets of leaves associated with disjoint subsets of inputs are disjoint. In fact, this property (that the subcircuits corresponding to disjoint subsets of inputs are disjoint) is true for all prior applications of the method. For example, in the application to non-deterministic and parity branching programs [15], the edges of the program are now labeled by variables, and thus the sets of edges corresponding to disjoint subsets of variables are disjoint. Similarly, in span programs over $G F(2)$ [3],
the rows are labeled by variables and so once again, subsets of rows corresponding to disjoint subset of variables are disjoint.

In principle, one could imagine generalizing this method to other circuit models, as long as the subcomputations corresponding to disjoint subsets of input variables do not overlap much. On the negative side, the method can provably not be applied to general Boolean circuits [28], which intuitively makes sense as circuits can copy intermediate computations as many times as they like.

So: what about comparator circuits? At first glance it seems like the method should not be applicable to comparator circuits either, since the circuits contain gates with fanout two, and this implies that the computation of the circuits on disjoint sets of variables can badly overlap, just like general Boolean circuits and unlike formulas, branching programs or span programs. Indeed, as we discuss in Remark 4.3, the set of gates of a comparator circuit contributing to the computation on a given subset of variables can potentially involve all the gates of the original circuit.

We can, however, overcome this problem. Our key observation is the following: the idea of Nechiporuk's method may be still applicable in situations where the subcomputations badly overlap, as long as we are able to bound the size of the computation by some (not too fast growing) function of the number of occurrences of input literals. This is simply because the occurrences of input literals (or in other words the input queries) made by the computation will always be disjoint over disjoint subsets of variables.

Let us elaborate further in the case of comparator circuits. In comparator circuits, the number of wires - which, we recall, refers to the number of distinct left-to-right lines in Figure 1 - corresponds exactly to the number of occurrences of input literals in the circuit. The fact that comparator circuits cannot simulate COPY gates suggests the following question. If $F$ is a Boolean formula with $\ell$ leaves then it is easy to see that the number of internal gates $s$ of $F$ is exactly $s=\ell-1$. In other words, the size of the formula $F$ is linearly related to the number of its leaves, and this is a result of $F$ having bounded fanout. Does a similar relation hold for comparator circuits between gates and wires?

First, observe that since each gate in the comparator circuit connects two wires, in order for each wire to be connected to the output gate we must have $\ell-1 \leq s$, where $s$ is the number of gates, just like in formulas. However, is it the case that $s=O(\ell)$ ? Or even $s=O(\operatorname{poly}(\ell))$ ? Surprisingly, we are able to show that $s \leq\binom{\ell}{2}$ assuming that none of the gates in the circuit are "useless" in a specific technical sense and can be removed (cf. Section 3); furthermore, we show that this relationship is sharp. For the special case of read once comparator circuits, essentially this statement was proved implicitly by Graham [10].

Theorem 1.2. Let $C$ be any comparator circuit with $\ell$ wires and $s$ gates such that every gate in $C$ is useful. Then $s \leq \ell(\ell-1) / 2$.

It is precisely this relationship between wires and gates which allows us to overcome the "overlapping subcomputations" issue in applying Nechiporuk's method, and it seems to be a remarkable property of this model. Furthermore, our final superlinear lower bound is on the number of wires of the circuit, instead of the number of the gates. As
far as we know, our results are the first generalization of Nechiporuk's method to such a scenario.

## 2 Definitions

For any positive integer $m$ let $[m]:=\{1,2, \ldots, m\}$, and if $n$ is an integer satisfying $m \leq n$ let $[m, n]:=\{m, m+1, \ldots, n\}$. If $S$ is a set then $\wp(S)$ denotes the power set of $S$.

Throughout we will be interested in Boolean functions $f:\{0,1\}^{n} \rightarrow\{0,1\}$. If $x \in\{0,1\}^{n}$ then $x_{i}$ is the value of the $i$ th bit in $x$. If $x, y \in\{0,1\}^{n}$ are binary strings then $x \leq y$ if $x_{i} \leq y_{i}$ for all $i \in[n]$. If $x \in\{0,1\}^{n}$ then $|x|$ is the Hamming weight of $x$.

A comparator gate is the function $C:\{0,1\}^{2} \rightarrow\{0,1\}^{2}$ computing the map $(x, y) \mapsto$ $(x \wedge y, x \vee y)$. It is natural to think of a comparator gate $C$ as "sorting" the input ( $x, y$ ), as the smaller input goes to the first coordinate and the larger input goes to the second. A comparator circuit is a circuit composed of comparator gates with arbitrary Boolean literals as input. For later convenience, we use the following, alternative definition of a comparator circuit. A comparator circuit $C$ is defined by a tuple ( $W, G, I, o$ ), where

- $W$ is a set of elements called wires,
- $I: W \rightarrow\left\{x_{1}, \ldots, x_{n}, \bar{x}_{1}, \ldots, \bar{x}_{n}, 0,1\right\}$ is an initial labeling of each wire with an input literal or Boolean constant,
- $G \subseteq W^{2}$ is an ordered list of comparator gates,
- $o \in W$ is the designated output wire.

Each comparator gate $g=(i, j) \in G$ is specified by the pair of wires the gate connects: the AND output of the gate is interpreted as the first coordinate and the OR output of the gate is interpreted as the second coordinate. A comparator circuit computes a Boolean function in the natural way: given an assignment to the input literals, we evaluate the comparator gates one by one in order according to $G$, and then output the Boolean value labeled on the output wire, o. We define the size of the comparator circuit to be the number of gates in the circuit, and note that the number of wires in the circuit will also be an interesting complexity measure.

## 3 Wires vs. Gates in Comparator Circuits

Definition 3.1. Let $C$ be a comparator circuit with $m$ wires, let $g=(i, j)$ be a gate in $C$, and let $v_{i}(x), v_{j}(x)$ be the values of the wires $i, j$, respectively, on input $x$ just before applying the gate $g$. Then $g$ is useful if there is a pair of inputs $x, y$ to $C$ such that $\left(v_{i}(x), v_{j}(x)\right)=(1,0)$ and $\left(v_{i}(y), v_{j}(y)\right)=(0,1)$. If $g$ is not useful then we say it is useless.

Equivalently, if a gate $g$ is useless then for every pair of inputs $(x, y)$ either $\left(v_{i}(x), v_{j}(x)\right)=$ $\left(v_{i}(y), v_{j}(y)\right)$ or $v_{i}=v_{j}$ for one of $x, y$. The next proposition states that gates that are not useful can be removed without loss of generality.

Proposition 3.2. Let $C$ be a comparator circuit with $\ell$ wires and $s$ gates computing some Boolean function $f$. Then there exists a comparator circuit $C^{\prime}$ with (at most) $\ell$ wires and $s^{\prime} \leq s$ gates computing $f$ such that every gate in $C^{\prime}$ is useful. Moreover, the labels of the wires that are not removed remain the same in $C^{\prime}$ as in $C$.

Proof. Let $g=(i, j)$ be any gate in $C$ such that $g$ is not useful. Let $v_{i}(x), v_{j}(x)$ be the values of the wires $i, j$ before applying $g$ on input $x$. Since $g$ is not useful, then it follows that for every pair of inputs $x, y$ to the circuit either $\left(v_{i}(x), v_{j}(x)\right)=\left(v_{i}(y), v_{j}(y)\right)$ or one of $x, y$ satisfies $v_{i}=v_{j}$.

Let $O \subseteq\{0,1\}^{n}$ be the set of inputs such that for all $x \in O$, exactly one of $v_{i}(x), v_{j}(x)$ is 1 . Since $g$ is useless, it follows that either $\left(v_{i}(x), v_{j}(x)\right)=(0,1)$ for every $x$ in $O$ or $\left(v_{i}(x), v_{j}(x)\right)=(1,0)$ for every $x$ in $O$.

Assume that the outputs of $g$ are ordered so that $g(0,1)=(0,1)$ and $g(1,0)=(0,1)$. The proof for the other type of gates is symmetric. Consider the following two cases.
Case 1: For all $x \in O,\left(v_{i}(x), v_{j}(x)\right)=(0,1)$.
We claim that for all $x \in\{0,1\}^{n}$, applying $g$ does not change the value of any wire, and so $g$ can be removed without affecting the computation $C$. This is because for every $x \in\{0,1\}^{n}$, either $x \in O$ and so applying $g$ does not change any value (since $g(0,1)=(0,1))$, or $x \notin O$, and by the definition of the set $O$ we must have $v_{i}(x)=v_{j}(x)$, and so applying $g$ will not change the values of the wires.
Case 2: For all $x \in O,\left(v_{i}(x), v_{j}(x)\right)=(1,0)$.
By definition $g(1,0)=(0,1)$, and so we perform the following operation: remove $g$, and for each wire $i, j$ find the first gate $h$ using that wire as an input. If it is the same gate for both wires, then $h$ is redundant, so remove it and continue searching forward along each wire. Otherwise, let $h_{i}=(k, \ell)$ where exactly one of $k, \ell$ is $i$ and let $h_{j}=\left(k^{\prime}, \ell^{\prime}\right)$ where exactly one of $k^{\prime}, \ell^{\prime}$ is $j$. Let the wire $i$ feed into the gate $h_{j}$ and the wire $j$ feed into the gate $h_{i}$. Switching the wires this way will simulate the action of the gate $g$ that we just eliminated.

Once you have found a non-redundant gate $h$ for each of the two wires $\{i, j\}$ (or if no such gate exists), then stop.

To see that this operation does not affect the output of the circuit, consider the following two cases:

1. $v_{i}(x)=v_{j}(x)$. In this case, performing this operation will not affect the computation of the circuit as either of the $h$ gates will receive the same input, regardless of which wire they are connected to.
2. $v_{i}(x) \neq v_{j}(x)$. In this case, $\left(v_{i}(x), v_{j}(x)\right)=(1,0)$, and $g(1,0)=(0,1)$, so by removing $g$ and switching the inputs of the $h$ gates we ensure that in the new circuit the computation will be the same.

By proceeding from the inputs of the circuit to the outputs we can apply this operation to remove every useless gate in the circuit. In each case, the output of the circuit will not be affected.

Finally, notice that if a wire is not connected to any remaining gate in $C^{\prime}$, we may remove it, except if the wire is designated as the output of the circuit. The wires that are not removed, retain the same label as in the original circuit $C$, and the proof of the proposition is complete.

Now we can prove the main theorem of this section, which we restate from the Introduction.

Theorem 1.2. Let $C$ be any comparator circuit with $\ell$ wires and $s$ gates such that every gate in $C$ is useful. Then $s \leq \ell(\ell-1) / 2$.

Proof. For $j=1,2, \ldots, s$ let $g_{j}$ denote the $j$ th gate of $C$, and let $C_{j}$ be the subcircuit of $C$ consisting of the first $j$ gates of $C$. Let $C_{0}$ be the circuit $C$ with all of its gates removed. We say that two inputs $x, y \in\{0,1\}^{n}$ are useful with respect to the pair $(i, j)$ if $w_{i}(x)=1, w_{i}(y)=0$ and $w_{j}(x)=0, w_{j}(y)=1$, where $w_{i}, w_{j}$ are the outputs of wires $i$ and $j$ of the comparator circuit $C$. If there exist two inputs $x, y \in\{0,1\}^{n}$ that are useful with respect to $(i, j)$ then we say that $(i, j)$ is a useful pair.

Let $U_{j}$ denote the collection of all useful pairs in the circuit $C_{j}$. We show that the size of $U_{j}$ decreases by at least one after applying each gate: that is $\left|U_{j}\right| \leq\left|U_{j-1}\right|-1$ for all $j$. Since $\left|U_{0}\right| \leq\binom{\ell}{2}$, it follows that the number of gates in $C$ is at most $\binom{\ell}{2}$.

Let $1 \leq j \leq s$ be an integer, and suppose the gate $g_{j}$ connects two wires $(a, b)$. Since $g_{j}$ is useful we have that $(a, b)$ is a useful pair in $U_{j-1}$. It is also clear that $(a, b) \notin U_{j}$, as applying the gate $g_{j}$ removes all useful inputs with respect to $(a, b)$. However, we are not finished, as applying the gate $g_{j}$ could have introduced a new useful pair $(c, d)$ : so, $(c, d) \in U_{j}$ and $(c, d) \notin U_{j-1}$. The proof will be complete once we prove the following claim, which states that if such a new useful pair is introduced, then there must exist another distinct useful pair that was removed.

Claim. Suppose there exists a useful pair $(c, d) \in U_{j}$ such that $(c, d) \notin U_{j-1}$. Then there is another useful pair $(\alpha, \beta)$, uniquely associated with $(c, d)$, such that $(\alpha, \beta) \neq(a, b)$, $(\alpha, \beta) \in U_{j-1}$, and $(\alpha, \beta) \notin U_{j}$.
Proof of Claim. It is clear that while $(c, d) \neq(a, b)$, it cannot be the case that $c, d \notin$ $\{a, b\}$ : that is, at least one of $c$ or $d$ must be equal to $a$ or $b$. This is because applying the gate $g_{j}$ only modifies the outputs of wires $a$ and $b$, and so any new useful pair must include one of these modified outputs.

First assume that $c=a$ (and we note that the other cases $(c=b, d=a, d=b$ ) follow by nearly identical proofs). It follows that $(a, d) \in U_{j}$ and $(a, d) \notin U_{j-1}$. We show that this implies that $(b, d) \in U_{j-1}$ and $(b, d) \notin U_{j}$. For any wire $t$ let $v_{t}$ denote the output of the $t$ th wire before applying the gate $g_{j}$, and let $w_{t}$ denote the output of the $t$ th wire after applying the gate $g_{j}$. Since $(a, d)$ is a useful pair it follows that there exists a pair of inputs $x, y \in\{0,1\}^{n}$ such that $w_{a}(x)=1, w_{d}(x)=0$ and $w_{a}(y)=0, w_{d}(y)=1$. Now, since $(a, d) \notin U_{j-1}$ we can deduce the values of $v_{a}, v_{b}$, and $v_{d}$ on inputs $x$ and $y$. It is
obvious that $v_{d}(x)=w_{d}(x)$ and $v_{d}(y)=w_{d}(y)$ as the gate $g_{j}$ is not connected to the wire $d$. We can also conclude that $v_{a}(x)=v_{b}(x)=1$, as the gate $g_{j}$ connects $a$ to $b$ and $w_{a}(x)=1$ (for if $v_{b}(x)=0$, then $w_{a}(x) \neq 1$, as the 1 would have been moved to the $\vee$ output of $g_{j}$ ). Finally, since $(a, d) \notin U_{j-1}$ we know that $v_{a}(y)=1$ and $v_{b}(y)=0$. For if $v_{a}(y)=0$, it would follow that $(a, d)$ is a useful pair in $U_{j-1}$ (a contradiction). Thus $v_{a}(y)=1$, and since $w_{a}(y)=0$, the only possibility is that the gate $g_{j}$ moved a 1 from wire $a$ to wire $b$ on input $y$. We now record the values of $v_{a}, v_{b}, v_{d}$ and $w_{a}, w_{b}, w_{d}$ on inputs $x$ and $y$ :

|  | $x$ | $y$ |
| :---: | :---: | :---: |
| $v_{a}$ | 1 | 1 |
| $v_{b}$ | 1 | 0 |
| $v_{d}$ | 0 | 1 |
| $w_{a}$ | 1 | 0 |
| $w_{b}$ | 1 | 1 |
| $w_{d}$ | 0 | 1 |

By examining the table it is easy to see that $(b, d)$ is a useful pair in $U_{j-1}$, as $v_{b}(x)=1, v_{d}(x)=0$ and $v_{b}(y)=0, v_{d}(y)=1$. We show that $(b, d) \notin U_{j}$, proving the claim in this case.

To see this, suppose that $(b, d) \in U_{j}$. Then there must exist an input $z$ such that $w_{b}(z)=0, w_{d}(z)=1$, on which we can similarly deduce the values of $v_{a}, v_{b}, v_{d}$. Clearly $v_{d}(z)=1$ since the gate $g_{j}$ connects the two wires $(a, b)$. Since $w_{b}(z)=0$, it must be that $v_{a}(z)=v_{b}(z)=0$. However, this means that $v_{a}(z)=0, v_{d}(z)=1$, and from the table we can see that $v_{a}(x)=1, v_{d}(x)=0$. This means that $(a, d)$ is a useful pair in $U_{j-1}$, a contradiction!

We make a final remark on the uniqueness property. We technically have four cases to prove here, as the new useful pair must be in one of the following forms:

1. $(a, d)$
2. $(b, d)$
3. $(c, a)$
4. $(c, b)$

In the proof above we showed that if the new useful pair is of the form $(a, d)$, then this implies that $(b, d) \in U_{j-1}$ and $(b, d) \notin U_{j}$. In the other three cases (which proceed by identical proofs), other uniquely associated pairs are introduced. In general, if $\alpha \in\{a, b\}$ and $\beta \in\{a, b\}, \beta \neq \alpha$, and the new useful pair is of the form $(\alpha, d)$ then the same proof shows that $(\beta, d)$ is in $U_{j-1}$ and not in $U_{j}$. Similarly, if the new useful pair is of the form $(c, \alpha)$ then a similar proof shows that $(c, \beta)$ is in $U_{j-1}$ and not in $U_{j}$. These facts together prove uniqueness.

To see that the upper bound given in the previous theorem is sharp, consider the following comparator circuit $C_{n}$ which sorts its $n$ inputs via the "bubble sort" method: the circuit incrementally bubbles the largest value from the top wire down as far as it can go, and then the second wire, and so on. It is not hard to see that the resulting circuit has $n$ wires and $\binom{n}{2}$ gates.

## 4 Lower Bound for Element Distinctness

Let $n=2 m \log m$, and recall the definition of the Element Distinctness function $E D_{n}$ : it takes $n=2 m \log m$ input bits divided into $m$ blocks of $2 \log m$ bits each, interpreted as $m$ integers in the range $\left\{1, \ldots, m^{2}\right\}$, and decides whether all $m$ numbers are distinct. In this section we prove our main lower bound, restated here for convenience:

Theorem 1.1. The size of any comparator circuit computing the $n$-bit Element Distinctness function $E D_{n}$ is at least $\Omega\left((n / \log n)^{3 / 2}\right)$.

We note that the lower bound holds for any function with a similar number of subfunctions; for instance, the Indirect Storage Access function [29].

We will need the following Lemma. Recall that in a comparator circuit, each of the wires is labeled with either a constant 0,1 , some input variable or its negation.

Lemma 4.1. Let $\ell \geq 1$ be an integer. For any fixed labeling of $\ell$ wires, the number of different Boolean functions that can be computed by comparator circuits with $\ell$ wires with the given labeling is at most $\ell^{\ell^{2}}$.

Proof. By Proposition 3.2, every comparator circuit $C$ with $\ell$ wires is equivalent to another comparator circuit $C^{\prime}$ with (at most) $\ell$ wires that has no useless gates. By Theorem 1.2 the number of gates of $C^{\prime}$ is at most $\binom{\ell}{2}$. Recall also that $C^{\prime}$ keeps the same labeling of the wires (that are not removed) as $C$; and if a wire is not connected to any remaining gate in $C^{\prime}$, we may remove it, except if the wire is designated as the output of the circuit.

To prove the lemma it is enough to estimate the number of different comparator circuits with at most $\ell$ wires of a given labeling and at most $\binom{\ell}{2}$ gates. For each gate, there are at most $\binom{\ell}{2}$ choices for the pair of wires it takes as inputs, and two choices for the ordering of the $\wedge$ output and $\vee$ output of the gate. In addition, we have at most $\ell$ choices to designate one of the wires as the output. Thus the number of possible such comparator circuits is at most $\ell \cdot\left(2\binom{\ell}{2}\right)^{\binom{\ell}{2}} \leq(\ell)^{\left(\ell^{2}\right)}$.

Remark 4.2. We can also bound the number of different Boolean functions on $n$ variables that are computed by comparator circuits with $s$ gates by $(2(n+1) s)^{2 s}$ using a similar counting argument. Note that here we do not assume a fixed labeling, and we use that the number of wires is at most the number of gates, assuming that each wire is connected to the output gate.

If the subsets of gates used in subcircuits over disjoint subsets of inputs would not overlap much, then a straightforward application of Nechiporuk's method seemingly would
yield nearly quadratic lower bounds using this counting argument. However, as we discuss in some more details in Remark 4.3, the subsets of gates can badly overlap. Thus, even though a counting argument in terms of gates is available, Nechiporuk's argument is not applicable directly to the gates of comparator circuits.

Note that we could have also stated a similar bound for comparator circuits with $\ell$ wires and $n$ input variables, without considering a fixed labeling of the wires. We find the current version of Lemma 4.1 more convenient for our purposes. We are now ready to prove Theorem 1.1.

Proof. (of Theorem 1.1) We prove the stronger statement, that the number of wires of any comparator circuit computing $E D_{n}$ is at least $\Omega\left((n / \log n)^{3 / 2}\right)$. Recall that the size of a comparator circuit is the number of its gates, and the number of gates in a comparator circuit with $w$ wires, where each wire is connected to the output gate, is at least $w-1$. Thus, a lower bound on the number of wires implies lower bounds on the size of the comparator circuit.

Partition the $n=2 m \log m$ input variables into $m$ groups of $2 \log m$ variables each, such that the variables in the $i$-th group represent the $i$-th integer in the input. For $i=1, \ldots, m$ let $S_{i}$ be the set of variables in the $i$-th group. Let $N_{i}$ denote the number of different subfunctions over the variables in $S_{i}$ that can be obtained by fixing all variables outside $S_{i}$ to constants. It is known (see e.g. [14]) that $N_{i}=2^{\Omega(n)}$ for each $i=1, \ldots, m$.

Let $C$ be a comparator circuit computing $E D_{n}$ with $w$ wires. Let $w_{i}$ denote the number of wires of $C$ labeled by a variable from the $i$-th group. Then $w=\sum_{i=1}^{m} w_{i}$.

For a given $i \in[m]$, consider a fixed assignment $\alpha$ of constants 0 or 1 to all variables outside of $S_{i}$. Consider the resulting comparator circuit $C_{i, \alpha}$ over the variables in $S_{i}$. Applying Proposition 3.2 to $C_{i, \alpha}$ we can obtain a comparator circuit $C_{i, \alpha}^{\prime}$ over variables from $S_{i}$ with no useless gates. Notice that if a wire is labeled by constant 0 or 1 then any gate directly using this wire must be useless (in the formal sense of Definition 3.1). Note also that after removing all useless gates, wires with constant label are not connected to any remaining gates. Thus, they can be removed, except when designated as the output wire. Note however, that if a wire with constant label that is not connected to any gate is designated as the output wire, then the function computed is constant 1 or 0 . Thus, unless the function computed by $C_{i, \alpha}^{\prime}$ is constant, all wires in $C_{i, \alpha}^{\prime}$ are labeled by variables or their negation from $S_{i}$, regardless of the particular assignment $\alpha$.

This fact has two important consequences for us. First, it means that the number of wires of $C_{i, \alpha}^{\prime}$ is at most $w_{i}$. Second, it means that for given $i$, unless the function computed by $C_{i, \alpha}^{\prime}$ is constant, the wires of $C_{i, \alpha}^{\prime}$ have the same labels (by variables or negated variables from $S_{i}$ ) regardless of the particular assignment $\alpha$. (To see this, recall that in Proposition 3.2 the labels of the wires that are not removed remain the same as in the original circuit.)

This allows us to conclude using Lemma 4.1 that $N_{i} \leq 2+w_{i}^{\left(w_{i}^{2}\right)}$ for $i=1, \ldots, m$. Thus, we have for $i=1, \ldots, m$ that

$$
w_{i}^{2} \geq \frac{\log \left(N_{i}-2\right)}{\log w_{i}} \geq \frac{\Omega(n)}{\log w_{i}}
$$

Note that if $\log w_{i}>\frac{1}{2} \log n$ then $w_{i} \geq \sqrt{n}$. On the other hand, if $\log w_{i} \leq \frac{1}{2} \log n$ then we get $w_{i}^{2} \geq \frac{\Omega(n)}{\log n}$. Thus, for $i=1, \ldots, m$ we have $w_{i} \geq \Omega\left(\frac{\sqrt{n}}{\sqrt{\log n}}\right)$, which yields $w \geq \Omega\left((n / \log n)^{3 / 2}\right)$.

Remark 4.3. It is crucial in the above argument that the set of wires used by the subcircuits $C_{i, \alpha}^{\prime}$ is the same for fixed $i$ regardless of the assignment $\alpha$, and that these sets do not overlap for different values of $i$. One could try to consider a similar argument directly for gates instead of wires. For instance, one could define $G_{i, \alpha}$ as the set of gates participating in the circuit $C_{i, \alpha}^{\prime}$, and consider $G_{i}=\cup G_{i, \alpha}$. But for different assignments $\alpha$, the circuits $C_{i, \alpha}^{\prime}$ may retain different gates of the original circuit, and the sets $G_{i}$ may badly overlap. In particular, for some values of $i, G_{i}$ may contain all gates of the original circuit.

## 5 Conclusion and Future Work

In this paper we have proved the first superlinear lower bound on the size of comparator circuits computing an explicit Boolean function. As we have remarked above, we actually prove a superlinear lower bound on the number of wires, or equivalently, input queries, of any comparator circuit for $E D_{n}$, which is stronger than a lower bound on the number of gates. Furthermore, by our Theorem 1.2, there is at most a quadratic separation between the number of wires and the number of gates in any minimal comparator circuit. A natural problem is to try and prove a lower bound on the number of gates directly. However, as we discuss above in Remark 4.3 this would require a different technique.

We remark that it seems quite difficult to apply restriction techniques to obtain wire lower bounds for comparator circuits. This is for a simple reason: observe that restricting one input to a single comparator will restrict exactly one output of the gate and re-wire the other input to the other output. This implies that if we have a comparator circuit with $m$ wires, and we restrict values to $t$ of them, then we are left with a new comparator circuit with exactly $m-t$ unrestricted wires after propagating this rewiring process. Note that some wires could possibly be removed if they are "separated" from the output gate, but, if the topology of the circuit is highly connected (e.g. is an expander) then we should expect this to be very unlikely.

Finally, we remark on a second natural open problem. As we discuss in the introduction, the key structural property of comparator circuits that enabled us to apply Nechiporuk's method to comparator circuits is Theorem 1.2, relating the number of wires to the number of gates. The crucial intuition in the proof of this Theorem is that comparator circuits cannot copy intermediate computations. There is a rich structure of circuit classes extending comparator circuits which cannot copy intermediate computations, as explored by Subramanian [25]. Can one extend any of our results to these more general classes?

## References

[1] Miklós Ajtai, János Komlós, and Endre Szemerédi. An $O(n \log n)$ sorting network. In the proceedings of STOC 1983, 1-9.
[2] Kenneth E. Batcher. Sorting networks and their applications. Proceedings of AFIPS Spring Joint Comput. Conf. 32:307-314, 1968.
[3] A. Beimel, A. Gál and M. Paterson: Lower bounds for monotone span programs. Computational Complexity 6 (1996/97), pp. 29-45. Preliminary version in Proc. 36th IEEE FOCS, Milwaukee WI 1995, pp. 674-681.
[4] A. Bogdanov. Small bias requires large formulas. Electronic Colloquium on Computational Complexity (ECCC), 24:91, 2017.
[5] R. Chen, V. Kabanets, A. Kolokolova, R. Shaltiel, and D. Zuckerman. Mining circuit lower bound proofs for meta-algorithms. Computational Complexity, 24(2): pp. 333-392, 2015.
[6] Stephen A. Cook, Yuval Filmus, and Dai Tri Man Le. The complexity of the comparator circuit value problem. ACM Trans. on Computation Theory 6(1) (2014).
[7] I. Dinur and O. Meir. Toward the KRW composition conjecture: Cubic formula lower bounds via communication complexity. In 31st Conference on Computational Complexity, CCC 2016, May 29 to June 1, 2016, Tokyo, Japan, pp. 3:1-3:51, 2016.
[8] A. Gál, A. Tal, A. Trejo Nunez. Cubic Formula Size Lower Bounds Based on Compositions with Majority, In Proceedings of ITCS 2019, pp. 35:1-35:13 (2019).
[9] Michael T. Goodrich. Zig-zag sort: a simple deterministic data-oblivious sorting algorithm running in $O(n \log n)$ time. In the proceedings of STOC 2014, 684-693.
[10] R. J. Graham. A mathematical study of a model of magnetic domain interactions Bell. Syst. Tech. Journal, 49, No. 8, pp. 1627-1644, 1970.
[11] Johan Håstad. Almost optimal lower bounds for small depth circuits. In the proceedings of STOC 2000, 378-387.
[12] Johan Håstad. The Shrinkage Exponent of de Morgan Formulas is 2. SIAM Journal on Computing, 27(1): 48-64, 1998.
[13] Kazuo Iwama and Hiroki Morizumi. An explicit lower bound of $5 n-o(n)$ for Boolean circuits. MFCS 2002, 353-364, 2002.
[14] Stasys Jukna. Boolean function complexity. Springer-Verlag, Berlin-Heidelberg, 2012.
[15] Maurico Karchmer and Avi Wigderson. On span programs. In Proc. 8th Ann. Symp. Structure in Complexity Theory, IEEE 1993, pp. 102-111.
[16] V.M. Khrapchenko (1971). A method of obtaining lower bounds for the complexity of $\Pi$-schemes, Math. Notes Acad. of Sci. USSR 10 (1972) 474479.
[17] Donald Knuth. The art of computer programming, volume 3: sorting and searching. Addison-Wesley, 1997.
[18] I. Komargodski and R. Raz. Average-case lower bounds for formula size. In Symposium on Theory of Computing Conference, STOC'13, Palo Alto, CA, USA, June 1-4, 2013, pages 171-180, 2013.
[19] I. Komargodski, R. Raz, and A. Tal. Improved average-case lower bounds for de morgan formula size: Matching worst-case lower bound. SIAM Journal on Computing, 46(1):37-57, 2017.
[20] E.W. Mayr and A. Subramanian. The complexity of circuit value and network stability. Journal of Computer and System Sciences, 44(2):302323, 1992.
[21] E. I. Nechiporuk. A Boolean function Doklady of the Academy of Sciences of the USSR (in Russian) 169(4), 765-766, 1966. English translation in Soviet Mathematics Doklady, 7(4) 999-1000.
[22] A. A. Razborov. Lower bounds on the monotone complexity of some Boolean function. Soviet Math. Dokl. 31:354-357 (1985).
[23] Robert Robere, Manuscript, 2015.
[24] Robert Robere, Toniann Pitassi, Benjamin Rossman, Stephen A. Cook. Exponential Lower Bounds for Monotone Span Programs. In Proceedings of FOCS 2016, 406415.
[25] Ashok Subramanian. The computational complexity of the circuit value and network stability problems. PhD Thesis, Stanford University, 1990.
[26] Avishay Tal. Shrinkage of de morgan formulae by spectral techniques. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014, pages 551-560, 2014.
[27] Avishay Tal. Formula lower bounds via the quantum method. In Proceedings of the 49th Annual ACM Symposium on Theory of Computing, (STOC 2017), 1256-1268, 2017.
[28] D. Uhlig. Boolean functions with a large number of subfunctions and small complexity and depth. In the Proceedings of the 8th FCT, 395404, 1991.
[29] Ingo Wegener. The complexity of Boolean functions. Wiley-Teubner Series in Computer Science, 1987.


[^0]:    *Part of this work was done while visiting the Simons Institute for the Theory of Computing in Berkeley.
    ${ }^{\dagger}$ University of Texas at Austin, Email: panni@cs.utexas.edu
    ${ }^{\ddagger}$ DIMACS at Rutgers University and Institute for Advanced Study in Princeton, Email: robere@cs.toronto.edu
    ${ }^{1}$ Pudlák's result is unpublished, and referenced by Karchmer and Wigderson in their paper [15].

[^1]:    ${ }^{2}$ We note that this differs from the standard usage of the word "wires" in circuit complexity, which is usually used to mean the edges in the underlying directed graph of the circuit.

