Binary search trees are a fundamental data structure and their height plays a key role in the analysis of divide-and-conquer algorithms like quicksort. We analyze their smoothed height under additive uniform noise: An adversary chooses a sequence of n real numbers in the range [0,1], each number is individually perturbed by adding a value drawn uniformly at random from an interval of size d, and the resulting numbers are inserted into a search tree. An analysis of the smoothed tree height subject to n and d lies at the heart of our paper: We prove that the smoothed height of binary search trees is $\Theta (\sqrt{n/d} + \log n)$, where $d \ge 1/n$ may depend on n. Our analysis starts with the simpler problem of determining the smoothed number of left-to-right maxima in a sequence. We establish matching bounds, namely once more $\Theta (\sqrt{n/d} + \log n)$. We also apply our findings to the performance of the quicksort algorithm and prove that the smoothed number of comparisons made by quicksort is $\Theta(\frac{n}{d+1} \sqrt{n/d} + n \log n)$.
Binary search trees are a fundamental data structure and their height
plays a key role in the analysis of divide-and-conquer algorithms like
quicksort. Their worst-case height is linear; their average height,
whose exact value is one of the best-studied problems in average-case
complexity, is logarithmic. We analyze their smoothed height under
additive noise: An adversary chooses a sequence of n real numbers in the
range [0,1]; each number is individually perturbed by adding a random
value from an interval of size d; and the resulting numbers are inserted
into a search tree. The expected height of this tree is called smoothed
tree height. If d is very small, namely for d < 1/n, the smoothed tree
height is the same as the worst-case height; if d is very large, the
smoothed tree height approaches the logarithmic average-case height. An
analysis of what happens between these extremes lies at the heart of our
paper: We prove that the smoothed height of binary search trees is
$\Theta (\sqrt{n/d} + \log n)$, where d >= 1/n may depend on n. This
implies that the logarithmic average-case height becomes manifest only
for $d \in \Omega (n/\log^2 n)$. For the analysis, we first prove that
the smoothed number of left-to-right maxima in a sequence is also
$\Theta (\sqrt{n/d} + \log n)$. We apply these findings to the
performance of the quicksort algorithm, which needs $\Theta(n^2)$
comparisons in the worst case and $\Theta(n \log n)$ on average, and
prove that the smoothed number of comparisons made by quicksort is
$\Theta(n/d+1 \sqrt{n/d} + n \log n)$. This implies that the
average-case becomes manifest already for
$d \in \Omega(\sqrt[3]{n/\logsqr n})$.