Details of Average Quicksort Calculation

We will be comparing several sorts whose average number of key comparisons is Θ(n log n) or equivialently Θ(n lg n), and since sorts are so important, and key comparisons such an obvious thing to count, let us for a bound for quicksort:  A(n)   ≤ c n ln n.  The idea of guessing a constant c and then solving for it, is generally useful.  This particular algorithm is so important that I am using it as an example, even though there is a calculus step that I would NOT expect of you.

We consider the case where distinct keys are in random order, and the element chosen as pivot is not based on any comparisons of keys, i.e. always first or always middle element. Since that element is random, it could end up in any index equally likely after sorting, each with probability 1/n.

The power of algebra is that you can manipulate symbols standing for some number, even if you do not yet know what the number is.  That is essential to the technique used here.  Let us prove by induction that the average number of key comparisons for versions of the algorithm discussed above, A(n)  satisfies

A(n)   ≤ c n ln n for some c, and all n >0

The deriviation will lead us to a value of c that works!

Basis:  Immediately true for n = 1 and any c (0 = 0)

Induction step:  Suppose n > 1,  and the hypothysis is true for i in 0 < i < n.  We need to prove
A(n)   ≤ c n ln n for some c.  

Nonrecursive key comparisons all come in the partition step, which compares the pivot to all other elements: n-1 comparisons exactly.  As discussed in the initial assumption about randomness, the final position of the pivot is equally likely to be at any index, each with probabilityt 1/n.  Adding all of these up for the expected value, all of the sizes from 0 to n-1 appearing twice (at the begining or end of the array), and sizes 0 and 1 contributes nothing, so we get the following, using the induction hypothysis for the inequality:
A(n) = n-1 +  (2/n)
A(i) ≤ n-1 +  (2/n)
i = 2
c i ln i

For a further inequality, switch to the real function x ln x, which is increasing for x ≥ 1, so we can compare to a definite integral.  (Do not worry about the calculus - assume you have mathematica to solve it...)

i = 2
 i ln i  ≤ 
x ln x dx  =
{(x2/2) ln x  - x2/4}
 =  (n2/2) ln n  - n2/4 - (2 ln 2 - 1)

 Substituting this in above we get

A(n) ≤ n-1 + (2c/n)[(n2/2) ln n  - n2/4 - (2 ln 2 - 1)]

and then collecting the different orders of n:

= cn ln n  + n(1 - c/2) -  1 - (4c ln 2)/n

≤ cn ln n  + n(1 - c/2)  for any positive c, and

≤ cn ln n  for any c ≥ 2.

We did this deriviation with an unknown c throughout, and the same value of c works for each n.  You could repeat from the top with c = 2 everywhere, and everything would follow the same way and we would end up proving by induction

A(n)   ≤ 2 n ln n, for all n >0.  

In terms of the more common log in algorithms, lg  or log base 2:

A(n)   ≤ 2 ln 2 n lg n ≈ 1.386 n lg n

For mergesort, the coefficient is 1, but quicksort is generally faster.

Back to More Sorts and Divide and Conquer.