next up previous contents index Search
Next: Improvement Strategies Up: 0.3.1 The Quicksort Previous: Partitioning Analysis

The performance of the whole Quicksort algorithm depends on how well selectpivot does at picking a good pivot point. The worst case for the given selectpivot procedure is when the data is already sorted. The higher of the first two elements in a data partition will simply divide the range into an one-element left side and an (x-1)element right side. This causes the Quicksort to run in quadratic time, n2 to be precise.

Another limitation of the Quicksort is that it tends to be an excellent choice for large arrays but performs badly with very small ones.

The best and average case running efficiency of a Quicksort is $\Theta(n log_2 n)$.

In order to Quicksort one element takes no comparisons. That is:

T(1) = 0

Now, in order to Quicksort n elements we have to select a pivot point, partition the n elements, and recurse on the two partitions. Assume the i element is chosen to be the pivot point (where $i \in
n$). In order to partition the n elements will take, at most, (n-1) comparisons at which point the quicksort will recurse on both of the two partitions. One partition will consist of the first i-1elements and the other will be the contain n-i elements (assuming the pivot value itself goes with the right partition). Thus, to sort n items we need, at most, (n-1) comparisons in the partitioning phase plus however long it takes to sort each resulting partition:

T(n) = (n-1) + T(i-1) + T(n-i)

However, the above recurrence relation cannot be simplified in the stated form; to continue more information about the selection of a pivot point is needed. Assume that any value of the n range is equally likely to become the pivot value. The above expression can be rewritten in the form below. Note that the terms T(i-1) and T(n-i) have been summed for all n possible values of i and then divided by n:

\begin{displaymath}T(n) = (n-1) + \frac{1}{n}\sum_{i=1}^{n} (T(i-1) + T(n-i))

The summation in the above expression can be distributed to both T(x) terms giving:

\begin{displaymath}T(n) = (n-1) + \frac{1}{n}\sum_{i=1}^{n} T(i-1) + \frac{1}{n}\sum_{i=1}^{n} T(n-i)


\begin{displaymath}T(n) = (n-1) + \frac{2}{n}\sum_{i=0}^{n-1} T(i)

This is a full-history recurrence relation and is solved by subtracting T(n) from T(n+1). First, calculate T(n+1) by substituting an (n+1) for every n in the above equation:

\begin{displaymath}T(n+1) = ((n+1)-1) + \frac{2}{n+1} \sum_{i=1}^{(n+1)-1} T(i)

These summations cannot be subtracted because of each ones the leading fraction. To proceed, multiply each formula by the denominator of its fraction and get:

(n+1) T(n+1) = n(n+1) + 2 \sum_{i=1}^{n} T(i)\\
(n) T(n) = n(n-1) + 2 \sum_{i=1}^{n-1} T(i)

Now subtract the latter from the former:

(n+1)T(n+1) - (n)T(n) = (n+1)n - n(n-1) + 2T(n) = 2n + 2T(n)

This implies that:

\begin{displaymath}T(n+1) = \frac{n+2}{n+1} T(n) + \frac{2n}{n+1}

This relation is still tricky to solve. To proceed substitute a value of 2 for the fraction $\frac{2n}{n+1}$ - a pretty close approximation. Also reverse the terms being added.

\begin{displaymath}T(n+1) = 2 + \frac{n+2}{n+1}T(n)

So expanding this relation:

\begin{displaymath}T(n) = 2 + \frac{n+1}{n} ( 2 + \frac{n}{n-1} + ( 2 + \frac{n-1}{n-2} (... \frac{4}{3} ) ... ) )

That is:

\begin{displaymath}T(n) = 2 ( 1 + \frac{n+1}{n} + \frac{n+1}{n-1} + ... \frac{n+1}{3}


T(n) = 2(n+1) (H(n+1)-1.5)

Where H above is the harmonic series (1+1/2+1/3+1/4...+1/n). The harmonic series of n can be approximated as:

\begin{displaymath}H(n) = ln(n) + O(1/n) + \gamma

Where $\gamma = 0.577...$ is Euler's constant. The approximate solution to the average case complexity of the Quicksort is:    

\begin{displaymath}T(n) \leq 2(n+1)(ln(n) + \gamma - 1.5) + O(1)

This is an O(n log2 n) solution.

Scott Gasch