## More on Sorting: HeapSort, Radix Sort, Sort Comparison, Sort Analysis

### Heapsort (6.4)

In class: Heap definition, reason for shape - array implementation.

Assume D is the depth of the heap;  n elements; D = ceiling(lg (n+1))

FixHeap:  if have a heap, you know the largest element, and can remove it.
Easy way to make a new heap?
Plug the hole with the last element, fix the order with fixheap
comparisons with i left in heap:  2 lg i + O(1):
total
 n-1 ∑ i=1
[2 lg i + O(1)] = 2n lg n  +  O(n)

Initial heap creation with  heapify: fix from bottom up
number of key comparisons?
obviously element placed in  <= 2D steps so all in n*2D = 2n lg n
but D steps possible in only 1 case; look closer

We could also create the heap from top down, keeping on adding one more element at the bottom, which may need to bubble up.  Each step in bubbleUp only takes one comparison to a parent, rather than the 2 in fixHeap to settle a parent down.  All n nodes might need to be fixed, and the maximum number of comparisons for any one is now 1*lg n.  Which approach is better?

A java version is below.  A similar Python version is here.

public class Heapsort
{
public static void heapsort(int[ ] data, int n){
// sort first n elements of data array via heapsort; 0 <= n < data.length
// first make heap
for (int r = (n-2)/2; r >= 0; r -= 1)
// work backwards from last parent node = (last index - 1)/2
heapify(data, r, n);
// sort heap by swapping biggest element to end of list,
//    and make heap again of the rest
while (n > 1){
n -= 1;
int temp = data[n]; // swap last unsorted element with top
data[n] = data;
data = temp;
heapify(data, 0, n);
}
}

private static void heapify(int[ ] data, int root, int n)
// let out-of-place element at index root
//    settle down through tree to make a legal heap
// start assuming any children of root are at tops of legal heaps
// all indices of heap elements have index < n
{
int left = 2*root + 1;
while(left < n) {  // while root is not a leaf
int right = left+1;
int rootdata = data[root]; // need a copy for swaps
if (right == n) { // right past end, only one leaf
if (data[left] > rootdata) {// swap if smaller at root
data[root] = data[left];
data[left] = rootdata;
}
return;  // already fixed child, which is the final leaf
}
else { // two leaves
int bigger = right; // find bigger of two children
if (data[left] > data[bigger])
bigger = left;
if (rootdata >= data[bigger])
return;  // OK, now is legal heap below
// else root out of place
data[root] = data[bigger]; //   swap root with bigger child
data[bigger] = rootdata;
root = bigger;  // descend in heap for next loop
left = 2*root + 1;
}
}
}

There are a variety of online animations for all the basic sorts, for example

Count comparisons (comp.) in heap creation:
For simplicity assume complete tree.  n = 2D+1 - 1:
depth D   2D elements move 0 levels, 0 comp. max for each,  tot 0
depth D-1   2D-1 elements move 1 levels, 1*2 comp. max for each,  tot 2*1*2D-1
depth D-2   2D-2 elements move 2 levels, 2*2 comp. max for each,  tot 2*2*2D-2
depth D-3   2D-3 elements move 3 levels, 3*2 comp. max for each,  tot 2*3*2D-3
...
depth 0   2D-D elements move D levels, D*2 comp. max for each,  tot 2*D*20

grand total
 D-1 ∑ d=0
2(D-d)2d =

 D-1 ∑ d=0
[2D2d - 2d2d] = 2D(2D - 1)  -  2*[(D-2)2D + 2] = 4*2D -  2D - 4 = 2n  -  O(lg n)

!! Smaller order than n lg n

Total comparisons for sort, to build heap and take apart is 2n lg n + O(n)

Creating the heap with the bubbleUp approach - could that worst case actually be O(n), too?  No:  the bottom most full row by itself has about n/2 elements, and each might have to bubble up through lg n  - 1 levels, producing at least Θ(n lg n).

Fine point -  Accelerated Heapsort - can skip:
Accelerated Heapsort, accelerated fixHeap after max of the heap is deleted, and the nth element needs to be inserted somewhere.  This is a curious variation on binary search:
Call the height of the tree h = ⌈lg(n+1)⌉
guess the nth element to be inserted fits in d = h/2 steps below the root, promote larger of two siblings until that far down (1  comparison  per level, h/2 total)
if go too low, bubble up at most h/2 levels, total h/2 down, h/2 up = h
else  do accelerated heapsort on subtree half way down, of height h/2

worse case is when it needs to go all the way to the bottom, down h levels, and after each promotion, checking to see if you are done requires a second comparison on that level, one more than accounted for so far.  The worse case involves promotions all the way to the bottom, calling promotion at most lg h times, so the worst case total number of comparison to delete the maximum element and fix the heap is h + lg h ≈ lg(n+1) + lg lg (n+1) times

Repeating this extraction of the max for all n items in the heap, the time is  <= n * comparisons for one deletion = n (lg n +  lg lg n) + O(n)
This sort can be done in place:  all recursion can be removed, since the action follows a single branch down the tree each time.

In accelerated heapsort the number of comparisons in deleteMax is always at least h/2, whereas in the original heapsort, there could be fewer.  Still I believe the accelerated version improves the average as well as the worst case, since the element to be inserted are always taken from the bottom of the heap, and most of the time the insertion will be more than 1/2 way down the tree, when the original verison would have done more comparisons than the maximum possible with the accelerated version.

(This fixHeapFast could also be used in the initial heap generation.  It is less important, because this does not effect the highest order term.  Also, in this case the comment made about average behavior of the original and accelerated fixHeap do not apply completely:  The elements to be bubbled down are random, not ones originally on the bottom of the heap, so they are not likely to sink down as far.  Still, since half the elements are within one level of the bottom, they are still likely to go pretty far down.)

The worse case number of data moves is comparable to the number of comparisons.

A quicksort with a highly improved pivot finding routine might do fewer data moves  (A lower bound  for the average with quicksort would be with the pivot in the middle and half the elements on the wrong side, requiring only half the elements to be moved at each level, or a total of .5n lg n with the original partition routine.)

I do not find it productive to analyse theoretically past here, where the relative times for comparison, data moves, and index moves on a particular machine all effect the leading coefficient of n lg n.  At some point it is time to do test runs with realistic data in the real environment the program is going to run.

### Bucket Partition (not in book)

This approach take a list, preferably a linked list, and in O(n) time splits the list into an array of sublists (easiest if each sublist is also a linked list), basing the array index of the sublist on a corresponding part of each key.  The name comes from the idea of physically tossing keys into one of a number of buckets.

If partitioning is based on a portion of the key that can have integer values 0, ... k-1:

Create or clear an array A of k empty lists.
For each key in the original list:
calculate the part of  the key examined.   Suppose it has integer value i
add the key to sublist A[i]

Since array lookup is O(1), one pass, partitioning a whole list into sublists is O(n).

For instance ASCII string keys can be partitioned into an array of 128 lists, where each key is put at the end of the sublist whose array index is the binary code in the most significant byte of the key.

This can be used as a first step combined with other sorting methods if the part of the key chosen is the most significant part.  After the sublists are sorted, concatenate sublists A, A, ... A[k-1].  It can also be turned into a full sort (radix sort below), using one section of the keys after another, but in an order that you might not guess:

### Radix sort (not in book)

The approach involves repeated bucket partitions using different portions of the key.  A key is needed that is (or can be converted to) an integer of fixed size.  This obviously works with size limited integer types.  It ls works with size limited ASCII strings padded with nulls or blanks, ordered by the underlying codes, if the raw binary data is treated as an integer.    For instance with a 3 byte key, you could partition on any of 6 different 4 bit sections.  A partition on one 4-bit section would result in an array of 24 = 16 sublists.

One naive way to use only bucket partitions for a complete sort is to take sublists which have the same common starting part of the key and partition them into smaller lists based on the next portion of the keys until you reach lists of length 1, and then concatenate the lists back together.  This recursively generates an enornmous unwieldy collection of lists that you must track.  Instead  radix sort follow an alternate, efficient but non-intuitive approach.

Radix sort starts by partitioning on the least significant part of the key first and recombining all the lists immediately after each completed partition.

Radix sort, given list L
For the least significant through most significant portion of the key:
Bucket partition L into array A
concatenate the sublists in A in order of the array indices to create a revised list L
The final version of L is sorted.

For instance if keys were 5 digit numbers, the sort could first be on the 1's digit, then the 10's digit, ... and finally the most significant digit, the 10,000's digit.  The only requirement to have this work is that the splitting and recombining operations make keys that go in the same bucket end up in the same relative order as the started (i.e. be stable).

Here is a small example, partitioning at individual digits.  For simplicity there are only 3 possible values in each place, and 3 digits:

 Initial data first pass second pass last pass combined 123 222 312 113 231 111 333 121 231 111 121 + 222 312 + 123 113 333 111 312 113 + 121 222 123 + 231 333 111 113 121 123 + 222 231 + 312 333 111 113 121 123 222 231 312 333

Thus if there are m pieces of the key on which partitions are done for a list of n items, the sort is of order O(mn).  In various texts radix sort is referred to as a form of bucket sort.  It is my term, bucket partition, which is analogous to the partition in quicksort, but with more parts, as only a step of some sorts.

Proof that radix sort works
After partittion i, the list is correctly sorted, in a stable sort, on the last i portions of the key.
Basis:  After one partition, the list is directly sorted on the one portion, and key with the same last portion are added to the sublist in the same order as they appeared in the original list, so the sort is stable.
Induction step:  Suppose  partitions and merges are done on the last i portions of the key, and the resulting list is sorted on the last i portions of the key.  Show the next sort will produce a stable sort on the last i+1 portions of the key.  Elements that differ in the latest part of the key considered will clearly be in order.  Elements with the latest part of the key the same appear in the same order as they were in the previous list, when they were in the proper order by the induction hypothesis.
Done.

Radix sort works well for massive datasets that must be stored in the file system (assuming keys are not too long).

### Counting Sort (7.1)

If a large lecture has a quiz with possible integer scores 0-10, how might you sort the scores?  Easier than mergesort or quicksort would be just counting the number of  students  with each possible score.  You need an array with indices 0-10 (length 11), initialized to 0's.  To count each quiz, you just increment the array element corresponding to the score.  After you have the counts, put the calculated number of 0's at the beginning of the sorted grade array, then the calculated number of 1's, ....   In general, if all the values are know lie in a range low <= value < low + m,  and there are n numbers in the list, the time for the sort is O(m+n).

This algorithm is an example where a separate subsidiary data structure is introduced that speeds up the algorithm and requires extra storage so it is a form of space-time tradeoff.  We look further into space-time tradeoffs in various dynamic data structures.

### Decision trees and a lower bound for many sorts (11.2)

An alternate source to the text section 11.2 is  DasGupta ch 2  8th page, showing that any sort where the only use of keys is to test if key1 < key2, then whatever the sorting method, the number of these comparisons is bounded below by an expression with highest order term n lg n.  This is the same leading term as with mergesort (not just order, but also the same constant multiple).

Questions:
1. If  a list of 5 digit integers of length n are sorted using radix sort on the digits, the algorithm is O(5n) , which is less than n lg n for large n.  Is this a contradiction with the lower bound above?
2. If a list of numbers is sorted with counting sort, the order is O(m+n).  In particular, if m=n, this is O(n).  Is this a contradiction?

### Sort Comparisons

Be aware of the advantages and disadvantages of each sort!  Expect questions on this!

Selection Sort
+ short code
+ low overhead for very small datasets
+ reqires limited moving of data
- bad for large datasets
- never better than O(n2)

Insertion Sort
+ short code
+ low overhead for small datasets
+ best case O(n)
-  worse case O(n2) is common

Quicksort:
+ generally the fastest key comparison sort on average
+ essentially in place
- worst case O(n2)
- best case only O(n lg n)

Mergesort:
+ worse case is still fast, n lg n + O(n) comparisons
+ some implementations have O(n) best case (Timsort)
+/- quite fast on average
+ if the list happens to be implemented as a mutable linked list, no extra storage needed
+ works via files as last steps for datasets that are too big for memory
- need O(n) extra storage in array implementations

Heapsort
+ reasonable fast in worst and average cases, n lg n + O(n) comparisons
+ in place in array
- best case still n lg n

+ extremely fast O(n) if  keys short
+ space efficient if data in linked list or for a massive dataset using files for storage
- not useful if keys long
- needs O(n) extra storage if in array in memory.

Counting sort
+ O(n) when it is useful
- Only useful when keys in a small known range that is O(n).
- Uses extra space based on the possible range of values.

Examples.  Choose an appropriate sort:
1. linked list with long keys
2. linked list with relatively small numeric keys
3. in array, with real time bound on allowed time, space constraints
4. list only slightly out of order for sure
5. list often not far out of order, but sometime quite random
6. sorting the letters in a book
7. short list but with large items of data to sort.
8. long keys, data too big for memory.
On to Amortizing for dynamic data structures