Quick Sort Recap: 3-way Partition, Handling Duplicate Keys

Overview

Handling duplicates is tricky, while 3-way partition improves the efficiency of Quick Sort by putting all the items equal to the partitioning item in place.

For equal keys, it is (counter-intuitively) better to stop on keys equal to the partitioning item’s key; An even more desirable way is to put all items equal to the partitioning item in place this is what 3-way partition does here.

This is quick recap about 3-way partition for Quick Sort after learning the material from Algorithm Part 1 from Coursera.

The Duplicate Keys Issue

Interestingly, the Prof gives a Caveat Emptor during the course that Many textbook implementations of Quick Sort go quadratic if array

  • Is sorted or reverse sorted.
  • Has many duplicates (even if randomized!)

The key observation here is that Quick sort will go quadratic unless partitioning stops on equal keys! :

–>
Mistake. Put all items equal to the partitioning item on one side.
Consequence. 0.5 N^2 compares when all keys equal.

–>
Recommended. Stop scans on items equal to the partitioning item.
Consequence. ~N lg N compares when all keys equal.

–>
Desirable. Put all items equal to the partitioning item in place.

One can try to interpret the above three difference cases by considering a simple example, say for an array of size 6 and every key is of the same value 1. Try to derive the number of compares in the above three different cases.

So here is the C++ implementation and I will explain the details based on my own understanding about it.

void sort3Way(int naryNumbers[], int nLo, int nHi) {
	if (nHi <= nLo) return ;
	int nLt = nLo, nGt = nHi;
	int nPivot = naryNumbers[nLo];
	int i = nLo;

	while (i <= nGt) {
		if (naryNumbers[i] == nPivot)
			++i;
		else if (naryNumbers[i] > nPivot)
			std::swap(naryNumbers[i], naryNumbers[nGt--]);
		else {
			std::swap(naryNumbers[i++], naryNumbers[nLt++]);
		}
	}
	sort3Way(naryNumbers, nLo, nLt - 1);
	sort3Way(naryNumbers, nGt + 1, nHi);
}

Let’s try to understand the code: we keep the invariant [0, nLt) which contains strict smaller keys, (nGt, nLen -1] contains strictly larger keys, [nLt, i) contains equal keys, this is the so-called 3-way partition

[i, nGt] contains the items to be processed (i.e., not processed yet), where i is a pointer scanning through [nLt, nGt] initially, in order to keep the invariant:

  1. when a[i] < nPivot, we need to swap a[nLt] and a[i] first to put all the smaller keys on the left side, this breaks [nLt, i) invariant, so we do nLt++ and i++ together to keep it, note that before i++, a[i] holds the key equal to the pivot after the swapping and that element is already processed, so we can move i to process the next one, this is different from the case next
  2. when a[i] > nPivot, we need to swap a[nGt] and a[i] to put the larger values on the right side first, this breaks the invariant for (nGt, nLen – 1], so we do nGt–, note different from step 1, right now a[i] still holds an item not yet processed, so do not move i in this case, we do i++ in the previous case because after swapping, a[i] holds a processed item.
  3. when a[i] == nPivot, just move i++, everything is OK.

This process is finished after the range [i, nGt] (items not processed yet) becomes empty: while (i <= nGt).

Remarks:

Easier than 2-way partition ha? Indeed, one could see the code actually becomes more compact than the 2-way partition and the while loop is easier to implement or interpret than the two embedded while loop in 2-way partition.

However, I want to say 2-way partition has its own advantage, To my knowledge, 3-way partition cannot be adapted to solve Selection (return the kth largest/smallest item in an array) intuitively because the kth element will fail to be identified if it happened to be among the duplicate keys, for example: 1 1 2 2 2 3 3, obviously, if we want to select 3nd element, i.e., a[3] in the sorted sequence, the above 3-way implementation do not have a natural way to return such an index while 2-way partition is easy to understand, it simply put the returned j-th item in its right position. So 2-way partition could be directly adapted to solve the Selection problem which I will talk about in my next post: “Quick Select Recap: 6 Different Methods, Median of Medians Algorithm“.

Summary

Handling duplicates is tricky, while 3-way partition impoves the efficinecy of Quick Sort by putting all the items equal to the partitioning item in place. Here in this post, I just summaried and explained the coding details about 3-way partition hopefully could help the readers to understand this algorithm better. Any comments or corrrecttion are quite welcome!


(Please specify the source  烟客旅人 sigmainfy — http://www.sigmainfy.com  as well as the original permalink

URL for any reprints,  and please do not use it for commercial purpose)

Written on October 22, 2014