# Quick Sort Recap: 3-way Partition, Handling Duplicate Keys

## Overview

Handling duplicates is tricky, while 3-way partition improves the efficiency of Quick Sort by putting all the items equal to the partitioning item in place.

For equal keys, it is (counter-intuitively) better to stop on keys equal to the partitioning item’s key; An even more desirable way is to put all items equal to the partitioning item in place this is what 3-way partition does here.

This is quick recap about 3-way partition for Quick Sort after learning the material from Algorithm Part 1 from Coursera.

## The Duplicate Keys Issue

Interestingly, the Prof gives a Caveat Emptor during the course that Many textbook implementations of Quick Sort go quadratic if array

- Is sorted or reverse sorted.
- Has many duplicates (even if randomized!)

The key observation here is that Quick sort will go quadratic unless partitioning stops on equal keys! :

–>

Mistake. Put all items equal to the partitioning item on one side.

Consequence. 0.5 N^2 compares when all keys equal.

–>

Recommended. Stop scans on items equal to the partitioning item.

Consequence. ~N lg N compares when all keys equal.

–>

Desirable. Put all items equal to the partitioning item in place.

One can try to interpret the above three difference cases by considering a simple example, say for an array of size 6 and every key is of the same value 1. Try to derive the number of compares in the above three different cases.

So here is the C++ implementation and I will explain the details based on my own understanding about it.

void sort3Way(int naryNumbers[], int nLo, int nHi) { if (nHi <= nLo) return ; int nLt = nLo, nGt = nHi; int nPivot = naryNumbers[nLo]; int i = nLo; while (i <= nGt) { if (naryNumbers[i] == nPivot) ++i; else if (naryNumbers[i] > nPivot) std::swap(naryNumbers[i], naryNumbers[nGt--]); else { std::swap(naryNumbers[i++], naryNumbers[nLt++]); } } sort3Way(naryNumbers, nLo, nLt - 1); sort3Way(naryNumbers, nGt + 1, nHi); }

Let’s try to understand the code: we keep the invariant [0, nLt) which contains strict smaller keys, (nGt, nLen -1] contains strictly larger keys, [nLt, i) contains equal keys, this is the so-called 3-way partition

[i, nGt] contains the items to be processed (i.e., not processed yet), where i is a pointer scanning through [nLt, nGt] initially, in order to keep the invariant:

- when a[i] < nPivot, we need to swap a[nLt] and a[i] first to put all the smaller keys on the left side, this breaks [nLt, i) invariant, so we do nLt++ and i++ together to keep it, note that before i++, a[i] holds the key equal to the pivot after the swapping and that element is already processed, so we can move i to process the next one, this is different from the case next
- when a[i] > nPivot, we need to swap a[nGt] and a[i] to put the larger values on the right side first, this breaks the invariant for (nGt, nLen – 1], so we do nGt–, note different from step 1, right now a[i] still holds an item not yet processed, so do not move i in this case, we do i++ in the previous case because after swapping, a[i] holds a processed item.
- when a[i] == nPivot, just move i++, everything is OK.

This process is finished after the range [i, nGt] (items not processed yet) becomes empty: while (i <= nGt).

**Remarks:**

Easier than 2-way partition ha? Indeed, one could see the code actually becomes more compact than the 2-way partition and the while loop is easier to implement or interpret than the two embedded while loop in 2-way partition.

However, I want to say 2-way partition has its own advantage, To my knowledge, 3-way partition cannot be adapted to solve Selection (return the kth largest/smallest item in an array) intuitively because the kth element will fail to be identified if it happened to be among the duplicate keys, for example: 1 1 2 2 2 3 3, obviously, if we want to select 3nd element, i.e., a[3] in the sorted sequence, the above 3-way implementation do not have a natural way to return such an index while 2-way partition is easy to understand, it simply put the returned j-th item in its right position. So 2-way partition could be directly adapted to solve the Selection problem which I will talk about in my next post: “Quick Select Recap: 6 Different Methods, Median of Medians Algorithm“.

## Summary

Handling duplicates is tricky, while 3-way partition impoves the efficinecy of Quick Sort by putting all the items equal to the partitioning item in place. Here in this post, I just summaried and explained the coding details about 3-way partition hopefully could help the readers to understand this algorithm better. Any comments or corrrecttion are quite welcome!