# 3Sum Problem Analysis: Handling Duplicates Without Using Set

## Overview

We discuss the 3sum problem here and give a sorting approach with time complexity O(N^2) to generate all the triplets without using set to remove the repeated triplets at the end. Approach based on hash would be touched a bit too and an open question about using hash to handle duplicates in the input would be raised (copyright @sigmainfy) which for now I haven’t found a good solution to resolve. Hope the readers enjoy and any comments are highly appreciated especially on the questions that are raised here.

## 3Sum Problem Definition

Given an array S of n integers, are there elements abc in S such that a + b + c = 0? Find all unique triplets in the array which gives the sum of zero.

Note:

• Elements in a triplet (a,b,c) must be in non-descending order. (ie, a ≤ b ≤ c)
• The solution set must not contain duplicate triplets.

For example, given array S = {-1 0 1 2 -1 -4}, A solution set is: (-1, 0, 1) (-1, -1, 2). Note the triplets consists of real values in the input rather than the index of the input numbers

## 3Sum Solution Analysis

As I have mentioned a bit in previous two sum problem analysis “Two Sum Problem Analysis 3: Handling Duplicates Input“: We need to distinguish VALUE pairs and INDEX pair (To see more on two sum problems, also check these two posts: post 1, post 2). It is a bit different context here in 3sum, we need to return triplets consisting of the input values rather than the index as the two sum problem requires previously. Of course, you can do the same thing in two sum problem to find value pairs too, but let’s just try something in a different context, and you need to keep this difference in mind, otherwise, you might get confused.

The general idea would be to convert 3 sum into 2 sum problem: pick one value A[i] from the input array, the next would be find the two two sum -A[i] in the rest of the array. And the detailed steps are:

1. Sort the whole input array by increasing order: O(NlogN)
2. Linear scan the sorted array, say the current element to process is A[i], check if A[i] is the same with the previous one, only when they are different we go the next steps.
3. Treat A[i] as the first value in the potential triplet, and solve a two sum problem in the rest of the input starting from i+1 with the target as -A[i], similar trick as in step 2 should be performed to avoid repeated triplets.

Remarks: Several noteworthy things from the above steps are:

• The total time complexity would be O(N^2) rather than O(N^2logN) because we sort the array for only one time at the beginning, many people will interpret it in a wrong way by converting 3 sum into 2 sum problems, that is, they convert it in a simple way, and do a “complete” 2 sum every time for each element in the array.
• Everytime we pick A[i] in step three, we only need to do 2 sum in the rest of the input starting from i + 1, think about why?
• We avoid repeated triplets before generating any of them by comparing the current value A[i] with the previous one A[i-1], this is the issue which bothered me for quite a while (I raised this in my earlier post in Chinese). Now we indeed managed to remove actually avoid duplicates (copyright @sigmainfy) in the final results without using the set data structure.
• As noted in the problem description, it says the solution set must not contain duplicate triplets, a naive and straightforward way to do this is to use set to do a post-filtering after all the triplets are generated. We managed to avoid this in this post.

The following is the accepted code by leetcode OJ:

Be careful about the implementation and the trick in step 2, otherwise, you might get an Output Limit Exceeded error which means you got duplicates result.

## The Hash based Approach Recap

The sort approach above already gives a quite elegant implementation and the time complexity is quite acceptable too. Since two sum problem got a hash based approach which makes it even faster and it could be solved in linear time, it is tempting to try out hash in 3 sum too which actually turns out to make things much more complicated. I show several difficulties first:

1. simple hash won’t work: the idea is to use a hash table to store all the unique values in the input, each value appears only once in the hash table. This won’t work because it cannot distinguish cases like:  1 1 1 5 with target 3, 1 1 5 with target 3, the first has a triplet (1, 1, 1) but the second does not have any solutions.
2. need a separate array with duplicates removed as the new input: do not iterate through the hash table directly, for unordered hash table, to iterate all the elements in the table the time complexity would be O(M) where M is the size of the table and M could be much larger than N.
3. duplicates indeed is very very troublesome!  For example, because of the difficulty in 1), I would come up with the same technique given in ”Two Sum Problem Analysis 3: Handling Duplicates Input“, so each key in the hash table would uniquely identify a list of the index in the original input that contains the key value. Similar with the sort approach, we first pick a value say 4, and we need to find two sum pair with target -4 in the rest of array. When we perform two sum, if the target happen to be equal to the sum of two same values, they will be identified at the same key slot in the hash table, in this case, only when the list of index contains >= 2 elements, it is OK to form the pair. Say one entry in the hash table is (-2, 0 -> 3 -> 5) which means in the original input, A[0] = A[3] = A[5] = -2, and in our case the target for two sum is -4, then we should be able to produce this triplet.

Open Question:

Well, when we reach here, at least to me myself, things are already very complicated. Although in my mind, I think it is still solvable to use hash as long as we resolve the difficulties I summarized above (and I indeed give solutions to solve each of them!), the approah based on hash to solve 3 sum problem could be implemented anyways,  However, I really don’t think it is a good idea to be used to handle 3 sum problems with duplicate input.

So here I raise this open question, I hope it is just myself that think too much or consider it in an unnecessarily complicated way, are there any more elegant, clear, consistent implementation for (copyright @sigmainfy) hash approach with all my concerns above resolved? Any comments or help are highly appreciated!!!

## Summary:

To summarize, I discussed and gave a sorting approach in time complexity O(N^2) to solve the 3sum problem without using set to filter repeated triplets at the end. Approach based on hash is discussed too and several difficulties and concerns are raised up. This is an instance of the general K sum problem and later on I will discuss 4sum and more generally K sum too.

Written on September 22, 2014