Two Sum Problem Analysis 1: Sort and Hash with unique solution

Overview

We analyze how to solve the two sum problem with assumption that there is only one unique solution:

  1. using sort and head/tail pointers with O(NlogN) time complexity

  2. using hash table with O(N) time complexity.

Unique solution is quite a big assumption and we will continue to discuss further more about the two sum problem when we cannot assume unique solution exists (could be zero, one or more than one such solutions) in another two posts: “Two Sum Problem Analysis 2: Not Unique and No Duplicates” and “Two Sum Problem Analysis 3: Handling Duplicates Input” with the duplicates included in the input or not (copyright @sigmainfy). I hope to give a thorough and insightful analysis and any comments are quite welcome and appreciated!

Two Sum Problem Definition

The two sum problem here requires that we return the index pair rather than value pairs (note the difference from 3sum problem ), the exact problem is:

Given an array of integers, find two numbers such that they add up to a specific target number. Write a function twoSum that should return indices of the two numbers such that they add up to the target, where index1 must be less than index2. You may assume that each input would have exactly one solution. Input: numbers={2, 7, 11, 15}, target=9 Output: index1=1, index2=2. Please note in the example the returned answers (both index1 and index2) are not zero-based.

Two Sum Solution Analysis

As described above, we can assume each input have exactly one unique solution. This simplifies a lot of messy stuff. We discuss the two basic popular solutions which are probably quite well known to a lot of people already and we leave the analysis of more complicated cases in later posts.

1) Brute force: for each pair (i, j) in the input, we test if A[i] + A[j] = target, a nested for loop like the following could easily solve the problem. The time complexity is not hard to figure: O(N^2)

for (int i = 0; i < len; ++i)
    for (int j = i; j < len; ++j)
        if (A[i] + A[j] == target) {
            ...
        }

2) Sort method: sorting the input gives the array a good property to keep and  good indication for moving the head and tail pointers. Here is the detailed steps.

  1. Sort the input in increasing order
  2. make two pointers i, j pointing at the head and tail elements of the sorted array
  3. by comparing the related order between the target and the sum of the two elements pointed by the current head/tail pointers, we decide how to move the pointers to test next  (copyright @sigmainfy) potential pairs
  4. If the target is bigger, we move the head pointer forward by one step and thus try to increase the sum, if the target is smaller we move the tail head towards the head by one step and thus try to decrease the sum a bit, if target is equal to the sum, then we are done by the assumption that there is only one such pair
  5. we repeat step 3 and 4 until i >= j or we find a solution

So as we could see, the time complexity for this approach would be O(NlogN). The following is the source code which is acccepted by leetcode OJ for your reference:

vector<int> TwoSum(vector<int> &numbers, int target) {
		vector<int> idxes(2, -1);
		vector<pair<int, int>> number_structs;
		int len = numbers.size();
		for (int i = 0; i < len; ++i)
			number_structs.push_back(make_pair(numbers[i], i));

		sort(number_structs.begin(), number_structs.end());
		int i = 0, j = number_structs.size() - 1, sum = 0;
		while (i < j) {
			sum = number_structs[i].first + number_structs[j].first;
			if (sum < target) ++i;
			else if (sum > target) --j;
			else {
				int reali = number_structs[i].second + 1, realj = number_structs[j].second + 1;
				idxes[0] = min(reali, realj);
				idxes[1] = max(reali, realj);
				break;
			}
		}
		return idxes;
	}

3) Hash method: well, you probably already know, there is a linear time method by using hash techniques. Since it is assumed there is a unique solution which simplifies a lot of things to make hash method easy to come up with and to code it up, later in another post we will discuss the cases where we cannot assume the unique solution and duplicates in the input could cause problems of using hash, we will also give amends for that in another post.

Here in this basic case, the algorithm is not complicated: we do a first linear scan to insert the elements into hash table with (value, index) pairs, we then do another linear scan to process each element A[i], check whether target – A[i] is in the hash table or not, since we use hash, the operation for each element is constant, and thus the total time complexity would be O(N), the following is the accepted code by leetcode OJ:

vector<int> TwoSum(vector<int> &numbers, int target) {
		vector<int> idxes(2, -1);
		unordered_map<int, int> number_indexes;
		int len = numbers.size();
		for (int i = 0; i &lt; len; ++i) number_indexes[numbers[i]] = i;
		for (int i = 0; i &lt; len; ++i) {
			auto it = number_indexes.find(target - numbers[i]);
			if (it != number_indexes.end()) {
				if (it->second <= i) continue;
				idxes[0] = i + 1;
				idxes[1] = it->second + 1;
				break;
			}
		}
		return idxes;
	}

Conclusion

To summarize, we give analysis of the sort and hash method for the classic two sum problem with the assumption that there is only one unique solution. I hope this post could give those who are new to this problem some basic ideas or those who already know the basic solutions some recap  (copyright @sigmainfy) . This post could be regarded as a preparation for further deeper analysis of the same two sum problem but without the unique solution assumption where things could be messy and complicated.

Written on September 21, 2014