LeetCode Anagram: Hashing and Sorting

Overview

LeetCode Anagram: Sort each string and hash each string into the hash table keyed at the sorted form so all the anagrams would be hashed into the same place.

LeetCode Anagram

Given an array of strings, return all groups of strings that are anagrams.

Note: All inputs will be in lower-case.

Solution and Precautions: Hashing and Sorting

First, note what is Anagrams, you can google search it and let’s claim that word A and B is called anagram to each other if we can get word B(A) from A(B)  by rearranging the letters of A(B) to produce B(A), using all the original letters exactly once.

Then, one straightforward way is to compare each pair of words and to see if they are anagrams, my first try is just like this, starting from the first word, I search through all the remaining words and get its corresponding anagrams (to the first word), of course, all such anagrams we found for the first word won’t be anagrams to any other words so we don’t have to consider them for further check. This method could pass the small data set test but will get TLE for large dataset test. The time complexity is O(N^2 * K log K) where N is the number of words and K is the length of the longest word, or you may say the average length of the words.

Finally, if we think deeper into this, we will find that we actually don’t have to do the N^2 comparisons at all. The key observation is that A and B is anagram to each other if and only if their sorted form are exactly the same. As a result, one linear scan through the words list is enough, for each word we can get its sorted form in K log K time, and we can use map to store the groups of words which are in the same sorted form. The time complexity is O(N K log K), this approach could pass both small data test and large data test. The following code implements this idea and can pass the LeetCode OJ for this Anagrams problem:

public class Solution {
    public List<String> anagrams(String[] strs) {
        List<String> results = new ArrayList<String>();
        Map<String, List<String>> groups = new HashMap<String, List<String>>();

        for (String s : strs) {
            char [] anagram = s.toCharArray();
            Arrays.sort(anagram);
            String sortedS = new String(anagram);

            if (null != groups.get(sortedS))
                groups.get(sortedS).add(s);
            else {
                List<String> g = new ArrayList<String>();
                g.add(s);
                groups.put(sortedS, g);
            }
        }

        for (List<String> l : groups.values()) {
            if (l.length() > 1)
            results.addAll(l);
        }
        return results;
    }
}

A final note that the LeetCode OJ for this Anagrams problem description is really confusing, we only need to return the anagrams if we find more than one such anagrams as the following test cases indicate:

Input:  [""]
Output: [""]
Expected:   []

Input:  ["a"]
Output: ["a"]
Expected:   []

Summary

LeetCode Anagram: Sort each string and hash each string into the hash table keyed at the sorted form so all the anagrams would be hashed into the same place.

Written on January 11, 2013