LeetCode Edit Distance: DP with O(N) Space

Overview

LeetCode Edit Distance: use DP to derive the recursive formula F(i, j), i.e., the edit distance for prefixes ended at i and j, only O(N) space is needed.

LeetCode Edit Distance

Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step.)

You have the following 3 operations permitted on a word:

a) Insert a character
b) Delete a character
c) Replace a character

Solution and Precautions: DP with O(N) Space

This could be solve by DP, let F(i, j) denote the edit distance between substring1 = word1[0]…word[i-1] and substring2 = word2[0]…word2[j-1], then F(m, n)  where m is the length of word1 and n is the length of word2 is our answer, we have the following recursive definition for F:

(1) if word1[i-1] == word2[j-1], F(i, j) = F(i-1, j-1)

(2) if word1[i-1] != word2[j-1],  F(i, j) = 1 + min{F(i-1, j-1), F(i-1, j), F(i, j-1)}

The direct implementation is as follows which could pass the LeetCode OJ for this Edit Distance problem after which I will describe how the memory cost could be reduced to O(N) rather than O(M * N):

class Solution {
    public:
        int minDistance(string word1, string word2) {
            int m = word1.size();
            int n = word2.size();

            vector<vector> dp(m + 1, vector(n + 1, 0));
            for (int i = 1; i <= m; ++i) dp[i][0] = i;
            for (int j = 1; j <= n; ++j) dp[0][j] = j;

            for (int i = 1; i <= m; ++i)
                for (int j = 1; j <= n; ++j)
                    dp[i][j] = (word1[i-1] == word2[j-1]) ? dp[i-1][j-1]
                               : 1 + min(min(dp[i-1][j], dp[i][j-1]), dp[i-1][j-1]);
            return dp[m][n];
        }
};

The key fact is that, we see dp[i][j] only depends on three previous values, that is dp[i][j-1], dp[i-1][j] and dp[i-1][j-1], so there is actually no need to store all the previous values. By treating updating the 2-D array dp row by row, we only need one row storage plus one additional value as for dp[i-1][j-1], then the array could update itself iteratively and correctly, the following code with memory optimization again could pass the LeetCode OJ for this Edit Distance problem and the code is more compact:

class Solution {
    public:
        int minDistance(string word1, string word2) {
            int m = word1.size();
            int n = word2.size();

            vector dp(n + 1, 0);
            int prev1 = 0;
            int prev2 = 0;

            dp[0] = 0;
            for (int j = 1; j <= n; ++j)
                dp[j] = j;

            for (int i = 1; i <= m; ++i) {
                prev1 = dp[0];
                dp[0] = i;
                for (int j = 1; j <= n; ++j) {
                    prev2 = dp[j];
                    dp[i][j] = (word1[i-1] == word2[j-1]) ? prev1 : 1 + min(min(dp[j], dp[j-1]), prev1);
                    prev1 = prev2;
                }
            }
            return dp[n];
        }
};

Relation to Longest Common Subsequence (LCS)

This is from Wiki:

For two strings X_{1 \dots m} and Y_{1 \dots n}, the length of the shortest common supersequence is related to the length of the LCS by

\left|SCS(X,Y)\right| = n + m - \left|LCS(X,Y)\right|.

The edit distance when only insertion and deletion is allowed (no substitution), or when the cost of the substitution is the double of the cost of an insertion or deletion, is:

d'(X,Y) = n + m - 2 \cdot \left|LCS(X,Y)\right|.

Summary

LeetCode Edit Distance: use DP to derive the recursive formula F(i, j), i.e., the edit distance for substring ended at i and j, only O(N) space is needed.

Written on June 4, 2013