Monday, January 23, 2012

Dynamic Programming: Longest common substring

Note: This is different from longest common subsequence. Longest common string is continuous while
the longest common subsequence is not.


subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements. For example, the sequence  \langle A,B,D \rangle  is a subsequence of  \langle A,B,C,D,E,F \rangle .
Given two sequences X and Y, a sequence G is said to be a common subsequence of X and Y, if G is a subsequence of both X and Y. For example, if
 X = \langle A,C,B,D,E,G,C,E,D,B,G \rangle  and
 Y = \langle B,E,G,C,F,E,U,B,K \rangle
then a common subsequence of X and Y could be
 G = \langle B,E,E \rangle.
This would not be the longest common subsequence, since G only has length 3, and the common subsequence  \langle B,E,E,B \rangle  has length 4. The longest common subsequence of X and Y is  \langle B,E,G,C,E,B \rangle .



 public class LCS{  
      public static String getLCS(String str1, String str2) {  
           StringBuilder sb = new StringBuilder();  
           if (str1 == null || str1.isEmpty() || str2 == null || str2.isEmpty())  
                return "";  
           // ignore case  
           str1 = str1.toLowerCase();  
           str2 = str2.toLowerCase();  
           // java initializes them already with 0  
           int[][] c = new int[str1.length()][str2.length()];  
           int maxlen = 0;  
           int lastSubsBegin = 0;  
           for (int i = 0; i < str1.length(); i++) {  
                for (int j = 0; j < str2.length(); j++) {  
                     if (str1.charAt(i) == str2.charAt(j)) {  
                          if ((i == 0) || (j == 0))  
                               c[i][j] = 1;  
                          else  
                               c[i][j] = 1 + c[i - 1][j - 1];  
                          if (c[i][j] > maxlen) {  
                               maxlen = c[i][j];  
                               int thisSubsBegin = i - c[i][j] + 1;  
                               if (lastSubsBegin == thisSubsBegin) {  
                                    //if the current LCS is the same as the last time this block ran  
                                    sb.append(str1.charAt(i));  
                               } else {  
                                    //this block resets the string builder if a different LCS is found  
                                    lastSubsBegin = thisSubsBegin;  
                                    sb = new StringBuilder();  
                                    sb.append(str1.substring(lastSubsBegin, i + 1));  
                               }  
                          }  
                     }  
                }  
           }  
           System.out.println("The DP table is: ");  
           for (int i=0; i<c.length; i++) {  
                for (int j=0; j<c[i].length; j++) {  
                     System.out.print(" " + c[i][j]);  
                }  
                System.out.println("");  
           }  
           return sb.toString();  
      }  
      public static void main(String args[]){  
           LCS m = new LCS();  
           System.out.println("The LCS is "+m.getLCS("complexity","xiterty"));  
      }  
 }  

No comments:

Post a Comment