#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2012
    Posts
    36
    Rep Power
    7

    Fuzzy Regex Pattern Matching


    Hi, I am having some trouble finding an appropriate library for what I am trying to do in Java. I need to be able to perform approximate regex pattern matching in order to extract variables from a string. Let me provide an example:

    Suppose we have a model, say:
    "my name is <name> and I am <age> years old"

    This model would be converted to a standard Regex expression using groups to specify the variables, so something like:
    "my name is (\\p{Print}*) and I am (\\p{Print}*) years old"

    Then if we were given an input string:
    "name is John and I'm 25 years old"

    This does not match exactly but it is close, using normal Regex, there is no way to isolate "John" as <name> and "25" as <age> because the input doesn't technically match the expression. I have previously used TRE to solve this problem but it is specific to C and I want something native to Java. Are there any libraries out there that would be able to do this type of pattern matching? Note I have already looked at FREJ and it does not have the functionality I need.

    Alternatively, can anyone think of some way other than using Regex to solve this problem?

    Thanks.
    Last edited by ZGorlock; May 10th, 2016 at 07:38 PM.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2012
    Posts
    36
    Rep Power
    7
    I figured out a solution for this problem so I thought I would reply in case anyone else has the same problem in the future.

    I decided not to use regex but instead use a minimum string distance algorithm with 'dont care' characters allowed which serve as variables in this case.

    Here is the code:

    Code:
            final char varChar = '*';
            String pattern = "ab*fgh*l"; //pattern
            String text = "abcdefghijkl"; //text
    
            //remove double vars
            while (pattern.contains("**")) {
                pattern = pattern.replaceAll("\\*\\*", "*");
            }
    
            int m = pattern.length(); //i
            int n = text.length(); //j
    
            int g = 0; //# of dc chars
            int v = -1;
            for (int pi = 0; pi < m; pi++) {
                if (pattern.charAt(pi) == varChar) {
                    g++;
                    if (v == -1)
                        v = pi;
                }
            }
            if (g == 0)
                v = m + 1;
    
            int[][] D = new int[m + 1][n + 1];
            for (int j = 0; j <= n; j++)
                D[0][j] = 0;
            for (int i = 0; i <= v; i++)
                D[i][0] = i;
            for (int i = v + 1; i <= m; i++)
                D[i][0] = Integer.MAX_VALUE;
    
            for (int i = 1; i <= m; i++) {
                if (pattern.charAt(i - 1) == varChar) {
                    for (int j = 0; j <= n; j++)
                        D[i][j] = Integer.MAX_VALUE;
                    for (int j = 0; j <= n; j++) {
                        for (int h = j; h <= n; h++) { //assuming dont care set could span from 0 chars to the rest of the string
                            if (h <= n && D[i - 1][j] < D[i][h]) {
                                for (h = h; h <= n; h++)
                                    D[i][h] = D[i - 1][j];
                            }
                        }
                        if (D[i][n] == 0) //no need to continue this loop
                            break;
                    }
                } else {
                    for (int j = 1; j <= n; j++) {
                        int match = (pattern.charAt(i - 1) == text.charAt(j - 1)) ? 1 : 0;
                        D[i][j] = 1 + Math.min(Math.min(D[i - 1][j], D[i][j - 1]),
                                        D[i - 1][j - 1] - match);
                    }
                }
            }
    
            System.out.println("Pattern: " + pattern);
            System.out.println("Text:    " + text);
    
            System.out.println("The minimum edit distance is: " + D[m][n]);
    
            String[] vars = new String[g]; //stores the extracted variables
    
            int row = m; //pattern
            int col = n; //text
            int var = g - 1; //variable counter
            while (row > 1 || col > 1) { //start at bottom right corner, move to upper left corner
                if (pattern.charAt(row - 1) == varChar) { //the current row is a *
                    vars[var] = "";
                    while (col >= 1) { //move left to digest variable
                        if (D[row][col - 1] == D[row][col]) {
                            vars[var] = text.charAt(col - 1) + vars[var];
                            col--;
                        }
    
                        if (row > 1 && col >= 1) {
                            if (D[row][col - 1] != D[row][col]) { //stop at the next to last col in the sequence
                                int min = Math.min(D[row - 1][col], D[row - 1][col - 1]); //first move out of * row
                                if (D[row - 1][col] == min) {
                                    row--; //move up
                                }
                                else { //D[row - 1][col - 1] == min
                                    row--; //move diagonally up and left
                                    col--;
                                }
    
                                break; //out of variable
                            }
                        }
                    }
                    var--;
                }
                else if (pattern.charAt(row - 2) == varChar) { //if the next row is a *
                    col--; //move diagonally up and left
                    row--;
                }
                else {
                    if (row > 1) {
                        if (col >= 1) {
                            int min = Math.min(Math.min(D[row - 1][col], D[row - 1][col - 1]), //move
                                    D[row][col - 1]);
                            if (D[row - 1][col] == min) {
                                row--; //move up
                            } else if (D[row - 1][col - 1] == min) {
                                row--; //move diagonally up and left
                                col--;
                            } else { //D[row][col - 1] == min
                                col--; //move left
                            }
                        }
                        else {
                            row--; //move up
                        }
                    }
                    else {
                        col--; //move left
                    }
                }
            }
    
            System.out.println("Extracted Variables:");
            for (int i = 0; i < g; i++) {
                System.out.println("    " + vars[i]);
            }

    Comments on this post

    • Will-O-The-Wisp agrees : Thanks!
  4. #3
  5. Not An Expert
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2015
    Posts
    404
    Rep Power
    4
    Thank you very much for sharing your solution!
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2012
    Posts
    36
    Rep Power
    7
    I have updated this algorithm to return all optimal variable extraction paths instead of just the first one it detects. Below is an example usage:

    Code:
    import java.util.ArrayList;
    import java.util.List;
    import java.util.Objects;
    
    public class FuzzyEditDistance
    {
        
        /**
         * The character used to specify a variable in a string compare.
         */
        public static final String varChar = "";
        
        /**
         * The current index in the vars and tokens list to store extracted vars and tokens for extractVariables().
         *
         * @see #extractVariables(String, String, int[][], int, int, List, List)
         */
        private static int extractionDepth = 0;
        
        
        /**
         * Main method.
         *
         * @param args Arguments to main method.
         */
        public static void main(String[] args)
        {
            String source = "What the hell are lobsters";
            String target = " s";
            
            List<List<String>> vars = new ArrayList<>();
            List<List<String>> tokens = new ArrayList<>();
            
            double score = stringCompare(target, source, vars, tokens, false);
            System.out.println("Score: " + score);
            System.out.println();
            
            System.out.println("---");
            for (int i = 0; i < vars.size(); i++) {
                System.out.println("Extraction #" + i);
                
                System.out.println("  Variables:");
                for (String var : vars.get(i)) {
                    System.out.println("    '" + var + "'");
                }
                System.out.println("  Tokens:");
                for (String token : tokens.get(i)) {
                    System.out.println("    '" + token + "'");
                }
                
                System.out.println("---");
            }
        }
        
        
        /**
         * Compares the closeness of two strings and extracts variables and non-variables is applicable.
         *
         * @param target     The string to compare against.
         * @param source     The string to compare.
         * @param vars       The list to store extracted variables in. This will contain a list of all optimal variable extraction sets.
         * @param tokens     The list to store extracted non-variable tokens in. This will contain a list of all optimal non-variable token extraction sets.
         * @param ignoreCase Whether to ignore case of not.
         * @return The closeness of the two strings, between 0 and 1.
         */
        public static double stringCompare(String target, String source, List<List<String>> vars, List<List<String>> tokens, boolean ignoreCase)
        {
            return (target.isEmpty() || source.isEmpty()) ? 0 : ((double) (target.length() - stringEditDistance(target, source, vars, tokens, ignoreCase)) / target.length());
        }
        
        /**
         * Calculates the distance between two strings with variable collection.
         *
         * @param pattern    The pattern to compare the string against.
         * @param text       The string to compare.
         * @param vars       The list to store extracted variables in. This will contain a list of all optimal variable extraction sets.
         * @param tokens     The list to store extracted non-variable tokens in. This will contain a list of all optimal non-variable token extraction sets.
         * @param ignoreCase Whether to ignore case or not.
         * @return The edit distance between the two strings.
         */
        public static int stringEditDistance(String pattern, String text, List<List<String>> vars, List<List<String>> tokens, boolean ignoreCase)
        {
            String doubleVarChar = varChar + varChar;
            
            while (pattern.contains(doubleVarChar)) {
                pattern = pattern.replaceAll(doubleVarChar, varChar); //remove double vars
            }
            
            int m = pattern.length(); //i
            int n = text.length(); //j
            
            int g = 0; //number of variables
            int v = -1; //index of first variable
            for (int pi = 0; pi < m; pi++) {
                if (Objects.equals(varChar, pattern.substring(pi, pi + 1))) {
                    g++;
                    if (v == -1) {
                        v = pi;
                    }
                }
            }
            if (g == 0) {
                v = m;
            }
            
            int[][] D = new int[m + 1][n + 1];
            
            for (int j = 0; j <= n; j++) {
                D[0][j] = 0;
            }
            for (int i = 0; i <= v; i++) {
                D[i][0] = i;
            }
            for (int i = v + 1; i <= m; i++) {
                D[i][0] = Integer.MAX_VALUE;
            }
            
            for (int i = 1; i <= m; i++) {
                if (Objects.equals(varChar, pattern.substring(i - 1, i))) {
                    for (int j = 0; j <= n; j++) {
                        D[i][j] = Integer.MAX_VALUE;
                    }
                    for (int j = 0; j <= n; j++) {
                        for (int h = j; h <= n; h++) { //assuming don't care set could span from 0 chars to the rest of the string
                            if ((h <= n) && (D[i - 1][j] < D[i][h])) {
                                for (h = h; h <= n; h++) {
                                    D[i][h] = D[i - 1][j];
                                }
                            }
                        }
                        if (D[i][n] == 0) //no need to continue this loop
                        {
                            break;
                        }
                        
                        printDmatch(pattern, text, D, i, j);
                    }
                } else {
                    for (int j = 1; j <= n; j++) {
                        int matchChar;
                        if (ignoreCase) {
                            matchChar = (Character.toUpperCase(pattern.charAt(i - 1)) == Character.toUpperCase(text.charAt(j - 1))) ? 1 : 0;
                        } else {
                            matchChar = (pattern.charAt(i - 1) == text.charAt(j - 1)) ? 1 : 0;
                        }
                        D[i][j] = 1 + Math.min(Math.min(D[i - 1][j], D[i][j - 1]),
                                D[i - 1][j - 1] - matchChar);
                        
                        printDmatch(pattern, text, D, i, j);
                    }
                }
            }
            
            if (tokens == null) {
                tokens = new ArrayList<>();
            }
        
            printDmatch(pattern, text, D);
            
            if ((vars != null) && (g > 0)) //if there are variables
            {
                extractVariables(pattern, text, D, pattern.length(), text.length(), vars, tokens);
            }
            
            return D[m][n];
        }
        
        /**
         * Extracts variables from the array produced by stringEditDistance().
         *
         * @param pattern The pattern to compare the string against.
         * @param text    The string to compare.
         * @param D       The array produced by stringEditDistance().
         * @param r       The row to start extraction from.
         * @param c       The column to start extraction from.
         * @param vars    The list to store extracted variables in. This will contain a list of all optimal variable extraction sets.
         * @param tokens  The list to store extracted non-variable tokens in. This will contain a list of all optimal non-variable token extraction sets.
         */
        private static void extractVariables(String pattern, String text, int[][] D, int r, int c, List<List<String>> vars, List<List<String>> tokens)
        {
            int row = r;
            int col = c;
            
            if ((r == pattern.length()) && (c == text.length())) {
                extractionDepth = 0;
            }
            int i = extractionDepth;
            
            if (vars.size() < (i + 1)) {
                vars.add(new ArrayList<>());
            }
            if (tokens.size() < (i + 1)) {
                tokens.add(new ArrayList<>());
            }
            
            StringBuilder thisVar = new StringBuilder();
            StringBuilder thisToken = new StringBuilder();
            while ((row >= 0) && (col > 0)) { //start at bottom right corner, move to upper left corner
                if ((row > 0) && Objects.equals(varChar, pattern.substring(row - 1, row))) { //the current row is a var char
                    
                    while (col >= 1) { //move left to digest variable
                        if (D[row][col - 1] == D[row][col]) {
                            thisVar.insert(0, text.charAt(col - 1));
                            col--;
                        }
                        
                        if ((row > 1) && (col >= 1)) {
                            if (D[row][col - 1] != D[row][col]) { //stop at the next to last col in the sequence
                                int min = Math.min(D[row - 1][col], D[row - 1][col - 1]); //first move out of * row
                                if (D[row - 1][col] == min) {
                                    row--; //move up
                                } else { //D[row - 1][col - 1] == min
                                    row--; //move diagonally up and left
                                    col--;
                                }
                                break; //out of variable
                                
                            } else if (D[row][col - 1] == D[row - 1][col]) { //detect alternate optimal variable path
                                List<String> popVars = new ArrayList<>(vars.get(i));
                                List<String> popTokens = new ArrayList<>(tokens.get(i));
                                popVars.add(thisVar.toString());
                                vars.add(popVars);
                                tokens.add(popTokens);
                                extractionDepth++;
                                extractVariables(pattern, text, D, row - 1, col, vars, tokens);
                            }
                        }
                        
                        printDmatch(pattern, text, D, row, col);
                    }
                    
                    vars.get(i).add(0, thisVar.toString().trim());
                    thisVar = new StringBuilder();
                    
                } else if ((row > 1) && Objects.equals(varChar, pattern.substring(row - 2, row - 1))) { //if the next row is a var char
                    thisToken.insert(0, text.charAt(col - 1));
                    tokens.get(i).add(0, thisToken.toString());
                    thisToken = new StringBuilder();
                    
                    col--; //move diagonally up and left
                    row--;
                    
                    printDmatch(pattern, text, D, row, col);
                    
                } else {
                    if (row > 0) {
                        int min = Math.min(Math.min(D[row - 1][col], D[row - 1][col - 1]), //move
                                D[row][col - 1]);
                        if (D[row - 1][col] == min) {
                            row--; //move up
                        } else if (col > 0) {
                            if (D[row - 1][col - 1] == min) {
                                thisToken.insert(0, text.charAt(col - 1));
                                row--; //move diagonally up and left
                                col--;
                            } else { //D[row][col - 1] == min
                                thisToken.insert(0, text.charAt(col - 1));
                                col--; //move left
                            }
                        }
                    } else {
                        thisToken.insert(0, text.charAt(col - 1));
                        col--; //move left
                    }
                    
                    if (col == 0) {
                        tokens.get(i).add(0, thisToken.toString());
                    }
        
                    printDmatch(pattern, text, D, row, col);
                }
            }
            
            printDmatch(pattern, text, D);
        }
        
        
        /**
         * Prints out the D matrix used in the stringEditDistance method.
         *
         * @param pattern The pattern to compare the string against.
         * @param text    The string to compare.
         * @param D       The matrix to print.
         * @see #printDmatch(String, String, int[][], int, int)
         */
        private static void printDmatch(String pattern, String text, int[][] D)
        {
            printDmatch(pattern, text, D, -1, -1);
        }
        
        /**
         * Prints out the D matrix used in the stringEditDistance method with an indicator at a location.
         *
         * @param pattern The pattern to compare the string against.
         * @param text    The string to compare.
         * @param D       The matrix to print.
         * @param row     The current row location of the algorithm.
         * @param col     The current column location of the algorithm.
         */
        private static void printDmatch(String pattern, String text, int[][] D, int row, int col)
        {
            int m = pattern.length(); //i
            int n = text.length(); //j
            
            for (int i = 0; i <= text.length(); i++) {
                if (i > 0) {
                    if (i == 1) {
                        System.out.print("   ");
                    }
                    System.out.print(text.charAt(i - 1) + "  ");
                } else {
                    System.out.print("   ");
                }
            }
            System.out.println();
            
            for (int i = 0; i <= m; i++) {
                if (i > 0) {
                    System.out.print(pattern.charAt(i - 1) + " ");
                } else {
                    System.out.print("  ");
                }
                
                for (int j = 0; j <= n; j++) {
                    
                    if ((i == row) && (j == col)) {
                        System.out.print("*");
                    } else {
                        System.out.print(" ");
                    }
                    
                    if (D[i][j] == Integer.MAX_VALUE) {
                        System.out.print("^ ");
                    } else {
                        System.out.print(D[i][j] + " ");
                    }
                    
                }
                System.out.println();
            }
            
            System.out.println();
        }
        
    }
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2012
    Posts
    36
    Rep Power
    7
    Here is another update, there was a logical issue in the previous code causing incorrect distances for certain pattern/text pairs.

    Code:
    import java.util.ArrayList;
    import java.util.List;
    import java.util.Objects;
    
    public class FuzzyEditDistance
    {
        
        /**
         * The character used to specify a variable in a string compare.
         */
        public static final String varChar = "";
        
        /**
         * The current index in the vars and tokens list to store extracted vars and tokens for extractVariables().
         *
         * @see #extractVariables(String, String, int[][], int, int, List, List)
         */
        private static int extractionDepth = 0;
        
        
        /**
         * Main method.
         *
         * @param args Arguments to main method.
         */
        public static void main(String[] args)
        {
            String source = "What the hell are lobsters";
            String target = " s";
            
            List<List<String>> vars = new ArrayList<>();
            List<List<String>> tokens = new ArrayList<>();
            
            double score = stringCompare(target, source, vars, tokens, false);
            System.out.println("Score: " + score);
            System.out.println();
            
            System.out.println("---");
            for (int i = 0; i < vars.size(); i++) {
                System.out.println("Extraction #" + i);
                
                System.out.println("  Variables:");
                for (String var : vars.get(i)) {
                    System.out.println("    '" + var + "'");
                }
                System.out.println("  Tokens:");
                for (String token : tokens.get(i)) {
                    System.out.println("    '" + token + "'");
                }
                
                System.out.println("---");
            }
        }
        
        
        /**
         * Compares the closeness of two strings and extracts variables and non-variables is applicable.
         *
         * @param target     The string to compare against.
         * @param source     The string to compare.
         * @param vars       The list to store extracted variables in. This will contain a list of all optimal variable extraction sets.
         * @param tokens     The list to store extracted non-variable tokens in. This will contain a list of all optimal non-variable token extraction sets.
         * @param ignoreCase Whether to ignore case of not.
         * @return The closeness of the two strings, between 0 and 1.
         */
        public static double stringCompare(String target, String source, List<List<String>> vars, List<List<String>> tokens, boolean ignoreCase)
        {
            return (target.isEmpty() || source.isEmpty()) ? 0 : ((double) (target.length() - stringEditDistance(target, source, vars, tokens, ignoreCase)) / target.length());
        }
        
        /**
         * Calculates the distance between two strings with variable collection.
         *
         * @param pattern    The pattern to compare the string against.
         * @param text       The string to compare.
         * @param vars       The list to store extracted variables in. This will contain a list of all optimal variable extraction sets.
         * @param tokens     The list to store extracted non-variable tokens in. This will contain a list of all optimal non-variable token extraction sets.
         * @param ignoreCase Whether to ignore case or not.
         * @return The edit distance between the two strings.
         */
        public static int stringEditDistance(String pattern, String text, List<List<String>> vars, List<List<String>> tokens, boolean ignoreCase)
        {
            String doubleVarChar = varChar + varChar;
            
            while (pattern.contains(doubleVarChar)) {
                pattern = pattern.replaceAll(doubleVarChar, varChar); //remove double vars
            }
            
            int m = pattern.length(); //i
            int n = text.length(); //j
            
            int g = 0; //number of variables
            int v = -1; //index of first variable
            for (int pi = 0; pi < m; pi++) {
                if (Objects.equals(varChar, pattern.substring(pi, pi + 1))) {
                    g++;
                    if (v == -1) {
                        v = pi;
                    }
                }
            }
            if (g == 0) {
                v = m;
            }
            
            int[][] D = new int[m + 1][n + 1];
            
            for (int j = 0; j <= n; j++) {
                D[0][j] = j;
            }
            for (int i = 0; i <= v; i++) {
                D[i][0] = i;
            }
            for (int i = v + 1; i <= m; i++) {
                D[i][0] = Integer.MAX_VALUE;
            }
            
            for (int i = 1; i <= m; i++) {
                if (Objects.equals(varChar, pattern.substring(i - 1, i))) {
                    for (int j = 0; j <= n; j++) {
                        D[i][j] = Integer.MAX_VALUE;
                    }
                    for (int j = 0; j <= n; j++) {
                        for (int h = j; h <= n; h++) { //assuming don't care set could span from 0 chars to the rest of the string
                            if ((h <= n) && (D[i - 1][j] < D[i][h])) {
                                for (h = h; h <= n; h++) {
                                    D[i][h] = D[i - 1][j];
                                }
                            }
                        }
                        if (D[i][n] == 0) { //no need to continue this loop
                            break;
                        }
                        
                        printDmatch(pattern, text, D, i, j);
                    }
                } else {
                    for (int j = 1; j <= n; j++) {
                        int matchChar;
                        if (ignoreCase) {
                            matchChar = (Character.toUpperCase(pattern.charAt(i - 1)) == Character.toUpperCase(text.charAt(j - 1))) ? 1 : 0;
                        } else {
                            matchChar = (pattern.charAt(i - 1) == text.charAt(j - 1)) ? 1 : 0;
                        }
                        D[i][j] = 1 + Math.min(Math.min(D[i - 1][j], D[i][j - 1]),
                                D[i - 1][j - 1] - matchChar);
                        
                        printDmatch(pattern, text, D, i, j);
                    }
                }
            }
            
            if (tokens == null) {
                tokens = new ArrayList<>();
            }
            
            printDmatch(pattern, text, D);
            
            if ((vars != null) && (g > 0)) { //if there are variables
                extractVariables(pattern, text, D, pattern.length(), text.length(), vars, tokens);
            }
            
            return D[m][n];
        }
        
        /**
         * Extracts variables from the array produced by stringEditDistance().
         *
         * @param pattern The pattern to compare the string against.
         * @param text    The string to compare.
         * @param D       The array produced by stringEditDistance().
         * @param r       The row to start extraction from.
         * @param c       The column to start extraction from.
         * @param vars    The list to store extracted variables in. This will contain a list of all optimal variable extraction sets.
         * @param tokens  The list to store extracted non-variable tokens in. This will contain a list of all optimal non-variable token extraction sets.
         */
        private static void extractVariables(String pattern, String text, int[][] D, int r, int c, List<List<String>> vars, List<List<String>> tokens)
        {
            int row = r;
            int col = c;
            
            if ((r == pattern.length()) && (c == text.length())) {
                extractionDepth = 0;
            }
            int i = extractionDepth;
            
            if (vars.size() < (i + 1)) {
                vars.add(new ArrayList<>());
            }
            if (tokens.size() < (i + 1)) {
                tokens.add(new ArrayList<>());
            }
            
            StringBuilder thisVar = new StringBuilder();
            StringBuilder thisToken = new StringBuilder();
            while ((row >= 0) && (col > 0)) { //start at bottom right corner, move to upper left corner
                if ((row > 0) && Objects.equals(varChar, pattern.substring(row - 1, row))) { //the current row is a var char
                    
                    while (col >= 1) { //move left to digest variable
                        if (D[row][col - 1] == D[row][col]) {
                            thisVar.insert(0, text.charAt(col - 1));
                            col--;
                        }
                        
                        if ((row > 1) && (col >= 1)) {
                            if (D[row][col - 1] != D[row][col]) { //stop at the next to last col in the sequence
                                int min = Math.min(D[row - 1][col], D[row - 1][col - 1]); //first move out of * row
                                if (D[row - 1][col] == min) {
                                    row--; //move up
                                } else { //D[row - 1][col - 1] == min
                                    row--; //move diagonally up and left
                                    col--;
                                }
                                break; //out of variable
                                
                            } else if (D[row][col - 1] == D[row - 1][col]) { //detect alternate optimal variable path
                                List<String> popVars = new ArrayList<>(vars.get(i));
                                List<String> popTokens = new ArrayList<>(tokens.get(i));
                                popVars.add(thisVar.toString());
                                vars.add(popVars);
                                tokens.add(popTokens);
                                extractionDepth++;
                                extractVariables(pattern, text, D, row - 1, col, vars, tokens);
                            }
                        }
                        
                        printDmatch(pattern, text, D, row, col);
                    }
                    
                    vars.get(i).add(0, thisVar.toString().trim());
                    thisVar = new StringBuilder();
                    
                } else if ((row > 1) && Objects.equals(varChar, pattern.substring(row - 2, row - 1))) { //if the next row is a var char
                    thisToken.insert(0, text.charAt(col - 1));
                    tokens.get(i).add(0, thisToken.toString());
                    thisToken = new StringBuilder();
                    
                    col--; //move diagonally up and left
                    row--;
                    
                    printDmatch(pattern, text, D, row, col);
                    
                } else {
                    if (row > 0) {
                        int min = Math.min(Math.min(D[row - 1][col], D[row - 1][col - 1]), //move
                                D[row][col - 1]);
                        if (D[row - 1][col] == min) {
                            row--; //move up
                        } else if (col > 0) {
                            if (D[row - 1][col - 1] == min) {
                                thisToken.insert(0, text.charAt(col - 1));
                                row--; //move diagonally up and left
                                col--;
                            } else { //D[row][col - 1] == min
                                thisToken.insert(0, text.charAt(col - 1));
                                col--; //move left
                            }
                        }
                    } else {
                        thisToken.insert(0, text.charAt(col - 1));
                        col--; //move left
                    }
                    
                    if (col == 0) {
                        tokens.get(i).add(0, thisToken.toString());
                    }
                    
                    printDmatch(pattern, text, D, row, col);
                }
            }
            
            printDmatch(pattern, text, D);
        }
        
        
        /**
         * Prints out the D matrix used in the stringEditDistance method.
         *
         * @param pattern The pattern to compare the string against.
         * @param text    The string to compare.
         * @param D       The matrix to print.
         * @see #printDmatch(String, String, int[][], int, int)
         */
        private static void printDmatch(String pattern, String text, int[][] D)
        {
            printDmatch(pattern, text, D, -1, -1);
        }
        
        /**
         * Prints out the D matrix used in the stringEditDistance method with an indicator at a location.
         *
         * @param pattern The pattern to compare the string against.
         * @param text    The string to compare.
         * @param D       The matrix to print.
         * @param row     The current row location of the algorithm.
         * @param col     The current column location of the algorithm.
         */
        private static void printDmatch(String pattern, String text, int[][] D, int row, int col)
        {
            int m = pattern.length(); //i
            int n = text.length(); //j
            
            for (int i = 0; i <= text.length(); i++) {
                if (i > 0) {
                    if (i == 1) {
                        System.out.print("   ");
                    }
                    System.out.print(text.charAt(i - 1) + "  ");
                } else {
                    System.out.print("   ");
                }
            }
            System.out.println();
            
            for (int i = 0; i <= m; i++) {
                if (i > 0) {
                    System.out.print(pattern.charAt(i - 1) + " ");
                } else {
                    System.out.print("  ");
                }
                
                for (int j = 0; j <= n; j++) {
                    
                    if ((i == row) && (j == col)) {
                        System.out.print("*");
                    } else {
                        System.out.print(" ");
                    }
                    
                    if (D[i][j] == Integer.MAX_VALUE) {
                        System.out.print("^ ");
                    } else {
                        System.out.print(D[i][j] + " ");
                    }
                    
                }
                System.out.println();
            }
            
            System.out.println();
        }
        
    }
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2012
    Posts
    36
    Rep Power
    7
    Here is the latest version of this code. This update fixed edge cases and allows for punctuation to be counted or ignored during scoring. In addition, I have added more detailed javadocs that describe the usage and functionality of these methods.

    I have also attached a unit test to verify the functionality to the methods and to test edge cases: StringComparisonUtilityTest.zip

    Code:
    /*
     * File:    StringComparisonUtility.java
     * Author:  Zachary Gill
     */
    
    import java.util.*;
    import java.util.concurrent.ConcurrentHashMap;
    
    /**
     * A resource class which provides functionality to compare Strings.
     */
    public final class StringComparisonUtility {
        
        //Constants
        
        /**
         * The character used to specify a variable in a string compare.
         */
        public static final String varChar = "";
        
        
        //Static Fields
        
        /**
         * The current index in the vars and tokens list to store extracted vars and tokens for extractVariables().
         *
         * @see #extractVariables(String, String, int[][], int, int, UUID, List, List, boolean, boolean)
         */
        private static final Map<UUID, Integer> extractionDepths = new ConcurrentHashMap<>();
        
        
        //Main Method
        
        /**
         * Main method.
         *
         * @param args Arguments to main method.
         */
        public static void main(String[] args) {
            String source = "Something";
            String target = "Nothing";
        
            List<List<String>> vars = new ArrayList<>();
            List<List<String>> tokens = new ArrayList<>();
            
            double score = stringCompare(target, source, vars, tokens, false, false);
            System.out.println("Score: " + score);
            System.out.println();
            
            source = "What the hell are lobsters";
            target = " s";
            vars.clear();
            tokens.clear();
            
            score = stringCompare(target, source, vars, tokens, false, false);
            System.out.println("Score: " + score);
            System.out.println();
            
            System.out.println("---");
            for (int i = 0; i < vars.size(); i++) {
                System.out.println("Extraction #" + i);
                
                System.out.println("  Variables:");
                for (String var : vars.get(i)) {
                    System.out.println("    '" + var + "'");
                }
                System.out.println("  Tokens:");
                for (String token : tokens.get(i)) {
                    System.out.println("    '" + token + "'");
                }
                
                System.out.println("---");
            }
        }
        
        
        //Functions
        
        /**
         * Compares the closeness of two strings and extracts variables and tokens if applicable.<br/>
         * The text string is compared against the pattern string and the value returned will be equal to the length of the pattern minus the edit distance between the two strings, divided by the length of the pattern.<br/>
         * If non-null vars and tokens lists are supplied as arguments, then following the comparison, the variables and tokens from the text as compared to the pattern will be extracted and stored in these lists.<br/>
         * There may be multiple top-level extractions of variables and tokens from the string, if so, each top-level extraction will be stored in the lists.<br/>
         * If the ignoreCase flag is set to true, then the case of the characters between the text and pattern strings will not be considered for scoring, however the cases of the text string will remain unchanged in the extractions.<br/>
         * If the ignorePunctuation flag is set to true, then punctuation in the text string will not affect the score of the comparison, however the punctuation of the text string will remain in the extractions.<br/>
         * If the ignorePunctuation flag is set to true, punctuation will most likely be assumed to be part of the variable or token to its left, even if the punctuation is specified in the pattern string.
         * In addition, if the ignorePunctuation flag is set to true, all punctuation in the pattern string will be stripped, which may change the length of the pattern string and the resulting comparison score.<br/>
         * If you wish to compare strings including punctuation, set the ignorePunctuation flag to false.
         *
         * @param pattern           The string to compare against.
         * @param text              The string to compare.
         * @param vars              The list to store extracted variables in. This will contain a list of all optimal variable extraction sets.
         * @param tokens            The list to store extracted non-variable tokens in. This will contain a list of all optimal non-variable token extraction sets.
         * @param ignoreCase        Whether to ignore case of not.
         * @param ignorePunctuation Whether to ignore punctuation of not.
         * @return The closeness of the two strings, between 0 and 1.
         */
        public static double stringCompare(String pattern, String text, List<List<String>> vars, List<List<String>> tokens, boolean ignoreCase, boolean ignorePunctuation) {
            if (pattern.isEmpty() || text.isEmpty()) {
                return 0.0;
            }
            
            pattern = pattern.replaceAll("[" + varChar + "]+", varChar); //remove double vars
            pattern = ignorePunctuation ? removePunctuationSoft(pattern, Collections.singletonList(varChar.charAt(0))) : pattern;
            
            if (pattern.isEmpty()) {
                return 0.0;
            }
            
            return (double) (pattern.length() - stringEditDistance(pattern, text, vars, tokens, ignoreCase, ignorePunctuation)) / pattern.length();
        }
        
        /**
         * Calculates the distance between two strings with variable collection.<br/>
         * The text string is compared against the pattern string and the value returned will be equal to the edit distance between the two strings.<br/>
         * If non-null vars and tokens lists are supplied as arguments, then following the comparison, the variables and tokens from the text as compared to the pattern will be extracted and stored in these lists.<br/>
         * There may be multiple top-level extractions of variables and tokens from the string, if so, each top-level extraction will be stored in the lists.<br/>
         * If the ignoreCase flag is set to true, then the case of the characters between the text and pattern strings will not be considered for scoring, however the cases of the text string will remain unchanged in the extractions.<br/>
         * If the ignorePunctuation flag is set to true, then punctuation in the text string will not affect the score of the comparison, however the punctuation of the text string will remain in the extractions.<br/>
         * If the ignorePunctuation flag is set to true, punctuation will most likely be assumed to be part of the variable or token to its left, even if the punctuation is specified in the pattern string.
         * In addition, if the ignorePunctuation flag is set to true, all punctuation in the pattern string will be stripped, which may change the length of the pattern string and the resulting comparison score.<br/>
         * If you wish to compare strings including punctuation, set the ignorePunctuation flag to false.
         *
         * @param pattern           The pattern to compare the string against.
         * @param text              The string to compare.
         * @param vars              The list to store extracted variables in. This will contain a list of all optimal variable extraction sets.
         * @param tokens            The list to store extracted non-variable tokens in. This will contain a list of all optimal non-variable token extraction sets.
         * @param ignoreCase        Whether to ignore case or not.
         * @param ignorePunctuation Whether to ignore punctuation or not.
         * @return The edit distance between the two strings.
         */
        @SuppressWarnings({"AssignmentToMethodParameter", "SillyAssignment", "ConstantConditions"})
        public static int stringEditDistance(String pattern, String text, List<List<String>> vars, List<List<String>> tokens, boolean ignoreCase, boolean ignorePunctuation) {
            pattern = ignorePunctuation ? removePunctuationSoft(pattern, Collections.singletonList(varChar.charAt(0))) : pattern;
            pattern = pattern.replaceAll("[" + varChar + "]+", varChar); //remove double vars
            
            int m = pattern.length(); //i
            int n = text.length(); //j
            
            int g = 0; //number of variables
            int v = -1; //index of first variable
            for (int pi = 0; pi < m; pi++) {
                if (varChar.equals(pattern.substring(pi, pi + 1))) {
                    g++;
                    if (v == -1) {
                        v = pi;
                    }
                }
            }
            if (g == 0) {
                v = m;
            }
            
            int[][] D = new int[m + 1][n + 1];
            
            for (int j = 0; j <= n; j++) {
                D[0][j] = j;
            }
            for (int i = 0; i <= v; i++) {
                D[i][0] = i;
            }
            for (int i = v + 1; i <= m; i++) {
                D[i][0] = Integer.MAX_VALUE;
            }
            
            boolean init = false;
            for (int i = 1; i <= m; i++) {
                
                if (varChar.equals(pattern.substring(i - 1, i))) {
                    boolean hitZero = false;
                    for (int j = 0; j <= n; j++) { //assuming don't care set could span from 0 chars to the rest of the string
                        if (D[i - 1][j] == 0) {
                            D[i][j] = 1;
                            hitZero = true;
                        } else if (hitZero) {
                            D[i][j - 1] = D[i - 1][j - 1];
                            for (j = j; j <= n; j++) {
                                D[i][j] = 0;
                            }
                            hitZero = false;
                            break; //no need to continue this loop
                        } else {
                            D[i][j] = (j > 0) ? Math.min(D[i - 1][j], D[i][j - 1]) : D[i - 1][j];
                        }
                    }
                    if (hitZero) {
                        D[i][n] = D[i - 1][n];
                    }
                    init = true;
                    
                } else {
                    char patternChar = pattern.charAt(i - 1);
                    for (int j = 1; j <= n; j++) {
                        char textChar = text.charAt(j - 1);
                        if (ignorePunctuation && isPunctuation(textChar)) {
                            for (j = j; j <= n; j++) {
                                if (isPunctuation(text.charAt(j - 1))) {
                                    if (!init) {
                                        D[i][j] = D[i][j - 1] - ((j == 1) ? 1 : 0);
                                        D[i - 1][j] = D[i - 1][j - 1];
                                    } else {
                                        D[i][j] = D[i][j - 1];
                                    }
                                } else {
                                    j--;
                                    break;
                                }
                            }
                        } else {
                            int matchChar;
                            if (ignoreCase) {
                                matchChar = (Character.toUpperCase(patternChar) == Character.toUpperCase(textChar)) ? 1 : 0;
                            } else {
                                matchChar = (patternChar == textChar) ? 1 : 0;
                            }
                            D[i][j] = 1 + Math.min(Math.min(D[i - 1][j], D[i][j - 1]),
                                    D[i - 1][j - 1] - matchChar);
                        }
                        init = true;
                    }
                }
            }
            
            printDmatch(pattern, text, D);
            if ((vars != null) && (g > 0)) { //if there are variables
                if (tokens == null) {
                    tokens = new ArrayList<>();
                }
                extractVariables(pattern, text, D, vars, tokens, ignorePunctuation);
            }
            
            return D[m][n];
        }
        
        /**
         * Extracts variables from the array produced by stringEditDistance().
         *
         * @param pattern           The pattern to compare the string against.
         * @param text              The string to compare.
         * @param D                 The array produced by stringEditDistance().
         * @param r                 The row to start extraction from.
         * @param c                 The column to start extraction from.
         * @param e                 The key associated with the algorithm's extraction depth in the extractionDepth map, null initially.
         * @param vars              The list to store extracted variables in. This will contain a list of all optimal variable extraction sets.
         * @param tokens            The list to store extracted non-variable tokens in. This will contain a list of all optimal non-variable token extraction sets.
         * @param varNext           A flag indicating whether the next extracted chunk will be a variable or not.
         * @param ignorePunctuation Whether to ignore punctuation or not.
         */
        @SuppressWarnings({"IfStatementWithIdenticalBranches", "ConstantConditions"})
        private static void extractVariables(String pattern, String text, int[][] D, int r, int c, UUID e, List<List<String>> vars, List<List<String>> tokens, boolean varNext, boolean ignorePunctuation) {
            int row = r;
            int col = c;
            
            if ((e == null) || ((r == pattern.length()) && (c == text.length()))) {
                e = UUID.randomUUID();
                extractionDepths.put(e, 0);
            }
            int i = extractionDepths.get(e);
            
            if (vars.size() < (i + 1)) {
                vars.add(new ArrayList<>());
            }
            if (tokens.size() < (i + 1)) {
                tokens.add(new ArrayList<>());
            }
            
            StringBuilder thisVar = new StringBuilder();
            StringBuilder thisToken = new StringBuilder();
            while ((row >= 0) && (col > 0)) { //start at bottom right corner, move to upper left corner
                if ((row > 0) && varChar.equals(pattern.substring(row - 1, row))) { //the current row is a var char
                    
                    while (col >= 1) { //move left to digest variable
                        if (D[row][col - 1] == D[row][col]) {
                            thisVar.insert(0, text.charAt(col - 1));
                            col--; //move left
                        }
                        
                        if ((row > 1) && (col >= 1)) {
                            if (D[row][col - 1] != D[row][col]) { //stop at the last col in the variable sequence
                                int min = Math.min(D[row - 1][col], D[row - 1][col - 1]); //move out of variable row
                                if (D[row - 1][col] == min) {
                                    row--; //move up
                                } else { //D[row - 1][col - 1] == min
                                    row--; //move diagonally up and left
                                    col--;
                                }
                                break; //out of variable
                                
                            } else if (D[row][col - 1] == D[row - 1][col]) { //detect alternate optimal variable path
                                List<String> popVars = new ArrayList<>(vars.get(i));
                                List<String> popTokens = new ArrayList<>(tokens.get(i));
                                popVars.add(0, thisVar.toString());
                                vars.add(popVars);
                                tokens.add(popTokens);
                                int currentExtractionDepth = extractionDepths.get(e);
                                extractionDepths.replace(e, ++currentExtractionDepth);
                                extractVariables(pattern, text, D, row - 1, col, e, vars, tokens, false, ignorePunctuation);
                            }
                        }
                    }
                    
                    vars.get(i).add(0, thisVar.toString());
                    varNext = false;
                    thisVar = new StringBuilder();
                    
                } else if ((row > 1) && varChar.equals(pattern.substring(row - 2, row - 1))) { //if the next row is a var char
                    
                    if (ignorePunctuation && isPunctuation(text.substring(col - 1, col).charAt(0))) {
                        while ((col > 1) && (D[row][col - 1] == D[row][col])) {
                            thisToken.insert(0, text.charAt(col - 1));
                            col--; //move left
                        }
                    }
                    
                    if ((col > 1) && (D[row - 1][col - 1] == D[row - 1][col]) && (D[row][col] != D[row - 1][col - 1])) {
                        List<String> popVars = new ArrayList<>(vars.get(i));
                        List<String> popTokens = new ArrayList<>(tokens.get(i));
                        popTokens.add(0, text.charAt(col - 1) + thisToken.toString());
                        vars.add(popVars);
                        tokens.add(popTokens);
                        int currentExtractionDepth = extractionDepths.get(e);
                        extractionDepths.replace(e, ++currentExtractionDepth);
                        extractVariables(pattern, text, D, row - 1, col - 1, e, vars, tokens, true, ignorePunctuation);
                        
                        row--; //move up
                    } else {
                        thisToken.insert(0, text.charAt(col - 1));
                        row--; //move diagonally up and left
                        col--;
                    }
                    
                    tokens.get(i).add(0, thisToken.toString());
                    varNext = true;
                    thisToken = new StringBuilder();
                    
                } else {
                    if (row > 0) {
                        int min = Math.min(Math.min(D[row - 1][col], D[row - 1][col - 1]), D[row][col - 1]); //move
                        if (col > 0) {
                            if ((D[row - 1][col] == min) && (D[row - 1][col] == D[row - 1][col - 1]) && (D[row][col] == D[row][col - 1])) {
                                row--; //move up;
                            } else if (D[row - 1][col - 1] == min) {
                                thisToken.insert(0, text.charAt(col - 1));
                                row--; //move diagonally up and left
                                col--;
                            } else if (D[row - 1][col] == min) {
                                row--; //move up;
                            } else { //D[row][col - 1] == min
                                thisToken.insert(0, text.charAt(col - 1));
                                col--; //move left
                            }
                        }
                    } else {
                        thisToken.insert(0, text.charAt(col - 1));
                        col--; //move left
                    }
                    
                    if (col == 0) { //reached beginning of string
                        tokens.get(i).add(0, thisToken.toString());
                        varNext = true;
                        thisToken = new StringBuilder();
                    }
                }
            }
            
            //digest unreached chunks in the pattern
            if ((col == 0) && (row > 0) && (pattern.length() >= row)) {
                while (row > 0) {
                    boolean isVar = (pattern.charAt(row - 1) == varChar.charAt(0));
                    if (!isVar && !varNext) {
                        tokens.get(i).add(0, "");
                        varNext = true;
                    } else if (isVar && varNext) {
                        vars.get(i).add(0, "");
                        varNext = false;
                    }
                    row--; //move up
                }
            }
            
            if ((r == pattern.length()) && (c == text.length())) {
                extractionDepths.remove(e);
            }
        }
        
        /**
         * Extracts variables from the array produced by stringEditDistance().
         *
         * @param pattern           The pattern to compare the string against.
         * @param text              The string to compare.
         * @param D                 The array produced by stringEditDistance().
         * @param vars              The list to store extracted variables in. This will contain a list of all optimal variable extraction sets.
         * @param tokens            The list to store extracted non-variable tokens in. This will contain a list of all optimal non-variable token extraction sets.
         * @param ignorePunctuation Whether to ignore punctuation or not.
         */
        private static void extractVariables(String pattern, String text, int[][] D, List<List<String>> vars, List<List<String>> tokens, boolean ignorePunctuation) {
            extractVariables(pattern, text, D, pattern.length(), text.length(), null, vars, tokens, false, ignorePunctuation);
        }
        
        /**
         * Prints out the D matrix used in the stringEditDistance method.
         *
         * @param pattern The pattern to compare the string against.
         * @param text    The string to compare.
         * @param D       The matrix to print.
         * @see #printDmatch(String, String, int[][], int, int)
         * @see #stringEditDistance(String, String, List, List, boolean, boolean)
         */
        private static void printDmatch(String pattern, String text, int[][] D) {
            printDmatch(pattern, text, D, -1, -1);
        }
        
        /**
         * Prints out the D matrix used in the stringEditDistance method with an indicator at a location.
         *
         * @param pattern The pattern to compare the string against.
         * @param text    The string to compare.
         * @param D       The matrix to print.
         * @param row     The current row location of the algorithm.
         * @param col     The current column location of the algorithm.
         * @see #stringEditDistance(String, String, List, List, boolean, boolean)
         */
        @SuppressWarnings("UseOfSystemOutOrSystemErr")
        private static void printDmatch(String pattern, String text, int[][] D, int row, int col) {
            StringBuilder dMatch = new StringBuilder();
            int m = pattern.length(); //i
            int n = text.length(); //j
            
            int spaces = Math.max(String.valueOf(n).length(), String.valueOf(m).length()) + 2;
            
            for (int i = 0; i <= text.length(); i++) {
                if (i > 0) {
                    dMatch.append(spaces(spaces - 1)).append(text.charAt(i - 1));
                } else {
                    dMatch.append(spaces(spaces + 1));
                }
            }
            dMatch.append(System.lineSeparator());
            
            for (int i = 0; i <= m; i++) {
                if (i > 0) {
                    dMatch.append(pattern.charAt(i - 1));
                } else {
                    dMatch.append(" ");
                }
                
                for (int j = 0; j <= n; j++) {
                    String element = (((i == row) && (j == col)) ? "*" : "") + ((D[i][j] > (Integer.MAX_VALUE / 2)) ? "^" : D[i][j]);
                    dMatch.append(spaces(spaces - element.length())).append(element);
                }
                dMatch.append(System.lineSeparator());
            }
            
            System.out.println(dMatch);
        }
        
        
        //Utility Functions
        
        /**
         * Creates a string of the length specified filled with spaces.
         *
         * @param num The length to make the string.
         * @return A new string filled with spaces to the length specified.
         */
        public static String spaces(int num) {
            char[] chars = new char[num];
            Arrays.fill(chars, ' ');
            return new String(chars);
        }
        
        /**
         * Determines if a character is punctuation or not.
         *
         * @param c The character.
         * @return Whether the character is punctuation or not.
         */
        public static boolean isPunctuation(char c) {
            return !(Character.isLetterOrDigit(c) || Character.isWhitespace(c));
        }
        
        /**
         * Gently removes the punctuation from a tokenized string.<br>
         * Does not remove decimals from numbers or operators from expressions.
         *
         * @param tokens The list of tokens to operate on.
         * @param save   A list of punctuation characters to ignore.
         * @return The tokens with punctuation gently removed.
         */
        private static List<String> removePunctuationSoft(List<String> tokens, List<Character> save) {
            List<String> depuncted = new ArrayList<>();
            
            for (int i = 0; i < tokens.size(); i++) {
                String token = tokens.get(i);
                
                if (((i != 0) && (i != (tokens.size() - 1))) &&
                        (tokenIsOperator(token) && (tokenIsNum(tokens.get(i - 1)) && tokenIsNum(tokens.get(i + 1))))) { //keep operators for expressions
                    depuncted.add(token);
                } else {
                    for (int j = 0; j < token.length(); j++) {
                        char current = token.charAt(j);
                        
                        if (!Character.isLetterOrDigit(current)) { //found a symbol character
                            boolean remove = true;
                            
                            if ((j < (token.length() - 1)) && Character.isDigit(token.charAt(j + 1))) { //if the next digit is a number
                                if (tokenIsOperator(Character.toString(current)) || (current == '.')) //operators and decimals are ok near numbers
                                {
                                    remove = false;
                                }
                            }
                            
                            if (remove) {
                                boolean saveChar = false;
                                for (Character c : save) {
                                    if (c == current) {
                                        saveChar = true;
                                    }
                                }
                                if (!saveChar) {
                                    StringBuilder sb = new StringBuilder(token);
                                    sb.deleteCharAt(j);
                                    token = sb.toString();
                                    j--;
                                }
                            }
                        }
                    }
                    depuncted.add(token);
                }
            }
            
            return depuncted;
        }
        
        /**
         * Gently removes the punctuation from a string.<br>
         * Does not remove decimals from numbers or operators from expressions.
         *
         * @param string The string to operate on.
         * @param save   A list of punctuation characters to ignore.
         * @return The string with punctuation gently removed.
         *
         * @see #removePunctuationSoft(List, List)
         */
        public static String removePunctuationSoft(String string, List<Character> save) {
            List<String> tokens = tokenize(string, " ");
            tokens = removePunctuationSoft(tokens, save);
            
            return detokenize(tokens, " ");
        }
        
        /**
         * Determines if a string token represents a number of not.
         *
         * @param token The token to examine.
         * @return Whether the token represents a number of not.
         */
        public static boolean tokenIsNum(String token) {
            try {
                double d = Double.parseDouble(token);
            } catch (NumberFormatException ignored) {
                return false;
            }
            return true;
        }
        
        /**
         * Determines if a string token represents an operator or not.
         *
         * @param token The token to examine.
         * @return Whether the token represents an operator or not.
         */
        @SuppressWarnings("HardcodedFileSeparator")
        public static boolean tokenIsOperator(String token) {
            String[] operators = new String[]{"+", "-", "*", "/", "\\", "%", ">", "<", "==", "!=", "<>", ">=", "<="};
            for (String operator : operators) {
                if (token.equals(operator)) {
                    return true;
                }
            }
            return false;
        }
        
        /**
         * Tokenizes a passed string into its words and returns a list of those words.
         *
         * @param str   The string to tokenize.
         * @param delim The delimiter to separate tokens by.
         * @return A list of all the tokens of the passed string.
         */
        public static List<String> tokenize(String str, String delim) {
            List<String> tokens = new ArrayList<>();
            
            StringTokenizer st = new StringTokenizer(str, delim);
            while (st.hasMoreTokens()) {
                tokens.add(st.nextToken());
            }
            
            return tokens;
        }
        
        /**
         * Detokenizes a passed list of tokens back into a string.
         *
         * @param tokens The list of tokens to detokenize.
         * @param delim  The delimiter to insert between tokens.
         * @return A string composed of the tokens in the passed list.
         */
        public static String detokenize(List<String> tokens, String delim) {
            StringBuilder str = new StringBuilder();
            for (int i = 0; i < tokens.size(); i++) {
                str.append(tokens.get(i));
                if (i != (tokens.size() - 1)) {
                    str.append(delim);
                }
            }
            return str.toString();
        }
        
    }

IMN logo majestic logo threadwatch logo seochat tools logo