#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Posts
    108
    Rep Power
    11

    Regex to find random collection of characters in a string


    I've got a simple anagram puzzle that needs to pick out all the words in a dictionary that contains all the letters in the scrambled anagram string. The problem I've got it that I don't know how to get my Regex to treat each character as a single entity only to be used once per word.

    So, say I have the word 'potato'. This has been scrambled into 'tootap' for the puzzle. In C#, I can use the expression [top] to return three matches, one for each letter it finds. However, I can't use the expression [to] since it will return 't','o','t' when I only need a single 't' to be found.

    How can I do what I need?
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    May 2007
    Posts
    765
    Rep Power
    929
    This isn't a good problem for a regex:

    Code:
    # Matches anagrams of potato
    /(?=.*p)(?=.*o.*o)(?=.*t.*t)(?=.*a)^......$/
    Here's a few other ways to solve it:

    Keep of count of each letter in the string and compare that with the occurrences in the dictionary word.

    Sort the letters in the string and word, then check equality.
    Last edited by OmegaZero; December 22nd, 2008 at 04:14 PM.
    sub{*{$::{$_}}{CODE}==$_[0]&& print for(%:: )}->(\&Meh);
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Posts
    108
    Rep Power
    11
    Originally Posted by OmegaZero
    This isn't a good problem for a regex:

    Code:
    # Matches anagrams of potato
    /(?=.*p)(?=.*o.*o)(?=.*t.*t)(?=.*a)^......$/
    Here's a few other ways to solve it:

    Keep of count of each letter in the string and compare that with the occurrences in the dictionary word.

    Sort the letters in the string and word, then check equality.
    I thought it might be an awkward one for Regex. In the end I came up with this (it's in C#) ... haven't tested it yet, though.

    Code:
            /// <summary>
            /// Returns a list of words that contain ALL the characters in the
            /// passed base string.
            /// </summary>
            /// <param name="baseString">The string containing the characters to match</param>
            /// <returns>A list of words containing all the characters in the passed base string</returns>
            public List<string> ReturnWordList(string baseString)
            {
                List<string> wordList = new List<string>();
                int matchesFound = false;
                string tempWordStr = null;
                string tempBaseStr = null;
    
                foreach (string str in dictionary)
                {
                    tempWordStr = str;
                    tempBaseStr = baseString;
    
                    // Don't bother with words that are shorter than the baseString
                    if (str.Length >= baseString.Length)
                    {
                        for (int i = 0; i < baseString.Length; i++)
                        {
                            string currentChar = baseString.Substring(i, 1);
                            // For each letter in the baseString
                            for (int j = 0; j < str.Length; j++)
                            {
                                // If it finds an occurrence of a letter in the current
                                // string, remove the letter.
                                int index = tempWordStr.IndexOf(currentChar);
                                
                                // Character found
                                if (index > -1)
                                {
                                    tempWordStr.Remove(index); // Remove the character
                                    tempBaseStr.Remove(i);
                                }
                            }
                        }
    
                        // If the tempBaseStr is empty, it means that all the characters
                        // have been found in the current string, so add it to the word
                        // list.
                        wordList.Add(str);
                    }
                }
                return wordList;
            }

IMN logo majestic logo threadwatch logo seochat tools logo