#1
  1. Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Dec 2004
    Location
    Meriden, Connecticut
    Posts
    1,797
    Rep Power
    154

    Checking the Dictionary


    Is there a way to check if a string contains any words from the U.S. dictionary?
  2. #2
  3. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    If you had a dictionary of all the words in the US Dictionary then sure, it's very possible although I think a better solution would be to find a web-service than offers dictionary lookups and use SOAP or XMLRPC to check if the word is present.

    Mark.
    Last edited by netytan; February 19th, 2005 at 01:11 PM.
    programming language development: www.netytan.com Hula

  4. #3
  5. Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Dec 2004
    Location
    Meriden, Connecticut
    Posts
    1,797
    Rep Power
    154
    Ok, thanks. I think that's going to be a difficult task.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2004
    Location
    Albuquerque, New Mexico
    Posts
    137
    Rep Power
    11
    Or, if you are running on a GNU based platform, /usr/share/dict/words will tell you many words.
  8. #5
  9. Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Dec 2004
    Location
    Meriden, Connecticut
    Posts
    1,797
    Rep Power
    154
    Actually, I found a word list of 27607 words. I was thinking maybe I could just use that. I'm not sure if Python will be able to handle that much data at the same time though.
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2004
    Location
    Albuquerque, New Mexico
    Posts
    137
    Rep Power
    11
    It should be able to...I've done it before
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2004
    Location
    Albuquerque, New Mexico
    Posts
    137
    Rep Power
    11
    Python is very powerful, so, it will handle 27,000 words just fine :P. The maximum amount of data that I've used with python was 2 million numbers (it was for a test script to compare speeds of databases)
  14. #8
  15. Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Dec 2004
    Location
    Meriden, Connecticut
    Posts
    1,797
    Rep Power
    154
    Ok, I'll have to try it out later.
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2004
    Location
    Albuquerque, New Mexico
    Posts
    137
    Rep Power
    11
    Well, heres an example of this:
    Code:
     f = open("dict_file")
     words = f.readlines()
     db = []
     for word in words:
     	 db.append(word.lower())
     
     # later
     string_to_search = "Hi everyone!"
     for word in string_to_search.split(" "):
     	 word.strip("!"); word.strip("?")
     	 if word.lower() in db:
     		   print "Found %s in db!" % (word)
     # this should work, but, it is untested.
  18. #10
  19. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Nick, the readlines() method returns a list of the lines in the file so you can bypass appending them to another list. The thing about readlines to remember is that the trailing newline character is left in place, this is particularly important for this kind of thing.

    You would also get better results from your search if you stripped all the punctuation from around the word (string.punctuation) i.e.

    Code:
    >>> import string
    >>> 
    >>> dictionary = file('/usr/share/dict/words', 'r').readlines()
    >>> aString = 'Hello all, strange day today!'
    >>> 
    >>> for word in aString.split():
    ...     word = word.lower()
    ...     word = word.strip(string.punctuation)
    ...     if (word + '\n') in dictionary:
    ...         print word, 'is in the dictionary!'
    ... 
    hello is in the dictionary!
    strange is in the dictionary!
    day is in the dictionary!
    today is in the dictionary!
    >>>
    Take care,

    Mark.
    programming language development: www.netytan.com Hula


IMN logo majestic logo threadwatch logo seochat tools logo