February 19th, 2005, 09:11 AM
Checking the Dictionary
Is there a way to check if a string contains any words from the U.S. dictionary?
February 19th, 2005, 01:09 PM
If you had a dictionary of all the words in the US Dictionary then sure, it's very possible although I think a better solution would be to find a web-service than offers dictionary lookups and use SOAP or XMLRPC to check if the word is present.
Last edited by netytan; February 19th, 2005 at 01:11 PM.
February 19th, 2005, 01:11 PM
Ok, thanks. I think that's going to be a difficult task.
February 21st, 2005, 02:44 AM
Or, if you are running on a GNU based platform, /usr/share/dict/words will tell you many words.
February 21st, 2005, 09:38 AM
Actually, I found a word list of 27607 words. I was thinking maybe I could just use that. I'm not sure if Python will be able to handle that much data at the same time though.
February 21st, 2005, 11:00 AM
It should be able to...I've done it before
February 21st, 2005, 11:53 AM
Python is very powerful, so, it will handle 27,000 words just fine :P. The maximum amount of data that I've used with python was 2 million numbers (it was for a test script to compare speeds of databases)
February 21st, 2005, 12:15 PM
Ok, I'll have to try it out later.
February 21st, 2005, 09:49 PM
Well, heres an example of this:
f = open("dict_file")
words = f.readlines()
db = 
for word in words:
string_to_search = "Hi everyone!"
for word in string_to_search.split(" "):
if word.lower() in db:
print "Found %s in db!" % (word)
# this should work, but, it is untested.
February 22nd, 2005, 03:01 AM
Nick, the readlines() method returns a list of the lines in the file so you can bypass appending them to another list. The thing about readlines to remember is that the trailing newline character is left in place, this is particularly important for this kind of thing.
You would also get better results from your search if you stripped all the punctuation from around the word (string.punctuation) i.e.
>>> import string
>>> dictionary = file('/usr/share/dict/words', 'r').readlines()
>>> aString = 'Hello all, strange day today!'
>>> for word in aString.split():
... word = word.lower()
... word = word.strip(string.punctuation)
... if (word + '\n') in dictionary:
... print word, 'is in the dictionary!'
hello is in the dictionary!
strange is in the dictionary!
day is in the dictionary!
today is in the dictionary!