#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    5
    Rep Power
    0

    Checking to see if a word is real


    I'm trying to write a simple program in C to find all possible combinations of letters typed in by the user. I've had no problem with that part, but I want it to pick out all of the real words, rather than just random combinations of the letters. Any ideas how I can check to see if these combinations are actual words? Thanks in advance for any help.
  2. #2
  3. Contributed User
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jun 2005
    Posts
    4,413
    Rep Power
    1871
    There are plenty of lists of words on the web.

    - grab a word list
    - load your word list into your program
    - use qsort() to make sure your words are in order
    - use bsearch() find each candidate word in your sorted dictionary
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper
  4. #3
  5. Contributing User

    Join Date
    Aug 2003
    Location
    UK
    Posts
    5,117
    Rep Power
    1803
    The question just begs more questions. What constitutes a "real" word? Only in English or any language? What about proper-nouns, abbreviations, acronyms, colloquialisms, anachronisms, jargon and dialect? You can probably discard combinations lacking vowels with very few exceptions.

    There is no "algorithm" that could determine whether a word is real without reference to a list of such words. Natural languages evolve all the time and at any one time, certainly in English, the total vocabulary of words ever used exceeds that listed in any single dictionary. Just because a word does not exist in a particular list or dictionary, does not mean it is not real; it is easy to get false negatives. One possibility is to use an API to perform web-search for a particular string, and if it gets hits it at least exists in that combination somewhere. However that has the opposite problem - that of false-positives - since apparently random combinations of letters might appear in part numbers or product codes for example, or even other languages. You might require a certain number of hits, or restrict the search to specified on-line dictionaries, which are likley to be more comprehensive than any in-memory word list you might other wise use.

    To be honest the attempt is probably futile if you want definitive word/not-word results. At best perhaps a range of classifications based on probability could be achieved such as: definitely-not-word/probably-not-word/possibly-word/probably-word/definitely-word could be achieved.
    Last edited by clifford; October 21st, 2012 at 05:05 PM.
  6. #4
  7. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,966
    Rep Power
    481
    I've written jumble solvers. I think I first used a wordstar dictionary.

    Subsequently I send my words through an internet dictionary and parse the result. It's straightforward once you're familiar with the output of a dictionary that pleases you.

    Having passed off this idea, I sure hope this is for occasional use only!
    [code]Code tags[/code] are essential for python code and Makefiles!
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    5
    Rep Power
    0
    Originally Posted by b49P23TIvg
    I've written jumble solvers. I think I first used a wordstar dictionary.

    Subsequently I send my words through an internet dictionary and parse the result. It's straightforward once you're familiar with the output of a dictionary that pleases you.

    Having passed off this idea, I sure hope this is for occasional use only!
    This was exactly what I needed. After a little work, it worked perfectly. Thanks to everyone for all the suggestions and help.
  10. #6
  11. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,966
    Rep Power
    481
    Good grief. I suppose I have parsed the output of an internet dictionary. Seriously, use ispell.

    ispell -l

    reads stdin and returns the misspelled words.
    [code]Code tags[/code] are essential for python code and Makefiles!

IMN logo majestic logo threadwatch logo seochat tools logo