#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    13
    Rep Power
    0

    Help with a program to spell check a file


    Hi,

    I am currently working on completing harvards cs50 course, and I need some help. I have 3 problem sets to go, the first is to create a spell checker, the second is to decompress, huffman compressed files, and the third is to create a website for buying and selling stocks. All that is followed by a test and final project!!!

    Luckly, not all hope of finishing in time is lost! I am close to completing the site, and am about half done the decompressing program... The one I am here asking for help with, is the spell checker.

    I, with the help of the users on this forum:

    http://cboard.cprogramming.com/c-programming/154936-help-program-spell-check-file.html

    Got a program, that is close to working. There was some miscommunication, however, about the provided programs, that it must work with. And that post is now dead, rather then open a new thread there I though I would try here. Properly explaining this, might take a bit of text, so If you are prepared to help me continue to learn, grab a drink :cadrunk: and settle in for a long(ish) read.

    I will start by defining, what the spell checker must do, infact I will copy this directly from the instructions:

    -Alright, the challenge ahead of you is to implement load, check, size, and unload as efficiently as possible, in such a way that TIME IN load, TIME IN check, TIME IN size, and TIME IN unload are all minimized. To be sure, it's not obvious what it even means to be minimized, inasmuch as these benchmarks will certainly vary as you feed speller different values for dictionary and for text. But therein lies the challenge, if not the fun, of this problem set. This problem set is your chance to design. Although we invite you to minimize space, your ultimate enemy is time. But before you dive in, some specifications from us.

    -You may not alter speller.c.

    -You may alter dictionary.c (and, in fact, must in order to complete the implementations of load, check, size, and unload), but you may not alter the declarations of load, check, size, or unload.

    -You may alter dictionary.h, but you may not alter the declarations of load, check, size, or unload.

    -You may alter Makefile.

    -You may add functions to dictionary.c or to files of your own creation so long as all of your code compiles via make.

    -Your implementation of check must be case-insensitive. In other words, if foo is in dictionary, then check should return true given any capitalization thereof; none of foo, foO, fOo, fOO, fOO, Foo, FoO, FOo, and FOO should be considered misspelled.

    -Capitalization aside, your implementation of check should only return true for words actually in dictionary. Beware hard-coding common words (e.g., the), lest we pass your implementation a dictionary without those same words. Moreover, the only possessives allowed are those actually in dictionary. In other words, even if foo is in dictionary, check should return false given foo's if foo's is not also in dictionary.
    You may assume that check will only be passed strings with alphabetical characters and/or apostrophes.

    -You may assume that any dictionary passed to your program will be structured exactly like ours, lexicographically sorted from top to bottom with one word per line, each of which ends with \n. You may also assume that dictionary will contain at least one word, that no word will be longer than LENGTH (a constant defined in dictionary.h) characters, that no word will appear more than once, and that each word will contain only lowercase alphabetical characters and possibly apostrophes.

    -Your spell-checker may only take text and, optionally, dictionary as input. Although you might be inclined (particularly if among those more comfortable) to "pre-process" our default dictionary in order to derive an "ideal hash function" for it, you may not save the output of any such pre-processing to disk in order to load it back into memory on subsequent runs of your spell-checker in order to gain an advantage.



    So, Here is the code they supply:

    Speller.c

    Code:
    /****************************************************************************
     * speller.c
     *
     * Computer Science 50
     * Problem Set 5
     *
     * Implements a spell-checker.
     ***************************************************************************/
    
    #include <ctype.h>
    #include <stdio.h>
    #include <sys/resource.h>
    #include <sys/time.h>
    
    #include "dictionary.h"
    
    // default dictionary
    #define DICTIONARY "/home/cs50/pset5/dictionaries/large"
    
    // prototype
    double calculate(const struct rusage* b, const struct rusage* a);
    
    int main(int argc, char* argv[])
    {
        // check for correct number of args
        if (argc != 2 && argc != 3)
        {
            printf("Usage: speller [dictionary] text\n");
            return 1;
        }
    
        // structs for timing data
        struct rusage before, after;
    
        // benchmarks
        double ti_load = 0.0, ti_check = 0.0, ti_size = 0.0, ti_unload = 0.0;
    
        // determine dictionary to use
        char* dictionary = (argc == 3) ? argv[1] : DICTIONARY;
    
        // load dictionary
        getrusage(RUSAGE_SELF, &before);
        bool loaded = load(dictionary);
        getrusage(RUSAGE_SELF, &after);
    
        // abort if dictionary not loaded
        if (!loaded)
        {
            printf("Could not load %s.\n", dictionary);
            return 1;
        }
    
        // calculate time to load dictionary
        ti_load = calculate(&before, &after);
    
        // try to open text
        char* text = (argc == 3) ? argv[2] : argv[1];
        FILE* fp = fopen(text, "r");
        if (fp == NULL)
        {
            printf("Could not open %s.\n", text);
            unload();
            return 1;
        }
    
        // prepare to report misspellings
        printf("\nMISSPELLED WORDS\n\n");
    
        // prepare to spell-check
        int index = 0, misspellings = 0, words = 0;
        char word[LENGTH+1];
    
        // spell-check each word in text
        for (int c = fgetc(fp); c != EOF; c = fgetc(fp))
        {
            // allow only alphabetical characters and apostrophes
            if (isalpha(c) || (c == '\'' && index > 0))
            {
                // append character to word
                word[index] = c;
                index++;
    
                // ignore alphabetical strings too long to be words
                if (index > LENGTH)
                {
                    // consume remainder of alphabetical string
                    while ((c = fgetc(fp)) != EOF && isalpha(c));
    
                    // prepare for new word
                    index = 0;
                }
            }
    
            // ignore words with numbers (like MS Word can)
            else if (isdigit(c))
            {
                // consume remainder of alphanumeric string
                while ((c = fgetc(fp)) != EOF && isalnum(c));
    
                // prepare for new word
                index = 0;
            }
    
            // we must have found a whole word
            else if (index > 0)
            {
                // terminate current word
                word[index] = '\0';
    
                // update counter
                words++;
    
                // check word's spelling
                getrusage(RUSAGE_SELF, &before);
                bool misspelled = !check(word);
                getrusage(RUSAGE_SELF, &after);
    
                // update benchmark
                ti_check += calculate(&before, &after);
    
                // print word if misspelled
                if (misspelled)
                {
                    printf("%s\n", word);
                    misspellings++;
                }
    
                // prepare for next word
                index = 0;
            }
        }
    
        // check whether there was an error
        if (ferror(fp))
        {
            fclose(fp);
            printf("Error reading %s.\n", text);
            unload();
            return 1;
        }
    
        // close text
        fclose(fp);
    
        // determine dictionary's size
        getrusage(RUSAGE_SELF, &before);
        unsigned int n = size();
        getrusage(RUSAGE_SELF, &after);
    
        // calculate time to determine dictionary's size
        ti_size = calculate(&before, &after);
    
        // unload dictionary
        getrusage(RUSAGE_SELF, &before);
        bool unloaded = unload();
        getrusage(RUSAGE_SELF, &after);
    
        // abort if dictionary not unloaded
        if (!unloaded)
        {
            printf("Could not unload %s.\n", dictionary);
            return 1;
        }
    
        // calculate time to unload dictionary
        ti_unload = calculate(&before, &after);
    
        // report benchmarks
        printf("\nWORDS MISSPELLED:     %d\n", misspellings);
        printf("WORDS IN DICTIONARY:  %d\n", n);
        printf("WORDS IN TEXT:        %d\n", words);
        printf("TIME IN load:         %.2f\n", ti_load);
        printf("TIME IN check:        %.2f\n", ti_check);
        printf("TIME IN size:         %.2f\n", ti_size);
        printf("TIME IN unload:       %.2f\n", ti_unload);
        printf("TIME IN TOTAL:        %.2f\n\n", 
         ti_load + ti_check + ti_size + ti_unload);
    
        // that's all folks
        return 0;
    }
    
    /**
     * Returns number of seconds between b and a.
     */
    double calculate(const struct rusage* b, const struct rusage* a)
    {
        if (b == NULL || a == NULL)
        {
            return 0.0;
        }
        else
        {
            return ((((a->ru_utime.tv_sec * 1000000 + a->ru_utime.tv_usec) -
                     (b->ru_utime.tv_sec * 1000000 + b->ru_utime.tv_usec)) +
                    ((a->ru_stime.tv_sec * 1000000 + a->ru_stime.tv_usec) -
                     (b->ru_stime.tv_sec * 1000000 + b->ru_stime.tv_usec)))
                    / 1000000.0);
        }
    }
    Dictionary.h

    Code:
    /****************************************************************************
     * dictionary.h
     *
     * Computer Science 50
     * Problem Set 5
     *
     * Declares a dictionary's functionality.
     ***************************************************************************/
    
    #ifndef DICTIONARY_H
    #define DICTIONARY_H
    
    #include <stdbool.h>
    
    // maximum length for a word
    // (e.g., pneumonoultramicroscopicsilicovolcanoconiosis)
    #define LENGTH 45
    
    /**
     * Returns true if word is in dictionary else false.
     */
    bool check(const char* word);
    
    /**
     * Loads dictionary into memory.  Returns true if successful else false.
     */
    bool load(const char* dictionary);
    
    /**
     * Returns number of words in dictionary if loaded else 0 if not yet loaded.
     */
    unsigned int size(void);
    
    /**
     * Unloads dictionary from memory.  Returns true if successful else false.
     */
    bool unload(void);
    
    #endif // DICTIONARY_H
    Makefile

    Code:
    /****************************************************************************
     * dictionary.h
     *
     * Computer Science 50
     * Problem Set 5
     *
     * Declares a dictionary's functionality.
     ***************************************************************************/
    
    #ifndef DICTIONARY_H
    #define DICTIONARY_H
    
    #include <stdbool.h>
    
    // maximum length for a word
    // (e.g., pneumonoultramicroscopicsilicovolcanoconiosis)
    #define LENGTH 45
    
    /**
     * Returns true if word is in dictionary else false.
     */
    bool check(const char* word);
    
    /**
     * Loads dictionary into memory.  Returns true if successful else false.
     */
    bool load(const char* dictionary);
    
    /**
     * Returns number of words in dictionary if loaded else 0 if not yet loaded.
     */
    unsigned int size(void);
    
    /**
     * Unloads dictionary from memory.  Returns true if successful else false.
     */
    bool unload(void);
    
    #endif // DICTIONARY_H
    And, here is MY code!

    Code:
    /****************************************************************************
     * dictionary.c
     *
     * Computer Science 50
     * Problem Set 5
     *
     * Implements a dictionary's functionality.
     ***************************************************************************/
    
    #include <stdio.h>
    #include "dictionary.h"
    #include <string.h>
    #include <stdlib.h>
    
    #define MAXWORDS 26
    #define DLENGTH 46
    
    int count=0;
    /**
     * Returns true if word is in dictionary else false.
     */
    bool check(const char* word)
    {
       char *words[LENGTH];
       int n =0;
    
       int j,lo=0,hi=n-1,mid;
       
       while(lo<=hi) 
       {
          mid=(lo+hi)/2; //printf("lo: %d  hi: %d  mid: %d\n",lo,hi,mid);getchar();
          j=strcmp(words[mid],word);
          if(j>0)
          { 
             hi=mid-1;
          } 
          else if(j<0)
          {
             lo=mid+1;
          }
          else
          {
             return true;
          }
       }
       return false;
    }
    /**
     * Loads dictionary into memory.  Returns true if successful else false.
     */
    bool load(const char* dictionary)
    {
     
        int input(FILE *fp,char *words[DLENGTH],int getData);
    
        char **words = NULL;
        char buff[BUFSIZ];  //BUFSIZ or BUFSIZE is a macro for your system - usually 256 or 512 char's in size. A "natural" buffer length, for your system.
     
        FILE *fp=fopen("test.txt","r");
       
        count=input(fp,words,0);   //just counting this time
        rewind(fp);              //going back to the start of the file
     
        //malloc the right number of words here
        words=malloc(count * sizeof(char *));
        for(int i=0;i<count;i++) 
        {
            words[i]=malloc(DLENGTH * sizeof(char));  //#define LENGTH  29
        }
        input(fp,words,1);   //now getting the words
      
        //all the other stuff, here (mostly calling some functions)
        
        printf("%s\n",buff);
     
        return 0;
    }
        int input(FILE *fp, char *words[DLENGTH], int getData)
       { 
            int i=0;
            char buff[128];
            while((fgets(buff, BUFSIZ, fp)) != NULL) 
            {
                 if(getData) 
                 {
                      //remove the newline here
                      strcpy(words[i],buff);
                 }
            ++i;
            }
            if(getData==1)
                return i;
            else
                return -1;
       }
     
    
    /**
     * Returns number of words in dictionary if loaded else 0 if not yet loaded.
     */
    unsigned int size(void)
    {
    //unsigned int count;
    if (count > 1)
        return count;
    else
        return 0;
    }
    /**
     * Unloads dictionary from memory.  Returns true if successful else false.
     */
    bool unload(void)
    {
        if(fclose(*fp)==0)
            return true;
        else
            return false;
    }


    When I try to compile it as it is now, I get this error:

    jharvard@appliance (~/Dropbox/pset5): make
    clang -ggdb -O0 -Qunused-arguments -std=c99 -Wall -Werror -c -o speller.o speller.c
    clang -ggdb -O0 -Qunused-arguments -std=c99 -Wall -Werror -c -o dictionary.o dictionary.c
    dictionary.c:114:16: error: use of undeclared identifier 'fp'
    if(fclose(*fp)==0)
    ^
    1 error generated.
    make: *** [dictionary.o] Error 1

    If I comment out that whole section of code (unloading from memory) it compiles successfully, but I get a seg fault when I try to run it.....

    If you have made it this far, then I thank you for reading, and ask you to give me some hints/help for making the code work.

    If there is anything I forgot to explain, or didn't properly explain, please let me know :cheers:

    Thanks,
    Josh
  2. #2
  3. Contributed User
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jun 2005
    Posts
    4,392
    Rep Power
    1871
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    13
    Rep Power
    0
    Originally Posted by salem
    Yes it is, I want as much help as I can get!

IMN logo majestic logo threadwatch logo seochat tools logo