#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2013
    Posts
    4
    Rep Power
    0

    C Word Counting help


    Hello everyone...hope you all are having a great time.

    So heres my problem


    SUMMARY:
    27340 words
    2572 unique words
    WORD FREQUENCIES (TOP 10):
    the 1644
    and 872
    to 729
    a 632
    it 595
    she 553
    i 545
    of 514
    said 462
    you 411

    This output was obtained by running the solution program on a text file . The first few lines provide a summary of the numbers of words found (total words and unique words). This summary is important because it indicates whether your program is processing the words correctly. Program must produce such a summary. Following the summary is a list of words and their frequencies sorted by decreasing frequency. We see the word “the” was the most common and occurred 1644 times, followed by “and” occurring 872 times, etc.

    Programming
    A circularly doubly linked list as the core data structure.
    To extract individual words from the text file, use the following function:
    Code:
    int getNextWord(FILE *fp, char *buf, int bufsize) { 
    char *p = buf; 
    char c; 
    //skip all non-word characters 
    do { 
    c = fgetc(fp); 
    if (c == EOF) 
    return 0; 
    } while (!isalpha(c)); 
    //read word chars 
    do { 
    if (p – buf < bufsize – 1) 
    *p++ = tolower(c); 
    c = fgetc(fp); 
    } while (isalpha(c)); 
    //finalize word 
    *p = ‘\0’; 
    return 1; 
    }
    The above function extracts the next word from the file pointer “fp” (which must be open) and stores it in the character array “buf” (which must be allocated). The “bufsize” parameter indicates the size of the character buffer. All words are automatically converted to lower case to make it easier to compare them for equality – you may want to import a C library to make it work. The function returns 1 (true) if a word is successfully read, and 0 (false) if an end-of-file is reached.

    Therefore, all the words can be read with a loop like this:
    Code:
    while (getNextWord(fp, buf, size)) { 
    // do something with buf 
    }
    You must use the following structure to represent a word: 
    #define MAX_WORD 32 
    typedef struct word { 
    char str[MAX_WORD]; 
    int freq; 
    } Word;
    Linked list implementation should contain data comprised of the Word type.

    Command Line Arguments
    Program must accept at least one command line argument, which is the name of the file to be processed. An optional (you must program it, but the user can choose to enter it) integer argument can also be provided which states how many of the most frequent words to list. If that argument is not provided, it must default to 10. Naturally, provide instructions to your users so that they can figure out how to run your program if they attempt to do so incorrectly.
    Executing with the command-line argument “test.txt” should produce the list shown on the first page. On the other hand, running your program with the arguments “test.txt 3” should list only the three most frequent words (“the”, “and”, and “to”). The summary word counts will not change – you must read and count the entire file every time regardless.


    I did work on the problem but i didnt get much success with circularly doubly linked list..can anyone help me doing this?

    Also i am struck at the sorting part..i talked to many people about this problem but most of them are avoiding by saying its too large :mad:

    heres what i have tried so far

    Code:
    #include <stdio.h>
        #include <stdlib.h>
        #include <string.h>
        #include <ctype.h>
        #define MAX_WORD 32
        #define MAX_TEXT_LENGTH 10000
        
    // ===========================================
    //                 STRUCTURE
    //============================================
        
        
        typedef struct word {
        char *str;              /* Stores the word */
        int freq;               /* Stores the frequency */
        struct word *pNext;     /* Pointer to the next word counter in the list */
    } Word;
    
    // ===========================================
    //             FUNCTION PROTOTYPES
    //============================================
    
    int getNextWord(FILE *fp, char *buf, int bufsize);   /* Given function to get words */
    void addWord(char *pWord);                          /* Adds a word to the list or updates exisiting word */
    void show(Word *pWordcounter);        /* Outputs a word and its count of occurrences */
    Word* createWordCounter(char *word);  /* Creates a new WordCounter structure */
    
    // ===========================================
    //             GLOBAL VARIABLES
    //============================================
    
    Word *pStart = NULL;                  /* Pointer to first word counter in the list */
    
    int totalcount = 0;                  /* Total amount of words */
    int uniquecount = 0;                /* Amount of unique words */
    
    
    
    // ===========================================
    //                 MAIN
    //============================================      
    
    
    int main () {
        
        /* File pointer */
        FILE * fp;
        /* Read text from here */
        fp = fopen("alice.txt","r");
        
        /* buf to hold the words */
        char buf[MAX_WORD];
        
        /* Size */
        int size = MAX_TEXT_LENGTH;
        
        
        /* Pointer to Word counter */
        Word *pCounter = NULL;
        
        
        /* Read all words from text file */
        
        while (getNextWord(fp, buf, size)) {
            
            /* Add the word to the list */
            addWord(buf); 
            
            /* Increment the total words counter */
            totalcount++;
        }
        
        
        /* Loop through list and figure out the number of unique words */
        pCounter = pStart;
        while(pCounter != NULL)
        {
            uniquecount++;
            pCounter = pCounter->pNext;
        }
        
        /* Print Summary */
        
        printf("\nSUMMARY:\n");
        printf("   %d words\n", totalcount); /* Print total words */
        printf("   %d unique words\n", uniquecount); /* Print unique words */
        
        
        
        
        /* List the words and their counts */
        pCounter = pStart;
        while(pCounter != NULL)
        {
            show(pCounter);
            pCounter = pCounter->pNext;
        }
        printf("\n");
        
        
        /* Free the allocated  memory*/
        pCounter = pStart;
        while(pCounter != NULL)
        {
            free(pCounter->str);        
            pStart = pCounter;           
            pCounter = pCounter->pNext;  
            free(pStart);                  
        }
        
        /* Close file */
        fclose(fp);
        
        return 0;
        
    }
    
    
    // ===========================================
    //                 FUNCTIONS
    //============================================
    
    
    void show(Word *pWordcounter)
    {
        /* output the word and it's count */
        printf("\n%-30s   %2d", pWordcounter->str,pWordcounter->freq);
        
    }
    
    void addWord(char *word)
    {
        Word *pCounter = NULL;
        Word *pLast = NULL;
        
        if(pStart == NULL)
        {
            pStart = createWordCounter(word);
            return;
        }
        
        /* If the word is in the list, increment its count */
        pCounter = pStart;
        while(pCounter != NULL)
        {
            if(strcmp(word, pCounter->str) == 0)
            {
                ++pCounter->freq;
                
                return;
            }
            pLast = pCounter;            
            pCounter = pCounter->pNext;  
        }
        
        /* Word is not in the list, add it */
        pLast->pNext = createWordCounter(word);
    }
    
    Word* createWordCounter(char *word)
    {
        Word *pCounter = NULL;
        pCounter = (Word*)malloc(sizeof(Word));
        pCounter->str = (char*)malloc(strlen(word)+1);
        strcpy(pCounter->str, word);
        pCounter->freq = 1;
        pCounter->pNext = NULL;
        return pCounter;
    }
    
    int getNextWord(FILE *fp, char *buf, int bufsize) {
        char *p = buf;
        char c;
        
        
        //skip all non-word characters
        do {
            c = fgetc(fp);
            if (c == EOF) 
                return 0;
        } while (!isalpha(c));
        
        //read word chars
        
        do {
            if (p - buf < bufsize - 1)
                *p++ = tolower(c);
            c = fgetc(fp);
        } while (isalpha(c));
        
        //finalize word
        *p = '\0';
        return 1;
    }
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    40
    Rep Power
    19
    Implement quick sort algorithm :

    Code:
    int get_freq(Word *wordlist, int n)
    {
        while(wordlist && n) {
            wordlist = wordlist->pNext;
            n--;
        }
        if (wordlist != NULL)
            return wordlist->freq;
        else
            return -1;
    }
    
    void swap(Word *wordlist, int i, int j)
    {
        Word *tmp = wordlist;
        int tmp_i;
        int tmp_j;
        int t_i = i;
        while(tmp && i) {
            i--;
            tmp = tmp->pNext;
        }
        tmp_i = tmp->freq;
        tmp = wordlist;
        while(tmp && j) {
            j--;
            tmp = tmp->pNext;
        }
        tmp_j = tmp->freq;
        tmp->freq = tmp_i;
        tmp = wordlist;
        i = t_i;
        while(tmp && i) {
            i--;
            tmp = tmp->pNext;
        }
        tmp->freq = tmp_j;
    }
    
    Word * my_qsort_list(Word *wordlist, int left, int right)
    {
        int i, j;
        int j_val;
        int pivot;
        i = left + 1;
        if (left + 1 < right) {
            pivot = get_freq(wordlist, left);
            for (j = left + 1; j <= right; j++) {
                j_val = get_freq(wordlist, j);
                if (j_val > pivot && j_val != -1) { /* descending */
                    swap(wordlist, i, j);
                    i++;
                }
            }
            swap(wordlist, i - 1, left);
            my_qsort_list(wordlist, left, i);
            my_qsort_list(wordlist, i, right);
        }
        return wordlist;
    }
    Using in main, somewhere before show();

    Code:
    /* Sorting */
    pCounter = my_qsort_list(pStart, 0, uniquecount);
    Good luck!
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2013
    Posts
    4
    Rep Power
    0
    Thanks..
    but i am not using a circularly double linked list in my program..can anyone tell me how will the algorithm change once i do so?
  6. #4
  7. Contributed User
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jun 2005
    Posts
    4,377
    Rep Power
    1871
    Please edit your post and put [code][/code] tags around your code.
    Copy again from your code editor so indentation is preserved.

    Most of the regulars will just ignore your post until the code is readable.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    40
    Rep Power
    19
    Originally Posted by krishh3
    Thanks..
    but i am not using a circularly double linked list in my program..can anyone tell me how will the algorithm change once i do so?
    I'm wonder. How can you write a very nice link list program with everything correctly but can not implement doubly link list?

    Why?

    What to do is only add a pointer to previous element. The rest is to make everything double. Just try!

    Code:
    typedef struct word {
        char *str;              /* Stores the word */
        int freq;               /* Stores the frequency */
        struct word *pPrevious; /* Pointer to previous word counter in list */
        struct word *pNext;     /* Pointer to the next word counter in the list */
    } Word;
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2013
    Posts
    4
    Rep Power
    0
    its not about only writing the program with circularly double linked list..all the logic will change to..about inserting and sorting(thats what i am guessing though i havent done yet it)..correct me if i am wrong :(
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    40
    Rep Power
    19
    No, logic is not changed. Just only more work.

    With single link list, you can insert new node at front, at end and given position.

    I'll demonstrate here only insert at front and at end.

    1) Create new node

    Code:
    Word *NewNode = (Word *)malloc(sizeof(Word));
    NewNode->str = (char*)malloc(strlen(word)+1);
    strcpy(NewNode->str, word);
    NewNode->freq = 1;
    NewNode->pNext = NULL;
    2) Insert at front

    Code:
    NewNode->pNext = Head;
    Head = NewNode;
    3) Insert at end

    Code:
    pCounter = Head;  
    while(pCounter->pNext != NULL)
    {
       pCounter = pCounter->pNext;
    }
    pCounter->pNext = NewNode;


    With doubly link list..


    1) Create new node

    Code:
    Word *NewNode = (Word *)malloc(sizeof(Word));
    NewNode->str = (char*)malloc(strlen(word)+1);
    strcpy(NewNode->str, word);
    NewNode->freq = 1;
    NewNode->pPrevious = NULL;
    NewNode->pNext = NULL;
    2) Insert at front

    Code:
    Head->pPrevious = NewNode;
    NewNode->pNext = Head;
    Head = NewNode;

    3) Insert at end

    Code:
    pCounter = Head;  
    while(pCounter->pNext != NULL)
    {
       pCounter = pCounter->pNext;
    }
    pCounter->pNext = NewNode;
    NewNode->pPrevious = pCounter;
    or
    Code:
    Tail->pNext = NewNode;
    NewNode->pPrevious = Tail;
    Tail = NewNode;

    I din't see difference.

    Do you got an idea?

    Doubly link list is convenient since you can transverse in both direction, Head to Tail and Tail to Head.
    You can distinguish which is Head or Tail from pointers. Head->pPrevious == NULL, Tail->pNext == NULL. It contains one pointer more that cause weight and slow factor.

    If you need circular link list, you could bind Head and Tail together.

    Code:
    Head->pPrevious = Tail;
    Tail->pNext = Head;
    :cooler:
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2013
    Posts
    4
    Rep Power
    0
    ok thank you so much :)

    i will definitely try this now

IMN logo majestic logo threadwatch logo seochat tools logo