Thread: Text comparison

    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    3
    Rep Power
    0

    Text comparison


    Hi!

    I am currently working on a program that compares words between two files and counts the similar words. However I am stuck and I hope you could help me out.

    Due to strtok() "US.dic" is destroyed after the first loop. Therefore I have to copy it once again by allocating memory, and that is where I fail. Thank you very much in advance.

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    int main()
    {
        char text[1000];
       // char lib[1000];
        int correct=0;
        int textcount=0;
    
        FILE *pText=fopen("text.txt","r");
        FILE *pLib=fopen("US.dic","r");
    
        char * dic  ;
        /*---------------------------US.dic---------------------------*/
        if (pText!=NULL)
        {
            long lSize;
            fseek (pLib , 0 , SEEK_END);
            lSize = ftell(pLib);
            rewind (pLib);
    
            dic = (char*) calloc (lSize+1,sizeof(char));
            fread(dic, sizeof(char),lSize,pLib);
            dic[lSize] = '\0';
            printf("%s",dic);
        }
        else
        {
            perror ("File not found. Insert text.txt.");
        }
    
        /*--------------------------Text.txt--------------------------*/
        if(pText!=NULL)
        {
           while(fgets(text,1000,pText)!=NULL)
            {
                size_t a=strlen(text);
                text[a-1]='\0';
                fputs(text,stdout);
                putc('\n', stdout);
                textcount++;
                
                //char * word = strtok(text,"\n");
               // memcpy(val, text, strlen(text));
                char * elem = strtok(dic,"\n");
               // strncpy(text,elem,strlen(text)); //maybe useless
                while(elem!=NULL)
                {
                 if(strcmp(elem,text)==0)
                 {
                     correct++;
                     break;
                 }
                    elem = strtok(NULL,"\n");
                }
            }
            /*char * pword = strtok(text, "\n");
            textcount++;
            while (pword!=NULL)
            {
                pword=strtok(NULL, "\n");
    
                if(pword!=NULL)
                    textcount++;
            }*/
        }
        else
        {
            perror("File not found. Insert text.txt.");
        }
    
        fclose(pText);
        fclose(pLib);
    
        printf("\nWord count: %d\n", textcount);
        printf("Correct word count: %d", correct);
    
        return 0;
    }
  2. #2
  3. Contributed User
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jun 2005
    Posts
    4,417
    Rep Power
    1871
    Instead of reading the dictionary in one large block, then trying to tokenise it every time you read a new word, why not read the dictionary as an array of tokens just once.

    Then when you read words from the text file, it's just a simple for loop through an array of dictionary tokens.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    3
    Rep Power
    0
    Thank you. I am just starting to program and it took me a long time to get to that point. I would rather stick to that approach. Do you know how I can solve that problem (with a code)?
  6. #4
  7. Contributed User
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jun 2005
    Posts
    4,417
    Rep Power
    1871
    If you're looking for something suitably messy and hacky, try
    Code:
    #include <stdio.h>
    #include <string.h>
    
    int main()
    {
      char test[] = "one\ntwo\nthree\n";
      char *p = test;
      while ( (p=strtok(p,"\n")) != NULL ) {
        printf("Token=%s\n", p );
        p[strlen(p)] = '\n';    /* put the token back */
        p = NULL;
      }
      printf("Original String=%s\n", test );
      return 0;
    }
    But if you have consecutive tokens, expect this to blow up.
    Like I said - ugly way of doing it.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    3
    Rep Power
    0
    Thanks

IMN logo majestic logo threadwatch logo seochat tools logo