#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    27
    Rep Power
    0

    Looking for key words in a really long string


    I have to make a statistical analysis from data of another program, wich saves logs in form of text. The idea is to paste the log in a textbox and solve everything with the code, so I'll have no trouble whenever I use it.
    The text is really long, is there a limit for how long can the text be? If so, is there another solution for long strings?

    Most of the lines (I'd say 70%) of the lines of the log have no information from the ones I need, so I assume removing those lines would make the string shorter and the program would run a lot faster, being the first step in the algorithm, and I can tell if the line has needed information on the 5th letter. Is it possible to remove lines of a string like that? This is easy I guess, but I just want to make sure before starting. Something like counting how many characters before the "\n" after "if" to check fifth letter and then decrease the position of every other by that amount.

    Now the hardest part(Again, since I work with numbers the most, I can now just the basics of <string.h>). Is is possible to scan the text after removing the "useless" part checking for some key words and counting them?

    I'm using Visual C++, Windows 7. I'm not very experienced with C++, but I've already read a lot and did some simple programs to get how it works, knowing C makes it simpler, but still there might be basic things I didn't learn.
  2. #2
  3. Contributed User
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jun 2005
    Posts
    4,376
    Rep Power
    1871
    I guess you could start with this to read a whole file (one line at a time), and test the 5th character.
    Code:
    char buff[BUFSIZ];
    FILE *fp = fopen("log.txt","r");
    while ( fgets(buff,BUFSIZ,fp) != NULL ) {
      if ( buff[4] == '?' ) {
        // do your thing
      }
    }
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    27
    Rep Power
    0
    Originally Posted by salem
    I guess you could start with this to read a whole file (one line at a time), and test the 5th character.
    Code:
    char buff[BUFSIZ];
    FILE *fp = fopen("log.txt","r");
    while ( fgets(buff,BUFSIZ,fp) != NULL ) {
      if ( buff[4] == '?' ) {
        // do your thing
      }
    }
    Well, that's a simpler code than the one I was writing, I could also use a variable to sabe wich lines have the useful data.

    Also, I just wrote a function with parameters the string and the size, that will find the useless lines and bring all the other characteres back until the whole line is "deleted", this is what I got:

    Code:
    void RemoveUseless(char log[],int size){
        int CorrectTil = -1;
        int Position = 0;
        int CharinLine;
        int i;
    
        while(log[CorrectTil + 1] != '\0'){
            if(log[Position + 4] == 'R'){      
                while(log[Position] != '\n'){    //if I'll use the line, I'll just find in what position that line ends
                    Position++;
                }                               //Now Position holds the position of the end of the lines, so we can say it's correct til this value.
                CorrectTil = Position;
                Position++;
            }
    
            else {
                CharinLine = 0;
                while(log[Position] =! '\n'){
                    CharinLine++;
                    Position++;
                }
                CharinLine++;
                for(i = Position+1; i<size;i++){
                    log[i-CharinLine] = log[i];
                }
                size = size - CharinLine;
                Position = CorrectTil + 1;
            }
        }
    }
    I just need to find a way of putting all the text pasted in a text box in the string, wich I'm not sure if I can.
    Also not sure how long that "for" would take if I get millions of characters.
  6. #4
  7. Contributed User
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jun 2005
    Posts
    4,376
    Rep Power
    1871
    Code:
                for(i = Position+1; i<size;i++){
                    log[i-CharinLine] = log[i];
                }
    Well this particular piece of code will really kill the performance.
    Every time you want to delete something, you're copying the ENTIRE tail of the string. If this is megabytes in length, you're going to burn Watts of power (and hours of time) doing it.

    Your outermost loop should look like this.
    Code:
    for ( i = 0, j = 0 ; log[i] != '\0' ; i++ ) {
      if ( good(log[i]) ) log[j++] = log[i];
    }
    log[j] = '\0';
    Where good() is whatever series of tests you do to decide whether you want to keep a character (or not). Each character you want to keep is moved exactly once.


    Just so we're clear, are you talking about very long files, or very long lines?

    You seem to have just read the entire file into a single block of memory, then proceed to pick your way through it using \n as a delimiter.

    Using fgets() to read a line at a time, then copying what you need to keep elsewhere might be a better alternative.


    > while(log[Position] =! '\n')
    Ouch.
    You've written
    while( log[Position] = (!'\n') )
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper

IMN logo majestic logo threadwatch logo seochat tools logo