C Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming LanguagesC Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old January 9th, 2013, 01:53 PM
VicFS VicFS is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2012
Posts: 24 VicFS User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 8 h 56 m 27 sec
Reputation Power: 0
Looking for key words in a really long string

I have to make a statistical analysis from data of another program, wich saves logs in form of text. The idea is to paste the log in a textbox and solve everything with the code, so I'll have no trouble whenever I use it.
The text is really long, is there a limit for how long can the text be? If so, is there another solution for long strings?

Most of the lines (I'd say 70%) of the lines of the log have no information from the ones I need, so I assume removing those lines would make the string shorter and the program would run a lot faster, being the first step in the algorithm, and I can tell if the line has needed information on the 5th letter. Is it possible to remove lines of a string like that? This is easy I guess, but I just want to make sure before starting. Something like counting how many characters before the "\n" after "if" to check fifth letter and then decrease the position of every other by that amount.

Now the hardest part(Again, since I work with numbers the most, I can now just the basics of <string.h>). Is is possible to scan the text after removing the "useless" part checking for some key words and counting them?

I'm using Visual C++, Windows 7. I'm not very experienced with C++, but I've already read a lot and did some simple programs to get how it works, knowing C makes it simpler, but still there might be basic things I didn't learn.

Reply With Quote
  #2  
Old January 9th, 2013, 02:22 PM
salem's Avatar
salem salem is offline
Contributed User
Click here for more information
 
Join Date: Jun 2005
Posts: 3,835 salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)  Folding Points: 153 Folding Title: Novice Folder
Time spent in forums: 2 Months 3 Weeks 2 Days 16 h 3 m 48 sec
Reputation Power: 1774
I guess you could start with this to read a whole file (one line at a time), and test the 5th character.
Code:
char buff[BUFSIZ];
FILE *fp = fopen("log.txt","r");
while ( fgets(buff,BUFSIZ,fp) != NULL ) {
  if ( buff[4] == '?' ) {
    // do your thing
  }
}
__________________
If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
If at first you don't succeed, try writing your phone number on the exam paper

Reply With Quote
  #3  
Old January 9th, 2013, 03:20 PM
VicFS VicFS is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2012
Posts: 24 VicFS User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 8 h 56 m 27 sec
Reputation Power: 0
Quote:
Originally Posted by salem
I guess you could start with this to read a whole file (one line at a time), and test the 5th character.
Code:
char buff[BUFSIZ];
FILE *fp = fopen("log.txt","r");
while ( fgets(buff,BUFSIZ,fp) != NULL ) {
  if ( buff[4] == '?' ) {
    // do your thing
  }
}
Well, that's a simpler code than the one I was writing, I could also use a variable to sabe wich lines have the useful data.

Also, I just wrote a function with parameters the string and the size, that will find the useless lines and bring all the other characteres back until the whole line is "deleted", this is what I got:

Code:
void RemoveUseless(char log[],int size){
    int CorrectTil = -1;
    int Position = 0;
    int CharinLine;
    int i;

    while(log[CorrectTil + 1] != '\0'){
        if(log[Position + 4] == 'R'){      
            while(log[Position] != '\n'){    //if I'll use the line, I'll just find in what position that line ends
                Position++;
            }                               //Now Position holds the position of the end of the lines, so we can say it's correct til this value.
            CorrectTil = Position;
            Position++;
        }

        else {
            CharinLine = 0;
            while(log[Position] =! '\n'){
                CharinLine++;
                Position++;
            }
            CharinLine++;
            for(i = Position+1; i<size;i++){
                log[i-CharinLine] = log[i];
            }
            size = size - CharinLine;
            Position = CorrectTil + 1;
        }
    }
}


I just need to find a way of putting all the text pasted in a text box in the string, wich I'm not sure if I can.
Also not sure how long that "for" would take if I get millions of characters.

Reply With Quote
  #4  
Old January 10th, 2013, 04:02 AM
salem's Avatar
salem salem is offline
Contributed User
Click here for more information
 
Join Date: Jun 2005
Posts: 3,835 salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)salem User rank is General 12nd Grade (Above 100000 Reputation Level)  Folding Points: 153 Folding Title: Novice Folder
Time spent in forums: 2 Months 3 Weeks 2 Days 16 h 3 m 48 sec
Reputation Power: 1774
Code:
            for(i = Position+1; i<size;i++){
                log[i-CharinLine] = log[i];
            }

Well this particular piece of code will really kill the performance.
Every time you want to delete something, you're copying the ENTIRE tail of the string. If this is megabytes in length, you're going to burn Watts of power (and hours of time) doing it.

Your outermost loop should look like this.
Code:
for ( i = 0, j = 0 ; log[i] != '\0' ; i++ ) {
  if ( good(log[i]) ) log[j++] = log[i];
}
log[j] = '\0';

Where good() is whatever series of tests you do to decide whether you want to keep a character (or not). Each character you want to keep is moved exactly once.


Just so we're clear, are you talking about very long files, or very long lines?

You seem to have just read the entire file into a single block of memory, then proceed to pick your way through it using \n as a delimiter.

Using fgets() to read a line at a time, then copying what you need to keep elsewhere might be a better alternative.


> while(log[Position] =! '\n')
Ouch.
You've written
while( log[Position] = (!'\n') )

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesC Programming > Looking for key words in a really long string

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap