The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Programming Languages
> C Programming
|
Looking for key words in a really long string
Discuss Looking for key words in a really long string in the C Programming forum on Dev Shed. Looking for key words in a really long string C programming forum discussing all C derivatives, including C#, C++, Object-C, and even plain old vanilla C. These languages are low level languages, and used on projects such as device drivers, compilers, and even whole computer operating systems.
|
|
 |
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

January 9th, 2013, 01:53 PM
|
|
Registered User
|
|
Join Date: Nov 2012
Posts: 24
Time spent in forums: 8 h 56 m 27 sec
Reputation Power: 0
|
|
|
Looking for key words in a really long string
I have to make a statistical analysis from data of another program, wich saves logs in form of text. The idea is to paste the log in a textbox and solve everything with the code, so I'll have no trouble whenever I use it.
The text is really long, is there a limit for how long can the text be? If so, is there another solution for long strings?
Most of the lines (I'd say 70%) of the lines of the log have no information from the ones I need, so I assume removing those lines would make the string shorter and the program would run a lot faster, being the first step in the algorithm, and I can tell if the line has needed information on the 5th letter. Is it possible to remove lines of a string like that? This is easy I guess, but I just want to make sure before starting. Something like counting how many characters before the "\n" after "if" to check fifth letter and then decrease the position of every other by that amount.
Now the hardest part(Again, since I work with numbers the most, I can now just the basics of <string.h>). Is is possible to scan the text after removing the "useless" part checking for some key words and counting them?
I'm using Visual C++, Windows 7. I'm not very experienced with C++, but I've already read a lot and did some simple programs to get how it works, knowing C makes it simpler, but still there might be basic things I didn't learn.
|

January 9th, 2013, 02:22 PM
|
 |
Contributed User
|
|
|
|
I guess you could start with this to read a whole file (one line at a time), and test the 5th character.
Code:
char buff[BUFSIZ];
FILE *fp = fopen("log.txt","r");
while ( fgets(buff,BUFSIZ,fp) != NULL ) {
if ( buff[4] == '?' ) {
// do your thing
}
}
|

January 9th, 2013, 03:20 PM
|
|
Registered User
|
|
Join Date: Nov 2012
Posts: 24
Time spent in forums: 8 h 56 m 27 sec
Reputation Power: 0
|
|
Quote: | Originally Posted by salem I guess you could start with this to read a whole file (one line at a time), and test the 5th character.
Code:
char buff[BUFSIZ];
FILE *fp = fopen("log.txt","r");
while ( fgets(buff,BUFSIZ,fp) != NULL ) {
if ( buff[4] == '?' ) {
// do your thing
}
}
| Well, that's a simpler code than the one I was writing, I could also use a variable to sabe wich lines have the useful data.
Also, I just wrote a function with parameters the string and the size, that will find the useless lines and bring all the other characteres back until the whole line is "deleted", this is what I got:
Code:
void RemoveUseless(char log[],int size){
int CorrectTil = -1;
int Position = 0;
int CharinLine;
int i;
while(log[CorrectTil + 1] != '\0'){
if(log[Position + 4] == 'R'){
while(log[Position] != '\n'){ //if I'll use the line, I'll just find in what position that line ends
Position++;
} //Now Position holds the position of the end of the lines, so we can say it's correct til this value.
CorrectTil = Position;
Position++;
}
else {
CharinLine = 0;
while(log[Position] =! '\n'){
CharinLine++;
Position++;
}
CharinLine++;
for(i = Position+1; i<size;i++){
log[i-CharinLine] = log[i];
}
size = size - CharinLine;
Position = CorrectTil + 1;
}
}
}
I just need to find a way of putting all the text pasted in a text box in the string, wich I'm not sure if I can.
Also not sure how long that "for" would take if I get millions of characters.
|

January 10th, 2013, 04:02 AM
|
 |
Contributed User
|
|
|
|
Code:
for(i = Position+1; i<size;i++){
log[i-CharinLine] = log[i];
}
Well this particular piece of code will really kill the performance.
Every time you want to delete something, you're copying the ENTIRE tail of the string. If this is megabytes in length, you're going to burn Watts of power (and hours of time) doing it.
Your outermost loop should look like this.
Code:
for ( i = 0, j = 0 ; log[i] != '\0' ; i++ ) {
if ( good(log[i]) ) log[j++] = log[i];
}
log[j] = '\0';
Where good() is whatever series of tests you do to decide whether you want to keep a character (or not). Each character you want to keep is moved exactly once.
Just so we're clear, are you talking about very long files, or very long lines?
You seem to have just read the entire file into a single block of memory, then proceed to pick your way through it using \n as a delimiter.
Using fgets() to read a line at a time, then copying what you need to keep elsewhere might be a better alternative.
> while(log[Position] =! '\n')
Ouch.
You've written
while( log[Position] = (!'\n') )
|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|