Page 1 of 2 12 Last
  • Jump to page:
    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    8
    Rep Power
    0

    Log parsing using an array


    Hello all. I'm an ex programmer trying to learn perl and having a problem with some code.

    My goal is to write a script I can run against log files to throw out records that are noise so I can see anomalies.

    My log file has a code in each record. I created a list of codes I don't care about and stored them as a list in a file named ASA-Codes.txt. My intention is to read this list into an array, and then compare each element of the array to each record in a syslog.log file. If records do not match elements of the array they will be written out to a file.

    (I will also modify the regex used to compare the array elements (or use a command line option) to do the reverse and write out to a file all matches to the array elements.)

    I've tried several different ways to do this but every way results in unwanted/invalid output.

    In this script I used grep and the output is not filtered and the file size is larger. The records appear to be missing linefeeds as well:


    # !C:\Perl64\bin


    my $data_file = 'ASA-Codes.txt';
    open DATA, "$data_file" or die "can't open $data_file $!";

    my @array_of_data = <DATA>;
    close (DATA);

    print "The array MAX index is ==> $#array_of_data ...\n";

    open (DATA, "< syslog.log");
    open (OUT,"> filteredlog.txt");

    while( <DATA> )
    {
    #print $_;
    chomp $_;
    $lookfor=$_;

    @match=grep(/$_/,@array_of_data);
    if (@match) {
    #print "MATCH :: << $lookfor >> \n";
    print OUT "$_/n";
    #print "$lookfor/n";
    }
    else
    {
    #print "NO-MATCH :: << $lookfor >> \n";

    }
    }
    close (DATA);
    close(OUT);

    exit;


    Any help would be appreciated. Thanks!
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,873
    Rep Power
    1225
    It would be helpful if you post sample lines from each file so that we can determine the best way to filter the data.

    My first thought would be to store the ASA codes in a hash instead of an array, but will need to see the actual data to say for sure if a hash would be better.

    A few code comments:

    1) You should always include the strict and warnings pragmas near the top of your scripts. They will help to point out problems which could be difficult to track down without their help.
    Code:
    use strict;
    use warnings;
    2) You should not use DATA when opening a filehandle. DATA is one of Perl's built-in filehandles and it's best practice not to override it.

    3) You should use a lexical filehandle instead of barewords.

    4) You should use the 3 arg form of open and ALWAYS check the return code to verify that it was successful and take action if it wasn't.

    Code:
    open my $log_fh, '<', 'syslog.log' or die "failed to open 'syslog.log' $!";

    Comments on this post

    • Laurent_R agrees
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    776
    Rep Power
    495
    Yes I agree with Fishmonger, a hash is most probably better than an array (much faster, simpler code), but we need to see data samples from both your files to confirm.
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    8
    Rep Power
    0
    Point taken on hashes but unfortunately I'll need to read about them before asking questions. Also appreciate the other recommendations and I'll integrate them into the code.

    Here is a sample of ASA-Codes.txt:
    %ASA-6-302014
    %ASA-6-302013
    %ASA-6-302020
    %ASA-6-302021
    %ASA-6-100609
    %ASA-6-302017
    %ASA-6-302019
    %ASA-6-302025

    And syslog.log:
    2013-06-05T08:37:02+01:00 172.22.0.1 %ASA-6-302014: Teardown TCP connection 7266670 for OLXWAN:192.168.20.102/49522 to OLXLAN:172.22.1.31/3389 duration 0:01:22 bytes 144430 TCP Reset-I
    2013-06-05T08:37:06+01:00 172.22.0.1 %ASA-6-302020: Built inbound ICMP connection for faddr 156.122.44.227/98 gaddr 172.16.150.194/0 laddr 172.22.1.31/0
    2013-06-05T08:37:06+01:00 172.22.0.1 %ASA-6-302021: Teardown ICMP connection for faddr 156.122.44.227/98 gaddr 172.16.150.194/0 laddr 172.22.1.31/0

    Syslog.log uses variable length records with variations in the field names and types. There are no delimiters.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,873
    Rep Power
    1225
    Code:
    #!C:\Perl64\bin\perl
    
    use strict;
    use warnings;
    
    my $data_file = 'ASA-Codes.txt';
    open my $asa_fh, '<', $data_file or die "can't open '$data_file' $!";
    
    my %ASA_code;
    while (my $code = <$asa_fh>) {
        chomp $code;
        $ASA_code{$code}++;
    }
    close $asa_fh;
    
    open my $log_fh, '<', 'syslog.log' or die "failed to open 'syslog.log' $!";
    while (my $log_entry = <$log_fh>) {
        my $code = (split /\s+/, $log_entry)[2];
        $code =~ s/:$//;
        print $log_entry if exists $ASA_code{$code};
    }
    close $log_fh;
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    8
    Rep Power
    0
    After making a few changes to get this to print the records to a file, I ran it against the logfile and the output contained everything in the syslog.log file.

    Did I use the wrong variable to send to the output file?

    Code:
    #!C:\Perl64\bin\perl
    
    use strict;
    use warnings;
    
    my $data_file = 'ASA-Codes.txt';
    open my $asa_fh, '<', $data_file or die "can't open '$data_file' $!";
    
    my %ASA_code;
    while (my $code = <$asa_fh>) {
        chomp $code;
        $ASA_code{$code}++;
    }
    close $asa_fh;
    
    
    open my $log_fh, '<', 'syslog.log' or die "failed to open 'syslog.log' $!";
    open my $output_fh, '>', 'log.txt' or die "failed to open 'log.txt' $!";
    
    while (my $log_entry = <$log_fh>) {
        my $code = (split /\s+/, $log_entry)[2];
        $code =~ s/:$//;
         print $output_fh $log_entry;
    # print $log_entry if exists $ASA_code{$code};
    }
    close $log_fh;
    close $output_fh;
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,873
    Rep Power
    1225
    You used the wrong print statement.

    You just needed to add the output filehandle to the print statement I used.

    Code:
    print {$output_fh} $log_entry if exists $ASA_code{$code};
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    776
    Rep Power
    495
    Perl Code:
         print $output_fh $log_entry;
    # print $log_entry if exists $ASA_code{$code};


    You commented out the print with the condition and are just printing everything into $log_entry.

    Having said that, I think that the condition in Fishmonger's code is just reverse of what you want.

    You said:

    I created a list of codes I don't care about and stored them as a list in a file named ASA-Codes.txt.
    It seems to me that Fishmonger kept the codes that you don't care about, instead of discarding them. But you only need to reverse the condition:

    Perl Code:
    print $log_entry unless exists $ASA_code{$code};
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    8
    Rep Power
    0
    Originally Posted by Laurent_R
    Perl Code:
         print $output_fh $log_entry;
    # print $log_entry if exists $ASA_code{$code};


    You commented out the print with the condition and are just printing everything into $log_entry.

    Having said that, I think that the condition in Fishmonger's code is just reverse of what you want.

    You said:



    It seems to me that Fishmonger kept the codes that you don't care about, instead of discarding them. But you only need to reverse the condition:

    Perl Code:
    print $log_entry unless exists $ASA_code{$code};
    Thanks. I saw that. I meant to change the condition before I posted the code. As I said in the opening post, I'd like to alter this at some point with a command line option that dictates whether to include or exclude the matches.
  18. #10
  19. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    8
    Rep Power
    0
    This works better in terms of not selecting every record. But it still is selecting records that don't match the ASA-Codes in the ASA-Codes.txt file

    For example:

    2013-06-04T14:46:28+01:00 172.22.0.1 %ASA-6-106100: access-list LAN_INCOMING denied tcp OBXLAN/10.50.200.130(58946) -> OBXWAN/208.47.254.65(80) hit-cnt 2 300-second interval [0x887da1fc, 0x0]

    There are others like this that don't match the codes I specify. I had this happen in some of the other methods I used. The only script that I wrote which didn't do this was one where I declared a single code in a variable and compared that to every record. Any idea why I get hits on these other codes?
  20. #11
  21. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,873
    Rep Power
    1225
    Please rephrase your needs.

    Are you wanting all log entries where the ASA_code is NOT in the ASA-Codes.txt file? Which is what your opening post states.

    Or, do you want all log entries where the ASA_code IS in the ASA-Codes.txt file?

    Or do you want something else?
  22. #12
  23. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    8
    Rep Power
    0
    Originally Posted by FishMonger
    Please rephrase your needs.

    Are you wanting all log entries where the ASA_code is NOT in the ASA-Codes.txt file? Which is what your opening post states.

    Or, do you want all log entries where the ASA_code IS in the ASA-Codes.txt file?

    Or do you want something else?
    You are right, I confused myself. Looks like it's working perfectly. I was used to the output of the script before I altered it.

    There is one more thing tho... I would like the ability to put any string into that text file and have the ability to match on it. Looking at the script in it's new form, it doesn't look like it will work for that. Right?

    BTW, thanks for your help on this...
  24. #13
  25. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,873
    Rep Power
    1225
    You would need to know/provide 2 or 3 bits of information before being able to alter the script.

    1) The field (index) that needs to be compared

    2) The string to be compared.

    3) You may also need to provide info on the type of match i.e., is it an exact equality match or a substring match, or a pattern match.
  26. #14
  27. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    8
    Rep Power
    0
    Ok.

    Let me see if I understand so far...

    This snippet reads throu the ASA-Codes.txt and indexes the strings:

    Code:
    my %ASA_code;
    while (my $code = <$asa_fh>) {
        chomp $code;
        $ASA_code{$code}++;
    }
    close $asa_fh;
    This part loses me but I'll try:

    Code:
    while (my $log_entry = <$log_fh>) {
        my $code = (split /\s+/, $log_entry)[2];
        $code =~ s/:$//;
        print $log_entry if exists $ASA_code{$code};
    }
    This loops through the records in the syslog.txt file and breaks the record into fields based on spaces being the delimiter(?). Not sure what the purpose of the [2] is.

    The " $code =~ s/:$//;" is unclear to me but the next line compares each field in the current syslog.txt to the ASA Codes array elements(?)
  28. #15
  29. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,873
    Rep Power
    1225
    Originally Posted by hueyii
    Ok.

    Let me see if I understand so far...

    This snippet reads throu the ASA-Codes.txt and indexes the strings:

    Code:
    my %ASA_code;
    while (my $code = <$asa_fh>) {
        chomp $code;
        $ASA_code{$code}++;
    }
    close $asa_fh;
    Correct


    This part loses me but I'll try:

    Code:
    while (my $log_entry = <$log_fh>) {
        my $code = (split /\s+/, $log_entry)[2];
        $code =~ s/:$//;
        print $log_entry if exists $ASA_code{$code};
    }
    This loops through the records in the syslog.txt file and breaks the record into fields based on spaces being the delimiter(?). Not sure what the purpose of the [2] is.
    Correct

    The [2] portion is an array slice and means that we only want to extract the 3rd field, which is array index 2.


    The " $code =~ s/:$//;" is unclear to me but the next line compares each field in the current syslog.txt to the ASA Codes array elements(?)
    That is a regex that strips off the trailing : colon from the field we extracted.

    Comments on this post

    • hueyii agrees
Page 1 of 2 12 Last
  • Jump to page:

IMN logo majestic logo threadwatch logo seochat tools logo