Thread: File parsing

    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2013
    Posts
    7
    Rep Power
    0

    File parsing


    Hi - can anyone help.

    I have a file containing certain information - i want to read each line and capture the information in each curly bracket, if the value at the end of the line = false then also get information from the next line in the file and capture that.

    the line would be as follows

    2013-04-25 08:11:33,943 [Default_Transport_To_Channel_pool : 71] [tdtim-ValidateAction$$EnhancerByCGLIB$$a859deab--1234567-1234567-tim-191.4.28.17] INFO STATS_LOG - RqValidateOrder 3404 performLimitValidate{BOS=B}{account=1234567}{clientcurrency=EUR}{symbol=SAP.XE}{amount=220.0}{invest =false}{period=3}{channel=W}{price=59.90}{trigger price=null}{orderType=GFTO}{expiryOffset=0}{expiryDate=null} mStatus=0 mSucceeded=true


    I have messed about with the following

    while (<>) {
    my $line = $_;
    # find beginning of a action

    if ($line =~/\[Default_Transport_To_Channel_pool\s\:\s(\d+)\].*\-ValidateAction.*(\d+)\-(\d+)\-tim-(\d+).*RqValidateOrder.*mS
    ucceeded=false/) {
    my $nextline = <>;

    print ("$line,$nextline");
    }


    any help would be appreciated
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2013
    Posts
    7
    Rep Power
    0
    I would also like to maybe write each captured value to a , delimited file.

    Would i need to put it all into an array or something.

    Thanks ... super new to perl and it's wonders so any help would be lovely.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Location
    Paris area, France
    Posts
    843
    Rep Power
    496
    An easy solution, but maybe not sufficient, but it migh get you started:

    Code:
    my @d = split /}{|}/, $line;
    Now, with your sample line, the @d array contains:

    Code:
    0  '2013-04-25 08:11:33,943 [Default_Transport_To_Channel_pool : 71] [tdtim-ValidateAction--1234567-1234567-tim-191.4.28.17] INFO STATS_LOG - RqValidateOrder 3404 performLimitValidate{BOS=B'
    1  'account=1234567'
    2  'clientcurrency=EUR'
    3  'symbol=SAP.XE'
    4  'amount=220.0'
    5  'invest =false'
    6  'period=3'
    7  'channel=W'
    8  'price=59.90'
    9  'trigger price=null'
    10  'orderType=GFTO'
    11  'expiryOffset=0'
    12  'expiryDate=null'
    13  ' mStatus=0 mSucceeded=true'
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Location
    Paris area, France
    Posts
    843
    Rep Power
    496
    I was in a hurry to go when I replied above, a better solution is:

    Perl Code:
    my @d = grep /=/, split /}{|\{|}/, $line;


    which produces the following @d array:

    Code:
    0  'BOS=B'
    1  'account=1234567'
    2  'clientcurrency=EUR'
    3  'symbol=SAP.XE'
    4  'amount=220.0'
    5  'invest =false'
    6  'period=3'
    7  'channel=W'
    8  'price=59.90'
    9  'trigger price=null'
    10  'orderType=GFTO'
    11  'expiryOffset=0'
    12  'expiryDate=null'
    13  ' mStatus=0 mSucceeded=true'
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2013
    Posts
    7
    Rep Power
    0
    oh great , i will give it a go . Thanks
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2013
    Posts
    7
    Rep Power
    0
    Originally Posted by TimRoss
    oh great , i will give it a go . Thanks
    Worked a treat. Thanks
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Location
    Paris area, France
    Posts
    843
    Rep Power
    496
    Hi,

    you are welcome.
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2013
    Posts
    7
    Rep Power
    0
    Originally Posted by Laurent_R
    Hi,

    you are welcome.
    Hi, sorry another question. I have taken the above information and built a file shown below. My question is how can i parse this file and count the number of errors by error type and print out like this

    Summary:
    =======
    Error|Total|Message
    391 |35 |Stock Blocked from trading no routing

    309 |35 |Insufficient stock for sale

    527 |102 |Client has insufficient Margin: (Margin Available)




    the lines in the file are in the format

    2013-04-29 07:05:20 account=1111111 BOS=S symbol=SODA.US amount=42.0 : 309 Insufficient stock for sale

    2013-04-29 07:07:37 account=1111111 BOS=S symbol=SODA.US amount=42.0 : 309 Insufficient stock for sale

    2013-04-29 09:04:16 account=2222222 BOS=S symbol=KPNR.AM amount=3623.0 : 391 No route found

    2013-04-29 09:10:31 account=2222222 BOS=S symbol=KPNR.AM amount=3623.0 : 391 No route found

    2013-04-29 09:14:52 account=2222222 BOS=S symbol=KPNR.AM amount=3623.0 : 391 No route found

    2013-04-29 09:19:45 account=3333333 BOS=S symbol=KPNR.AM amount=4100.0 : 391 No route found

    2013-04-29 09:34:53 account=4444444 BOS=B symbol=DIS.US amount=50.0 : 527 Client has insufficient Margin: (Margin Available Spend)

    2013-04-29 09:35:28 account=4444444 BOS=B symbol=DIS.US amount=50.0 : 527 Client has insufficient Margin: (Margin Available Spend)

    2013-04-29 09:55:22 account=5555555 BOS=S symbol=AAPL.US amount=100.0 : 309 Insufficient stock for sale

    2013-04-29 10:11:45 account=6666666 BOS=B symbol=WOMV.P amount=107.0 : 391 No route found

    2013-04-29 10:12:52 account=6666666 BOS=B symbol=CODW.P amount=73.0 : 391 No route found

    2013-04-29 10:25:44 account=7777777 BOS=B symbol=SLW.US amount=50.0 : 527 Client has insufficient Margin: (Margin Available Spend)

    2013-04-29 10:34:21 account=7777777 BOS=B symbol=SLW.US amount=26.0 : 527 Client has insufficient Margin: (Margin Available Spend)

    2013-04-29 10:34:41 account=7777777 BOS=B symbol=SLW.US amount=25.0 : 527 Client has insufficient Margin: (Margin Available Spend)

    2013-04-29 10:44:27 account=5555555 BOS=B symbol=WOMV.P amount=107.0 : 391 No route found
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2013
    Posts
    7
    Rep Power
    0
    Hi have this so far , just unsure how ii would go about counting and printing the output

    while (<>) {
    my $line = $_;
    my @errortype= split /: /, $line;
    print "$errortype[1]\n";
    }
  18. #10
  19. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Location
    Paris area, France
    Posts
    843
    Rep Power
    496
    You probably need to split on space colon space (" : ") to get the last field, and to split again to get the error number and error message. Then you store an error count in one hash and the error message in another hash (could also be an hash of arrays or array of hashes, but let's keep it simpler for the example).

    Assuming your current line is in $line:

    Perl Code:
     
    chomp $line;
    my $end_field = (split / : /, $line)[-1];
    my ($error_number, $error_message) =  split / /, $end_field, 2;
    $error_number =~ s/ //g; #removing extra spaces if any to normalize key
    $error_name{$error_number} = $error_message;
    $error_count{$error_number}++;


    At the end, use the values stored in the %error_name and %error_count hashes to print your output.
  20. #11
  21. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Posts
    1
    Rep Power
    0
    Hi Laurent,

    I quite did not understand split /}{|\{|}/, $line;

    Can you pls help me in explaining the same.

    Thanks,
  22. #12
  23. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Location
    Paris area, France
    Posts
    843
    Rep Power
    496
    Hi,

    the isea in the general case is to get chunks of text between curly braces: {...}. For thgis, I split on the following regex: /}{/. This way, the text is split into the chunks of text between opening and closing curlies: {}.

    However, this does not take care of the first opening curly brace at the beginning and the last closing curly at the end.
    So, if I have the following text:

    Code:
    {period=3}{channel=W}{price=59.90}{trigger price=null}
    split on /{}/ will yield me the following array:
    Code:
    {period=3
    channel=W
    price=59.90
    trigger price=null}
    So I am giving to alternate patterns on which to split /\{/ and /}/, so that if the first pattern, /{}/ did not match, one of the two other can match.

    To put everything together I used the | alternation pattern, which leads to:

    Perl Code:
    /}{|\{|}/


    In effect, this says:
    Split on }{,
    then on {
    then on }.

    The reason for the back slash in /\{/ is that otherwise { might be interpreted as the beginning of a regex quantifier, so backslashing make sure that this is interpreted as a litteral opening curly.
  24. #13
  25. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,970
    Rep Power
    1225
    Wouldn't it be cleaner to use a character class instead of alternation in the split?

    Code:
    #!/usr/bin/perl
    
    use warnings;
    use strict;
    use Data::Dumper;
    
    my $str = '2013-04-25 08:11:33,943 [Default_Transport_To_Channel_pool : 71] [tdtim-ValidateAction$$EnhancerByCGLIB$$a859deab--1234567-1234567-tim-191.4.28.17] INFO STATS_LOG - RqValidateOrder 3404 performLimitValidate{BOS=B}{account=1234567}{clientcurrency=EUR}{symbol=SAP.XE}{amount=220.0}{invest =false}{period=3}{channel=W}{price=59.90}{trigger price=null}{orderType=GFTO}{expiryOffset=0}{expiryDate=null} mStatus=0 mSucceeded=true';
    
    my @fields = split(/[{}]+/, $str);
    
    print Dumper \@fields;
    Yields:
    Code:
    $VAR1 = [
              '2013-04-25 08:11:33,943 [Default_Transport_To_Channel_pool : 71] [tdtim-ValidateAction$$EnhancerByCGLIB$$a859deab--1234567-1234567-tim-191.4.28.17] INFO STATS_LOG - RqValidateOrder 3404 performLimitValidate',
              'BOS=B',
              'account=1234567',
              'clientcurrency=EUR',
              'symbol=SAP.XE',
              'amount=220.0',
              'invest =false',
              'period=3',
              'channel=W',
              'price=59.90',
              'trigger price=null',
              'orderType=GFTO',
              'expiryOffset=0',
              'expiryDate=null',
              ' mStatus=0 mSucceeded=true'
            ];
  26. #14
  27. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Location
    Paris area, France
    Posts
    843
    Rep Power
    496
    Yes, Fishmonger, you are right, a character class is simpler than an alternation.

    As I remember my thinking when I proposed that, I originally did not want to use a character class such as /[{}]/ because it would lead to empty string elements, but did not think at the time about simply adding the + which does solve the problem very simply.

    Having said that, this was just a quick trick to solve easily the OP's problem, and it works fine.

IMN logo majestic logo threadwatch logo seochat tools logo