#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    3
    Rep Power
    0

    Regex matching multiple lines


    This is a pretty basic question, but i'm trying to match a string on each line and print a portion of the result. The below example is trying to look at each line which is a directory listing with filenames that are dates and print the day. I can only get it to match/print one line.

    Code:
    my $string = <DATA>;
    
    foreach (my $string = <DATA>)
    {
    $string =~m/.{38}(\d\d\d\d)(\d\d)(\d\d)/;
    print "\nDay is $3\n";
    }
    __DATA__
    01/23/2013  05:08 AM        15,674,256 20130123.txt
    01/23/2013  05:08 AM        15,674,256 20130224.txt
  2. #2
  3. !~ /m$/
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    May 2004
    Location
    Reno, NV
    Posts
    4,259
    Rep Power
    1810
    On your first line you pulled the first value out of <DATA> and put it into $string, then never used it.

    In the foreach loop you pulled the second and only remaining value out and performed a regex on it.

    The idiomatic way to loop over a file would be with the while operator:

    Code:
    #!/usr/bin/perl
    use strict;
    use warnings;
    
    while (my $string = <DATA>) {
    	$string =~m/.{38}(\d\d\d\d)(\d\d)(\d\d)/;
    	print "Day is $3\n";
    }
    
    __DATA__
    01/23/2013  05:08 AM        15,674,256 20130123.txt
    01/23/2013  05:08 AM        15,674,256 20130224.txt
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    3
    Rep Power
    0
    Thanks that works great. Using the while loop how would i get it to kick out non matches?

    Example Below

    Code:
    use strict;
    use warnings;
    
    while (my $string = <DATA>) {
    	$string =~m/.{38}(\d\d\d\d)(\d\d)(\d\d)/;
    	print "Day is $3\n";
    }
    
    __DATA__
    01/29/2013  05:02 PM               391 test2.txt
    01/23/2013  05:08 AM        15,674,256 20130123.txt
    01/23/2013  05:08 AM        15,674,256 20130224.txt
    01/28/2013  10:44 AM                53 test.txt

    Gives me use of uninitialized value $3 in concatenation for the "test.txt" files.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    832
    Rep Power
    496
    This quite normal. You've got two errors in your code.

    First, if I take your first line of input:

    Code:
    01/29/2013  05:02 PM               391 test2.txt
    This will obviously not match your regex. You should test if a match occurred before trying to print $3.

    For example, you could change your code to:

    Perl Code:
    print "Day is $3\n" if defined $3;


    This will remove the warning you obtained.

    The second error is more subtle, as your code will work nonetheless, but not the way you think and may be inefficient. If you count 38 characters from the start of the line, you get to the last digit of the file size. So, the next character is a space, not a digit. The match will nonethless occur because your regex will backtrack so that eventually the ".{38}" will match:

    Code:
    1/23/2013  05:08 AM        15,674,256
    which, even it it works according to your wish, is not really what you expect. Also, this backtracking can be quite inefficient depending on your input (especially when match fails). I would change your regex to something like this:

    Perl Code:
    $string =~m/^.{39}(\d\d\d\d)(\d\d)(\d\d)/;


    or

    Perl Code:
    $string =~m/^.{38}\s(\d\d\d\d)(\d\d)(\d\d)/;


    the important point (besides correcting the number of characters before your start of capture) being the start of string anchor at the beginning of the regex, which will prevent useless backtracking.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    3
    Rep Power
    0
    Awesome. Thank you for the help!

IMN logo majestic logo threadwatch logo seochat tools logo