Perl Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming LanguagesPerl Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old January 31st, 2013, 03:12 PM
vroom92 vroom92 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2013
Posts: 3 vroom92 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 56 m 57 sec
Reputation Power: 0
Regex matching multiple lines

This is a pretty basic question, but i'm trying to match a string on each line and print a portion of the result. The below example is trying to look at each line which is a directory listing with filenames that are dates and print the day. I can only get it to match/print one line.

Code:
my $string = <DATA>;

foreach (my $string = <DATA>)
{
$string =~m/.{38}(\d\d\d\d)(\d\d)(\d\d)/;
print "\nDay is $3\n";
}
__DATA__
01/23/2013  05:08 AM        15,674,256 20130123.txt
01/23/2013  05:08 AM        15,674,256 20130224.txt

Reply With Quote
  #2  
Old February 1st, 2013, 07:36 AM
keath's Avatar
keath keath is offline
!~ /m$/
Dev Shed Specialist (4000 - 4499 posts)
 
Join Date: May 2004
Location: Reno, NV
Posts: 4,084 keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level)keath User rank is General 12nd Grade (Above 100000 Reputation Level) 
Time spent in forums: 2 Weeks 4 Days 6 h 49 m 56 sec
Reputation Power: 1809
On your first line you pulled the first value out of <DATA> and put it into $string, then never used it.

In the foreach loop you pulled the second and only remaining value out and performed a regex on it.

The idiomatic way to loop over a file would be with the while operator:

Code:
#!/usr/bin/perl
use strict;
use warnings;

while (my $string = <DATA>) {
	$string =~m/.{38}(\d\d\d\d)(\d\d)(\d\d)/;
	print "Day is $3\n";
}

__DATA__
01/23/2013  05:08 AM        15,674,256 20130123.txt
01/23/2013  05:08 AM        15,674,256 20130224.txt

Reply With Quote
  #3  
Old February 1st, 2013, 05:35 PM
vroom92 vroom92 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2013
Posts: 3 vroom92 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 56 m 57 sec
Reputation Power: 0
Thanks that works great. Using the while loop how would i get it to kick out non matches?

Example Below

Code:
use strict;
use warnings;

while (my $string = <DATA>) {
	$string =~m/.{38}(\d\d\d\d)(\d\d)(\d\d)/;
	print "Day is $3\n";
}

__DATA__
01/29/2013  05:02 PM               391 test2.txt
01/23/2013  05:08 AM        15,674,256 20130123.txt
01/23/2013  05:08 AM        15,674,256 20130224.txt
01/28/2013  10:44 AM                53 test.txt



Gives me use of uninitialized value $3 in concatenation for the "test.txt" files.

Reply With Quote
  #4  
Old February 2nd, 2013, 04:16 AM
Laurent_R Laurent_R is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: Jun 2012
Posts: 502 Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level)Laurent_R User rank is Major (30000 - 40000 Reputation Level) 
Time spent in forums: 4 Days 18 h 51 m 13 sec
Reputation Power: 385
This quite normal. You've got two errors in your code.

First, if I take your first line of input:

Code:
01/29/2013  05:02 PM               391 test2.txt


This will obviously not match your regex. You should test if a match occurred before trying to print $3.

For example, you could change your code to:

Perl Code:
Original - Perl Code
  1. print "Day is $3\n" if defined $3;


This will remove the warning you obtained.

The second error is more subtle, as your code will work nonetheless, but not the way you think and may be inefficient. If you count 38 characters from the start of the line, you get to the last digit of the file size. So, the next character is a space, not a digit. The match will nonethless occur because your regex will backtrack so that eventually the ".{38}" will match:

Code:
1/23/2013  05:08 AM        15,674,256 


which, even it it works according to your wish, is not really what you expect. Also, this backtracking can be quite inefficient depending on your input (especially when match fails). I would change your regex to something like this:

Perl Code:
Original - Perl Code
  1. $string =~m/^.{39}(\d\d\d\d)(\d\d)(\d\d)/;


or

Perl Code:
Original - Perl Code
  1. $string =~m/^.{38}\s(\d\d\d\d)(\d\d)(\d\d)/;


the important point (besides correcting the number of characters before your start of capture) being the start of string anchor at the beginning of the regex, which will prevent useless backtracking.

Reply With Quote
  #5  
Old February 4th, 2013, 09:09 AM
vroom92 vroom92 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2013
Posts: 3 vroom92 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 56 m 57 sec
Reputation Power: 0
Awesome. Thank you for the help!

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPerl Programming > Regex matching multiple lines

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap