#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2009
    Posts
    8
    Rep Power
    0

    Perl regex not matching...


    It's been a while since I had to do anything in perl, and I've been working in php and C++ in the meantime, so it's very possible I've done something stupid; however, I can't get my regex to report a match:

    Code:
    if ($line =~ /(Resource Donation)(March\s\d,\s\d\d\d\d,\s\d\d:\d\d)(\w.+)/i) {
    
    #rest of program here
    
    } else {
                   print OUTPUT "line doesn't match: ";
    
    		if ($line !~ /Resource Donation/) {
    
    			print OUTPUT "Non-resource ";
    
    		}
    
    		if ($line !~ /(March\s\d,\s\d\d\d\d,\s\d\d:\d\d)/) {
    
    			print OUTPUT "invalid date ";
    
    		}
    
    		if ($line !~ /(\w.+)/) {
    
    			print OUTPUT "no words";
    
    		}
    
    		print OUTPUT "\n";
    
    }

    my input file consists of several lines in the format:
    Resource Donation March 7, 2009, 9:52 Avallach donated 150000 iron to the clan.

    however, my output file simply consists of several lines of:
    line doesn't match: Non-resource invalid date no words


    Have I done something silly?
  2. #2
  3. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Your regex:

    Code:
    /(Resource Donation)(March\s\d,\s\d\d\d\d,\s\d\d:\d\d)(\w.+)/i
    - does not have a space between "Donation" and "March";
    - defined the hour-part of your regex as HH:MM while your example string is like this: H:MM;
    - after the HH:MM, there's no space: \w does not encapsulate spaces;

    That said, this ought to work:

    Code:
    /(Resource\s+Donation)\s+(March\s+\d{1,2},\s+\d{4},\s+\d{1,2}:\d{2})\s+(.+)/i
    Good luck!
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2009
    Posts
    8
    Rep Power
    0
    Thanks; I still had issues after that but it turns out I was using "chomp" incorrectly >.> it works now
  6. #4
  7. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by yamikuronue
    Thanks; I still had issues after that but it turns out I was using "chomp" incorrectly >.> it works now
    Good to hear that, and you're welcome.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    2
    Rep Power
    0
    HI there!

    This is probably not the place to ask my question, but I cannot find the place on your page to do this. i'm new at Perl. I'm trying to write a Perl RE to match strings of the format 0[((11)*0)*|((1)(11)*)*] in other words strings starting with 0, followed by 0 or more occurrences of (11)*0 or 0 or more occurrences of 1(11)*. Examples of strings to be accepted:
    0111
    011011110
    To explain a bit more: No adjacent 0's allowed. Between any two 0's must be an even number of 1's. If the string ends with a 1 the last group of 1's is an uneven number of 1's. The shortest string is 0.

    My expression looks like this, but is not working:
    /0[ ((11)*0)* | ((1)(11)*)*]/

    Your assistance will be appreciated.
    babayPerl
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    828
    Rep Power
    496
    Hi,

    I do not have time right now to go through your stuff in depth, but I would like to point out a couple of probably wrong things in your regex after a quick look:

    1. Why do you have a space as a third character in your regex, it does not seem to fit your requirement.

    2. More importantly: [...] is used for defining a character class, which is obviously not what you are doing. If you are using them for defining an alternation, this is wrong. Your square brackets should most probably be replaced with parens.
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    828
    Rep Power
    496
    A bit more time now...

    In addition to the main defects already mentioned earlier...

    strings starting with 0 ... easy enough: /^0...

    followed by 0 or more occurrences of (11)*0: this does not work, according to your other rules: /^0(11)*0/ will match "00", which is forbidden. This might be better: /^0((11)+0)?..., but I can't know for sure, because your description is contradictory, I can only try to guess...

    or 0 or more occurrences of 1(11)*.: this also does not work, because an odd number of 1 will match, which also contradicts your other rules.

    In other word, since you are describing your requirement in part with false partial regexes, it is impossible to figure out exactly what you need.

    Please explain your exact matching rules in plain English (no partial regex), detailed enough.

    Overall, it might be easier to use several negative regexes, to rule out some forbidden cases in an easier way, before you try to actually match the real one.

    For example:
    Code:
    next if /00/ or /01(11)?0/ or /^[^0]/; 
    print "matched\n" if /...some regex .../
    This rules out:
    - two consecutive 0, or
    - two 0 separated by an odd number of 1, or
    - lines not starting by 0.

    Once you have excluded those forbidden cases (and possibly others to be added), the real matching regex might be far easier to write, as you no longer need to check, for example, the number of 1 between to 0: you know that you always have an even number of 1 between any two 0. It might actually be almost finished.
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    2
    Rep Power
    0
    Originally Posted by Laurent_R
    A bit more time now...

    In addition to the main defects already mentioned earlier...

    strings starting with 0 ... easy enough: /^0...

    followed by 0 or more occurrences of (11)*0: this does not work, according to your other rules: /^0(11)*0/ will match "00", which is forbidden. This might be better: /^0((11)+0)?..., but I can't know for sure, because your description is contradictory, I can only try to guess...

    or 0 or more occurrences of 1(11)*.: this also does not work, because an odd number of 1 will match, which also contradicts your other rules.

    In other word, since you are describing your requirement in part with false partial regexes, it is impossible to figure out exactly what you need.

    Please explain your exact matching rules in plain English (no partial regex), detailed enough.

    Overall, it might be easier to use several negative regexes, to rule out some forbidden cases in an easier way, before you try to actually match the real one.

    For example:
    Code:
    next if /00/ or /01(11)?0/ or /^[^0]/; 
    print "matched\n" if /...some regex .../
    This rules out:
    - two consecutive 0, or
    - two 0 separated by an odd number of 1, or
    - lines not starting by 0.

    Once you have excluded those forbidden cases (and possibly others to be added), the real matching regex might be far easier to write, as you no longer need to check, for example, the number of 1 between to 0: you know that you always have an even number of 1 between any two 0. It might actually be almost finished.
    Hi there,

    Thank you so much for your reply. I will study it in depth a bit later today.
  16. #9
  17. kill 9, $$;
    Devshed Supreme Being (6500+ posts)

    Join Date
    Sep 2001
    Location
    Shanghai, An tSín
    Posts
    6,897
    Rep Power
    3886
    Originally Posted by Laurent_R
    This rules out:
    - two consecutive 0, or
    - two 0 separated by an odd number of 1, or
    - lines not starting by 0.
    Technically, your middle regexp will rule out two 0s separated by exactly one or three 1s. For any odd number, it would be /01(11)*0/.

    A small nitpick I know
  18. #10
  19. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    828
    Rep Power
    496
    Originally Posted by ishnid
    Technically, your middle regexp will rule out two 0s separated by exactly one or three 1s. For any odd number, it would be /01(11)*0/.
    Yes, of course, you are absolutely right, a typo when typing late at night and not re-reading carefully enough...

IMN logo majestic logo threadwatch logo seochat tools logo