#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2008
    Posts
    29
    Rep Power
    0

    Substituting with a group leaves me with extra stuff, why?


    Hello all,

    I'm confused by the way regexps in Perl are behaving. When I do this:

    Code:
    my $somename =~ s/^(.+?)\s+\(\d{4}\).*$/\1/;
    $somename is left with a newline special character at the end? Why? I could easily chomp the result, but I would expect that the substitution would whack off the newline special character for me. What am I missing?

    Thnaks for any help.
  2. #2
  3. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,141
    Rep Power
    9398
    Normally a . will not match newlines and $ will match the end of a line (or the entire string).

    Try adding the /s flag.

    Comments on this post

    • s2cuts agrees : /s works
    • ishnid agrees
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2008
    Posts
    29
    Rep Power
    0
    After doing some more reading, it seems this is the normal behavior in Perl and some other languages. Edit: Thanks requinix, yup the /s flag gives the functionality that I'm looking for.

    http://www.regular-expressions.info/anchors.html

    Strings Ending with a Line Break
    Even though \Z and $ only match at the end of the string (when the option for the caret and dollar to match at embedded line breaks is off), there is one exception. If the string ends with a line break, then \Z and $ will match at the position before that line break, rather than at the very end of the string. This "enhancement" was introduced by Perl, and is copied by many regex flavors, including Java, .NET and PCRE. In Perl, when reading a line from a file, the resulting string will end with a line break. Reading a line from a file with the text "joe" results in the string joe\n. When applied to this string, both ^[a-z]+$ and \A[a-z]+\Z will match joe.

    If you only want a match at the absolute very end of the string, use \z (lower case z instead of upper case Z). \A[a-z]+\z does not match joe\n. \z matches after the line break, which is not matched by the character class.

IMN logo majestic logo threadwatch logo seochat tools logo