#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2003
    Posts
    391
    Rep Power
    18

    Negate backreference


    I'm trying to get the attributes from an img tag passed as a string to php.

    Say I want to get the src attribute value I would do something like this:
    Code:
    $src = preg_replace("/^.*src=([\"'])(.+)\\1.*$/i","$2",$img);
    However $src is returning :
    http://www.domain.com/folder/file.jpg" width="50" height="50" alt="Dog picture"

    Ideally I would like to place a negative backreference inside the second parenthesis like so :
    Code:
    $src = preg_replace("/^.*src=([\"'])([^\\1]+)\\1.*$/i","$2",$img);
    but this fails!

    Does anyone have any ideas?
  2. #2
  3. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,749
    Rep Power
    9397
    Ideally you wouldn't try to suck up every single character and then backtrack to forget the ones you didn't want.
    Code:
    .+?
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2003
    Posts
    391
    Rep Power
    18
    Thanks requinix, I like your solution.

    I haven't used lazy-quantifiers before, primarily because I didn't understand them, but your solution made me take a second look at my book.

    Ideally you wouldn't try to suck up every single character and then backtrack to forget the ones you didn't want.
    Is this comment referencing the .* before and after the match I am trying to make? Is there another option to returning only the match without having to deal with arrays? Maybe returning and search through the array is faster and/or uses less resources than the regex I have?

    I have come across an interesting blog about lazy-quantifiers. I'm not certain how PHP's regex engine deals with them, but it's some food for thought http://blog.stevenlevithan.com/archi...zy-performance
    Last edited by aconway; August 23rd, 2010 at 01:06 PM.
  6. #4
  7. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,749
    Rep Power
    9397
    - In essence, greedy quantifiers suck up "as much as they can" while lazy quantifiers suck up "as little as they can". Either way the engine always tries to make a match, and will jump through rings of fire to get it.

    - Yes, the comment was aimed at using a greedy *. Most of the time a .* is used to match "a bunch of stuff", but it has a more sinister meaning that most people don't think about - or know. In most of those circumstances a .*? is better because it actually represents what's expect.

    - If you're interested, I suggest reading a book on regular expressions. Most of them will also teach about how the engines themselves work, and with that knowledge alone you can write better expressions. It also, of course, is great stuff to know for troubleshooting.

IMN logo majestic logo threadwatch logo seochat tools logo