#1
  1. Prisoner of the Sun

    Join Date
    Jul 2004
    Location
    The Mews At Windsor Heights
    Posts
    5,309
    Rep Power
    2351

    Question Abnormal IMG SRC?


    I've wasted too many hours trying to figure this out

    Can I use a single pattern to extract the (correct) src attribute of image tags (even if there are mutiple src attributes)?

    E.g. what pattern can get the 2nd src attribute of the 1st tag and the only src attribute of the 2nd tag?
    Code:
    <img onError= src="http://images.play.com/SiteCSS/Play/Live2/2010032301/img/proxy/01m.gif" src="http://images.play.com/covers/10667429m.jpg" alt="Tim Burton's Alice In Wonderland" style="border-width:0px;height:178px;width:117px;" />
    
    <IMG SRC="http://images.play.com/banners/content/Alice 6.jpg " ALT="Alice In Wonderland" />
    This is the latest pattern I have tried:
    PHP Code:
    define('IMG_SRC_PATTERN''#[^onError= ]*src=[\"\']?([^"\']+)#i');

    preg_match(IMG_SRC_PATTERN$tag$match); 
    EDIT:
    I think I may have stumbled onto the pattern I need, but I'm not sure if it is efficient or not. Can anyone advise?
    PHP Code:
    $pattern "#(= src=['\"].+[^\"]?)?src=[\"']?([^\"']+)#i"
    Last edited by b3n; March 29th, 2010 at 09:57 AM.
    .
    :: My blip.fm tunes :: Web Design Feeds :: Web Dev Feeds :: CheatSheets :: PHP :: MySQL :: 13 Moon FB App.

    "All matter is merely energy condensed to a slow vibration. We are all one consciousness experiencing itself - subjectively. There is no such thing as death, life is only a dream. We are the imaginations of ourselves."
    - Bill Hicks


    "Truth is hidden in the subtle nature of the heart of everything, although it is invisible. One cannot see it from inside and neither from the surface. One can only live and experience it."
    - Heart Sutra
  2. #2
  3. kill 9, $$;
    Devshed Supreme Being (6500+ posts)

    Join Date
    Sep 2001
    Location
    Shanghai, An tSín
    Posts
    6,898
    Rep Power
    3887
    Using regexps for parsing markup like HTML is generally a bad idea. It's usually advisable to use a proper tag-aware HTML parser for this.
  4. #3
  5. Prisoner of the Sun

    Join Date
    Jul 2004
    Location
    The Mews At Windsor Heights
    Posts
    5,309
    Rep Power
    2351
    Ha

    I started using preg_ functions, then I changed to DomDocument->loadHTML(), and now I have changed back to preg_ again.

    DomDocument is slow and doesn't pick up all of the image tags when they are "abnormal".

    I changed primarily due to this thread:

    http://forums.devshed.com/php-develo...es-687448.html
    .
    :: My blip.fm tunes :: Web Design Feeds :: Web Dev Feeds :: CheatSheets :: PHP :: MySQL :: 13 Moon FB App.

    "All matter is merely energy condensed to a slow vibration. We are all one consciousness experiencing itself - subjectively. There is no such thing as death, life is only a dream. We are the imaginations of ourselves."
    - Bill Hicks


    "Truth is hidden in the subtle nature of the heart of everything, although it is invisible. One cannot see it from inside and neither from the surface. One can only live and experience it."
    - Heart Sutra
  6. #4
  7. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,908
    Rep Power
    6352
    Your "edit" pattern looks fine, I don't even know how they manage to make image sources like this, I didn't think it was valid.

    You can also try actually stepping through the string, as distasteful as that may be.

    -Dan
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.

IMN logo majestic logo threadwatch logo seochat tools logo