#1
  1. Banned

    Join Date
    Jul 2004
    Location
    The Mews At Windsor Heights
    Posts
    5,326
    Rep Power
    0

    Question Abnormal IMG SRC?


    I've wasted too many hours trying to figure this out

    Can I use a single pattern to extract the (correct) src attribute of image tags (even if there are mutiple src attributes)?

    E.g. what pattern can get the 2nd src attribute of the 1st tag and the only src attribute of the 2nd tag?
    Code:
    <img onError= src="http://images.play.com/SiteCSS/Play/Live2/2010032301/img/proxy/01m.gif" src="http://images.play.com/covers/10667429m.jpg" alt="Tim Burton's Alice In Wonderland" style="border-width:0px;height:178px;width:117px;" />
    
    <IMG SRC="http://images.play.com/banners/content/Alice 6.jpg " ALT="Alice In Wonderland" />
    This is the latest pattern I have tried:
    PHP Code:
    define('IMG_SRC_PATTERN''#[^onError= ]*src=[\"\']?([^"\']+)#i');

    preg_match(IMG_SRC_PATTERN$tag$match); 
    EDIT:
    I think I may have stumbled onto the pattern I need, but I'm not sure if it is efficient or not. Can anyone advise?
    PHP Code:
    $pattern "#(= src=['\"].+[^\"]?)?src=[\"']?([^\"']+)#i"
  2. #2
  3. kill 9, $$;
    Devshed Supreme Being (6500+ posts)

    Join Date
    Sep 2001
    Location
    Shanghai, An tSín
    Posts
    6,878
    Rep Power
    3889
    Using regexps for parsing markup like HTML is generally a bad idea. It's usually advisable to use a proper tag-aware HTML parser for this.
  4. #3
  5. Banned

    Join Date
    Jul 2004
    Location
    The Mews At Windsor Heights
    Posts
    5,326
    Rep Power
    0
    Ha

    I started using preg_ functions, then I changed to DomDocument->loadHTML(), and now I have changed back to preg_ again.

    DomDocument is slow and doesn't pick up all of the image tags when they are "abnormal".

    I changed primarily due to this thread:

    http://forums.devshed.com/php-develo...es-687448.html
  6. #4
  7. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,885
    Rep Power
    6354
    Your "edit" pattern looks fine, I don't even know how they manage to make image sources like this, I didn't think it was valid.

    You can also try actually stepping through the string, as distasteful as that may be.

    -Dan
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.

IMN logo majestic logo threadwatch logo seochat tools logo