March 29th, 2010, 10:37 AM
Abnormal IMG SRC?
I've wasted too many hours trying to figure this out
Can I use a single pattern to extract the (correct) src attribute of image tags (even if there are mutiple src attributes)?
E.g. what pattern can get the 2nd src attribute of the 1st tag and the only src attribute of the 2nd tag?
This is the latest pattern I have tried:
<img onError= src="http://images.play.com/SiteCSS/Play/Live2/2010032301/img/proxy/01m.gif" src="http://images.play.com/covers/10667429m.jpg" alt="Tim Burton's Alice In Wonderland" style="border-width:0px;height:178px;width:117px;" />
<IMG SRC="http://images.play.com/banners/content/Alice 6.jpg " ALT="Alice In Wonderland" />
define('IMG_SRC_PATTERN', '#[^onError= ]*src=[\"\']?([^"\']+)#i');
preg_match(IMG_SRC_PATTERN, $tag, $match);
I think I may have stumbled onto the pattern I need, but I'm not sure if it is efficient or not. Can anyone advise?
$pattern = "#(= src=['\"].+[^\"]?)?src=[\"']?([^\"']+)#i";
March 29th, 2010, 11:02 AM
Using regexps for parsing markup like HTML is generally a bad idea. It's usually advisable to use a proper tag-aware HTML parser for this.
March 29th, 2010, 11:29 AM
I started using preg_ functions, then I changed to DomDocument->loadHTML(), and now I have changed back to preg_ again.
DomDocument is slow and doesn't pick up all of the image tags when they are "abnormal".
I changed primarily due to this thread:
March 29th, 2010, 12:48 PM
Your "edit" pattern looks fine, I don't even know how they manage to make image sources like this, I didn't think it was valid.
You can also try actually stepping through the string, as distasteful as that may be.
HEY! YOU! Read the New User Guide and Forum Rules
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin
"The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002
Think we're being rude? Maybe you asked a bad question
or you're a Help Vampire.
Trying to argue intelligently? Please read this.