October 7th, 2000, 08:02 PM
On a quick note what exactly is the differnce between basic regexps and PCRE?
Anyway.... I'm trying to parse <a href... > tags on a page and get the page it links to and also the title of the link out of em.
So I want to get maybe /about.html and "About" out of it, it needs to work with images and alt tags too. It's a peice of cake if your only looking for simple links but with style sheets and font declarations inside links it a pain. Here's what I have so far which sorta works:
preg_match_all("'<as+hrefs*= s*["']([^"'=])*["']s?> (<[^>]+>(.*)<[^>]+></a> | .*alts*=s*["'](.*)["']></a> )'iUx", $this->sFile, $matches, PREG_SET_ORDER);
As you can probably see it's rather complicated. Anyone got a simpler/better/proper working way of doing it?
Thanks for help,
October 9th, 2000, 05:05 PM
Basic Regexps are regular expressions as they are generally defined within the programming community.
PHP's PCRE functions make use of Perl's regular expression implementation.
from where are the URLs emanating? Are you creating them, or could they be coming from anywhere? Much of the regexp syntax in your example is due to the fact that you are always looking for whitespace. If you could follow some strict HREF format, you could greatly reduce the length of the regexp.