Thread: Regexps

  1. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2000
    Rep Power
    Hi all,

    On a quick note what exactly is the differnce between basic regexps and PCRE?

    Anyway.... I'm trying to parse <a href... > tags on a page and get the page it links to and also the title of the link out of em.

    So I want to get maybe /about.html and "About" out of it, it needs to work with images and alt tags too. It's a peice of cake if your only looking for simple links but with style sheets and font declarations inside links it a pain. Here's what I have so far which sorta works:

    preg_match_all("'<as+hrefs*= s*["']([^"'=])*["']s?> (<[^>]+>(.*)<[^>]+></a> | .*alts*=s*["'](.*)["']></a> )'iUx", $this->sFile, $matches, PREG_SET_ORDER);

    As you can probably see it's rather complicated. Anyone got a simpler/better/proper working way of doing it?

    Thanks for help,

  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2000
    Rep Power
    Hey Billy,

    Basic Regexps are regular expressions as they are generally defined within the programming community.

    PHP's PCRE functions make use of Perl's regular expression implementation.

    from where are the URLs emanating? Are you creating them, or could they be coming from anywhere? Much of the regexp syntax in your example is due to the fact that you are always looking for whitespace. If you could follow some strict HREF format, you could greatly reduce the length of the regexp.


Similar Threads

  1. web fetching and regexp's
    By dave2k in forum Java Help
    Replies: 1
    Last Post: November 23rd, 2003, 02:29 PM
  2. regexps (newbie)
    By dave2k in forum Java Help
    Replies: 1
    Last Post: October 22nd, 2003, 03:49 PM
  3. regexps containing lots of (
    By Derek Gatherer in forum Perl Programming
    Replies: 4
    Last Post: October 17th, 2003, 03:05 AM
  4. XSSI and regexps
    By xerxexrex in forum Apache Development
    Replies: 0
    Last Post: February 16th, 2002, 07:22 PM
  5. Replies: 12
    Last Post: December 7th, 2001, 11:25 AM

IMN logo majestic logo threadwatch logo seochat tools logo