#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2003
    Posts
    235
    Rep Power
    16

    Identify external links in a string and add text to them


    Guys and girls,

    I'm using PHP. I usually like to work these things out myself but I'm totally stuck.

    Taking a clump of text e.g. from a CMS entry, I want to check all the links in the text and for those that do not contain

    "folder1/folder2"

    in the URL path I want to add the text along the lines

    "(this link will take you away from this section)"

    to the text in the anchor tag, for all links in the text provided.

    E.g.
    Code:
    <a href="some/other/place">my link</a>
    would become
    Code:
    <a href="some/other/place">my link (this link will take you away from this section)</a>
    I got as far as this for the expression:
    PHP Code:
    preg_match('/<a.+href=".*folder1/folder2.*".*>.+</a>/i'$testString
    but couldn't even get that to work...

    Really appreciate some help with this. Many thanks.
    Time isn't wasted if you're wasted all the time
  2. #2
  3. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,112
    Rep Power
    9398
    PHP Code:
    preg_replace('#(<a href="((?!folder1/folder2)[^"])+"[^>]*>)(.*?)</a>#is''$1$3 (this link will take you away from this section)</a>'$text); 
    If you need that explained (it's okay if you do) just say something. I would now but I just woke up and I feel weird. Not sick, just... weird. I could probably give a good explanation now but I think it'd be best if I waited a bit before trying to talk at great lengths about something kinda complicated.

    [edit] It's amazing what food can do to you. Except for the Raisin Bran aftertaste I feel pretty good.
    Code:
    (<a href="((?!folder1/folder2)[^"])+"[^>]*>)(.*?)</a>
    The important part is the ((?!folder1/folder2)[^"])+. (?!...) means that there should not be a ... at this point. Put it before a [^"] and you get the next character isn't a " and neither is it the start of .... Now that only takes one character, so it gets grouped together with a () and repeated with a +.
    (There are a couple other ways of doing the same thing but that's the method I prefer.)

    The rest is simple if you know that .*? is the same as .* except it matches as few characters as possible.
    Last edited by requinix; January 30th, 2009 at 01:48 PM.

IMN logo majestic logo threadwatch logo seochat tools logo