#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2005
    Posts
    48
    Rep Power
    10

    Replacing plain text URLs with links dilemma


    Okay, replacing plain text URLs with a link is pretty easy, but I've run into an issue. The way I'm replacing them, it's able to read into the HTML, kind of [BBCode class].

    So if I have: http://wat.com
    and <img src="http://wat.com">

    It would replace both, and then I'd end up with:
    <img src="<a href="http://wat.com" height="y" width="x" />

    Etc. Right now I'm using
    Code:
    /((((http|https|ftp):\/\/)|(www\.))(.*?)([,:%#&\/?=\w+\.-]+))/is
    To find and replace the links, but I need to ignore it if it has src=" behind it. How can I do that? I've tried adding ^[src.*?] but it has no effect.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Jun 2007
    Posts
    1,513
    Rep Power
    1424
    If it doesn't get more complex, this might work:
    Code:
    /(?<!(src|href)=(['"]|))((((http|https|ftp):\/\/)|(www\.))(.*?)([,:%#&\/?=\w+\.-]+))/is
    A regular expression based bbcode parser isn't a good idea for all regex libraries, though. If we're talking (e.g.) PHP here, you'll probably hit a wall pretty soon.

    Regards, Jens
    Last edited by JClasen; April 8th, 2010 at 12:00 PM.
  4. #3
  5. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,074
    Rep Power
    9398
    What I do in situations like this is split the string into HTML and non-HTML parts (with PHP it's preg_split). Then I can apply some replacements to HTML text and other replacements to non-HTML text.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2005
    Posts
    48
    Rep Power
    10
    Originally Posted by JClasen
    If it doesn't get more complex, this might work:
    Code:
    /(?<!(src|href)=(['"]|))((((http|https|ftp):\/\/)|(www\.))(.*?)([,:%#&\/?=\w+\.-]+))/is
    A regular expression based bbcode parser isn't a good idea for all regex libraries, though. If we're talking (e.g.) PHP here, you'll probably hit a wall pretty soon.

    Regards, Jens
    I don't see why, but that won't work.

    And I'm applying this after the BBCode replaces everything, but maybe I'll look into another way. Right now I'm matching the 'tags', listing it and then using a switch() to replace the attributes, inner text and the tags themselves.

    Is that still an inefficient way to do it?
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Jun 2007
    Posts
    1,513
    Rep Power
    1424
    Seems as if php's regex extension doesn't like look behind assertations with variable length. Try this instead:

    Code:
    /(?<!src=['"])(?<!href=['"])((((http|https|ftp):\/\/)|(www\.))(.*?)([,:%#&\/?=\w+\.-]+))/is
    For the last part:

    yes, that should work better.

    Regards, J.

    Comments on this post

    • requinix agrees : indeed it does not
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2005
    Posts
    48
    Rep Power
    10
    That it did

    Awesome :-P good job

IMN logo majestic logo threadwatch logo seochat tools logo