#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2011
    Posts
    8
    Rep Power
    0

    Using regex to NOT select emails but only links


    I currently have this regex that my boss helped me to create:

    (?i)(?<fullpath>(((https?:(//|\\\\))|(mailto:)|(ftp:(//|\\\\)(\b\w+:?\b\w*[^@])?)))?(?<url>[^\s#]*\.(com|net|org|edu|gov|co|uk|us|info)(?!.*[\<\'])(([^\s\.]*(\.\w)?))*))

    I use this regex in the Dot Net Nuke's Active Forums module of our website to make links that are posted in the forums clickable (so you don't have to copy and paste). The problem is, it see's emails as links. I created another regex that finds emails and makes them "mailto:" links:

    (?i)(?<fullemail>([A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}(?!.*[\<\'])))

    I cannot seem to get the first regex to do two things:

    1. Don't find emails
    2. find links that are inline (i.e. not at the beginning of a line or fall after some words)

    Any help would be greatly appreciated!

    Jeff
    Last edited by requinix; August 19th, 2011 at 05:03 PM. Reason: disabled smilies
  2. #2
  3. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,054
    Rep Power
    9398
    I don't get this. You say the first expression makes links of email addresses and that's a "problem", so then you create something to make links of email addresses?
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2011
    Posts
    8
    Rep Power
    0
    Originally Posted by requinix
    I don't get this. You say the first expression makes links of email addresses and that's a "problem", so then you create something to make links of email addresses?
    My mistake, what I meant to say is, the first expression makes url's into <a href="url"> links as Dot Net Nuke doesn't do this. If you post a url link in the forum, you have to copy and paste it into the browsers address bar to navigate there. BUT I also want to make posts that contain an email display the email as a <a href="mailto:email@address.com"> link if that makes better sense.
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2011
    Posts
    8
    Rep Power
    0
    Still cannot figure this out! Can anyone helpo??
  8. #5
  9. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,905
    Rep Power
    6351
    The first regexp is finding mailto links when it shouldn't, you say. Take out the "mailto" and the "@" from the first expression and it will stop matching emails.

    As for your second question, there is a flag somewhere that's anchoring this expression to the beginning of the line (maybe), nothing you've posted suggests it would be anchored that way. Both expressions contain the <full*> thing, maybe that's it? I can't tell without more data.
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2011
    Posts
    8
    Rep Power
    0
    Originally Posted by ManiacDan
    The first regexp is finding mailto links when it shouldn't, you say. Take out the "mailto" and the "@" from the first expression and it will stop matching emails.

    As for your second question, there is a flag somewhere that's anchoring this expression to the beginning of the line (maybe), nothing you've posted suggests it would be anchored that way. Both expressions contain the <full*> thing, maybe that's it? I can't tell without more data.
    Even though I remove the mailto: it still finds the email. Also, the [^@] is a removal. It says find anything BUT @. I've got the other part worked out.
  12. #7
  13. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,054
    Rep Power
    9398
    If the expression you have is now
    Code:
    (?i)(?<fullpath>(((https?:(//|\\\\))|(ftp:(//|\\\\)(\b\w+:?\b\w*[^@])?)))?(?<url>[^\s#]*\.(com|net|org|edu|gov|co|uk|us|info)(?!.*[\<\'])(([^\s\.]*(\.\w)?))*))
    then there is no way it will match against
    Code:
    mailto:foo@example.com
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2011
    Posts
    8
    Rep Power
    0
    Originally Posted by requinix
    If the expression you have is now
    Code:
    (?i)(?<fullpath>(((https?:(//|\\\\))|(ftp:(//|\\\\)(\b\w+:?\b\w*[^@])?)))?(?<url>[^\s#]*\.(com|net|org|edu|gov|co|uk|us|info)(?!.*[\<\'])(([^\s\.]*(\.\w)?))*))
    then there is no way it will match against
    Code:
    mailto:foo@example.com
    This is correct. That part is fixed. But, it still finds:

    foo@example.com

    and makes it:

    <a href="foo@example.com">foo@example.com</a>

    I need for it not to find foo@example.com.

    The unfortunate part is, I am doing this in Dot Net Nuke and I have no control over which filter goes first. If I did, I'd put the second filter above the first filter and all would work properly.
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2011
    Posts
    8
    Rep Power
    0
    Originally Posted by dyetube
    This is correct. That part is fixed. But, it still finds:

    foo@example.com

    and makes it:

    <a href="foo@example.com">foo@example.com</a>

    I need for it not to find foo@example.com.

    The unfortunate part is, I am doing this in Dot Net Nuke and I have no control over which filter goes first. If I did, I'd put the second filter above the first filter and all would work properly.
    I did figure out something... Apparently you CAN set the order of the way the filters work. After changing the order, these expressions now work. I do still have one problem though, .co.uk address doesn't work correctly. But this seems to be a Dot Net Nuke issue, not my regex filter. I will try and figure it out.
  18. #10
  19. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,905
    Rep Power
    6351
    Originally Posted by dyetube
    I do still have one problem though, .co.uk address doesn't work correctly.
    Your code contains:
    Code:
    \.(com|net|org|edu|gov|co|uk|us|info)
    That will match .co OR .uk, not .co.uk
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  20. #11
  21. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2011
    Posts
    8
    Rep Power
    0
    Originally Posted by ManiacDan
    Your code contains:
    Code:
    \.(com|net|org|edu|gov|co|uk|us|info)
    That will match .co OR .uk, not .co.uk
    Ahhh..... You're correct!
  22. #12
  23. CSS & JS/DOM Adept
    Devshed Supreme Being (6500+ posts)

    Join Date
    Jul 2004
    Location
    USA (verifiably)
    Posts
    20,126
    Rep Power
    4304
    Any particular reason why you're only allowing that subset of top-level domains? Why allow US, UK, and Columbia domains but not Australia and Canada, for example?
    Spreading knowledge, one newbie at a time.

    Check out my blog. | Learn CSS. | PHP includes | X/HTML Validator | CSS validator | Common CSS Mistakes | Common JS Mistakes

    Remember people spend most of their time on other people's sites (so don't violate web design conventions).
  24. #13
  25. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2011
    Posts
    8
    Rep Power
    0
    Originally Posted by Kravvitz
    Any particular reason why you're only allowing that subset of top-level domains? Why allow US, UK, and Columbia domains but not Australia and Canada, for example?
    Most of the links that will be posted will most likely be top level domains. Our company helps American schoolteachers to teach Advanced Placement classes and most of what will be linked to will be in the top level of domains.
  26. #14
  27. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,905
    Rep Power
    6351
    most of what will be linked to will be in the top level of domains.
    What he's saying is that you don't include the top level domains. Youtube.ca and amazon.ca wouldn't be allowed.
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.

IMN logo majestic logo threadwatch logo seochat tools logo