#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2004
    Location
    Out by the pool...
    Posts
    179
    Rep Power
    26

    Masking URLs that match pattern [WAS: Regular Expression help]


    Im trying to write a regular expression to match certain domain name's and replace them if they exist in a large string.

    This is what I have so far but it doesn't work 100%.

    Code:
    $string = 'This is a test string http://www.google.com/ http://www.tester.com/index.php?id=94871 http://test.com/ http://ftester.com http://www.ftester.net/?aff=aufsufsa7f5';
    
    $string = preg_replace('/(https?):\/\/(www\.)(tester.com|test.net)(\/.*)/i', 'SPAM', $string);
    So based on that http://www.tester.com/index.php?id=94871 is the only thing that should be replaced with 'SPAM' but instead this is the output:

    This is a test string http://www.google.com/ SPAM

    Can someone please help me with my regex?
  2. #2
  3. Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2008
    Location
    North Carolina
    Posts
    2,674
    Rep Power
    2674
    There is a regex forum for this kinda stuff.

    But.. try this thrown together regex.
    php Code:
    $result = preg_replace('%https?://www\.tester\.(com|net)([^ ]+)%i', 'SPAM', $subject);
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2004
    Location
    Out by the pool...
    Posts
    179
    Rep Power
    26
    It works better but it still misses tester.com if the www. is left off thats why i had it in (www\.) so it would include it if it was present.

    Code:
    $string = 'This is a test string http://www.google.com/ http://www.tester.com/index.php?id=94871 http://tester.com/index.php?id=94871 http://test.com/ http://ftester.com http://www.ftester.net/?aff=aufsufsa7f5';
    
    $string = preg_replace('%https?://www\.(tester.com|test.net)([^ ]+)%i', 'SPAM', $string);
    Returns: This is a test string http://www.google.com/ SPAM http://tester.com/index.php?id=94871 http://test.com/ http://ftester.com http://www.ftester.net/?aff=aufsufsa7f5

    See how tester.com is still there? The domain list (tester.com|test.net is an array that I use join('|', $array) because there are several hundred domains.
  6. #4
  7. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,908
    Rep Power
    6351
    You want:
    Code:
    '/(https?):\/\/(www\.)?(tester.com|test.net)(\/\S*)/i'
    Do try to use the right forum (there's a regular expression forum) and use the correct tags to highlight your code. [ PHP ] tags properly highlight PHP code, whereas [ CODE ] tags leave it uncolored.

    -Dan
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  8. #5
  9. Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2008
    Location
    North Carolina
    Posts
    2,674
    Rep Power
    2674
    Maybe give a bit more details next time then. You said:
    So based on that http://www.tester.com/index.php?id=94871 is the only thing that should be replaced with 'SPAM'
    You said nothing about with or without the www nor did you provide an example of it in the test string, so you got what you asked for.

    Comments on this post

    • ManiacDan agrees
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2004
    Location
    Out by the pool...
    Posts
    179
    Rep Power
    26
    @simshaun I never expect to get the exact answer that I need. Generally I am just looking for an example that I can play with so I can learn about what I'm doing. Doesn't do much good having someone else figure it out. Your example got me going in the right direction but once I tested it on site I noticed the ones with www. wernt getting removed. Sorry for any confusion.

    @maniacdan It works great but I noticed I have some that are wrapped in a sorta bbcode. URL It works great on those too but it removes the closing [/url] is there any way to make this regex look for a space after the string first then also look for a [ closing bracket so they don't get removed? It messes up the display when there is an open bbcode tag.

    I also noticed the regex Forum after I posted but I didn't want to make another post in that Forum dealing with the same thing. I kinda hoped maybe a mod would move it.

    Thank you both for your time I have learned alot studying the regex you both posted.
  12. #7
  13. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,908
    Rep Power
    6351
    Thread moved.

    You still are not being clear as to what you want. Show a before and after string. Do you want the entire BBCode tag removed (including the closing tag) or do you want to simply replace the link inside the tag and leave the rest of the tag alone?

    -Dan
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2004
    Location
    Out by the pool...
    Posts
    179
    Rep Power
    26
    As it is now it works but some of the URL's are inside of BBCode tags.

    Example:

    PHP Code:

    $replacer 
    'SPAM';
    $blocked_urls = array('tester.com''test.net');
    $blocked join('|'$blocked_urls);

    $string 'Hey nice post check out my website... http://www.youtube.com/watch?v=C37eSmMXx6A http://www.test.net/?id=94871 http://www.ftester.com/?id=94871 [url]http://www.tester.com/?id=94871[/url] http://www.google.com/ http://www.tester.com/index.php?id=94871 http://tester.com/index.php?id=94871 http://test.com/ http://www.ftester.com http://www.ftester.net/?aff=aufsufsa7f5 www.ftester.com';

    $string preg_replace('/(https?):\/\/(www\.)?('.$blocked.')(\/\S*)/i'$replacer$string); 
    Output:
    PHP Code:
    Hey nice post check out my website... http://www.youtube.com/watch?v=C37eSmMXx6A SPAM http://www.ftester.com/?id=94871 [url]SPAM http://www.google.com/ SPAM SPAM http://test.com/ http://www.ftester.com http://www.ftester.net/?aff=aufsufsa7f5 www.ftester.com 
    (I put the output in php tags because the Forum kept parsing the URL's and the BBcode.)

    Ok see the
    PHP Code:
    [url]SPAM
    [/url


    Basically I want it to leave the
    on those. So if it is BBcode then the output would be...

    PHP Code:
    [url]SPAM[/url
    Because the open BBcode causes issues.
  16. #9
  17. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,908
    Rep Power
    6351
    Code:
    '/(https?):\/\/(www\.)?(tester.com|test.net)(\/[^\s\[]*)/i'

    Comments on this post

    • wonton agrees : Thank you!
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  18. #10
  19. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2004
    Location
    Out by the pool...
    Posts
    179
    Rep Power
    26
    Ok that works for the BBcode issue but now I noticed another issue.

    Some of the URL's don't end with a / or have a QUERY_STRING.

    These should all be replaced if tester.com is a banned URL

    PHP Code:
    http://www.tester.com/?aff=sdf787sd8f
    http://www.tester.com/
    http://www.tester.com
    http://tester.com/?aff=sdf787sd8f
    http://tester.com/
    http://tester.com
    www.tester.com/?aff=sdf787sd8f
    www
    .tester.com/
    www.tester.com
    tester
    .com/?aff=sdf787sd8f
    tester
    .com/
    tester.com

    [url]http://www.tester.com/?aff=sdf787sd8f[/url]
    [url]http://www.tester.com/[/url]
    [url]http://www.tester.com[/url]
    [url]http://tester.com/?aff=sdf787sd8f[/url]
    [url]http://tester.com/[/url]
    [url]http://tester.com[/url]
    [url]www.tester.com/?aff=sdf787sd8f[/url]
    [
    url]www.tester.com/[/url]
    [
    url]www.tester.com[/url]
    [
    url]tester.com/?aff=sdf787sd8f[/url]
    [
    url]tester.com/[/url]
    [
    url]tester.com[/url
    See where I am going? Trying to block spam bots and people from posting porn sites and referer sites etc.

    The ?aff= is only a sample of what the QUERY_STRING may contain.
    Last edited by wonton; April 1st, 2010 at 03:40 PM. Reason: Typo
  20. #11
  21. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2004
    Location
    Out by the pool...
    Posts
    179
    Rep Power
    26
    Ok I was able to take what you gave me and changed it to this:

    Code:
    '/((https?):\/\/)?(www\.)?(tester.com|test.net)([^\s\[<]*)/i'
    Now let me know if I am wrong here...

    I added the ()? around http:// so if it wasnt there it was ok. Also I added < at the end so if there was an HTML tag following it wouldn't get messed up. Also I removed the \/ so if the URL didnt end with it, it was ok.

    Does it look ok? It works...

IMN logo majestic logo threadwatch logo seochat tools logo