#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    4
    Rep Power
    0

    [SOLVED] URL Regex Problem


    I cant get the famous gruber url parsing regex working with Python.

    It works fine on many online python-based testers.

    Heres the code I am using:
    Code:
    import re
    string = "http://www.google.com"
    regex = re.compile("(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?]))")
    r = regex.search(string)
    print(r)
    I am using Python 3.2

    Search google for "gruber regex" to find where I got it from. I am a new member and cannot post urls.

    Any Ideas what is causing this to return None?
  2. #2
  3. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,904
    Rep Power
    481
    hmm, well first I started to figure out the string:
    (?i) means ignore case,

    then I got the bright idea---try it as a raw string! Notice I inserted an "r" before the '"'. Python has a bunch of ways to express string literals.

    Code:
    import re
    string = "http://www.google.com"
    regex = re.compile(r"(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?]))")
    r = regex.search(string)
    print(r)
    [code]Code tags[/code] are essential for python code and Makefiles!
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    4
    Rep Power
    0
    Thanks! That worked perfectly.

IMN logo majestic logo threadwatch logo seochat tools logo