September 14th, 2012, 06:38 PM
-
[SOLVED] URL Regex Problem
I cant get the famous gruber url parsing regex working with Python.
It works fine on many online python-based testers.
Heres the code I am using:
Code:
import re
string = "http://www.google.com"
regex = re.compile("(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))")
r = regex.search(string)
print(r)
I am using Python 3.2
Search google for "gruber regex" to find where I got it from. I am a new member and cannot post urls.
Any Ideas what is causing this to return None?
September 14th, 2012, 10:20 PM
-
hmm, well first I started to figure out the string:
(?i) means ignore case,
then I got the bright idea---try it as a raw string! Notice I inserted an "r" before the '"'. Python has a bunch of ways to express string literals.
Code:
import re
string = "http://www.google.com"
regex = re.compile(r"(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))")
r = regex.search(string)
print(r)
[code]
Code tags[/code] are essential for python code and Makefiles!
September 15th, 2012, 08:30 AM
-
Thanks! That worked perfectly.