March 7th, 2005, 12:32 AM
python with regex
matching string in Python
i have written the following.
s = 'www.test.co.jp/history/company.html'
#s = s.replace("http://","")
file_path = re.sub(r'\/.+$', '', s)
file_nm = re.sub(r'.+\/', '', s)
file_path : www.test.co.jp
file_nm : company.html
i am not a beginner on regex , but with Python
i can't get the ideal result
i want the file_path to hold the value
but it always returns "www.test.co.jp"
any better ways to do it out there
March 7th, 2005, 01:08 AM
Only use regexes when you absolutely have to. Unless you're doing real pattern matching use string methods instead:
Originally Posted by oppai
url = 'http://www.test.co.jp/history/company.html'
urltype, url = url.split("://")
# Use this for 2.4+ (warning, untested):
path, page = url.rsplit("/", 1)
# Use this for 2.3 and earlier:
url = url.split("/")
path, page = "/".join(url[:-1]), url[-1]
Comments on this post
March 7th, 2005, 01:39 AM
Originally Posted by hydroxide
The split worked well in my example , thanks.
But why do you say that "Only use regexes when you absolutely have to " ?
March 7th, 2005, 02:33 AM
Try this if you want to use regex:
If you need a method to split URLs into file name and file path, there is an os path function:
r = re.search("^(.*)/([^/]+)$", url )
file_path, file_nm = r.groups()
file_path, file_nm = os.path.split(url)
March 7th, 2005, 03:23 PM
Python also includes a module for cutting up URLs called urlparser, which you might be interested in.
The reason we (or I) use the string methods rather than immediately jumping for regular expressions is because regular expressions have a nasty habit for complicating code, often unnecessarily. By trying to do something with string methods first the resulting is often much simpler.
March 7th, 2005, 08:22 PM
Why avoid regexes? I think Jamie Zawinski said it well:
The second problem, of course, being that it takes work to understand a regex, hence all the requests for help in debugging regexes ;-)
The general Pythonic aim is for simplicity and readability over compactness or speed - if something can be done using string methods (or even better - as pointed out by sbkwi and netytan - by using preexisting modules) then they should be used because they are more readable, thus more maintainable.
There is a place for regexes in Pythonic programming, but it should be a last resort and not a first.