#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2005
    Location
    NIPPON
    Posts
    15
    Rep Power
    0

    python with regex


    matching string in Python

    i have written the following.

    s = 'www.test.co.jp/history/company.html'

    #s = s.replace("http://","")
    file_path = re.sub(r'\/.+$', '', s)
    file_nm = re.sub(r'.+\/', '', s)

    file_path : www.test.co.jp

    file_nm : company.html

    i am not a beginner on regex , but with Python
    i can't get the ideal result

    i want the file_path to hold the value
    "'www.test.co.jp/history"
    but it always returns "www.test.co.jp"

    any better ways to do it out there
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2005
    Posts
    78
    Rep Power
    10
    Originally Posted by oppai
    matching string in Python

    i am not a beginner on regex , but with Python
    i can't get the ideal result

    i want the file_path to hold the value
    "'www.test.co.jp/history"
    but it always returns "www.test.co.jp"

    any better ways to do it out there
    Only use regexes when you absolutely have to. Unless you're doing real pattern matching use string methods instead:
    Code:
    url = 'http://www.test.co.jp/history/company.html'
    urltype, url = url.split("://")
    # Use this for 2.4+ (warning, untested):
    path, page = url.rsplit("/", 1)
    
    # Use this for 2.3 and earlier:
    url = url.split("/")
    path, page = "/".join(url[:-1]), url[-1]
    --OH.

    Comments on this post

    • netytan agrees : Couldn't have said it better :).
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2005
    Location
    NIPPON
    Posts
    15
    Rep Power
    0
    Originally Posted by hydroxide
    Only use regexes when you absolutely have to. Unless you're doing real pattern matching use string methods instead:
    Code:
    url = 'http://www.test.co.jp/history/company.html'
    urltype, url = url.split("://")
    # Use this for 2.4+ (warning, untested):
    path, page = url.rsplit("/", 1)
    
    # Use this for 2.3 and earlier:
    url = url.split("/")
    path, page = "/".join(url[:-1]), url[-1]
    --OH.


    The split worked well in my example , thanks.
    But why do you say that "Only use regexes when you absolutely have to " ?
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2004
    Location
    Regensburg, Germany
    Posts
    147
    Rep Power
    16
    Try this if you want to use regex:
    Code:
    r = re.search("^(.*)/([^/]+)$", url )
    if r:
        file_path, file_nm = r.groups()
    If you need a method to split URLs into file name and file path, there is an os path function:
    Code:
    file_path, file_nm = os.path.split(url)
  8. #5
  9. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Python also includes a module for cutting up URLs called urlparser, which you might be interested in.

    http://www.python.org/doc/2.4/lib/module-urlparse.html

    The reason we (or I) use the string methods rather than immediately jumping for regular expressions is because regular expressions have a nasty habit for complicating code, often unnecessarily. By trying to do something with string methods first the resulting is often much simpler.

    Take care,

    Mark.
    programming language development: www.netytan.com Hula

  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2005
    Posts
    78
    Rep Power
    10
    Why avoid regexes? I think Jamie Zawinski said it well:
    Some people, when confronted with a problem, think I know, Ill use regular expressions. Now they have two problems.
    The second problem, of course, being that it takes work to understand a regex, hence all the requests for help in debugging regexes ;-)

    The general Pythonic aim is for simplicity and readability over compactness or speed - if something can be done using string methods (or even better - as pointed out by sbkwi and netytan - by using preexisting modules) then they should be used because they are more readable, thus more maintainable.

    There is a place for regexes in Pythonic programming, but it should be a last resort and not a first.

    --OH.

IMN logo majestic logo threadwatch logo seochat tools logo