#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2004
    Posts
    29
    Rep Power
    0

    if a string matches a regex...


    Hi,
    Im trying to work out if a string 'fits' a regular expression.

    I have the regular expression '/(\w+/)*(\w+).(\w+)' and if a string 'matches' it then i would like it to return true, else if it doesnt return false.

    I was thinking something along the lines of:

    Code:
    path = """/path/to/file.jpg"""
    r = r'/(\w+/)*(\w+).(\w+)'
    if r:
        ext_r = re.compile(r)
        ext = ext_r.findall(path)
    
    print ext
    but i cannot find out how to tell if the string exactly matches the expression, I can only print it.

    Correct me if im wrong but it should find expressions like '/directory/file.ext' with many directory levels and one file.ext at the end.

    for instance i have the string /absolute/path/image.jpg and the regex should find this then return true, else if the string was:
    '/alsolute/path/image' or '/absolute/path' or 'abolsolute/path/image.jpg' then it should return false.

    Sorry, i am totally new to both python and regex and I have tried the re methods, but i just get object links returned to me.

    cheers for any help.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Posts
    624
    Rep Power
    34
    You can use the methods without needing them to return True/False. Python can use more than just True/False in conditional statements. For instance full/empty lists, and so on:

    Code:
    >>> import re
    >>> re.match("\d", "1")
    <_sre.SRE_Match object at 0x0156D3A0>
    
    >>> re.match("\d", "a")
    
    >>> if re.match("\d", "1"):
    ... 	print "hi"
    ... 	
    hi
    >>>
    If you actually want the groups from your regex, then you do need the returned objects:

    Code:
    >>> result = re.match("/(\w+/)*(\w+).(\w+)", "/dir/x/file.jpg")
    >>> if result:
    ...     print result.groups()
    ('x/', 'file', 'jpg')
    >>>
    Though it doesn't seem to produce exactly what you're looking for.

    Code:
    if the string exactly matches the expression
    You would need to build something into the regex to mean "match this and only this" - perhaps change it to include start and end of line anchors ($ and ^, IIRC).

    I would be tempted to use something much more simple:

    Code:
    >>> path = "/path/to/file.jpg"
    >>> print path.split('/')
    ['', 'path', 'to', 'file.jpg']
    Last edited by sfb; March 9th, 2005 at 05:25 PM.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2004
    Posts
    29
    Rep Power
    0
    cheers, but really i was looking for something that , as you said, exactly matches the regex. and it does need to return true or false, as a function, for this purpose.

    after testing your solution, im thinking my regex statement is incoreect, because it will on print false (i edited it) if only a "/" is in the 'path'.

    is there a way of make all portions of the regex statement compulsory. i.e. the path must have a "file.jpg" portion as well as "/" (root) and then *possibly* a directory in the form of "directory/".
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Posts
    624
    Rep Power
    34
    is there a way of make all portions of the regex statement compulsory. i.e. the path must have a "file.jpg" portion as well as "/" (root) and then *possibly* a directory in the form of "directory/".
    What if the file has no extension?
    What if it is file.jpg.vbs?
    What if the system is one where / isn't the path separator?
    What if the directory name has spaces or dots in the name?
    Does the path have to actually exist on the system?
    Do you care about invalid characters for a real path - e.g. Windows wont allow <>\? and so on in file or folder names...

    This:

    Code:
    pattern = "^/(.+\/)*(.+)$"
    Will match:

    A string starting with a /
    (Optionally followed by:
    One or more characters and a /)
    Repeated as many times as possible
    Ending in one or more characters.

    Everything from
    "/file.jpg"
    through
    "/path/file.jpg"
    "/some/folder/a/b/c/d/e/filename"
    to
    "/a" or "/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a"

    Is that what you're looking for?

    I think you will have to do the True/False yourself with:

    Code:
    if re.match(pattern, text):
        return True
    else:
        return False

    Comments on this post

    • CyBerHigh agrees : good way to put it
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2005
    Posts
    78
    Rep Power
    10
    Originally Posted by sfb
    I think you will have to do the True/False yourself with:

    Code:
    if re.match(pattern, text):
        return True
    else:
        return False
    Nope - you can do much simpler:
    Code:
    return bool(re.match(pattern, text))
    --OH.

IMN logo majestic logo threadwatch logo seochat tools logo