#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2003
    Posts
    8
    Rep Power
    0

    regular expressions and back references


    Greetings, I am having a problem using back references in my regex and I am having a difficult time figuring out what I am doing wrong. My regex works fine with out the back refs but when I try to use them it won't match my sample. It looks to me that I am using them no differently then my examples and documentation but to no avail.

    Here is my patteren:

    macExpression = "^[0-9A-F]{1,2}(\:|\.|\-)[0-9A-F]{1,2}\1[0-9A-F]{1,2}\1[0-9A-F]{1,2}\1[0-9A-F]{1,2}\1[0-9A-F]{1,2}$:

    And this is how I am using it:

    matched = re.match(macExpression, macAddress)

    I am trying to match mac addresses in the following formats 0:a0:c9:ee:b2:c0, 0-a0-c9-ee-b2-c0 & 0.a0.c9.ee.b2.c0 etc.

    I wasn't sure how to do it but then I read about back references and I thought that all was well... Alas If any one could lend a hand I would appreciate it very much.

    -matthew
  2. #2
  3. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Just a quick glance but, Have you tired it without escaping "1" - by back referancing i'm assuming you mean escaping special char's - since it isn't a special char so escaping it makes no sence. You said it works before you escape the chars so hopefully this will do it. Let me know if it doesn't.

    Have fun,
    Mark.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2003
    Posts
    8
    Rep Power
    0

    back referencing explained


    The 1 is supposed to be escaped. Escaping a number is what makes it a back reference rather then a literal number. The following quote from this site is where I first learned about back referencing. But it is also mentioned in Programing Python by Mark Lutz.
    One powerful option in creating search patterns is specifying that a subexpression that was matched earlier in a regular expression is matched again later in the expression. We do this using backreferences. Backreferences are named by the numbers 1 through 9, preceded by the backslash/escape character when used in this manner. These backreferences refer to each successive group in the match pattern, as in /(one)(two)(three)/\1\2\3/. Each numbered backreference refers to the group that, in this example, has the word corresponding to the number.
    In short a back reference refers to a previously matched expresion by order of occurance.

    -matthew
  6. #4
  7. Wacky hack
    Devshed Novice (500 - 999 posts)

    Join Date
    Apr 2001
    Location
    London, England
    Posts
    513
    Rep Power
    14
    Just a stab in the dark here, not having investigated backreferences much, but why are you using them here? It seems like you put a backreference in if you expect the same text to crop up later in the regexp, but you're not expecting that. What's wrong with a "normal" regexp?
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2002
    Location
    Norwich, UK
    Posts
    53
    Rep Power
    12
    The problem is your escaping. When you use double quotes around a string in Python it will treat backslashes as an escape character. Meaning to actually put a literal '\' in your string means using '\\'. Simply replace all instances of '\' with '\\' or use single quotes around the expression string.
  10. #6
  11. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Hi, sorry i've never heard of or used backreferances so i assumed you ment escaping but I have to say that telex is right, why are you using backreferances when a simple regex would do the job.

    Just make a regex to match the strings and use findall to get all the resulting matches..

    Mark.
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Posts
    133
    Rep Power
    12
    Another way to escape \ is to use a raw string, such as:
    Code:
    r'(?<=\n)\t\tdef ([\w\d_])\(.*?\):(?=\n)'
    That will work just as well as using \\ notation, and looks cleaner.

    Also, not knowing for what the RE is used, it seems as if a backreference is in actuality quite necessary here. If he matches '\-' in the text, he might not want to match '\:' later on, but just the '\-'. That's what back-references are for, it's not just repeating an earlier pattern, it's matching against an earlier match.
    Last edited by percivall; August 9th, 2003 at 04:49 AM.
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2002
    Location
    Norwich, UK
    Posts
    53
    Rep Power
    12
    In addition to my other post you'll also want to set the IGNORECASE flag otherwise it will currently only match against uppercase letters. And percivall was right, you'll need to make it a raw string if you use single quotes.

    So you might use your pattern like this:

    Code:
    rex = re.compile(r'^[0-9A-F]{1,2}(\:|\.|\-)[0-9A-F]{1,2}\1[0-9A-F]{1,2}\1[0-9A-F]{1,2}\1[0-9A-F]{1,2}\1[0-9A-F]{1,2}$', re.IGNORECASE)
    matches = rex.match('0:a0:c9:ee:b2:c0')

IMN logo majestic logo threadwatch logo seochat tools logo