Thread: reg expressions

    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2005
    Posts
    4
    Rep Power
    0

    reg expressions


    Hello,

    I need to port some code from Python to C++. I need to know what these reg expressions actually do:

    re.sub(r'[%s%s%s%s\\]' % (string.punctuation, string.digits,
    string.ascii_letters, string.whitespace), ' ', s)

    return re.sub(ur'[%s%s%s%s%s%s%s%s%s]' % (1,2,3,4), '', ustr)

    return re.sub(ur'[%s%s%s%s%s%s%s%s]' % (COMMA, SEMICOLON, QUESTION, ZERO,PERCENT, THOUSANDS, STAR,FULL_STOP), '', ustr)

    Simply describe me in a few words so I can write my own code in C++.

    Thank you,
    R
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Posts
    624
    Rep Power
    35
    re.sub(r'[%s%s%s%s\\]' % (string.punctuation, string.digits,string.ascii_letters, string.whitespace), ' ', s)
    Replacing upper and lowercase characters, digits, whitespace and punctuation characters with '' in the string s.

    return re.sub(ur'[%s%s%s%s%s%s%s%s%s]' % (1,2,3,4), '', ustr)
    Replace the characters 1,2,3 or 4 with nothing in the string ustr.

    return re.sub(ur'[%s%s%s%s%s%s%s%s]' % (COMMA, SEMICOLON, QUESTION, ZERO,PERCENT, THOUSANDS, STAR,FULL_STOP), '', ustr)
    Replace any of the named characters with '' in the string ustr.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2005
    Posts
    4
    Rep Power
    0
    Thank you very much. That's what I needed to know.

    Just a couple more:

    m1 = re.match(ur'^(\S\S)%s%s(\S)$' % (ALEF, ALEF), ustr)

    m = re.search(ur'^([%s]%s|[%s]%s|[%s]%s|[%s]%s|[%s]%s|[%s]%s|)(.+?)(%s|%s|%s|%s|%s|%s|%s|%s|%s|%s|%s[%s]|%s|%s|%s|%s|%s[%s]|[%s]|)$' % (
    WAW + FEH + BEH,
    ALEF + LAM,
    BEH + YEH + LAM + MEEM + TEH + WAW + SEEN + NOON)


    I will figure out the rest based on these.


    Thanks and best regards,
    R
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Posts
    624
    Rep Power
    35
    That is some of the ugliest 'Python' I've seen in a long while

    Plus, I'm surprised some of these aren't raising errors like "TypeError: not enough arguments for format string"...

    Code:
    m1 = re.match(ur'^(\S\S)%s%s(\S)$' % (ALEF, ALEF), ustr)
    Hmm.
    *Quickly checks the amazing The Regex Coach*

    The \S represents any not-whitespace character, so numbers, letters, punctuation, but not tabs, spaces, returns, etc.

    So, this matches a string starting (^) with two not-whitespace characters, followed by two of whatever "ALEF" contains, followed by another not-whitespace character and the end of the string (nothing else).
    It also stores the three not-white-space characters in unnamed groups, which can be accessed from m1.groups(n)

    Code:
    m = re.search(ur'^([%s]%s|[%s]%s|[%s]%s|[%s]%s|[%s]%s|[%s]%s|)(.+?)(%s|%s|%s|%s|%s|%s|%s|%s|%s|%s|%s[%s]|%s|%s|%s|%s|%s[%s]|[%s]|)$' % (
    WAW + FEH + BEH,
    ALEF + LAM,
    BEH + YEH + LAM + MEEM + TEH + WAW + SEEN + NOON)
    That can't work unless WAW, FEH, etc. are lists or tuples. If they were strings, it would be trying to substitute one string into a format string with 20+ places...

    [] denotes a character class, so any characters in it are matched as individual characters.
    | means "or"

    So that's a huge chain of (redunant?) things like:

    Match WAW+FEH+BEH...[0] characters followed by WAW+FEH+BEH...[0] string
    OR
    Match WAW+FEH+BEH...[1] characters followed by WAW+FEH+BEH...[1] string
    OR
    Match WAW+FEH+BEH...[2] characters followed by WAW+FEH+BEH...[2] string
    OR
    An empty string.

    and store it in a group.
    If it's followed by as many as possible, but at least one other character. Store those in a group.

    Followed by
    WAW+FEH+BEH...[n] string
    OR
    WAW+FEH+BEH...[n] string
    OR
    WAW+FEH+BEH...[n] string
    OR
    WAW+FEH+BEH...[n] string
    OR
    WAW+FEH+BEH...[n] string followed by WAW+FEH+BEH...[n] characters
    OR
    WAW+FEH+BEH...[n] string followed by WAW+FEH+BEH...[n] characters
    OR
    WAW+FEH+BEH...[n] string followed by WAW+FEH+BEH...[n] characters
    OR
    WAW+FEH+BEH...[n] string followed by WAW+FEH+BEH...[n] characters
    OR
    WAW+FEH+BEH...[n] characters.
    OR
    ...
    OR
    An empty string.

    and store those in the third group.


    Hilariously, after that, the string "a" would match (an empty string followed by one or more characters followed by an empty string).
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2005
    Posts
    4
    Rep Power
    0
    Thank you very very much. I didn't paste the whole statements for simplicity. But I get the idea of how to read them.


    Best regards,
    R.
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2005
    Posts
    4
    Rep Power
    0
    Hello,

    One last more :-)

    What is the difference between this and that, i.e. between

    [%s%s]

    and

    [%s,%s]

    in regular expression. Example:

    m2 = re.match(ur'^(\S)([%s,%s])(\S)%s(\S)$' % (TEH, YEH, ALEF),ustr)
    m4 = re.match(ur'^(\S)%s(\S)([%s%s])(\S)$' % (ALEF, YEH, WAW), ustr)


    Regards,
    R

IMN logo majestic logo threadwatch logo seochat tools logo