#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    2
    Rep Power
    0

    Question Simple Regular Expression searching for all two-letter clusters


    Hi, I have a string of random characters represented below:

    text = "iemndcidpwpkeejjslsdjnnvdjsskeelxkjwnsnejxx"

    I'm looking for a regular expression that will find all the two-letter character clusters within the above string. In other words, my ideal generated list would be as follows:

    ['ie','em','mn','nd',...] et cetera.

    My code so far is this:

    import re
    x=re.compile(r"..")
    x.findall(text)

    However, upon executing this, the resulting list generated is this:

    ['ie','mn','dc'...] etc.

    There is no overlap! I'm looking to find a way to include every two-letter character within my Regular Expression.

    Any help would be greatly appreciated!

    Thanks so much.
  2. #2
  3. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,957
    Rep Power
    1045
    Hi,

    using a regex to get substrings of a certain length is rather odd. That's not what regular expressions are for.

    If Python doesn't have an intelligent method to get consecutive characters, simply use a "for" loop and extract each two-character substring.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Feb 2004
    Location
    San Francisco Bay
    Posts
    1,939
    Rep Power
    1313
    As Jacques1 said, this isn't a job for regular expressions. Here's a one-liner that does what you want:
    Code:
    [text[i:i+2] for i in xrange(len(text)-1)]
    That's for Python 2. For Python 3, replace "xrange" with "range".

IMN logo majestic logo threadwatch logo seochat tools logo