#1
  1. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Posts
    206
    Rep Power
    12

    Regular expression only matches once


    Hi,

    I have a regular expression for a web page. The pattern matches more than once in a page. If I do:
    Code:
    found = rexTableEntries.search(page)
      if found:
        print found.groups()
    Only the last pattern that matched is printed out. The presious in the page are not printed.

    Who can I change this so that? Some ideas?
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2004
    Location
    There where the rabbits jump
    Posts
    556
    Rep Power
    11
    yes
    it only finds the first one and then stops searching
    you have to use a while loop and test if the last one == 0 or whatever it is
    Those people who think they know everything are a great annoyance to those of us who do.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Feb 2004
    Location
    London, England
    Posts
    1,585
    Rep Power
    1373
    Alternatively, from the re docs:

    findall( pattern, string[, flags])
    Return a list of all non-overlapping matches of pattern in string. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match. New in version 1.5.2. Changed in version 2.4: Added the optional flags argument.

    finditer( pattern, string[, flags])
    Return an iterator over all non-overlapping matches for the RE pattern in string. For each match, the iterator returns a match object. Empty matches are included in the result unless they touch the beginning of another match. New in version 2.2. Changed in version 2.4: Added the optional flags argument.
    Dave
  6. #4
  7. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Posts
    206
    Rep Power
    12
    Originally Posted by DevCoach
    Alternatively, from the re docs:
    Dave
    findall was that what I'm looking for.
    With the follwing code it works.
    Code:
    matchList = re.findall(rexTableEntries, page)
    
    for curMatch in matchList:
      print curMatch
    Thanks for the quick replies.

IMN logo majestic logo threadwatch logo seochat tools logo