#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2004
    Posts
    2
    Rep Power
    0

    regex replace with function


    I'm going to apologize in advance is any of this seems somewhat non-sensical. It's 3:20am and I've been up since 4:30am.

    With that out of the way, here is my problem.

    I'm using regular expressions in python, and I would like to match a substring, pass that substring to a function, and then replace that substring with the returned value from the function.

    includeregex.sub(parseText(getFile("\1")), text)

    That probably gives a better idea of what I'm trying to do. Unfortunately, I can not figure out what value i need instead of "\1" to replace the matched value. It's passing what I believe to be chr(1) is.

    If anyone could help me figure out this one, it would be helpful.
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2004
    Posts
    17
    Rep Power
    0
    From what I can determine you're trying to get all the \1's in a file. If it's \1, then i think \\1 would give you "\1" If this isn't what you're looking for then try breaking the problem into super simple parts and test each part. In this case, the first thing you'd do is see what getFile("\1") does. It looks like you alrady know its not returning what you want so then you'd look into the getFile function and see how it works. Also look into general string manipulation, with literals, raw strings etc. hth
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2004
    Posts
    2
    Rep Power
    0
    Here is what I'm trying to do.

    I open up a file and read the contents. Let's say the file has a line with <!-- include anotherfile.txt --> in it.

    The regex replaces <!-- include anotherfile.txt --> with the contents of anotherfile.txt. But, let's say anotherfile.txt has a line in it that says <!-- include yetanotherfile.txt -->.

    I basically want to be able to find those lines in the files, and grab the filename part. Then pass that filename part to a function (this is recursion here). When the function returns the contents of the file (with all the includes replaced), those contents have to replace the <!-- include filename.txt --> portion.

    Does that make it a bit more easier to understand?
  6. #4
  7. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Originally Posted by Anthron
    I'm going to apologize in advance is any of this seems somewhat non-sensical. It's 3:20am and I've been up since 4:30am.
    I know the feeling too well, seems to come with the lifestyle: learn to program loose as much sleep as you get .

    Anyway, since this is such a simple pattern there's not much point in using regular expressions IMO; here's an example function:

    Code:
    [001] #!/usr/bin/env python
    [002] 
    [003] defcopyFile(fileName, intoList): 
    [004]     for line in file(fileName): 
    [005]         if line.startswith('<!-- include ') and line.endswith(' --/>\n'): 
    [006]             intoList = copyFile(line[13: -6], intoList)
    [007]         else: intoList.append(line.strip())
    [008]     return intoList
    [009] 
    [010] if __name__ == '__main__': 
    [011]     print '\n'.join(copyFile('testfile1.txt', []))
    Note: there is a possibility that you could get yourself into an infinite loop here so be a little careful.

    If you really wanted to use regex for some reason then I would do something a little different, but this will work fine for what you want; if you want an example using regex let me know .

    Hope this helps ,

    Mark.
    Attached Files
    programming language development: www.netytan.com Hula

  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Feb 2004
    Location
    London, England
    Posts
    1,585
    Rep Power
    1373
    It can be done with regex, but not in the way you were trying. From the docs:

    sub( pattern, repl, string[, count])
    <snip>
    If repl is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string. For example:

    >>> def dashrepl(matchobj):
    .... if matchobj.group(0) == '-': return ' '
    .... else: return '-'
    >>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
    'pro--gram files'
    So the solution will look something like this (WARNING: untested code ahead):
    Code:
    INCLUDE_REGEX = re.compile(r'<!--\s*include\s+(.*?)\s*-->', re.DOTALL | re.MULTILINE)
    
    def  replaceIncludes(text):
      return re.sub(INCLUDE_REGEX, doReplace, text)
    
    def doReplace(matchObj):
        text = open(matchObj.group(1)).read()
        return replaceIncludes(text)
    Unlike netytan's code, this should work even if the <--include...--> is spread over more than one line, or on a line with other text.

    As with netytan's code, this will get into an infinite loop if two files recursively import each other. This can be avoided by keeping a global dictionary of the filenames as you include them and throwing an exception if it has already been included.

    Dave - The Developers' Coach

IMN logo majestic logo threadwatch logo seochat tools logo