December 26th, 2004, 02:23 AM
regex replace with function
I'm going to apologize in advance is any of this seems somewhat non-sensical. It's 3:20am and I've been up since 4:30am.
With that out of the way, here is my problem.
I'm using regular expressions in python, and I would like to match a substring, pass that substring to a function, and then replace that substring with the returned value from the function.
That probably gives a better idea of what I'm trying to do. Unfortunately, I can not figure out what value i need instead of "\1" to replace the matched value. It's passing what I believe to be chr(1) is.
If anyone could help me figure out this one, it would be helpful.
December 26th, 2004, 03:13 PM
From what I can determine you're trying to get all the \1's in a file. If it's \1, then i think \\1 would give you "\1" If this isn't what you're looking for then try breaking the problem into super simple parts and test each part. In this case, the first thing you'd do is see what getFile("\1") does. It looks like you alrady know its not returning what you want so then you'd look into the getFile function and see how it works. Also look into general string manipulation, with literals, raw strings etc. hth
December 26th, 2004, 03:26 PM
Here is what I'm trying to do.
I open up a file and read the contents. Let's say the file has a line with <!-- include anotherfile.txt --> in it.
The regex replaces <!-- include anotherfile.txt --> with the contents of anotherfile.txt. But, let's say anotherfile.txt has a line in it that says <!-- include yetanotherfile.txt -->.
I basically want to be able to find those lines in the files, and grab the filename part. Then pass that filename part to a function (this is recursion here). When the function returns the contents of the file (with all the includes replaced), those contents have to replace the <!-- include filename.txt --> portion.
Does that make it a bit more easier to understand?
December 26th, 2004, 07:11 PM
I know the feeling too well, seems to come with the lifestyle: learn to program – loose as much sleep as you get .
Originally Posted by Anthron
Anyway, since this is such a simple pattern there's not much point in using regular expressions IMO; here's an example function:
Note: there is a possibility that you could get yourself into an infinite loop here – so be a little careful.
 #!/usr/bin/env python
 defcopyFile(fileName, intoList):
 for line in file(fileName):
 if line.startswith('<!-- include ') and line.endswith(' --/>\n'):
 intoList = copyFile(line[13: -6], intoList)
 else: intoList.append(line.strip())
 return intoList
 if __name__ == '__main__':
 print '\n'.join(copyFile('testfile1.txt', ))
If you really wanted to use regex for some reason then I would do something a little different, but this will work fine for what you want; if you want an example using regex let me know .
Hope this helps ,
December 27th, 2004, 04:27 AM
It can be done with regex, but not in the way you were trying. From the docs:
So the solution will look something like this (WARNING: untested code ahead):
Unlike netytan's code, this should work even if the <--include...--> is spread over more than one line, or on a line with other text.
INCLUDE_REGEX = re.compile(r'<!--\s*include\s+(.*?)\s*-->', re.DOTALL | re.MULTILINE)
return re.sub(INCLUDE_REGEX, doReplace, text)
text = open(matchObj.group(1)).read()
As with netytan's code, this will get into an infinite loop if two files recursively import each other. This can be avoided by keeping a global dictionary of the filenames as you include them and throwing an exception if it has already been included.
Dave - The Developers' Coach