July 12th, 2004, 11:52 AM
-
Need help with re module
Could anyone write a quick reg expression script to extract anything in between some sort of tag? For example, <title>this needs to become a python var</title> .....thanks!
July 12th, 2004, 03:45 PM
-
There are two answers to this question: (a) yes, it is trivial, and (b) no, it is virtually impossible.
If you do not care about nested tags then the answer is trivial:
regex = r'<title>.*?</title>'
will match anything between <title> tags. However if there is the possibility that tags will be nested then it cannot be done with a single regex (as far as I know) and needs more complex parsing.
Dave - The Developers' Coach
July 12th, 2004, 06:59 PM
-
And another caveat: If you're going to be heavy HTML/XML/SGML parsing, you should use the standard Python libraries for doing so. Parsing via regular expressions is often not the best solution simply because the regexes themselves are generally pretty ugly. Using simple string processing and/or *ML parsing libs like the ones mentioned above are often better solutions.