August 1st, 2005, 01:17 PM
Can't ignore whitespace?
Is there really no builtin way for python to automatically ignore whitespace in parsed xml files? All I need is a simple method, attribute or parameter called 'ignoreWhitespace' or something, do I really have to write my own function that recursively strips out every piece of whitespace each time an xml file gets loaded? Say it aint so! I'm currently using xml.dom.minidom but no luck there. Are there other xml handling modules that do have this functionality?
If so, please let me know!
August 2nd, 2005, 11:59 AM
I've never done more than the most basic XML parsing in Python so I'm not quite sure what, if any modules handle this. What white space are you trying to strip out?
Also: why do you have to strip the spaces recursively, Pythons built in replace() method for strings should work fine .
Should give you a nice long list will all the spaces stripped out of each line. Alternatively you could do the replace() once and then split the file into lines using str.splitlines().
lines = [line.replace(' ', '') for line in file('sample.txt)]
Hope this helps,
August 2nd, 2005, 12:03 PM
If you don't want the file as a list, which I assumed above then you can simply replace() all the spaces in the file once. If you need a file object then you can use StringIO modules.
August 2nd, 2005, 12:48 PM
Thanks for the help!
About your question:
I load my structure from an xml file using xml.dom.minidom.parse("filename.xml"). Lets say the xml code in this file looks like this:
now, you would think that the <root> element only has one child: <child>. But, according to Python it has three:
1. text node (\n\t)
3. text node (\n)
so, that's the whitespace I'm talking about: The spaces used to make the xml file readable. Normally you can specify that the xml handler should ignore these whitespace nodes, but apparently that is not an option here. So now when I want to loop through a node's childnodes I have to check everytime whether it's really an actual element, or strip out all the whitespace in a file. Both solutions will lead to a unneccesary decrease in performance speed when handling large xml files. I know there are a dozen ways to deal with whitespace, like the one you provided, but I was just wondering whether there was a setting that ignores whitespace automatically.
August 2nd, 2005, 02:10 PM
All I can say is: strange
Could you show us the code you use to parse this.