June 30th, 2004, 04:20 PM
How to remove blank-lines in a xml-file before parsing!
I am trying to parse an xml-file using minidom.parse
- I want to remove any blank lines in the begining of the xml-file before I start parsing so that my root-element is always at the first line.
As currently parsing works fine if there are no blank-lines at the top means before the root-element but if I put any blank-line before the first-element the parser doesn't work.
- Also is there a way to remove all the blank-lines in the xml-file before parsing.
Probably the best way to do this would be to iterate though the file and write only the none-blank lines to a temporary file, then pass the name of this temporary files (or the file object itself) the the parse() method.
This can be done as easily as this:
Note: This has'nt been tested and is here to illustrate the idea only though it should work.
import os, random
#Create a random name for the temp file.
path = random.sample(5, 'temporary_file')
#Create a new temp file to write to.
temp = file(path, 'w')
for line in file('base.xml'):
#Iterate over each line in the file and if the
#line is not blank then write it to the temp
#close the temp file.
#Parse the temp file using 'path' as the file name.
#Finally remove the temp file.
But you can also create temporary files using the tempnam() function is the os module, or by using the tempfile module though it is just as simple to use random in this case.
Infact it would not be hard at all to create a temporary file object that could be increadably easy to use .
Last edited by netytan; July 1st, 2004 at 01:58 PM.
print ''.join(re.split(r'[\n]+\s*', open('base.xml').read())).strip()
The disadvangage of this is that you're reading the whole file into to memory and preforming an action on it. If the file is large then this isn't going to be a good thing. For small files it fine but use file itorators where possiable. In this case you can also avoid the (slight) overhead of importing and using regex.
Why not just make changes to the XML file? Surly, if its not bing treated as valid XML then you want to make it so. So You could write the none blank lines back into the original file and be done with it .
Last edited by netytan; July 2nd, 2004 at 01:53 AM.