#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2004
    Posts
    101
    Rep Power
    11

    How to remove blank-lines in a xml-file before parsing!


    Hi,
    I am trying to parse an xml-file using minidom.parse
    method.
    • I want to remove any blank lines in the begining of the xml-file before I start parsing so that my root-element is always at the first line.
    • Also is there a way to remove all the blank-lines in the xml-file before parsing.
    As currently parsing works fine if there are no blank-lines at the top means before the root-element but if I put any blank-line before the first-element the parser doesn't work.

    Thanks
  2. #2
  3. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Probably the best way to do this would be to iterate though the file and write only the none-blank lines to a temporary file, then pass the name of this temporary files (or the file object itself) the the parse() method.

    This can be done as easily as this:

    Code:
    #!/usr/bin/env python
    
    import os, random
    
    #Create a random name for the temp file.
    path = random.sample(5, 'temporary_file')
    #Create a new temp file to write to.
    temp = file(path, 'w')
    
    for line in file('base.xml'):
        #Iterate over each line in the file and if the
        #line is not blank then write it to the temp
        #file.
        if line.strip():
            temp.write()
    #close the temp file.
    temp.close()
    
    #...
    #Parse the temp file using 'path' as the file name.
    #...
    
    #Finally remove the temp file.
    os.remove(path)
    Note: This has'nt been tested and is here to illustrate the idea only though it should work.

    But you can also create temporary files using the tempnam() function is the os module, or by using the tempfile module though it is just as simple to use random in this case.

    http://www.python.org/doc/2.3.4/lib/...-tempfile.html
    http://www.python.org/doc/2.3.4/lib/module-os.html
    http://www.python.org/doc/2.3.4/lib/module-random.html

    Infact it would not be hard at all to create a temporary file object that could be increadably easy to use .

    Have fun,

    Mark.
    Last edited by netytan; July 1st, 2004 at 01:58 PM.
    programming language development: www.netytan.com Hula

  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2004
    Posts
    10
    Rep Power
    0

    Smile


    try:

    import re
    print ''.join(re.split(r'[\n]+\s*', open('base.xml').read())).strip()
  6. #4
  7. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    The disadvangage of this is that you're reading the whole file into to memory and preforming an action on it. If the file is large then this isn't going to be a good thing. For small files it fine but use file itorators where possiable. In this case you can also avoid the (slight) overhead of importing and using regex.

    Why not just make changes to the XML file? Surly, if its not bing treated as valid XML then you want to make it so. So You could write the none blank lines back into the original file and be done with it .

    Take care,

    Mark.
    Last edited by netytan; July 2nd, 2004 at 01:53 AM.
    programming language development: www.netytan.com Hula


IMN logo majestic logo threadwatch logo seochat tools logo