#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2005
    Location
    Amsterdam, the Netherlands
    Posts
    12
    Rep Power
    0

    Can't ignore whitespace?


    Is there really no builtin way for python to automatically ignore whitespace in parsed xml files? All I need is a simple method, attribute or parameter called 'ignoreWhitespace' or something, do I really have to write my own function that recursively strips out every piece of whitespace each time an xml file gets loaded? Say it aint so! I'm currently using xml.dom.minidom but no luck there. Are there other xml handling modules that do have this functionality?

    If so, please let me know!
  2. #2
  3. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    I've never done more than the most basic XML parsing in Python so I'm not quite sure what, if any modules handle this. What white space are you trying to strip out?

    Also: why do you have to strip the spaces recursively, Pythons built in replace() method for strings should work fine .

    Code:
    lines = [line.replace(' ', '') for line in file('sample.txt)]
    Should give you a nice long list will all the spaces stripped out of each line. Alternatively you could do the replace() once and then split the file into lines using str.splitlines().

    Hope this helps,

    Mark.
    programming language development: www.netytan.com Hula

  4. #3
  5. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    If you don't want the file as a list, which I assumed above then you can simply replace() all the spaces in the file once. If you need a file object then you can use StringIO modules.

    Mark.
    programming language development: www.netytan.com Hula

  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2005
    Location
    Amsterdam, the Netherlands
    Posts
    12
    Rep Power
    0
    Thanks for the help!

    About your question:
    What white space are you trying to strip out?
    I load my structure from an xml file using xml.dom.minidom.parse("filename.xml"). Lets say the xml code in this file looks like this:
    Code:
    <?xml version="1.0"?>
    <root>
        <child />
    </root>
    now, you would think that the <root> element only has one child: <child>. But, according to Python it has three:

    1. text node (\n\t)
    2. Element(<child>)
    3. text node (\n)

    so, that's the whitespace I'm talking about: The spaces used to make the xml file readable. Normally you can specify that the xml handler should ignore these whitespace nodes, but apparently that is not an option here. So now when I want to loop through a node's childnodes I have to check everytime whether it's really an actual element, or strip out all the whitespace in a file. Both solutions will lead to a unneccesary decrease in performance speed when handling large xml files. I know there are a dozen ways to deal with whitespace, like the one you provided, but I was just wondering whether there was a setting that ignores whitespace automatically.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2004
    Posts
    394
    Rep Power
    51
    Hi!

    All I can say is: strange
    Could you show us the code you use to parse this.

    Regards, mawe

IMN logo majestic logo threadwatch logo seochat tools logo