#1
  1. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,464
    Rep Power
    594

    XML::Simple String Size Limit?


    I am trying to parse some XML output from a command and XMLin seems to go into an unending loop (just a guess as it never finishes (>12 hours). Admittedly the string I am trying to parse is very large (114156848 bytes). Is there some limit on the size of the string XMLin can handle? In any case can someone suggest a way to resolve this (will piping the XML to a file make any difference)? TIA.
    There are 10 kinds of people in the world. Those that understand binary and those that don't.
  2. #2
  3. !~ /m$/
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    May 2004
    Location
    Reno, NV
    Posts
    4,259
    Rep Power
    1810
    How do you know whether your file is too big for XML::Simple to handle? The rule of thumb is that XML expands by a factor of ten when read into memory. The implication is that if you have a few hundred megabytes of free memory on your workstation, XML::Simple should be able to handle XML files that are up to a few tens of megabytes in size.
    Source: IBM.com

    While your XML file isn't enormous, you are having to read the entire thing into memory, and at the same time build a single perl memory structure that contains all the same data. Yeah, it's going to be big, and that's the wrong approach.

    When processing an XML file that is so enormous, the best approach is to use a stream-based parser, generally one built on SAX.

    And it's a different sort of experience. It's event based, so you wait to be notified of certain events and then react to them.

    Here's an old article with one example: xml.com

    I don't know if XML::Parser is the best choice for perl anymore. I'm more recently done this job in other languages.

    Once you have received the data you are looking for, you usually would save it at that time, and then clear memory for the next event. You don't try and hold all the entire file's worth of data in memory.

    So if the point of loading the entire XML file is to search over the whole set of data, either you limit your data collection to just the needed portion and an amount that will fit in memory, or you save the data into a database that is indexed and can perform the selections and comparisons for you.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,464
    Rep Power
    594
    Thanks for the reply. I don't know that an event based parser is suitable for my script although I admittedly have not looked at it yet. I don't want to over complicate this. All I need is to sequentially examine the nodes and process certain ones with their sub-nodes if they exist. Would using 'XMLSimple_Load_File' be just as bad, memory wise?
    There are 10 kinds of people in the world. Those that understand binary and those that don't.
  6. #4
  7. !~ /m$/
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    May 2004
    Location
    Reno, NV
    Posts
    4,259
    Rep Power
    1810
    Would using 'XMLSimple_Load_File' be just as bad, memory wise?
    What is that? A method? Do you have a link?

    I expect it would be though, since the two options you have would be to load the entire XML file into memory, or use a stream parser to pick out the parts you want. And using a stream parser implies writing the bits that define what you want.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,464
    Rep Power
    594
    Rats. I was in a hurry and I'm working on multiple projects. That was a PHP function not perl. Sorry. I guess I may be stuck with your suggestion.
    Last edited by gw1500se; November 28th, 2012 at 12:55 PM.
    There are 10 kinds of people in the world. Those that understand binary and those that don't.

IMN logo majestic logo threadwatch logo seochat tools logo