#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    7
    Rep Power
    0

    Dump Using XML::TWIG


    Hi,

    I am using xml::twig module to parse a xml file.

    I have started seeing a issue where perl is core dumping when there is a empty line in between the xml file.

    I collected the lines to see where it is core dumping and seems the empty line is causing the dump. And the file is not parsed anymore.

    Has anyone seen this or can they suggest a workaround for this ?

    Thanks-
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    776
    Rep Power
    495
    There must be a better solution within XML::TWIG, but if worse comes to worse, you could simply preprocess your XML file and remove empty lines.
  4. #3
  5. !~ /m$/
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    May 2004
    Location
    Reno, NV
    Posts
    4,221
    Rep Power
    1809
    XML::Twig

    Surviving an untimely death
    XML parsers are supposed to react violently when fed improper XML. XML::Parser just dies.

    XML::Twig provides the safe_parse and the safe_parsefile methods which wrap the parse in an eval and return either the parsed twig or 0 in case of failure.
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    7
    Rep Power
    0
    Thank you.

    How can I use the logic of safe_parse in my current script ?
    Could you give sample code ?
  8. #5
  9. !~ /m$/
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    May 2004
    Location
    Reno, NV
    Posts
    4,221
    Rep Power
    1809
    You didn't provide your current script, so I can only guess.

    Code:
    my $twig=XML::Twig->new();
    $twig->safe_parse;   # or
    $twig->safe_parsefile( 'doc.xml');
    Or you could be calling it as a class method. I don't know.

    Somewhere in your code you are calling a parse or parsefile method. Just replace that with the safe version.
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    7
    Rep Power
    0
    Thank you,

    The issue I do not think is related to the Space in the file. Seems like the file is parsed to a specific size and than it crashes. The interesting thing is the same code was working for 6 months with no issues.

    The code I have now :

    XML::TWIG->new( twig_handlers => { Fruit => \&Fruit })
    ->safe_parsefile('dic.xml')

    sub Fruit {
    my ($c, $fruit)=@_

    foreach my $food($fruit->children('Fruit'))
    {
    print MYFILE(';', $food->field(' Size'),
    $food->field('Shape'),
    $food->field('State'),"\n";
    $c->purge;
    }

    foreach my $veg($food->children('Veg'))
    {
    print MYFILE(';', $veg->field(' Size'),
    $veg->field('Shape'),
    $veg->field('State'),"\n";
    $c->purge;
    }

    close (MYFILE);

    }



    Pls note this is a GB file.

    Anything I can do to break this file - not sure if that will help.

    I am using perl version 5.8.1

    really really appreciate help.
  12. #7
  13. !~ /m$/
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    May 2004
    Location
    Reno, NV
    Posts
    4,221
    Rep Power
    1809
    Are you saying that the parser is crashing because the file is much larger than files you have processed in the past?

    If so, you should read up on processing an XML document chunk by chunk in the XML::Twig documentation.

    Or you could switch to one of the SAX type stream parsers.
    Last edited by keath; September 30th, 2013 at 10:46 PM.
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    7
    Rep Power
    0
    Thanks

    I don't think it is a memory issue as I tried today on a server with 24GB of RAM and still it crashed at the same location.

    I think it is something to do with the child in each tag.

    Has anyone seen this issue ? I went through some forum saw couple of mention that it might be related to the perl version. Not sure.

IMN logo majestic logo threadwatch logo seochat tools logo