Hi, I'm parsing a *very* big xml file (600MB+) in order to do multiple inserts
into a mysql table, but after about 80,000 inserts (5 minutes) I get a seg fault
and obviously the script stops. I'm expecting it to get up to around 1 million rows

The method I'm employing to read the file in follows and then one of my xml_element handlers does the insert, which i didn't bother
showing for the sake of brevity.

<?
# snipped code

function grab_file($parser,$file) {

#<snip>

while($data = fread($handle, 4096)) {
xml_parse($parser, $data, feof($handle));
}

#<snip>
}

$parser = xml_parser_create();

xml_set_element_handler($parser, "start_xml_element", "stop_xml_element");
xml_set_character_data_handler($parser, "character_xml_data");
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 0);

grab_file($parser, $file);
xml_parser_free($parser);

?>

The script itself seems to be working fine as I tested it with much
smaller data set and it went like a breeze, so I'm assuming this is coming down to system resources.

Under normal load the box has about 20MB of free memory (64MB max) and about 300MB swap space (400MB+ max)
Just before my seg fault, free mem had fallen to less than 1MB and swap space was down to 200MB, the 'size' of the script in ps view was
200MB+

So, when I'm reading this file in, does php not just buffer the 4KB that it's useing for that particular loop? and throw it away when
finished? This doesn't seem to be the case, but I'm confused as to how the script can be taking up 200MB, possibly the xml parsing
functions are buffering the whole thing?

in short, how can I keep memory useage down (by a large factor)?

I'm using php4.0.1pl2 and the script is running as a cgi

tia

Bealers

------------------
http://back-end.org