XML Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming Languages - MoreXML Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
SlickEdit: Code in over 40 languages across 7 platforms. SlickEdit’s unmatched power, speed, and flexibility allows even the most accomplished developers to write better code faster. Download a free trial today!
  #1  
Old November 19th, 2003, 05:52 AM
fabiank fabiank is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2003
Posts: 1 fabiank User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Angry loosing data while parsing XML with Expat

Hello,

I got a weird problem and need your help and ideas...

I´ve written an php application wich imports data in XML format and writes this data to a MySQL database to have a faster access.

The application uses Expat 1.95.7 via php to render the xml data. (The same Error occured with other Versions)

First everything seemed to work fine. But now I noticed that something goes wrong:

If the ammount of XML data is larger than used for testing the application, we´re talking about something between 2 and 4 MB, some data gets lost.

If the structure of the file doesn´t change the lost data is always the same.

But if I change the structure of the File e.g. by adding a line somewhere the problem occures on another place.

For Example:

<event>

<SysId>27</SysId>

<ClientId>1</ClientId>

<EventNo>9402</EventNo>

<EventName>Martin Schneider Karben</EventName>

<category>

<Type>Keine Veranstaltungsart</Type>

</category>

.....

</event>

Let´s assume that "Mar" of the data between the <EventName> Tags gets lost and we get "tin Schneider Karben".

When I insert a Line above the <event> block the "t" from "tin" gets also lost, so we have "in Schneider Karben".

Why ?

I also tried to dynamically generate parts of the xml data with php:


//--------------- CODE ------------------------------------------------------//
<?php
// num of datasets
$datasets = 2000;
// build the xml string
$str .= '<?xml version="1.0" encoding="ISO-8859-1"?><program xmlns="http://www.orestes.de">'."\n";
for($i=0; $i<$datasets; $i++){
$str .= '<event>
<SysId>27</SysId>
<ClientId>1</ClientId>
<EventNo>'.$i.'</EventNo>
<EventName>NUM'.$i.'</EventName>
<category>
<Type>Keine Veranstaltungsart</Type>
</category>
<location>
<Name>location_name_'.$i.'</Name>
<Street>Strasse</Street>
<ZIP>32333</ZIP>
<City>City</City>
<Country></Country>
</location>
<Currency>EUR</Currency>
<show>
<ShowNo>1</ShowNo>
<ShowDate>31.12.2004</ShowDate>
<ShowTime>20:00</ShowTime>
<ShowWeekday>Freitag</ShowWeekday>
<ShowPage href="32160001.jsp">TPP Gutscheine</ShowPage>
<Info></Info>
<block number="0">
<FreeSeats>61</FreeSeats>
</block>
</show>
</event>';
}
$str .= "</program>";
// write the data to file
$fp = fopen("../DATA/elektra.xml","w");
fputs($fp, $str);
fclose($fp);
?>
//--------------- CODE END --------------------------------------------------//



with this generated file NUM1644 becomes 1644 and NUM1195 becomes 5. All other data is parsed correctly ?!?!


Here the Code of the two Classes used for parsing and importing (also residing in the zip file):


//--------------- CODE ------------------------------------------------------//
<?php
require_once "DB.php";

class ElektraImporter
{
var $FileHash;
var $DAO;
var $XMLDataFile;

function ElektraImporter(){
$this->XMLDataFile = Config::getAttribute("Config/Config_Base", "elektra_xml");

$DB = DB::connect(Config::getAttribute("Config/Config_Base", "dsn"));
if(DB::isError($DB)){die(DB::ErrorMessage($DB));}
$this->DAO = Loader::buildObject("XML/ElektraDAO", null, $DB);
}
/**
* checks for changes on the elektra xml data.
* If there are changes the database will be refreshed
*/
function checkForUpdate(){
/* if there are changes */
if($this->_hasElektraFileChanged($this->DAO->getElektraFileHashCode())){
/* read the file and update the database */
$this->DAO->updateElektraData($this->_getElektraData());
} else {
/* everything is o.k. */
}
}
/**
* parse the xml file and get the needed data
* @return array $data
*/
function _getElektraData(){
$Parser = &Loader::buildObject("XML/ElektraParser", null, array(&$arr));
if( PEAR::isError($Parser) ){
die (PEAR::errorMsg($Parser));
}
$Parser->setInputFile($this->XMLDataFile);
if(PEAR::isError($Parser)){ die($Parser->getMessage()); }

$data = $Parser->getXMLData();

$data['filehash'] = md5_file($this->XMLDataFile);

return $data;
}
/**
* checks if the file has changed
* @return boolean
*/
function _hasElektraFileChanged($filehash = ""){
$this->FileHash = md5_file($this->XMLDataFile);

if($filehash == $this->FileHash){
return false;
} else {
return true;
}
}
}
?>
//--------------- CODE END --------------------------------------------------//


The Parser Class extending the PEAR::XML_Parser


//--------------- CODE ------------------------------------------------------//
<?php
require_once "XML/Parser.php";


class ElektraParser extends XML_Parser
{
var $XMLData;
var $EventNo;
var $EventName;
var $LastEventNo;
var $ActualEventNo;
var $EventCnt = 0;
var $ShowCnt = 0;

function ElektraParser(&$arr){
$this->XMLData = &$arr;
$this->XML_Parser("ISO-8859-1", "event", "ISO-8859-1");
}

function startHandler($xp, $element, $attribs) {
$this->Element = $element;
$this->Attribs = $attribs;
}

function endHandler($xp, $element) {
if ( $element == "EVENT" ){
/* increase event counter */
$this->EventCnt++;
/* set show counter to 0 */
//Debug::add("Event schließt, <br>ShowCnt steht auf ".$this->ShowCnt."<br>");
$this->ShowCnt = 0;
//Debug::add("EventCnt wird ".$this->EventCnt." gesetzt<br>ShowCnt wird 0 gesetzt<br>");
}
elseif ( $element == "SHOW" ){
/* increase show counter for the next show */
$this->ShowCnt ++;
}
$this->Element = "";
}

function cdataHandler($xp, $cdata) {
if($this->Element == "DATE"){
$this->XMLData['creationdate'] = $cdata;
}
elseif($this->Element == "TIME"){
$this->XMLData['creationtime'] = $cdata;
}
/* every event has a sysid the sysid and the eventno make the unique eventid */
elseif($this->Element == "SYSID"){
$this->XMLData['event'][$this->EventCnt]['sysid'] = $cdata;
}
elseif($this->Element == "CLIENTID"){
$this->XMLData['event'][$this->EventCnt]['clientid'] = $cdata;
}
elseif($this->Element == "EVENTNO"){
$this->XMLData['event'][$this->EventCnt]['eventno'] = $cdata;
}
elseif($this->Element == "EVENTNAME"){
$this->XMLData['event'][$this->EventCnt]['eventname'] = $cdata;
}
elseif($this->Element == "NAME"){
$this->XMLData['event'][$this->EventCnt]['location'] = $cdata;
}
elseif($this->Element == "CITY"){
$this->XMLData['event'][$this->EventCnt]['city'] = $cdata;

/* eventgroups */
/* get the position of the first occurence of the city in the eventname */
$pos = strpos($this->XMLData['event'][$this->EventCnt]['eventname'], $cdata);
/* if there´s the city in the name */
if( $pos ){
$this->XMLData['event'][$this->EventCnt]['group'] = trim(substr($this->XMLData['event'][$this->EventCnt]['eventname'], 0, $pos));
}
/* otherwise we take the whole eventname as group */
else {
$this->XMLData['event'][$this->EventCnt]['group'] = trim($this->XMLData['event'][$this->EventCnt]['eventname']);
}
}
/* get the shows */
elseif($this->Element == "SHOWNO") {
$this->XMLData['event'][$this->EventCnt]['show'][$this->ShowCnt]['showno'] = $cdata;
}
elseif($this->Element == "SHOWDATE") {
$this->XMLData['event'][$this->EventCnt]['show'][$this->ShowCnt]['showdate'] = $cdata;
}
elseif($this->Element == "SHOWTIME") {
$this->XMLData['event'][$this->EventCnt]['show'][$this->ShowCnt]['showtime'] = $cdata;
}
elseif($this->Element == "SHOWPAGE"){
$this->XMLData['event'][$this->EventCnt]['show'][$this->ShowCnt]['showpage'] = $this->Attribs['HREF'];
}
}
function defaultHandler($xp, $cdata) {

}
function &getXMLData(){
$p = $this->parse();
if(PEAR::isError($p)){ die($p->getMessage()); }
return $this->XMLData;
}
}
?>
//--------------- CODE END --------------------------------------------------//



This Problme drives me crazy !!!

I have no idea what the reason is or even might be =(
a bug in Expat ?!? ... i can´t really believe
bad formatted XML ? ... not really !?!
problems with expats memory management ?!?
or just my fault? ... where ?

But it seems that the problem is coupled to the format of the xml file.
If i take out linebreaks or add lines the error occures on other places !?!
But the same structure always produces the same errors ?!?


My XML skills are not that good so I would be very pleased if you have an idea or an advice for me.

Thanks for your advice.

With best regards

Fabian Krüger

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming Languages - MoreXML Programming > loosing data while parsing XML with Expat


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 5 hosted by Hostway