#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2013
    Posts
    2
    Rep Power
    0

    Comparing Large files and rewriting to new file


    Hi,

    I'm comparing a 400 meg xml file with a list containing 10k entries. Once I find the entry in the 400 meg file I am writing the content of that record to different file. My main problem is that its taking hours to produce the new file.
    See below code:

    PHP Code:
    $Master= "NewFile.xml";
    $ProdMaster = fopen($Master, 'w') or die("can't open file");
    #Header information;
    fwrite($ProdMaster, "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n");

    $getProdPath fopen("InitialProductionSkus-1.txt""r");
    $x=0;
    $Prodsku "no";
    while ((
    $data fgetcsv($getProdPath1000"|")) !== false) {
        
    $ProdPathSku[$x] = $data[0];

        
    $thisProdPathSku $ProdPathSku[$x];
        
    $ProdComparo "<Id>".$thisProdPathSku."</Id>";
           
    $x++;
      
    $file "400meg.xml";
    $f fopen($file"r");
    while ( 
    $line fgets($f1000) ) {

     
    $skuid trim($line);
    if (
    $Prodsku == "yes") {
        if (
    $skuid != "</Attributes>" and $skuid != "</Item>"){
            
    fwrite($ProdMaster,"$line");
        }
        if (
    $skuid == "</Attributes>") {
            
    fwrite($ProdMaster"   <Attribute name=\"color\" xml:lang=\"en-US\">\n");
            
    fwrite($ProdMaster"     <Value>true</Value>\n");
            
    fwrite($ProdMaster"   </Attribute>\n");
            
    fwrite($ProdMaster"$line");
        }

        if (
    $skuid == "</Item>") {
            
    fwrite($ProdMaster,"$line");
            
    $Prodsku "no";
        }    
        
        
    }else{
     
          if ( 
    $skuid == $ProdComparo) {
               
    fwrite($ProdMaster"    <Item>\n");
        
        
    fwrite($ProdMaster"$line");
    $Prodsku "yes";
    echo 
    "Here's the line data!!!!!\n";

    }else{
    #don't need
    }
    }
    }
    }
    fwrite($ProdMaster"</Item>\n");

    fclose($ProdMaster);
    ?>

    Thanks for your help!
    Last edited by ManiacDan; July 25th, 2013 at 09:11 AM.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Dec 2004
    Posts
    3,073
    Rep Power
    377
    of course 400MB file will not take minutes..
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2013
    Posts
    2
    Rep Power
    0
    Originally Posted by paulh1983
    of course 400MB file will not take minutes..
    Of course it won't. I was asking if there was a more efficient or faster way to process. When I say hours, at its current rate it will take more than 24 hours and I need it to process faster. Any constructive advice would be more than appreciative.
  6. #4
  7. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2009
    Location
    Jakarta, Indonesia.
    Posts
    235
    Rep Power
    33
    Maybe... by moving these 2
    PHP Code:
    $file "400meg.xml";
    $f fopen($file"r"); 
    out from the 1st while() loop?
  8. #5
  9. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,904
    Rep Power
    6352
    Right, for every line in your "master" file, you open this enormous XML file and read the whole thing 1000 characters at a time. Why not load the master file into memory, then loop through the XML file ONCE, comparing each line to EVERY line in your master file? In-memory operations are far faster than filesystem operations.
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.

IMN logo majestic logo threadwatch logo seochat tools logo