#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2012
    Posts
    3
    Rep Power
    0

    Result isnt showing


    My code should look at sitemap and get all the new information and input this to the database but I have no idea why my code isnt doing that.

    Please help

    [PHPNET="PHP code"]PHP

    <?php

    require_once 'simplehtmldom/simple_html_dom.php';

    $_ECHO = FALSE;
    $html = new simple_html_dom();
    $printerListFileName = "pl_printer_list.txt";
    $outputFileName = "pl_printers_new.txt";

    //////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    //
    // cleanBadChars()
    //
    /////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    function cleanBadChars( $plFieldValue )
    {
    $badChars = array('', '', '', '', '', '');

    $cleanFieldValue = str_replace( $badChars, "", $plFieldValue );

    return( trim( $cleanFieldValue ) );
    }

    //////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    //
    // cleanDBField()
    //
    /////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    function cleanDBField( $CLPField, $plFieldValueDirty )
    {
    global $_ECHO;

    if($_ECHO) echo "$CLPField=>$plFieldValueDirty<br />";

    $plFieldValueClean = cleanBadChars( $plFieldValueDirty );

    switch( $CLPField )
    {
    case "productid":
    $clpFieldValueClean = preg_replace( "/Item Number: &nbsp;/", "", $plFieldValueClean );
    $clpFieldValueClean = str_replace( "/", "_", $clpFieldValueClean );
    break;

    case "name":
    $clpFieldValueClean = trim( strstr( $plFieldValueClean, " " ) );
    $clpFieldValueClean = strstr( $clpFieldValueClean, "Minolta" ) ? strstr( $clpFieldValueClean, " " ) : $clpFieldValueClean;
    break;

    case "manufacturer":
    $clpFieldValueClean = ucfirst( strtolower( substr( $plFieldValueClean, 0, strpos( $plFieldValueClean, " " ) ) ) );
    $clpFieldValueClean = strstr( $clpFieldValueClean, "Konica" ) ? $clpFieldValueClean . " Minolta" : $clpFieldValueClean;
    $clpFieldValueClean = strstr( $clpFieldValueClean, "Hp" ) ? "HP" : $clpFieldValueClean;
    break;

    case "format":
    $clpFieldValueClean = $plFieldValueClean;
    break;

    case "platform":
    $arr = array();
    $arr[] = strstr( $plFieldValueClean, "Windows" ) ? "Windows" : "";
    $arr[] = strstr( $plFieldValueClean, "Mac" ) ? "Mac" : "";
    $clpFieldValueClean = count( $arr ) == 2 ? implode( " / ", $arr ) : implode( "", $arr );
    break;

    case "bwppm":
    $clpFieldValueClean = strstr( $plFieldValueClean, "i" ) ? substr( $plFieldValueClean, 0, strpos( $plFieldValueClean, "i" ) ) : substr( $plFieldValueClean, 0, strpos( $plFieldValueClean, "p" ) );
    $clpFieldValueClean = ( $num = intval( $clpFieldValueClean ) ) > 0 ? $num : "NULL";
    break;

    case "cppm":
    $clpFieldValueClean = strstr( $plFieldValueClean, "i" ) ? substr( $plFieldValueClean, 0, strpos( $plFieldValueClean, "i" ) ) : substr( $plFieldValueClean, 0, strpos( $plFieldValueClean, "p" ) );
    $clpFieldValueClean = ( $num = intval( $clpFieldValueClean ) ) > 0 ? $num : "NULL";
    break;

    case "resolution":
    $clpFieldValueClean = substr( $plFieldValueClean, 0, strpos( $plFieldValueClean, "dpi" ) );
    break;

    case "ram":
    $ramValues = explode( " ", $plFieldValueClean );
    $clpFieldValueClean = str_replace( array( "MB", "GB", "KB" ), "", $ramValues[ 0 ] );
    break;

    case "maxram":
    $clpFieldValueClean = $plFieldValueClean;
    break;

    case "ethernet":
    $clpFieldValueClean = !strstr( $plFieldValueClean, "Yes") ? "NULL" : "Yes";
    break;

    case "usb":
    $clpFieldValueClean = strlen ( $plFieldValueClean ) > 1 ? "Yes" : "NULL";
    break;

    case "firstprint":
    $clpFieldValueClean = explode( " ", $plFieldValueClean );
    $clpFieldValueClean = $clpFieldValueClean[ 0 ];
    break;

    case "parallel":
    $clpFieldValueClean = strstr( $plFieldValueClean, "Parallel" ) ? "Yes" : "NULL";
    break;

    case "duplex":
    $clpFieldValueClean = strlen( $plFieldValueClean ) < 2 ? "Manual" : $plFieldValueClean;
    break;

    case "printmethod":
    $clpFieldValueClean = preg_replace( "/ Printer/", "", $plFieldValueClean );
    $clpFieldValueClean = preg_replace( "/ Fax/", "", $clpFieldValueClean );
    break;

    case "category":
    if ( strstr( $plFieldValueClean, "Multifunction" ) )
    $clpFieldValueClean = "Multifunction";
    elseif ( strstr( $plFieldValueClean, "Laser" ) )
    $clpFieldValueClean = strstr( $plFieldValueClean, "Colour" ) ? "Colour Laser" : "Mono Laser";
    elseif ( strstr( $plFieldValueClean, "Fax" ) )
    $clpFieldValueClean = "Fax";
    elseif ( strstr( $plFieldValueClean, "Dot Matrix" ) )
    $clpFieldValueClean = "Dot Matrix";
    elseif ( strstr( $plFieldValueClean, "Inkjet" ) )
    $clpFieldValueClean = "Inkjet";
    elseif ( strstr( $plFieldValueClean, "Label" ) )
    $clpFieldValueClean = "Label";
    elseif ( strstr( $plFieldValueClean, "Scanner" ) )
    $clpFieldValueClean = "Scanner";
    elseif ( strstr( $plFieldValueClean, "Thermal" ) )
    $clpFieldValueClean = "Thermal";
    break;

    case "description":
    $clpFieldValueClean = $plFieldValueClean;
    break;

    case "rrp":
    $clpFieldValueClean = strstr( $plFieldValueClean, "" );
    $clpFieldValueClean = preg_replace( "//", "", $clpFieldValueClean );
    break;

    case "paper":
    $clpFieldValueClean = $plFieldValueClean;
    break;

    case "additional":
    $clpFieldValueClean = $plFieldValueClean;
    break;

    case "offertext":
    if ( $plFieldValueClean == "" )
    $clpFieldValueClean = "New Low price On This Printer";
    else
    {
    $clpFieldValueClean = str_replace( "", "pound;", $plFieldValueClean );
    $clpFieldValueClean = str_replace( "&", "amp;", $clpFieldValueClean );
    $clpFieldValueClean = str_replace( "pound;", "&pound;", $clpFieldValueClean );
    $clpFieldValueClean = str_replace( "amp;", "&amp;", $clpFieldValueClean );

    $clpFieldValueClean .= " <br /><br />Offer ends";
    }
    break;

    case "specialhead":
    if ( $plFieldValueClean == "" )
    $clpFieldValueClean = "New Low price On This Printer";
    else
    {
    $clpFieldValueClean = str_replace( "", "pound;", $plFieldValueClean);
    $clpFieldValueClean = str_replace( "&", "amp;", $clpFieldValueClean );
    $clpFieldValueClean = str_replace( "pound;", "&pound;", $clpFieldValueClean );
    $clpFieldValueClean = str_replace( "amp;", "&amp;", $clpFieldValueClean );
    }
    break;

    default:
    break;
    }

    $clpFieldValueClean = preg_replace( "/\"/", "", $clpFieldValueClean );

    if($_ECHO) echo "$CLPField=>$clpFieldValueClean<br />";

    return( trim( $clpFieldValueClean ) );
    }

    //////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    //
    // fetchPrinterDetails()
    //
    /////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    function fetchPrinterDetails( $printerURL )
    {
    global $html;
    global $outputFileName;
    global $_ECHO;

    $line = "";
    $fileContents = file_get_contents( $printerURL );

    if($_ECHO) echo "Scraping details started for " . $printerURL . "<br />";

    $html->load( $fileContents );

    $stop = FALSE;
    $metas = $html->find( "meta[name=Keywords]" );

    if ( isset( $metas[ 0 ] ) )
    {
    $stop = strstr( strtoupper( $metas[ 0 ]->content ), "EXDEMO" ) ||
    strstr( strtoupper( $metas[ 0 ]->content ), "BOXOPEN" )||
    strstr( strtoupper( $metas[ 0 ]->content ), "BOX OPEN" )||
    strstr( strtoupper( $metas[ 0 ]->content ), "DISCONTINUED" );

    $stop = $stop ? $stop : !( strstr( strtoupper( $metas[ 0 ]->content ), "PRINTER" ) || strstr( strtoupper( $metas[ 0 ]->content ), "FAX" ) );
    $stop = $stop ? $stop : strstr( strtoupper( $metas[ 0 ]->content ), "ACCESSORIES" );
    }
    else
    if ( $_ECHO ) echo "<meta> tag NOT FOUND<br />";

    // Dont bother with non-current printers
    if ( $stop )
    {
    if ( $_ECHO ) echo "Ignoring $printerURL<br />";
    return;
    }

    $DBFields = array
    (
    "productid" => "#ctl00_placeholderMain_lblItem",
    "name" => "#ctl00_placeholderMain_lblProductHead",
    "manufacturer" => "h1",
    "format" => "@Product Group Output",
    "platform" => "@Operating Systems Supported",
    "height" => "NULL",
    "width" => "NULL",
    "depth" => "NULL",
    "weight" => "NULL",
    "bwppm" => "@Speed Monochrome",
    "cppm" => "@Speed Colour",
    "resolution" => "Printer Resolution@Printer Enhanced Resolution",
    "ram" => "@Memory (Maximum)",
    "maxram" => "NULL",
    "ethernet" => "@Network Ready",
    "parallel" => "@Interface Type(s)",
    "usb" => "USB Port@USB Ports",
    "firstprint" => "First Page@Print First Page",
    "warmupprint" => "NULL",
    "duplex" => "@Double Sided Printing",
    "printmethod" => "@Technology",
    "relability" => "NULL",
    "standby" => "NULL",
    "running" => "NULL",
    "category" => "h1",
    "description" => ".productdescriptioncontainer",
    "rrp" => "#ctl00_placeholderMain_lbltxtProductPrice",
    "printspeed" => "NULL",
    "large" => "NULL",
    "discont" => "NULL",
    "pdf" => "DEFAULT=1",
    "paper" => "@Paper Handling Input 1",
    "multi" => "NULL",
    "additional" => "@Paper Handling Input 2",
    "CPppma3" => "NULL",
    "CPppm" => "NULL",
    "CPram" => "NULL",
    "CPmaxram" => "NULL",
    "CPresolution" => "NULL",
    "Fmodem" => "NULL",
    "Fresolution" => "NULL",
    "Fcompatability" => "NULL",
    "Fram" => "NULL",
    "Fmaxram" => "NULL",
    "SCspeed" => "NULL",
    "SCresolution" => "NULL",
    "SCmodes" => "NULL",
    "specialid" => "DEFAULT=1",
    "offertext" => "#ctl00_placeholderMain_lblMareketingText",
    "image" => "NULL",
    "promo" => "NULL",
    "metatag" => "NULL",
    "metadescrip" => "NULL",
    "pricerunner" => "NULL",
    "google" => "NULL",
    "offerdate" => "NULL",
    "specialtext" => "NULL",
    "specialhead" => "#ctl00_placeholderMain_lblMareketingText"
    );

    foreach( $DBFields as $CLPField => $PLField )
    { //echo $PLField;
    if ( $PLField == "NULL" )
    $line .= '"' . trim( $PLField ) . '",';
    elseif ( strstr( $PLField, "DEFAULT=" ) )
    $line .= '"' . str_replace( "DEFAULT=", "", $PLField ) . '",';
    else
    {
    // This is a Spec field so we will need to work out which one
    if ( strstr( $PLField, "@" ) != FALSE )
    {
    // Get all the spec titles
    $specTitles = $html->find( ".specleftitem" );

    // Look for the field title
    if ( isset( $specTitles[ 0 ] ) )
    {
    $clpFieldValue = "NULL";
    $possFields = explode( "@", $PLField );

    // Loop thru all spec items
    foreach( $specTitles as $specTitle )
    {
    // Check all poss fields for a match
    foreach( $possFields as $possField )
    {
    if ( trim( $specTitle->plaintext ) == $possField )
    {
    $clpFieldValue = $specTitle->next_sibling()->plaintext;
    $line .= '"' . cleanDBField( $CLPField, $clpFieldValue ) . '",';
    break;
    }
    }

    if ( $clpFieldValue != "NULL" )
    break;
    }

    if ( $clpFieldValue == "NULL" )
    $line .= '"' . $clpFieldValue . '",';
    }
    }
    else
    {
    $plFieldValue = $html->find( $PLField );

    // Found the field in the PL page ?
    if ( isset( $plFieldValue[ 0 ] ) )
    {
    $clpFieldValue = $plFieldValue[ 0 ]->plaintext;
    $line .= '"' . cleanDBField( $CLPField, $clpFieldValue ) . '",';
    }
    else
    {
    $line .= '"NULL",';
    }
    echo $PLField;
    }
    }
    }

    $line = preg_replace( "/,$/", "\n", trim( $line ) );

    $fp = fopen( $outputFileName, "a" );
    fputs( $fp, $line );
    fclose( $fp );
    //echo "stop:". $stop;
    if($_ECHO) echo "Scraping details completed for " . $printerURL . "<br />";
    }

    //////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    //
    // scrapePrinters()
    //
    /////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    function scrapePrinters()
    {
    global $printerListFileName;
    global $outputFileName;
    global $_ECHO;

    set_time_limit( 0 );

    if ( file_exists( $printerListFileName ) == FALSE )
    {
    if($_ECHO) echo "Cannnot find $printerListFileName so quitting...<br />";
    exit(0);
    }

    if($_ECHO) echo "Deleting existing $outputFileName file...<br />";

    if ( file_exists( $outputFileName ) == TRUE )
    unlink( $outputFileName );

    // List of PL printers taken from sitemap page of PL website
    $fp = fopen( $printerListFileName, "r" );

    if($_ECHO) echo "Fetching printer details started...<br />";

    while ( $printerURL = fgets( $fp ) )
    fetchPrinterDetails( trim( $printerURL ) );

    if($_ECHO) echo "Fetching printer details completed...<br />";

    fclose( $fp );
    }

    scrapePrinters();

    function test()
    {
    global $html;

    $files = array( "OKI-C810n-Box-Opened--P110692.aspx", "HP-1320-P4453.aspx", "Waste-Toner-Cleaner-Pack-12-000-Pages--P48796.aspx", "Lexmark-C543dn-P6117.aspx", "Brother-FAX-T104-P11767.aspx", "Black-Toner-3500-pages--P110364.aspx", "Lexmark-X544dn-P9732.aspx", "EB-05-IEEE-1394-Expansion-Board-P30721.aspx", "Kodak-Photo-Paper-Gloss-A4-210-x-297mm-20-Sheets-165gsm--P13998.aspx", "Xerox-7600-P13571.aspx" );

    echo "STARTING...<br />";

    foreach( $files as $file )
    {
    fetchPrinterDetails( trim( $file ) );
    }

    echo "DONE...<br />";
    }

    //test();
    ?>

    [/PHPNET]
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,497
    Rep Power
    594
    Please edit your post and use PHP tags per ManiacDan's New UserGuide. Your code is very difficult to read and those tags will format it for you. That guide will also give you some debugging tips so you can diagnose your own code and it also addresses the issues you are having.
    There are 10 kinds of people in the world. Those that understand binary and those that don't.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2012
    Posts
    3
    Rep Power
    0

    Lightbulb


    PHP Code:
    <?php

        
    require_once 'simplehtmldom/simple_html_dom.php';
        
        
    $_ECHO FALSE;
        
    $html = new simple_html_dom();
        
    $printerListFileName "pl_printer_list.txt";
        
    $outputFileName "pl_printers_new.txt";

        
    //////////////////////////////////////////////////////////////////////////////////////////////////////////////////
        //
        //        cleanBadChars()
        //
        /////////////////////////////////////////////////////////////////////////////////////////////////////////////////
        
    function cleanBadChars$plFieldValue )
        {
            
    $badChars = array('''''''''''');
            
            
    $cleanFieldValue str_replace$badChars""$plFieldValue );

            return( 
    trim$cleanFieldValue ) );
        }
        
        
    //////////////////////////////////////////////////////////////////////////////////////////////////////////////////
        //
        //        cleanDBField()
        //
        /////////////////////////////////////////////////////////////////////////////////////////////////////////////////
        
    function cleanDBField$CLPField$plFieldValueDirty )
        {
            global 
    $_ECHO;
            
            if(
    $_ECHO) echo "$CLPField=>$plFieldValueDirty<br />";

            
    $plFieldValueClean cleanBadChars$plFieldValueDirty );
            
            switch( 
    $CLPField )
            {
                case 
    "productid":
                    
    $clpFieldValueClean preg_replace"/Item Number: &nbsp;/"""$plFieldValueClean );
                    
    $clpFieldValueClean str_replace"/""_"$clpFieldValueClean );
                break;

                case 
    "name":
                    
    $clpFieldValueClean trimstrstr$plFieldValueClean" " ) );
                    
    $clpFieldValueClean strstr$clpFieldValueClean"Minolta" ) ? strstr$clpFieldValueClean" " ) : $clpFieldValueClean;
                break;

                case 
    "manufacturer":
                    
    $clpFieldValueClean ucfirststrtolowersubstr$plFieldValueClean0strpos$plFieldValueClean" " ) ) ) );
                    
    $clpFieldValueClean strstr$clpFieldValueClean"Konica" ) ? $clpFieldValueClean " Minolta" $clpFieldValueClean;
                    
    $clpFieldValueClean strstr$clpFieldValueClean"Hp" ) ? "HP" $clpFieldValueClean;
                break;

                case 
    "format":
                    
    $clpFieldValueClean $plFieldValueClean;
                break;
                
                case 
    "platform":
                    
    $arr = array();
                    
    $arr[] = strstr$plFieldValueClean"Windows" ) ? "Windows" "";
                    
    $arr[] = strstr$plFieldValueClean"Mac" ) ? "Mac" "";
                    
    $clpFieldValueClean count$arr ) == implode" / "$arr ) : implode""$arr );
                break;
                
                case 
    "bwppm":
                    
    $clpFieldValueClean strstr$plFieldValueClean"i" ) ? substr$plFieldValueClean0strpos$plFieldValueClean"i" ) ) : substr$plFieldValueClean0strpos$plFieldValueClean"p" ) );
                    
    $clpFieldValueClean = ( $num intval$clpFieldValueClean ) ) > $num "NULL";
                break;
                
                case 
    "cppm":
                    
    $clpFieldValueClean strstr$plFieldValueClean"i" ) ? substr$plFieldValueClean0strpos$plFieldValueClean"i" ) ) : substr$plFieldValueClean0strpos$plFieldValueClean"p" ) );
                    
    $clpFieldValueClean = ( $num intval$clpFieldValueClean ) ) > $num "NULL";
                break;
                
                case 
    "resolution":
                    
    $clpFieldValueClean substr$plFieldValueClean0strpos$plFieldValueClean"dpi"  ) );
                break;
                
                case 
    "ram":
                    
    $ramValues explode" "$plFieldValueClean );
                    
    $clpFieldValueClean str_replace( array( "MB""GB""KB" ), ""$ramValues] );
                break;

                case 
    "maxram":
                    
    $clpFieldValueClean $plFieldValueClean;
                break;

                case 
    "ethernet":
                    
    $clpFieldValueClean = !strstr$plFieldValueClean"Yes") ? "NULL" "Yes";
                break;
                
                case 
    "usb":
                    
    $clpFieldValueClean strlen $plFieldValueClean ) > "Yes" "NULL";
                break;
                
                case 
    "firstprint":
                    
    $clpFieldValueClean explode" ",  $plFieldValueClean );
                    
    $clpFieldValueClean $clpFieldValueClean];
                break;
        
                case 
    "parallel":
                    
    $clpFieldValueClean strstr$plFieldValueClean"Parallel" ) ? "Yes" "NULL";
                break;

                case 
    "duplex":
                    
    $clpFieldValueClean strlen$plFieldValueClean ) < "Manual" $plFieldValueClean;
                break;

                case 
    "printmethod":
                    
    $clpFieldValueClean preg_replace"/ Printer/"""$plFieldValueClean );
                    
    $clpFieldValueClean preg_replace"/ Fax/"""$clpFieldValueClean );
                break;
            
                case 
    "category":
                    if ( 
    strstr$plFieldValueClean"Multifunction" ) )
                        
    $clpFieldValueClean "Multifunction";
                    elseif ( 
    strstr$plFieldValueClean"Laser" ) )
                        
    $clpFieldValueClean strstr$plFieldValueClean"Colour" ) ? "Colour Laser" "Mono Laser";
                    elseif ( 
    strstr$plFieldValueClean"Fax" ) )
                        
    $clpFieldValueClean "Fax";
                    elseif ( 
    strstr$plFieldValueClean"Dot Matrix" ) )
                        
    $clpFieldValueClean "Dot Matrix";
                    elseif ( 
    strstr$plFieldValueClean"Inkjet" ) )
                        
    $clpFieldValueClean "Inkjet";
                    elseif ( 
    strstr$plFieldValueClean"Label" ) )
                        
    $clpFieldValueClean "Label";
                    elseif ( 
    strstr$plFieldValueClean"Scanner" ) )
                        
    $clpFieldValueClean "Scanner";
                    elseif ( 
    strstr$plFieldValueClean"Thermal" ) )
                        
    $clpFieldValueClean "Thermal";
                break;
                
                case 
    "description":
                    
    $clpFieldValueClean $plFieldValueClean;
                break;
                
                case 
    "rrp":
                    
    $clpFieldValueClean strstr$plFieldValueClean"" );
                    
    $clpFieldValueClean preg_replace"//"""$clpFieldValueClean );
                break;

                case 
    "paper":
                    
    $clpFieldValueClean $plFieldValueClean;
                break;
                
                case 
    "additional":
                    
    $clpFieldValueClean $plFieldValueClean;
                break;
        
                case 
    "offertext":
                    if ( 
    $plFieldValueClean == "" )
                        
    $clpFieldValueClean "New Low price On This Printer";
                    else
                    {                    
                        
    $clpFieldValueClean str_replace"""pound;"$plFieldValueClean );
                        
    $clpFieldValueClean str_replace"&""amp;"$clpFieldValueClean );            
                        
    $clpFieldValueClean str_replace"pound;""&pound;"$clpFieldValueClean );
                        
    $clpFieldValueClean str_replace"amp;""&amp;"$clpFieldValueClean );            
                        
                        
    $clpFieldValueClean .= " <br /><br />Offer ends";
                    }
                break;
            
                case 
    "specialhead":
                    if ( 
    $plFieldValueClean == "" )
                        
    $clpFieldValueClean "New Low price On This Printer";
                    else
                    {                    
                        
    $clpFieldValueClean str_replace"""pound;"$plFieldValueClean);
                        
    $clpFieldValueClean str_replace"&""amp;"$clpFieldValueClean );            
                        
    $clpFieldValueClean str_replace"pound;""&pound;"$clpFieldValueClean );
                        
    $clpFieldValueClean str_replace"amp;""&amp;"$clpFieldValueClean );
                    }
                break;
                
                default:
                break;
            }
            
            
    $clpFieldValueClean preg_replace"/\"/"""$clpFieldValueClean );

            if(
    $_ECHO) echo "$CLPField=>$clpFieldValueClean<br />";

            return( 
    trim$clpFieldValueClean ) );    
        }
        
        
    //////////////////////////////////////////////////////////////////////////////////////////////////////////////////
        //
        //        fetchPrinterDetails()
        //
        /////////////////////////////////////////////////////////////////////////////////////////////////////////////////
        
    function fetchPrinterDetails$printerURL )
        {
            global 
    $html;
            global 
    $outputFileName;
            global 
    $_ECHO;
            
            
    $line "";
            
    $fileContents file_get_contents$printerURL );
            
            if(
    $_ECHO) echo "Scraping details started for " $printerURL "<br />";
            
            
    $html->load$fileContents );

            
    $stop FALSE;
            
    $metas $html->find"meta[name=Keywords]" );
            
            if ( isset( 
    $metas] ) )
            {
                
    $stop =     strstrstrtoupper$metas]->content ), "EXDEMO" ) || 
                            
    strstrstrtoupper$metas]->content ), "BOXOPEN" )|| 
                            
    strstrstrtoupper$metas]->content ), "BOX OPEN" )|| 
                            
    strstrstrtoupper$metas]->content ), "DISCONTINUED" );
                
                
    $stop $stop $stop : !( strstrstrtoupper$metas]->content ), "PRINTER" ) || strstrstrtoupper$metas]->content ), "FAX" ) );
                
    $stop $stop $stop :    strstrstrtoupper$metas]->content ), "ACCESSORIES" );
            }
            else
                if ( 
    $_ECHO ) echo "<meta> tag NOT FOUND<br />";

            
    // Dont bother with non-current  printers 
            
    if ( $stop )
            {
                if ( 
    $_ECHO ) echo "Ignoring $printerURL<br />";
                return;
            }
            
            
    $DBFields = array 
            ( 
                
    "productid" => "#ctl00_placeholderMain_lblItem"
                
    "name" => "#ctl00_placeholderMain_lblProductHead"
                
    "manufacturer" => "h1"
                
    "format" => "@Product Group Output",
                
    "platform" => "@Operating Systems Supported"
                
    "height" => "NULL"
                
    "width" => "NULL"
                
    "depth" => "NULL"
                
    "weight" => "NULL"
                
    "bwppm" => "@Speed Monochrome"
                
    "cppm" => "@Speed Colour",
                
    "resolution" => "Printer Resolution@Printer Enhanced Resolution",
                
    "ram" => "@Memory (Maximum)",
                
    "maxram" => "NULL",
                
    "ethernet" => "@Network Ready",
                
    "parallel" => "@Interface Type(s)"
                
    "usb" => "USB Port@USB Ports"
                
    "firstprint" => "First Page@Print First Page"
                
    "warmupprint" => "NULL"
                
    "duplex" => "@Double Sided Printing"
                
    "printmethod" => "@Technology"
                
    "relability" => "NULL"
                
    "standby" => "NULL"
                
    "running" => "NULL"
                
    "category" => "h1"
                
    "description" => ".productdescriptioncontainer"
                
    "rrp" => "#ctl00_placeholderMain_lbltxtProductPrice"
                
    "printspeed" => "NULL"
                
    "large" => "NULL"
                
    "discont" => "NULL"
                
    "pdf" => "DEFAULT=1"
                
    "paper" => "@Paper Handling Input 1"
                
    "multi" => "NULL"
                
    "additional" => "@Paper Handling Input 2"
                
    "CPppma3" => "NULL"
                
    "CPppm" => "NULL"
                
    "CPram" => "NULL"
                
    "CPmaxram" => "NULL"
                
    "CPresolution" => "NULL"
                
    "Fmodem" => "NULL"
                
    "Fresolution" => "NULL"
                
    "Fcompatability" => "NULL"
                
    "Fram" => "NULL"
                
    "Fmaxram" => "NULL"
                
    "SCspeed" => "NULL"
                
    "SCresolution" => "NULL"
                
    "SCmodes" => "NULL"
                
    "specialid" => "DEFAULT=1"
                
    "offertext" => "#ctl00_placeholderMain_lblMareketingText"
                
    "image" => "NULL"
                
    "promo" => "NULL"
                
    "metatag" => "NULL"
                
    "metadescrip" => "NULL"
                
    "pricerunner" => "NULL"
                
    "google" => "NULL"
                
    "offerdate" => "NULL"
                
    "specialtext" => "NULL"
                
    "specialhead" => "#ctl00_placeholderMain_lblMareketingText" 
            
    );

            foreach( 
    $DBFields as $CLPField => $PLField )
            { 
    //echo $PLField;
                
    if ( $PLField == "NULL" )
                    
    $line .= '"' trim$PLField ) . '",';
                elseif ( 
    strstr$PLField"DEFAULT=" ) )
                    
    $line .= '"' str_replace"DEFAULT="""$PLField ) . '",';
                else
                {
                    
    // This is a Spec field so we will need to work out which one
                    
    if ( strstr$PLField"@" ) != FALSE )
                    {
                        
    // Get all the spec titles
                        
    $specTitles $html->find".specleftitem" );
                        
                        
    // Look for the field title
                        
    if ( isset( $specTitles] ) )
                        {
                            
    $clpFieldValue "NULL";
                            
    $possFields explode"@"$PLField );

                            
    // Loop thru all spec items
                            
    foreach( $specTitles as $specTitle )
                            {
                                
    // Check all poss fields for a match
                                
    foreach( $possFields as $possField )
                                {
                                    if ( 
    trim$specTitle->plaintext ) == $possField )
                                    {
                                        
    $clpFieldValue $specTitle->next_sibling()->plaintext;
                                        
    $line .= '"' cleanDBField$CLPField$clpFieldValue ) . '",';
                                        break;
                                    }
                                }
                                
                                if ( 
    $clpFieldValue != "NULL" )
                                    break;
                            }
                            
                            if ( 
    $clpFieldValue == "NULL" )
                                
    $line .= '"' $clpFieldValue '",';
                        }
                    }
                    else
                    {
                        
    $plFieldValue $html->find$PLField );
                        
                        
    // Found the field in the PL page ?
                        
    if ( isset( $plFieldValue] ) )
                        {
                            
    $clpFieldValue $plFieldValue]->plaintext;
                            
    $line .= '"' cleanDBField$CLPField$clpFieldValue ) . '",';
                        }
                        else
                        {
                            
    $line .= '"NULL",';
                        }
                        echo 
    $PLField;
                    }
                }
            }
            
            
    $line preg_replace"/,$/""\n"trim$line ) );

            
    $fp fopen$outputFileName"a" );
            
    fputs$fp$line );
            
    fclose$fp );
            
    //echo "stop:". $stop;
            
    if($_ECHO) echo "Scraping details completed for " $printerURL "<br />";
        }

        
    //////////////////////////////////////////////////////////////////////////////////////////////////////////////////
        //
        //        scrapePrinters()
        //
        /////////////////////////////////////////////////////////////////////////////////////////////////////////////////
        
    function scrapePrinters()
        {
            global 
    $printerListFileName;
            global 
    $outputFileName;
            global 
    $_ECHO;
        
            
    set_time_limit);

            if ( 
    file_exists$printerListFileName ) == FALSE )
            {
                if(
    $_ECHO) echo "Cannnot find $printerListFileName so quitting...<br />";
                exit(
    0);
            }

            if(
    $_ECHO) echo "Deleting existing $outputFileName file...<br />";
            
            if ( 
    file_exists$outputFileName ) == TRUE )
                
    unlink$outputFileName );
            
            
    // List of PL printers taken from sitemap page of PL website
            
    $fp fopen$printerListFileName"r" );
        
            if(
    $_ECHO) echo "Fetching printer details started...<br />";
            
            while ( 
    $printerURL fgets$fp ) )    
                
    fetchPrinterDetailstrim$printerURL ) );
                
            if(
    $_ECHO) echo "Fetching printer details completed...<br />";
            
            
    fclose$fp );
        }

        
    scrapePrinters();
        
        function 
    test()
        {
            global 
    $html;
            
            
    $files = array( "OKI-C810n-Box-Opened--P110692.aspx""HP-1320-P4453.aspx""Waste-Toner-Cleaner-Pack-12-000-Pages--P48796.aspx""Lexmark-C543dn-P6117.aspx""Brother-FAX-T104-P11767.aspx""Black-Toner-3500-pages--P110364.aspx""Lexmark-X544dn-P9732.aspx""EB-05-IEEE-1394-Expansion-Board-P30721.aspx""Kodak-Photo-Paper-Gloss-A4-210-x-297mm-20-Sheets-165gsm--P13998.aspx""Xerox-7600-P13571.aspx" );

            echo 
    "STARTING...<br />";
            
            foreach( 
    $files as $file )
            {
                
    fetchPrinterDetailstrim$file ) );
            }
            
            echo 
    "DONE...<br />";
        }

        
    //test();
    ?>
  6. #4
  7. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,908
    Rep Power
    6351
    So...your question is "at some point in this 200 line script something goes wrong in some way, tell me why"? Debug this a bit on your own, narrow down the problem.

    Comments on this post

    • ptr2void agrees : WTF is WITH these people?
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2012
    Posts
    3
    Rep Power
    0
    I tired to but no luck.

    On line 200 is

    PHP Code:
    $fileContents file_get_contents$printerURL ); 
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,497
    Rep Power
    594
    So what debugging steps have you taken? What is the error? Did you even bother to read the new user guide?
    There are 10 kinds of people in the world. Those that understand binary and those that don't.
  12. #7
  13. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,908
    Rep Power
    6351
    I didn't mean "show me line 200," I meant "I'm not reading 200 lines without any context or any ability to run them myself."

    There are debugging steps in the new user guide. In order to be a good programmer (or even a decent programmer) you must be able to at least narrow down the problem. All we know is that your entire program isn't working. Narrow this problem description down.
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.

IMN logo majestic logo threadwatch logo seochat tools logo