#1
  1. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2017
    Posts
    845
    Rep Power
    0

    Need Php Code Explanation


    Php Lovers,

    I opened this thread to paste codes from tutorials that I do not understand so you nice folks can clear things out.
    Look at this code from here:
    Create Simple Web Crawler Using PHP And MySQL

    PHP Code:
    // Database Structure 
    CREATE TABLE 'webpage_details' (
     'link' text NOT NULL,
     'title' text NOT NULL,
     'description' text NOT NULL,
     'internal_link' text NOT NULL,
    ) ENGINE=MyISAM AUTO_INCREMENT=5 DEFAULT CHARSET=latin1

    <?php
     $main_url
    ="http://samplesite.com";
     
    $str file_get_contents($main_url);
     
     
    // Gets Webpage Title
     
    if(strlen($str)>0)
     {
      
    $str trim(preg_replace('/\s+/'' '$str)); // supports line breaks inside <title>
      
    preg_match("/\<title\>(.*)\<\/title\>/i",$str,$title); // ignore case
      
    $title=$title[1];
     }
        
     
    // Gets Webpage Description
     
    $b =$main_url;
     @
    $url parse_url$b );
     @
    $tags get_meta_tags($url['scheme'].'://'.$url['host'] );
     
    $description=$tags['description'];
        
     
    // Gets Webpage Internal Links
     
    $doc = new DOMDocument
     @
    $doc->loadHTML($str); 
     
     
    $items $doc->getElementsByTagName('a'); 
     foreach(
    $items as $value
     { 
      
    $attrs $value->attributes
      
    $sec_url[]=$attrs->getNamedItem('href')->nodeValue;
     }
     
    $all_links=implode(",",$sec_url);
     
     
    // Store Data In Database
     
    $host="localhost";
     
    $username="root";
     
    $password="";
     
    $databasename="sample";
     
    $connect=mysql_connect($host,$username,$password);
     
    $db=mysql_select_db($databasename);

     
    mysql_query("insert into webpage_details values('$main_url','$title','$description','$all_links')");

    ?>
    Now, I need to learn what these lines mean and so care to explain ?
    PHP Code:
    // Gets Webpage Title
     
    if(strlen($str)>0)
     {
      
    $str trim(preg_replace('/\s+/'' '$str)); // supports line breaks inside <title>
      
    preg_match("/\<title\>(.*)\<\/title\>/i",$str,$title); // ignore case
      
    $title=$title[1];
     } 
    Q1. I do no understand the BOLD part. What is this array $title and the key "1" doing here ?
    // Gets Webpage Title
    if(strlen($str)>0)
    {
    $str = trim(preg_replace('/\s+/', ' ', $str)); // supports line breaks inside <title>
    preg_match("/\<title\>(.*)\<\/title\>/i",$str,$title); // ignore case
    $title=$title[1];
    }


    Q2.
    PHP Code:
    // Gets Webpage Internal Links
     
    $doc = new DOMDocument
     @
    $doc->loadHTML($str); 
    Q2a. Is this line a reference like a pointer to the DOMDocument ?
    $doc = new DOMDocument;

    Q2b. What is this line saying ?
    @$doc->loadHTML($str);
    Is it saying: Load the webpage's html and extract the links. And, find the code (regex and all) to extract the links from the the DOMDocument file.
    It is similar to the "include". Right ?

    If I understood wrong then care to explain these 2 lines as I don't understand them.


    Q3. The following line only extracts the content from the <meta description>:
    PHP Code:
    @$tags get_meta_tags($url['scheme'].'://'.$url['host'] ); 
    What if I want to extract the content from the <meta keywords> tag aswell ? How do I change this line ? To what do I change it to ? We need to see a sample snippet from you for our learning purpose.


    Thank You!
    Last edited by UniqueIdeaMan; May 23rd, 2018 at 01:59 PM.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2017
    Posts
    845
    Rep Power
    0
    Php Lovers,

    I have more questions regarding the code mentioned on my above post.
    I get errors and need explaining to understand them. And need answers on how to weed them out.

    Warning: file_get_contents(): SSL operation failed with code 1. OpenSSL Error messages: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed in C:\xampp\htdocs\cURL\crawler_test.php on line 6
    Why did file_get_contents fail to open a secured page (https) ?

    Warning: file_get_contents(): Failed to enable crypto in C:\xampp\htdocs\cURL\crawler_test.php on line 6
    Failed to enable crypto ? What crypto ? Does it mean it failed to open the decrypotor to decrypt the decrypted content on the webpage ? Yes or no ? if so, then what does it mean ?

    Warning: file_get_contents(https://developers.google.com/youtube/): failed to open stream: operation failed in C:\xampp\htdocs\cURL\crawler_test.php on line 6
    Failed to open stream ? What stream ? This is not a streaming video file.


    Notice: Undefined variable: sec_url in C:\xampp\htdocs\cURL\crawler_test.php on line 32
    Ok. I understand the variable has not been defined. But, I don't know to what i should define it to. So, you care to give the definition and tell me on which line I should insert it to ?

    Warning: implode(): Invalid arguments passed in C:\xampp\htdocs\cURL\crawler_test.php on line 32
    PHP: implode - Manual
    According to the manual, the implode has 2 params and the first tutorial param is double quoted.
    The tutorial code looks like this and looks fine to me:
    PHP Code:
    $all_links=implode(",",$sec_url); 
    What is wrong with it ?

    Fatal error: Uncaught Error: Call to undefined function mysql_query() in C:\xampp\htdocs\cURL\crawler_test.php:42 Stack trace: #0 {main} thrown in C:\xampp\htdocs\cURL\crawler_test.php on line 42
    I believe the mysql_query function is deprecated. Hence this confusing error. Right ? What is all this "stack trace" message ?


    Don't forget to address my above post.

    Thanks!
  4. #3
  5. Banned (not really)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 1999
    Location
    Caro, Michigan
    Posts
    14,961
    Rep Power
    4575
    lol
    -- Cigars, whiskey and wild, wild women. --
  6. #4
  7. Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2006
    Posts
    2,682
    Rep Power
    1841
    Oh boy ...!
    I may be wrong, 'cos let's face it, I do NOT do php, but ... all those preg_ stuff is using regular expressions (regexes). and those, to cheapen them greatly, look for patterns within a supplied string and return the supplied string broken down into component parts based on the pattern(s) used. The preg_replace is changing occurences of the "special expression" s+ with a single space. From memory of other languages s+ means, in effect, one or more whitespace characters, and if a linebreak is seen as a whitespace character then that will also 'hoover' all of those up and make the string supplied, in effect, a 'single line'. The preg_match then takes that and looks for a pattern that should identify the tags specifying the start and end of the title definition, looking for optional (the brackets identify an optional portion) of any characters - shown by the .*. What it does is return all the bits of that in an array - with element 1 being that 'lump' of text between the delimiters - hence, by definition, the actual title.
    Last edited by SimonJM; May 24th, 2018 at 05:34 PM. Reason: spelling mistakes
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc
  8. #5
  9. Banned (not really)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 1999
    Location
    Caro, Michigan
    Posts
    14,961
    Rep Power
    4575
    What is this array $title and the key "1" doing here ?
    It's set in the preg_match() line above it and contains the matches. [1] is the second value in the array (zero being the first).

    Is this line a reference like a pointer to the DOMDocument ?
    No, it's creating a new object.

    What is this line saying ?
    It's calling the loadHTML() method of the object.

    It is similar to the "include". Right ?
    No.

    The following line only extracts the content from the <meta description>:
    Wrong. It extracts all named meta tags.
    -- Cigars, whiskey and wild, wild women. --
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2017
    Posts
    845
    Rep Power
    0
    Originally Posted by Sepodati
    It's set in the preg_match() line above it and contains the matches. [1] is the second value in the array (zero being the first).



    No, it's creating a new object.



    It's calling the loadHTML() method of the object.



    No.



    Wrong. It extracts all named meta tags.
    Thank You!

    You should do this more often.

IMN logo majestic logo threadwatch logo seochat tools logo