Php Folks,
My following 3 codes work.
The final two are the same except the latter one suppresses the WARNINGS.
Now, the difference between the first one and the second one is that the first one uses file_get_html() function while the second one uses cURL (aswell as the third one).
Anyway, trying to combine the two codes for some reasons which I will get into once this problem is solved.
I get error:
Fatal error: Uncaught Error: Call to a member function find() on string in C:\xampp\htdocs\cURL\crawler.php:24 Stack trace: #0 {main} thrown in C:\xampp\htdocs\cURL\crawler.php on line 24
WORKING CODE 1
PHP Code:
FINDING HTML ELEMENTS BASED ON THEIR TAG NAMES
Suppose you wanted to find each and every image on a webpage or say, each and every hyperlink.
We will be using “find” function to extract this information from the object. Here’s how to do it using Simple HTML DOM Parser :
*/
include('simple_html_dom.php');
$html = file_get_html('http://google.com');
//to fetch all hyperlinks from a webpage
$links = array();
foreach($html->find('a') as $a) {
$links[] = $a->href;
}
print_r($links);
echo "<br />";
//to fetch all images from a webpage
$images = array();
foreach($html->find('img') as $img) {
$images[] = $img->src;
}
print_r($images);
echo "<br />";
//to find h1 headers from a webpage
$headlines = array();
foreach($html->find('h1') as $header) {
$headlines[] = $header->plaintext;
}
print_r($headlines);
echo "<br />";
?>
WORKING CODE 2
PHP Code:
<?php
/*
2a. Scrape Urls & Anchors And Echo Them By NOT Suppressing Warnings.
Using PHP's DOM functions to
fetch hyperlinks and their anchor text
*/
$url = 'https://google.com';
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
$data = curl_exec($curl);
// DO NOT Hide HTML warnings
libxml_use_internal_errors(true);
$dom = new DOMDocument;
// echo Links and their anchor text
echo '<pre>';
echo "Link\tAnchor\n";
foreach($dom->getElementsByTagName('a') as $link) {
$href = $link->getAttribute('href');
$anchor = $link->nodeValue;
echo $href,"\t",$anchor,"\n";
}
echo '</pre>';
?>
WORKING CODE 3
PHP Code:
<?php
/*
2b. Scrape Urls & Anchors And Echo Them By Suppressing Warnings.
Using PHP's DOM functions to
fetch hyperlinks and their anchor text
*/
$url = 'https://google.com';
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
$data = curl_exec($curl);
// Hide HTML warnings
libxml_use_internal_errors(true);
$dom = new DOMDocument;
if($dom->loadHTML($data, LIBXML_NOWARNING)){
// echo Links and their anchor text
echo '<pre>';
echo "Link\tAnchor\n";
foreach($dom->getElementsByTagName('a') as $link) {
$href = $link->getAttribute('href');
$anchor = $link->nodeValue;
echo $href,"\t",$anchor,"\n";
}
echo '</pre>';
}else{
echo "Failed to load html.";
}
?>
Here is the code I am getting the error:
PHP Code:
<?php
/* FROM dom_crawler_NOTES.php file.
2.
FINDING HTML ELEMENTS BASED ON THEIR TAG NAMES
Suppose you wanted to find each and every link on a webpage.
We will be using “find” function to extract this information from the object. Here’s how to do it using Simple HTML DOM Parser :
*/
include('simple_html_dom.php');
$url = 'https://www.yahoo.com';
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
$html = curl_exec($curl);
//to fetch all hyperlinks from a webpage
$links = array();
foreach($html->find('a') as $a) {
$links[] = $a->href;
}
print_r($links);
echo "<br />";
?>
Fatal error: Uncaught Error: Call to a member function find() on string in C:\xampp\htdocs\cURL\crawler.php:24 Stack trace: #0 {main} thrown in C:\xampp\htdocs\cURL\crawler.php on line 24
Why error in this line ? ....
PHP Code:
foreach($html->find('a') as $a) {
Why script failing to acknowledge the "find" when it exists in the file:
include('simple_html_dom.php');
The simple_html_dom.php file can be downloaded from here:
https://sourceforge.net/projects/simplehtmldom/files/
Look at the first 3 scripts in this post. The first script uses this "simple_html_dom.php" and it had no problem using the "find".
Since on my latter script's code I referenced it to this "simple_html_dom.php" and it is residing in the same directory on my xamp then I should not be getting this silly error:
Fatal error: Uncaught Error: Call to a member function find() on string in C:\xampp\htdocs\cURL\crawler.php:24 Stack trace: #0 {main} thrown in C:\xampp\htdocs\cURL\crawler.php on line 24
Now should I ?