January 24th, 2013, 07:16 PM
-
PHP Scrape With Preg_Match
I'm trying to scrape html but I can't figure out how to use preg_match and foreach or loop. For example, I would like the php code to scan each class=paragraph_style_2 for the price after the '$' and also the text which is before the price and create two separate arrays, one for prices and another for the types, i.e. walk-in, 10 class pack, 20 class pack. I appreciate it so much! Even if you point me in the right direction, I would try to figure it out.
Here is the html:
Code:
<p class="paragraph_style_2"><br /></p> <p class="paragraph_style_2">Walk-in $18<br /></p> <p class="paragraph_style_2">10 Class Pack $160<br /></p> <p class="paragraph_style_2">20 Class Pack $300<br /></p> <p class="paragraph_style_2">Monthly Unlimited $149<br /></p>
I tried using simple_html_dom.php from the Internet, but it's not separating the price from price type, and I think the code should be a lot simpler, perhaps without even using simple_html_dom.
Here is what I have so far:
PHP Code:
<?php include('php/simple_html_dom.php');
function scraping_even() {
// create HTML DOM
$html = file_get_html('(some URL)');
foreach($html->find('.graphic_textbox_layout_style_default') as $article) {
// get price and price type
$item['p'] = trim($article->find('.paragraph_style_2', 2)->plaintext);
$item['p1'] = trim($article->find('.paragraph_style_2', 3)->plaintext);
$item['p2'] = trim($article->find('.paragraph_style_2', 4)->plaintext);
$item['p3'] = trim($article->find('.paragraph_style_2', 5)->plaintext);
$ret[] = $item;
}
// clean up memory
$html->clear();
unset($html);
return $ret;
}
// test it
// check user_agent header...
ini_set('user_agent', 'My-Application/2.5');
ini_set('display_errors',1);
error_reporting(E_ALL);
$ret = scraping_even();
foreach($ret as $v) {
echo $v['p'].'<br>';
echo $v['p1'].'<br>';
echo $v['p2'].'<br>';
echo $v['p3'].'<br>';
}
?>
The result is:
Walk-in                      $18
10 Class Pack             $160
20 Class Pack             $300
Monthly Unlimited    $149
which does not remove the spaces before the $, and does not create two arrays.
Last edited by requinix; January 24th, 2013 at 07:35 PM.
Reason: added a missing quote
January 25th, 2013, 07:15 AM
-
It is not completely clear to me what you are trying to do but I think you want to use split to extract your data. Something like:
PHP Code:
$str=split($article->find('.paragraph_style_2', 2)->plaintext,"$");
$item['p']=$str[1];
There are 10 kinds of people in the world. Those that understand binary and those that don't.
January 25th, 2013, 12:40 PM
-
Originally Posted by gw1500se
I think you want to use
split to extract your data.
split() is deprecated and supports simple regular expressions. explode(), like you linked to, is more appropriate.
Comments on this post