PHP Development
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming LanguagesPHP Development

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old February 7th, 2013, 01:06 PM
dandy dandy is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2006
Posts: 32 dandy User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 8 h 58 m 35 sec
Reputation Power: 7
PHP-General - Preg_match read between div

Hi

Ive not really done any work with php for years and am struggling a bit to be honest (i was never a whizz to begin with)

Any ways ive been making a wordpress site which is about a hobby of mine (its an overly popular niche to be honest). What im trying to do is make a custom price comparison tool for my users. There will be only four shops that are searched (like i say its nor an overly popular hobby). As its not going to be made to create profit for me nor do i have the funds for someone to build it for me, i am looking at attempting it my self lol :S

What im thinking is create a database, input all the urls (of the products i want included)from the shops i want to search. Create a cron job to recheck these prices on a weekly basis.

I will use file_get_contents of the urls in my database, then use preg_match to extract the info i need.

I am having some difficulties worting the php out to extract from the following html (taken from one of the product pages)

<div class='ShowProductPrices'><span class='ShowProductMainPrices'>Price: £82.50</span></div>( EX VAT @ 20% ) <div>

What would the preg match code be to read just the price??

Any help would be appreciated

Reply With Quote
  #2  
Old February 7th, 2013, 01:12 PM
gw1500se gw1500se is online now
Contributing User
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Jul 2003
Posts: 2,907 gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level)gw1500se User rank is Colonel (50000 - 60000 Reputation Level) 
Time spent in forums: 1 Year 1 Month 1 Day 18 h 19 m 52 sec
Reputation Power: 581
Wouldn't it be more flexible to use DOM to parse it out rather than use preg_match?
Comments on this post
requinix agrees!
__________________
There are 10 kinds of people in the world. Those that understand binary and those that don't.

Reply With Quote
  #3  
Old February 7th, 2013, 01:18 PM
ManiacDan's Avatar
ManiacDan ManiacDan is offline
Sarcky
Dev Shed God 10th Plane (9500 - 9999 posts)
 
Join Date: Oct 2006
Location: Pennsylvania, USA
Posts: 9,923 ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)  Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 2 Months 3 Weeks 1 Day 11 h 1 m 28 sec
Reputation Power: 6113
PHP Code:
 preg_match('/Price:\s*([^<]+)/'$theContents$foo);
$price $foo[1]; 
__________________
HEY! YOU! Read the New User Guide and Forum Rules

"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

"The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.

Reply With Quote
  #4  
Old February 7th, 2013, 01:35 PM
dandy dandy is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2006
Posts: 32 dandy User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 8 h 58 m 35 sec
Reputation Power: 7
Thanks for the replies

Quote:
Wouldn't it be more flexible to use DOM to parse it out rather than use preg_match?


I have no idea? never worked with that before?

Reply With Quote
  #5  
Old February 7th, 2013, 02:00 PM
ManiacDan's Avatar
ManiacDan ManiacDan is offline
Sarcky
Dev Shed God 10th Plane (9500 - 9999 posts)
 
Join Date: Oct 2006
Location: Pennsylvania, USA
Posts: 9,923 ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)  Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 2 Months 3 Weeks 1 Day 11 h 1 m 28 sec
Reputation Power: 6113
The DOM (Document Object Model) is a rather opaque library available in PHP that parses well-formed HTML and XML documents into object trees you can traverse. It's used to both build and read those kinds of documents, and should be able to parse whatever page you're talking about into a tree which you can then search similar to the results of SimpleXML loading functions. However, it's overkill for this particular application, since you only want a single string.

My regex keeps the currency symbol btw, which means the results are not a number.

Reply With Quote
  #6  
Old February 7th, 2013, 02:15 PM
dandy dandy is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2006
Posts: 32 dandy User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 8 h 58 m 35 sec
Reputation Power: 7
Thanks for that, is there a way to ignore the currency symbol or would i just strip it out with a different function leaving me with just a number?

Reply With Quote
  #7  
Old February 7th, 2013, 03:12 PM
ManiacDan's Avatar
ManiacDan ManiacDan is offline
Sarcky
Dev Shed God 10th Plane (9500 - 9999 posts)
 
Join Date: Oct 2006
Location: Pennsylvania, USA
Posts: 9,923 ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)  Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 2 Months 3 Weeks 1 Day 11 h 1 m 28 sec
Reputation Power: 6113
PHP Code:
 preg_match('/Price:\s*\D([^<]+)/'$theContents$foo); 
$price $foo[1]; 

Reply With Quote
  #8  
Old February 7th, 2013, 03:26 PM
dandy dandy is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2006
Posts: 32 dandy User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 8 h 58 m 35 sec
Reputation Power: 7
Cheers guys, ive had a bit of free time and tried that code and its worked

Will crack on and try the other divs now hehe, can i just ask as i am keen to learn more about php

'/Price:\s*\D([^<]+)/ what are all these characters for??

Thanks

Reply With Quote
  #9  
Old February 7th, 2013, 03:42 PM
ManiacDan's Avatar
ManiacDan ManiacDan is offline
Sarcky
Dev Shed God 10th Plane (9500 - 9999 posts)
 
Join Date: Oct 2006
Location: Pennsylvania, USA
Posts: 9,923 ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)  Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 2 Months 3 Weeks 1 Day 11 h 1 m 28 sec
Reputation Power: 6113
'/Price:\s*\D([^<]+)/'
' ' -- quotes, makes a PHP string
/ / -- the "delimiters" or boundaries of the regular expression
Price: -- literal string 'Price:', from your output
\s -- whitespace
* -- "whatever the previous character was (whitespace, in this case), that character zero or more times"
\D -- "not a number"
( ) -- a capture group, which is how I got just the price into $foo[1]
[^>] -- NOT a >
+ -- "whatever the previous thing was (Not a >), that thing one or more times"
Comments on this post
dandy agrees: Thanks

Reply With Quote
  #10  
Old February 7th, 2013, 04:01 PM
dandy dandy is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2006
Posts: 32 dandy User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 8 h 58 m 35 sec
Reputation Power: 7
Thanks for your help, Really made some good progress tonight much more than i thought to be honest. Think ill call it a night now though and do a bit more tomorrow. (Im in no rush for this lol)

Next preg match lol

Code:
				<div class="ProductPageNav">
			<a href='Categories.asp'>Our Products</a>: <a href=COMPONENTS.htm' onmouseover="javascript:document.getCatPre.idcategory.value='40'; CatPrecallxml='1'; return runPreCatXML('cat_40');" onmouseout="javascript: CatPrecallxml=''; hidetip();">COMPONENTS</a> > <a href=c42.htm' onmouseover="javascript:document.getCatPre.idcategory.value='42'; CatPrecallxml='1'; return runPreCatXML('cat_42');" onmouseout="javascript: CatPrecallxml=''; hidetip();">Small Parts</a>
		</div>


The above code is from the website im scraping, Now id like to extract the link text and insert them in my database as keywords. From what i can see there can be more or less links within this div, so im guessing some king of preg match all or something? how do you disregard all that other rubbish in the link?

One other question: Probs should ahve checked this sooner to be honest, but do you think i will need permission from the shops im crawling before i display it on my site? I dont think there will be an issue though as its free advertising for them as such, maybe not the ones who are most expensive though lol?? Whats your thoughts.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPHP Development > PHP-General - Preg_match read between div

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap