#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2006
    Posts
    32
    Rep Power
    8

    Preg_match read between div


    Hi

    Ive not really done any work with php for years and am struggling a bit to be honest (i was never a whizz to begin with)

    Any ways ive been making a wordpress site which is about a hobby of mine (its an overly popular niche to be honest). What im trying to do is make a custom price comparison tool for my users. There will be only four shops that are searched (like i say its nor an overly popular hobby). As its not going to be made to create profit for me nor do i have the funds for someone to build it for me, i am looking at attempting it my self lol :S

    What im thinking is create a database, input all the urls (of the products i want included)from the shops i want to search. Create a cron job to recheck these prices on a weekly basis.

    I will use file_get_contents of the urls in my database, then use preg_match to extract the info i need.

    I am having some difficulties worting the php out to extract from the following html (taken from one of the product pages)

    <div class='ShowProductPrices'><span class='ShowProductMainPrices'>Price: 82.50</span></div>( EX VAT @ 20% ) <div>

    What would the preg match code be to read just the price??

    Any help would be appreciated
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,383
    Rep Power
    594
    Wouldn't it be more flexible to use DOM to parse it out rather than use preg_match?

    Comments on this post

    • requinix agrees
    There are 10 kinds of people in the world. Those that understand binary and those that don't.
  4. #3
  5. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,853
    Rep Power
    6351
    PHP Code:
    preg_match('/Price:\s*([^<]+)/'$theContents$foo);
    $price $foo[1]; 
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2006
    Posts
    32
    Rep Power
    8
    Thanks for the replies

    Wouldn't it be more flexible to use DOM to parse it out rather than use preg_match?
    I have no idea? never worked with that before?
  8. #5
  9. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,853
    Rep Power
    6351
    The DOM (Document Object Model) is a rather opaque library available in PHP that parses well-formed HTML and XML documents into object trees you can traverse. It's used to both build and read those kinds of documents, and should be able to parse whatever page you're talking about into a tree which you can then search similar to the results of SimpleXML loading functions. However, it's overkill for this particular application, since you only want a single string.

    My regex keeps the currency symbol btw, which means the results are not a number.
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2006
    Posts
    32
    Rep Power
    8
    Thanks for that, is there a way to ignore the currency symbol or would i just strip it out with a different function leaving me with just a number?
  12. #7
  13. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,853
    Rep Power
    6351
    PHP Code:
     preg_match('/Price:\s*\D([^<]+)/'$theContents$foo); 
    $price $foo[1]; 
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2006
    Posts
    32
    Rep Power
    8
    Cheers guys, ive had a bit of free time and tried that code and its worked

    Will crack on and try the other divs now hehe, can i just ask as i am keen to learn more about php

    '/Price:\s*\D([^<]+)/ what are all these characters for??

    Thanks
  16. #9
  17. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,853
    Rep Power
    6351
    '/Price:\s*\D([^<]+)/'
    ' ' -- quotes, makes a PHP string
    / / -- the "delimiters" or boundaries of the regular expression
    Price: -- literal string 'Price:', from your output
    \s -- whitespace
    * -- "whatever the previous character was (whitespace, in this case), that character zero or more times"
    \D -- "not a number"
    ( ) -- a capture group, which is how I got just the price into $foo[1]
    [^>] -- NOT a >
    + -- "whatever the previous thing was (Not a >), that thing one or more times"

    Comments on this post

    • dandy agrees : Thanks
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  18. #10
  19. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2006
    Posts
    32
    Rep Power
    8
    Thanks for your help, Really made some good progress tonight much more than i thought to be honest. Think ill call it a night now though and do a bit more tomorrow. (Im in no rush for this lol)

    Next preg match lol

    Code:
    				<div class="ProductPageNav">
    			<a href='Categories.asp'>Our Products</a>: <a href=COMPONENTS.htm' onmouseover="javascript:document.getCatPre.idcategory.value='40'; CatPrecallxml='1'; return runPreCatXML('cat_40');" onmouseout="javascript: CatPrecallxml=''; hidetip();">COMPONENTS</a> > <a href=c42.htm' onmouseover="javascript:document.getCatPre.idcategory.value='42'; CatPrecallxml='1'; return runPreCatXML('cat_42');" onmouseout="javascript: CatPrecallxml=''; hidetip();">Small Parts</a>
    		</div>
    The above code is from the website im scraping, Now id like to extract the link text and insert them in my database as keywords. From what i can see there can be more or less links within this div, so im guessing some king of preg match all or something? how do you disregard all that other rubbish in the link?

    One other question: Probs should ahve checked this sooner to be honest, but do you think i will need permission from the shops im crawling before i display it on my site? I dont think there will be an issue though as its free advertising for them as such, maybe not the ones who are most expensive though lol?? Whats your thoughts.

IMN logo majestic logo threadwatch logo seochat tools logo