#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2008
    Posts
    118
    Rep Power
    7

    Search through huge amount of text chunk for a few strings


    Not sure if I should use strpos, preg, or something else.

    I'm trying to search a webpage for a couple of words..if it finds both words then return true, if not continue the search til the end of the page.

    Here's part of the webpage data text
    Code:
    *Blah blah blah..useless text before this*
    <div class="offer box module" id="p44201123" data-postid="44201123">
                        <div class="avatar st_offline">
                            <a href="/user/265347"><img src="http://media.steampowered.com/steamcommunity/public/images/avatars/14/14b38259b8888c901e043afcfc4106091efc3e3c_medium.jpg" width="60" height="60"></a>
                        </div>
                        <div class="post_data">
                            <div class="title">
                                <div class="caption"><a href="/user/265347"><strong><span class="nickname regular">RT_PT</span></strong></a> <time datetime="2013-09-03T22:19:05UTC">(2 days ago)</time></div>
    *blah blah blah..only search below if the above text is not found*
    Within this little chunk I need it to first find the extra words <div class="offer box module"

    Then continue searching until it finds
    days ago)</time>

    -----------------------
    http://www.tf2outpost.com/trade/14090146
    I basically trying to do a search to see if a recent offer exists that is not hidden.
    This trade is just an example, and I can easily change the trade post to test to see if it works.

    So any clue on how I should go about this?
  2. #2
  3. Wiser? Not exactly.
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    May 2001
    Location
    Bonita Springs, FL
    Posts
    5,959
    Rep Power
    4035
    If you're looking for a specific sequence of characters, then use strpos() to locate them. If you need to match some kind of pattern then you'd use preg_match. Based on your description you seem to just want to search for a specific sequence so use strpos.
    Recycle your old CD's, don't just trash them



    If I helped you out, show some love with some reputation, or tip with Bitcoins to 1N645HfYf63UbcvxajLKiSKpYHAq2Zxud
  4. #3
  5. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,959
    Rep Power
    1014
    Hi,

    using a primitive string search to extract info from HTML is generally a very poor approach. If the markup changes just a little bit (different formatting, additional whitespace, additional classes, whatever), then your whole "solution" falls apart, and you need to fumble with your code again -- until the next change. It also doesn't make a lot of sense, because you do not even want a string. What you want is an HTML element.

    Looking for the "days" keyword to get recent offers also isn't very sensible. What if the text says "1 hour"? Is that not recent? Do you really wanna wait until the offer is at least 2 days old so that your tool recognizes it?

    I mean, if you're just playing around, and if this whole thing isn't really important, then this might be "good enough" as a quick and dirty hack. But if you're serious, you'll need to take a different approach.

    What I would do is parse the HTML and then look for all divs with the class offer but without the class hidden (you can use XPath). And then I'd parse the datetime value from the time element to see if it falls within in the given time limit (whatever that is).

    I mean, c'mon, this is nice semantic HTML. They're making it easy to parse the data. Use that!
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".

IMN logo majestic logo threadwatch logo seochat tools logo