September 6th, 2013, 12:34 AM
Search through huge amount of text chunk for a few strings
Not sure if I should use strpos, preg, or something else.
I'm trying to search a webpage for a couple of words..if it finds both words then return true, if not continue the search til the end of the page.
Here's part of the webpage data text
Within this little chunk I need it to first find the extra words <div class="offer box module"
*Blah blah blah..useless text before this*
<div class="offer box module" id="p44201123" data-postid="44201123">
<div class="avatar st_offline">
<a href="/user/265347"><img src="http://media.steampowered.com/steamcommunity/public/images/avatars/14/14b38259b8888c901e043afcfc4106091efc3e3c_medium.jpg" width="60" height="60"></a>
<div class="caption"><a href="/user/265347"><strong><span class="nickname regular">RT_PT</span></strong></a> <time datetime="2013-09-03T22:19:05UTC">(2 days ago)</time></div>
*blah blah blah..only search below if the above text is not found*
Then continue searching until it finds
I basically trying to do a search to see if a recent offer exists that is not hidden.
This trade is just an example, and I can easily change the trade post to test to see if it works.
So any clue on how I should go about this?
September 6th, 2013, 01:05 AM
If you're looking for a specific sequence of characters, then use strpos() to locate them. If you need to match some kind of pattern then you'd use preg_match. Based on your description you seem to just want to search for a specific sequence so use strpos.
Recycle your old CD's, don't just trash them
If I helped you out, show some love with some reputation, or tip with Bitcoins to 1N645HfYf63UbcvxajLKiSKpYHAq2Zxud
September 6th, 2013, 06:37 AM
using a primitive string search to extract info from HTML is generally a very poor approach. If the markup changes just a little bit (different formatting, additional whitespace, additional classes, whatever), then your whole "solution" falls apart, and you need to fumble with your code again -- until the next change. It also doesn't make a lot of sense, because you do not even want a string. What you want is an HTML element.
Looking for the "days" keyword to get recent offers also isn't very sensible. What if the text says "1 hour"? Is that not recent? Do you really wanna wait until the offer is at least 2 days old so that your tool recognizes it?
I mean, if you're just playing around, and if this whole thing isn't really important, then this might be "good enough" as a quick and dirty hack. But if you're serious, you'll need to take a different approach.
What I would do is parse the HTML and then look for all divs with the class offer but without the class hidden (you can use XPath). And then I'd parse the datetime value from the time element to see if it falls within in the given time limit (whatever that is).
I mean, c'mon, this is nice semantic HTML. They're making it easy to parse the data. Use that!