#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2006
    Posts
    167
    Rep Power
    21

    Scraping the data from website


    Hi guys,

    I really need your help, I have scraping the data from a website that i use to read the information on my php. I have a problem, I could not be able to scraping the right data from 5 hours backward of my current time, e.g my current time is 10pm and the 5 hours backward time is 5pm so i want to scrape the data that is on the same row as the 5pm time.

    here is the php code:

    PHP Code:
       <?php
        
        $data 
    file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179');
        
    preg_match_all('/<a id="rowTitle\d+" class="zc-ssl-pg-title"[^>]*>([^<]+)<\/a>/im'$data$matches);
        
    $titles $matches[1];
        
        echo 
    $titles[19];
    I can only scraping the data that are 7 hours backward from my current time, but i can't be able to figure it out how i can scraping the data that are 5 hours backward from my current time.

    If you know how i can scraping the data in the same row as the time that are 5 hours backward from my current time, i would be very appreicated it if you could post the code that i can scraping the data that are 5 hours backward from my current time to the end of the page.

    Any advice would be much appreicated.

    Thanks in advance
    Last edited by stephen100; April 15th, 2013 at 08:28 PM.
  2. #2
  3. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,957
    Rep Power
    1046
    Hi,

    you don't select any time at all, you just pick whatever happens to be the 20th entry in the list -- and the time of that depends on how long the shows are. You do understand this listing, right?

    You need to go through the list entries (li.zc-ssl-pg), parse the time (span.zc-ssl-pg-time) and then display the title of the entries you're interested in.

    Also, please don't take the "scraping" literally. Instead of fumbling with regexes, take one of the many, many XML/DOM parsers and do it the sane way.
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2006
    Posts
    167
    Rep Power
    21
    Yes I do understand that where you are coming from, but the trouble is I come from the UK and I am 5 hours forward from the US. I need to find it out how i can pick the right data behind my current time what's showing on right now.

    Could you please help me with the timezone how i can extract the correct title in the same row as the time that are 5 hours behind my current time which i want to extract from the start to the end of the title using with DOM Parsers?
  6. #4
  7. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,957
    Rep Power
    1046
    Well, the current show is the first entry under "today". You don't need the time for that.

    Or do you want to select a specific local time and then get the show for that point of time? Something like "today at 17:25"?

    In any case, start by reading up on the DOM parser and then fetch the list entries.
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2006
    Posts
    167
    Rep Power
    21
    No, I want to scrape the programme title that is in current time in the USA, for e.g. the usa time is 10pm while my current time is 3am.

    Here is the example programme that showing on right now:

    PHP Code:
     10:00 PM Baseball Tonight

        LIVE

    11
    :00 PM SportsCenter

        LIVE

    Tomorrow
    12
    :00 AM SportsCenter

        LIVE

    1
    :00 AM SportsCenter

        LIVE

    2
    :00 AM SportsCenter

        LIVE

    3
    :00 AM SportsCenter

    4
    :00 AM SportsCenter 
    I hope you get my point?
  10. #6
  11. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,957
    Rep Power
    1046
    I understand what you're trying to do, I gust don't get what the issue is and why you keep pointing out the different time zones.

    Again, the current show is listed first under "Today". No need for any time calculations. Just loop through the elements of #zc-ssl-scrollList and search for the first element with the class zc-ssl-sp and the content "Today". That's the label. The next list element is the current show.

    If you're using XPath, you can probably do this in one line without any loops.
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  12. #7
  13. Mad Scientist
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Oct 2007
    Location
    North Yorkshire, UK
    Posts
    3,661
    Rep Power
    4123
    the current show is listed first under "Today"
    That's not what I see. I see the full list is rendered and then javascript scrolls me to the current time.

    However, it would appear from the source that the current line has the id row1-1

    Comments on this post

    • Jacques1 agrees
    I said I didn't like ORM!!! <?php $this->model->update($this->request->resources[0])->set($this->request->getData())->getData('count'); ?>

    PDO vs mysql_* functions: Find a Migration Guide Here

    [ Xeneco - T'interweb Development ] - [ Are you a Help Vampire? ] - [ Read The manual! ] - [ W3 methods - GET, POST, etc ] - [ Web Design Hell ]
  14. #8
  15. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,957
    Rep Power
    1046
    Yep, you're right. You can also see the JavaScript scrolling to it:

    Code:
    var curView=document.getElementById("row1-1").offsetTop;
    jQuery(this.st).animate({scrollTop:curView+"px", ...});
    That makes it even easier.
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2006
    Posts
    167
    Rep Power
    21
    Originally Posted by Jacques1
    I understand what you're trying to do, I gust don't get what the issue is and why you keep pointing out the different time zones.

    Again, the current show is listed first under "Today". No need for any time calculations. Just loop through the elements of #zc-ssl-scrollList and search for the first element with the class zc-ssl-sp and the content "Today". That's the label. The next list element is the current show.

    If you're using XPath, you can probably do this in one line without any loops.
    I am glad that you understand that. I tried to figure it out how i can scrape the title in the USA current time, e.g. the usa time is 10:00pm and it is showing "Baseball Tonight", but i don't know how.

    Could you please post the source code that will help me out easily?
  18. #10
  19. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2006
    Posts
    167
    Rep Power
    21
    I think I have found the solve to extract the title using with 'rowTitle1'. There is a problem, I can only extract one of these title. I can't be able to extract both of them at the same time.

    PHP Code:
     <?php
       
      $data 
    file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179');
      
    preg_match_all('/<a id="rowTitle1\" class="zc-ssl-pg-title"[^>]*>([^<]+)<\/a>/im'$data$matches);
      
    preg_match_all('/<a id="rowTitle2\" class="zc-ssl-pg-title"[^>]*>([^<]+)<\/a>/im'$data$matches);
      
    $titles $matches[1];
      echo 
    $titles[0]; 
    ?>
    Can you please help??
  20. #11
  21. Mad Scientist
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Oct 2007
    Location
    North Yorkshire, UK
    Posts
    3,661
    Rep Power
    4123
    that code over writes the first $matches with the second preg_match call

    either

    PHP Code:
    $data file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179'); 

    preg_match_all('/<a id="rowTitle1\" class="zc-ssl-pg-title"[^>]*>([^<]+)<\/a>/im'$data$matches); 

    $titles $matches[1]; 

    echo 
    $titles[0];  

    preg_match_all('/<a id="rowTitle2\" class="zc-ssl-pg-title"[^>]*>([^<]+)<\/a>/im'$data$matches); 

    $titles $matches[1]; 

    echo 
    $titles[0]; 

    or

    PHP Code:
    $data file_get_contents('http://tvlistings.zap2it.com/tvlistings/ZCSGrid.do?stnNum=10179'); 
    preg_match_all('/<a id="rowTitle1\" class="zc-ssl-pg-title"[^>]*>([^<]+)<\/a>/im'$data$matches1); 
    preg_match_all('/<a id="rowTitle2\" class="zc-ssl-pg-title"[^>]*>([^<]+)<\/a>/im'$data$matches2); 
    $titles1 $matches1[1]; 
    $titles2 $matches2[1]; 
    echo 
    $titles1[0]; 
    echo 
    $titles2[0]; 
    or similar.

    Personally, I'd try a DOM based approach as already suggested. HTML can be viewed as a structured data language and, as such, that structure can be queried for data. The DOMDocument and simplexml classes in php can turn a (well formed) HTML string into an object that can have its child elements identified.
    Last edited by Northie; April 18th, 2013 at 02:42 AM.
    I said I didn't like ORM!!! <?php $this->model->update($this->request->resources[0])->set($this->request->getData())->getData('count'); ?>

    PDO vs mysql_* functions: Find a Migration Guide Here

    [ Xeneco - T'interweb Development ] - [ Are you a Help Vampire? ] - [ Read The manual! ] - [ W3 methods - GET, POST, etc ] - [ Web Design Hell ]
  22. #12
  23. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2006
    Posts
    167
    Rep Power
    21
    Thank you very much for your help. Do you know how I can extract the time, title and the new and live text to output in my php using with this source?

    PHP Code:
                               <span id="row1Time" class="zc-ssl-pg-time">9:00 AM</span>
                                <
    a id="rowTitle1" class="zc-ssl-pg-title" href='http://tvlistings.zap2it.com/tv/sportscenter/EP00019917'>SportsCenter</a>
                                                <
    ul class="zc-icons">
                                    <
    li class="zc-ic zc-ic-span"><span class="zc-ic-live">LIVE</span></li></ul

    And do you know how i can work it out the time how long the programme will last for, e.g. 30 minutes, 1 hour..etc
  24. #13
  25. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2006
    Posts
    167
    Rep Power
    21
    does anyone know how????
  26. #14
  27. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,853
    Rep Power
    6351
    Do you not understand regular expressions at all? Northie has given a few examples of proper regular expressions which fetch data from HTML source. Instead of simply asking him to keep doing each bullet point on your to-do list until he does all your work for you, maybe you can read his work, try to understand it, and adapt it to your new needs?

    Comments on this post

    • Matt1776 agrees
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.

IMN logo majestic logo threadwatch logo seochat tools logo