#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2009
    Posts
    103
    Rep Power
    5

    HttpURLConnection doesnt quite cut it


    i am using HttpURLConnection to grab search results from web pages
    i want to use it on this site
    http://www.twitch.tv/search?query=dan+dinh#stq=dan%20dinh&stp=1
    but the source code returned does not have the title, description etc in the source code
    instead they have variables which look like {{title}}, {{description}} in its place
    i have searched all of the java script files and all the other files in the source code for these variable values and cannot find them
    but if the web browser can translate these variables into their values, there must be a way i can do it in java
    does anyone know if there are other functions like HttpURLConnection that have the functionality i am looking for?
  2. #2
  3. Contributing User
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Aug 2010
    Location
    Eastern Florida
    Posts
    3,711
    Rep Power
    347
    but the source code returned
    Are you talking about the contents of the html page that the site returns?

    Can you post some of what is read from the site?
    When I read from the site I get what looks like valid html.
    <!DOCTYPE html>
    <html lang='en' xml:lang='en' xmlns:fb='http://www.facebook.com/2008/fbml' xmlns:og='http://opengraphprotocol.org/schema/' xmlns='http://www.w3.org/1999/xhtml'>
    <head>
    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
    <title>Twitch</title>
    ...
    Last edited by NormR; August 23rd, 2012 at 07:17 AM.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2009
    Posts
    103
    Rep Power
    5
    this is how they are posting search results
    Code:
    <script id='search-users' type='text/html'>
    <li class='users result user clearfix'>
    <a class='thumb' href='{{profilePath}}'>
    <img class='p60' src='{{profileImage}}'>
    </a>
    <div class='user_meta'>
    <p class='title'>
    <a href='{{profilePath}}'>{{name}}</a>
    </p>
    <p class='desc'>{{bio}}</p>
    <p class='user_stats'>
    <!-- / %span.stat.views_count {{views}} -->
    <span class='stat followers_count'>{{followers}}</span>
    </p>
    </div>
    </li>
    </script>
    <script id='search-broadcasts' type='text/html'>
    <li class='broadcasts result archive video clearfix'>
    <div class='cap_and_profile'>
    <a class='thumb' href='{{path}}'>
    <img class='cap' src='{{thumbnail}}'>
    <a class='profile' href='{{profilePath}}'>
    <img class='p50' src='{{profileImage}}'>
    </a>
    </a>
    </div>
    <div class='video_meta'>
    <p class='title'>
    <img alt='Recorded' class='video_type' src='/images/xarth/g/g18_camera-00000080.png'>
    <a href='{{path}}'>
    {{title}}
    {{^title}}
    Untitled Broadcast
    {{/title}}
    </a>
    </p>
    <p class='video_stats search_topstats'>
    <span class='stat channelname'>
    on
    <a href='{{profilePath}}'>{{user}}</a>
    {{#game}}
    playing
    <a href='{{gamePath}}'>{{game}}</a>
    {{/game}}
    </span>
    </p>
    <p class='desc'>
    <span class='content'>{{description}}</span>
    </p>
    <p class='video_stats'>
    <span class='stat time_ago'>
    Recorded
    <time datetime='{{startTime}}'></time>
    </span>
    <span class='stat length'>{{length}}</span>
    <span class='stat views_count'>{{views}}</span>
    </p>
    </div>
    </li>
    </script>
  6. #4
  7. Contributing User
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Aug 2010
    Location
    Eastern Florida
    Posts
    3,711
    Rep Power
    347
    What are "search results"?

    What are the first few lines returned by the server for the URL you posted? I posted what I receive when I send an HTTP Get to that URL. It looks like a full HTML page.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2009
    Posts
    103
    Rep Power
    5
    im not talking about the first few lines im talking about the section of the code where they display the search results on line 324 of the link i posted
  10. #6
  11. Contributing User
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Aug 2010
    Location
    Eastern Florida
    Posts
    3,711
    Rep Power
    347
    I'm not sure what the problem is. If you send an HTTP Get to a server it returns an HTML page.
    Does the code you are using read the complete page that the server sends?
    What more do you want than that?
  12. #7
  13. No Profile Picture
    rebel with a cause
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    May 2004
    Location
    The Batsh!t Crazy State.
    Posts
    5,817
    Rep Power
    3462
    Those tags aren't HTML. My first guess is that they're targets for JQuery or some other AJAX code. You may not be able to get the data you want just by scraping the results page.

    There's a link to a Developer API at the bottom of the page. Why not try looking there for a way to get the data you want.
    Dear God. What is it like in your funny little brains? It must be so boring.

IMN logo majestic logo threadwatch logo seochat tools logo