#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    18
    Rep Power
    0

    Grabbing next set of results from webpage


    I have written a program that pulls results from a webpage and returns it on the screen, however, I would like it to return more than one result, and the different players. Obviously, this would involve some form of loop, probably a for loop. But all it does is return the same name. I would like it to eliminate that result from the selection after processing it.

    Code:
    import urllib.request
    
    data = urllib.request.urlopen('http://www.football-league.co.uk/page/DivisionalScorers/0,,10794~20127,00.html')
    
    e = data.read()
    
    m = e.decode('utf8')
    
    splitted_page = m.split('<div class="statistics">')
    splitted_page = splitted_page[1].split('</div>')
    splitted_page2 = splitted_page[0].split('<tr class="rowDark">')
    splitted_page2 = splitted_page2[1].split('</tr>')
    splitted_page3 = splitted_page2[0].split('<td style="text-align:center;">')
    splitted_page3 = splitted_page3[1].split('</td>')
    splitted_page4 = splitted_page2[0].split('<td>')
    splitted_page4 = splitted_page4[1].split('</td>')
    splitted_page5 = splitted_page4[0].split('>')
    splitted_page5 = splitted_page5[1].split('<')
    print(splitted_page5[0])
    print(splitted_page3[0])
    What it is doing is splitting the page up into several parts until all that remains is what I am looking for.

    Edit: I'm using python 3
    Last edited by noskiw; March 10th, 2013 at 10:34 AM. Reason: Forgot to mention python 3
  2. #2
  3. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,703
    Rep Power
    480
    This bash command lists the players to my screen:
    Code:
    $ wget -O /dev/stdout 'http://www.football-league.co.uk/page/DivisionalScorers/0,,10794~20127,00.html' 2>/dev/null | gawk '1==a{print;a=2}3==a{a=0}/www[.]player/{++a}'
    Glenn Murray
    Charlie Austin
    Jordan Rhodes
    Matej Vydra
    Tom Ince
    Chris Wood
    David Nugent
    ...
    Danny Batth
    Stephen Ward
    Matt Doherty
    Richard Stearman
    So get on with it, install cygwin/x11 or mingw .
    [code]Code tags[/code] are essential for python code and Makefiles!
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    18
    Rep Power
    0
    How does this work, and how does it help me achieve. It has to all be done in python for my A2 coursework.
  6. #4
  7. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,703
    Rep Power
    480
    It might not be particularly good for your python homework.

    The wget command is the data up through your definition of m.

    I suppose for your class you'd then use the xml.sax.parser or an html parser. These are easy to use following the example code in the documents. (My opinion.)

    Anyway, my solution was to look at the page source, observe that every other occurrence of
    www.player
    is in a line preceding the lines you want to retain.
    The gawk program implements a finite state machine with 4 states. The state advances by one each time gawk finds www.player . If the state is 1 it prints the line and increments the state. If the state is 3 the state is reset to 0. Works.
    [code]Code tags[/code] are essential for python code and Makefiles!

IMN logo majestic logo threadwatch logo seochat tools logo