Python Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming LanguagesPython Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old March 10th, 2013, 10:26 AM
noskiw noskiw is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2012
Posts: 18 noskiw User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 14 h 30 m 46 sec
Reputation Power: 0
Grabbing next set of results from webpage

I have written a program that pulls results from a webpage and returns it on the screen, however, I would like it to return more than one result, and the different players. Obviously, this would involve some form of loop, probably a for loop. But all it does is return the same name. I would like it to eliminate that result from the selection after processing it.

Code:
import urllib.request

data = urllib.request.urlopen('http://www.football-league.co.uk/page/DivisionalScorers/0,,10794~20127,00.html')

e = data.read()

m = e.decode('utf8')

splitted_page = m.split('<div class="statistics">')
splitted_page = splitted_page[1].split('</div>')
splitted_page2 = splitted_page[0].split('<tr class="rowDark">')
splitted_page2 = splitted_page2[1].split('</tr>')
splitted_page3 = splitted_page2[0].split('<td style="text-align:center;">')
splitted_page3 = splitted_page3[1].split('</td>')
splitted_page4 = splitted_page2[0].split('<td>')
splitted_page4 = splitted_page4[1].split('</td>')
splitted_page5 = splitted_page4[0].split('>')
splitted_page5 = splitted_page5[1].split('<')
print(splitted_page5[0])
print(splitted_page3[0])


What it is doing is splitting the page up into several parts until all that remains is what I am looking for.

Edit: I'm using python 3

Last edited by noskiw : March 10th, 2013 at 10:34 AM. Reason: Forgot to mention python 3

Reply With Quote
  #2  
Old March 10th, 2013, 02:30 PM
b49P23TIvg's Avatar
b49P23TIvg b49P23TIvg is offline
Contributing User
Dev Shed Loyal (3000 - 3499 posts)
 
Join Date: Aug 2011
Posts: 3,460 b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level) 
Time spent in forums: 1 Month 2 Weeks 4 Days 6 h 56 m 42 sec
Reputation Power: 403
This bash command lists the players to my screen:
Code:
$ wget -O /dev/stdout 'http://www.football-league.co.uk/page/DivisionalScorers/0,,10794~20127,00.html' 2>/dev/null | gawk '1==a{print;a=2}3==a{a=0}/www[.]player/{++a}'
Glenn Murray
Charlie Austin
Jordan Rhodes
Matej Vydra
Tom Ince
Chris Wood
David Nugent
...
Danny Batth
Stephen Ward
Matt Doherty
Richard Stearman
So get on with it, install cygwin/x11 or mingw .
__________________
[code]Code tags[/code] are essential for python code!

Reply With Quote
  #3  
Old March 10th, 2013, 02:39 PM
noskiw noskiw is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2012
Posts: 18 noskiw User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 14 h 30 m 46 sec
Reputation Power: 0
How does this work, and how does it help me achieve. It has to all be done in python for my A2 coursework.

Reply With Quote
  #4  
Old March 10th, 2013, 03:06 PM
b49P23TIvg's Avatar
b49P23TIvg b49P23TIvg is offline
Contributing User
Dev Shed Loyal (3000 - 3499 posts)
 
Join Date: Aug 2011
Posts: 3,460 b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level) 
Time spent in forums: 1 Month 2 Weeks 4 Days 6 h 56 m 42 sec
Reputation Power: 403
It might not be particularly good for your python homework.

The wget command is the data up through your definition of m.

I suppose for your class you'd then use the xml.sax.parser or an html parser. These are easy to use following the example code in the documents. (My opinion.)

Anyway, my solution was to look at the page source, observe that every other occurrence of
www.player
is in a line preceding the lines you want to retain.
The gawk program implements a finite state machine with 4 states. The state advances by one each time gawk finds www.player . If the state is 1 it prints the line and increments the state. If the state is 3 the state is reset to 0. Works.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPython Programming > Grabbing next set of results from web-page

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap