November 12th, 2005, 10:50 PM
Python & The Web: Scenario
Basically my script's goal is to:
1. access a local.yahoo.com results page that I have downloaded onto my hard drive.
2. visit a particular link on that results page.
3. parse info from that particular link.
4. go back to results page and visit the "Next" link.
5. Repeat steps 2-4.
At the end, one file will have all the parsed info from all the visited webpages.
I am new to the internet capabilities realm with Python, and after reading al the documentation on it, I am a little confused. What I want is to just fetch and read webpages via the "Next" link without having a broswer pop up as a result. I looked at webbrowser, urllib, SGMLParser, and HTMLParser in the Python documentation but it didn't clearly describe what I was looking for.
Is there a way to do access and do stuff to HTMLs without opening any browser windows in the process?
Any suggestions would be greatly appreciated.
November 13th, 2005, 03:11 PM
yes the urllib2 will be the best bet to doing this. I would say. It lets you handle a conneciton to a web site as a file. There is also a html parser file, but I don't know if that would be good for you. Your best bet will be to google some information about urllib2 for python and get the basics then you can play with the html praser in python to see if you like it or if it would help.
for an example of urllib2:
this will print all the lines of the source to that page.
website = urllib2.urlopen("http://local.yahoo.com")
for line in website:
November 13th, 2005, 04:41 PM
I dont fully understand what you mean with you have downloaded on your computer, it would be much easier to access it from your program with the urllib module.
Those people who think they know everything are a great annoyance to those of us who do.
November 14th, 2005, 12:51 PM
for parse html i use function:
w = formatter.DumbWriter(txt)
f = formatter.AbstractFormatter(w)
p = htmllib.HTMLParser(f)