|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
Stay one step ahead of the competition. Evaluate and give feedback
on some of the hottest web development tools on the market today.
Make your opinion heard! Click
Here
|
|
#1
|
|||
|
|||
|
webcrawler in python
Hey everybody, I'm new here and pretty new to python... I need to write a small script that can fetch web pages and submit information to those webpages. Could anybody point me in the right direction please??
Thanks, your help is greatly appreciationed, Corey edit: also I would like information on client session(how to use cookies client side) |
|
#2
|
||||
|
||||
|
__________________
Up the Irons What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home. "Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest Down with Sharon Osbourne Puzzle of the Month solved by sizeablegrin, etienne141 and L7Sqr, superior C/C++ programmers of the month |
|
#3
|
||||
|
||||
|
I'd also check out urllib's urlopen which prvides an easy way of getting a web page (source).
I've got a small webcrawler that i never finished for you to have a look at but i'll post that latter. Have fun, Mark. |
|
#4
|
||||
|
||||
|
Hi,
here's the script I wrote, it goes through a list of stored websites and checks if they have changed since they were last checked. If they have it stores the new MD5 value for that site to be checked against latter. The wesites can be viewed in the template. Not exacty what you want but the basic's are there. Hope this helps, Take care, Mark. Last edited by netytan : August 8th, 2003 at 11:39 AM. |
|
#5
|
|||
|
|||
|
Also, have a look at HarvestMan and spider.
|
|
#6
|
|||
|
|||
|
Thanks!
I'll take a look at your code netytan, and use python.org as a resource ![]() |
|
#7
|
||||
|
||||
|
Hi Alpha,
please dicregard the code i posted, it seams i posted the wrong version which doesn't actually work , this is just the template side of it.. sorry if i wasted any of you time.I will try and find the right code for you a little latter tonight. In the mean time you can take a look at this code.. Code:
#!/usr/bin/env python
import urllib, md5
page = urllib.urlopen('http://australianit.news.com.au').read()
checksum = md5.new(page).digest()
if open('md5.txt', 'r').read().strip() != checksum:
print 'Page has been changed\n'
open('md5.txt', 'w').write(checksum)
else:
print 'Page has not been changed\n'
you will need to create md5.txt before running but it should, get the source code from any webpage sent to it. this is the converted to an MD5 checksum and stored for comparison. If the page has changed then the new checksum is stored and the 'Page has been changed' line will be outputted. Hope this is of more help, Mark. |
![]() |
| Viewing: Dev Shed Forums > Programming Languages > Python Programming > webcrawler in python |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|