The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Programming Languages
> Python Programming
|
webcrawler in python
Discuss webcrawler in python in the Python Programming forum on Dev Shed. webcrawler in python Python Programming forum discussing coding techniques, tips and tricks, and Zope related information. Python was designed from the ground up to be a completely object-oriented programming language.
|
|
 |
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

July 30th, 2003, 10:28 AM
|
|
Junior Member
|
|
Join Date: Jul 2003
Posts: 10
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
|
webcrawler in python
Hey everybody, I'm new here and pretty new to python... I need to write a small script that can fetch web pages and submit information to those webpages. Could anybody point me in the right direction please??
Thanks, your help is greatly appreciationed,
Corey
edit: also I would like information on client session(how to use cookies client side)
|

July 30th, 2003, 12:06 PM
|
 |
Banned ;)
|
|
Join Date: Nov 2001
Location: Woodland Hills, Los Angeles County, California, USA
|
|
|
__________________
Up the Irons
What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
"Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
Down with Sharon Osbourne
|

July 30th, 2003, 03:32 PM
|
 |
Hello World :)
|
|
Join Date: Mar 2003
Location: Hull, UK
|
|
|
I'd also check out urllib's urlopen which prvides an easy way of getting a web page (source).
I've got a small webcrawler that i never finished for you to have a look at but i'll post that latter.
Have fun,
Mark.
|

August 1st, 2003, 10:40 AM
|
 |
Hello World :)
|
|
Join Date: Mar 2003
Location: Hull, UK
|
|
|
Hi,
here's the script I wrote, it goes through a list of stored websites and checks if they have changed since they were last checked. If they have it stores the new MD5 value for that site to be checked against latter. The wesites can be viewed in the template. Not exacty what you want but the basic's are there.
Hope this helps,
Take care,
Mark.
Last edited by netytan : August 8th, 2003 at 11:39 AM.
|

August 3rd, 2003, 08:38 AM
|
|
Contributing User
|
|
Join Date: Jul 2003
Posts: 133
Time spent in forums: < 1 sec
Reputation Power: 10
|
|
|

August 8th, 2003, 10:09 AM
|
|
Junior Member
|
|
Join Date: Jul 2003
Posts: 10
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
Thanks!
I'll take a look at your code netytan, and use python.org as a resource 
|

August 8th, 2003, 11:54 AM
|
 |
Hello World :)
|
|
Join Date: Mar 2003
Location: Hull, UK
|
|
Hi Alpha,
please dicregard the code i posted, it seams i posted the wrong version which doesn't actually work  , this is just the template side of it.. sorry if i wasted any of you time.
I will try and find the right code for you a little latter tonight. In the mean time you can take a look at this code..
Code:
#!/usr/bin/env python
import urllib, md5
page = urllib.urlopen('http://australianit.news.com.au').read()
checksum = md5.new(page).digest()
if open('md5.txt', 'r').read().strip() != checksum:
print 'Page has been changed\n'
open('md5.txt', 'w').write(checksum)
else:
print 'Page has not been changed\n'
you will need to create md5.txt before running but it should, get the source code from any webpage sent to it. this is the converted to an MD5 checksum and stored for comparison. If the page has changed then the new checksum is stored and the 'Page has been changed' line will be outputted.
Hope this is of more help,
Mark.
|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|