Python Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming LanguagesPython Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
Stay one step ahead of the competition. Evaluate and give feedback on some of the hottest web development tools on the market today. Make your opinion heard! Click Here
  #1  
Old July 30th, 2003, 10:28 AM
A|pha_N3rd A|pha_N3rd is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jul 2003
Posts: 10 A|pha_N3rd User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
webcrawler in python

Hey everybody, I'm new here and pretty new to python... I need to write a small script that can fetch web pages and submit information to those webpages. Could anybody point me in the right direction please??

Thanks, your help is greatly appreciationed,

Corey

edit: also I would like information on client session(how to use cookies client side)

Reply With Quote
  #2  
Old July 30th, 2003, 12:06 PM
Scorpions4ever's Avatar
Scorpions4ever Scorpions4ever is offline
Banned ;)
Dev Shed God 5th Plane (7000 - 7499 posts)
 
Join Date: Nov 2001
Location: Glendale, Los Angeles County, California, USA
Posts: 7,432 Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level)Scorpions4ever User rank is Major General (70000 - 90000 Reputation Level) 
Time spent in forums: 4 Weeks 1 Day 22 h 29 m 51 sec
Reputation Power: 784
__________________
Up the Irons
What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
"Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
Down with Sharon Osbourne

Puzzle of the Month solved by sizeablegrin, etienne141 and L7Sqr, superior C/C++ programmers of the month

Reply With Quote
  #3  
Old July 30th, 2003, 03:32 PM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,529 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 17 h 19 m 5 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
I'd also check out urllib's urlopen which prvides an easy way of getting a web page (source).

I've got a small webcrawler that i never finished for you to have a look at but i'll post that latter.

Have fun,
Mark.

Reply With Quote
  #4  
Old August 1st, 2003, 10:40 AM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,529 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 17 h 19 m 5 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
Hi,

here's the script I wrote, it goes through a list of stored websites and checks if they have changed since they were last checked. If they have it stores the new MD5 value for that site to be checked against latter. The wesites can be viewed in the template. Not exacty what you want but the basic's are there.

Hope this helps,

Take care,
Mark.

Last edited by netytan : August 8th, 2003 at 11:39 AM.

Reply With Quote
  #5  
Old August 3rd, 2003, 08:38 AM
percivall percivall is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jul 2003
Posts: 133 percivall User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 6
Also, have a look at HarvestMan and spider.

Reply With Quote
  #6  
Old August 8th, 2003, 10:09 AM
A|pha_N3rd A|pha_N3rd is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jul 2003
Posts: 10 A|pha_N3rd User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Thanks!

I'll take a look at your code netytan, and use python.org as a resource

Reply With Quote
  #7  
Old August 8th, 2003, 11:54 AM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,529 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 17 h 19 m 5 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
Hi Alpha,

please dicregard the code i posted, it seams i posted the wrong version which doesn't actually work , this is just the template side of it.. sorry if i wasted any of you time.

I will try and find the right code for you a little latter tonight. In the mean time you can take a look at this code..

Code:
#!/usr/bin/env python 
 
import urllib, md5

page = urllib.urlopen('http://australianit.news.com.au').read() 

checksum = md5.new(page).digest()

if open('md5.txt', 'r').read().strip() != checksum:
	print 'Page has been changed\n'
	open('md5.txt', 'w').write(checksum)
else:
	print 'Page has not been changed\n'


you will need to create md5.txt before running but it should, get the source code from any webpage sent to it. this is the converted to an MD5 checksum and stored for comparison. If the page has changed then the new checksum is stored and the 'Page has been changed' line will be outputted.

Hope this is of more help,
Mark.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPython Programming > webcrawler in python


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 3 hosted by Hostway