Python Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming LanguagesPython Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old January 6th, 2003, 05:16 PM
bullwinkle bullwinkle is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2003
Location: el paso, texas
Posts: 9 bullwinkle User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Send a message via AIM to bullwinkle Send a message via Yahoo to bullwinkle
how to write a spider in python?

I need to write a program that will retrieve the text from web sites; I'd supply a list, it would get me all the text under the given URLs.

Reply With Quote
  #2  
Old January 9th, 2003, 12:10 AM
Blaktyger Blaktyger is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2002
Posts: 3 Blaktyger User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
You can use the urllib module to retrieve text from a web page.
The urlopen function returns the HTML code of the specified web page.

Example take from the Python manual:

Quote:
import urllib
params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
print f.read()

Reply With Quote
  #3  
Old January 9th, 2003, 05:42 AM
bullwinkle bullwinkle is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2003
Location: el paso, texas
Posts: 9 bullwinkle User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Send a message via AIM to bullwinkle Send a message via Yahoo to bullwinkle
thank you so much!

I will check out the urllib in the manual.

I also need to know about following links from the web page; I assume I will find the info in the urllib module.

Reply With Quote
  #4  
Old January 11th, 2003, 04:54 PM
alper alper is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2001
Location: Delft
Posts: 1 alper User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Send a message via ICQ to alper
The urllib wil only return the HTML of the url you feed it.

If you want to follow links from that webpage you can get them out with either regular expressions (re module) or a parser (HTMLParser module).

Unless you have a compelling reason I'd recommend ripping the urls out quick and dirty with a regular expression. You can then feed them to urllib again.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPython Programming > how to write a spider in python?

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap