|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
Stay one step ahead of the competition. Evaluate and give feedback
on some of the hottest web development tools on the market today.
Make your opinion heard! Click
Here
|
|
#1
|
|||
|
|||
|
how to write a spider in python?
I need to write a program that will retrieve the text from web sites; I'd supply a list, it would get me all the text under the given URLs.
|
|
#2
|
|||
|
|||
|
You can use the urllib module to retrieve text from a web page.
The urlopen function returns the HTML code of the specified web page. Example take from the Python manual: Quote:
|
|
#3
|
|||
|
|||
|
thank you so much!
I will check out the urllib in the manual. I also need to know about following links from the web page; I assume I will find the info in the urllib module. |
|
#4
|
|||
|
|||
|
The urllib wil only return the HTML of the url you feed it.
If you want to follow links from that webpage you can get them out with either regular expressions (re module) or a parser (HTMLParser module). Unless you have a compelling reason I'd recommend ripping the urls out quick and dirty with a regular expression. You can then feed them to urllib again. |
![]() |
| Viewing: Dev Shed Forums > Programming Languages > Python Programming > how to write a spider in python? |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|