SunQuest
           Python Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming LanguagesPython Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
Be the architects of evolution and help create the mobile internet future. It’s your move---enter to win here!
  #1  
Old February 25th, 2004, 09:47 AM
7imz 7imz is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2004
Posts: 10 7imz User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
again httmlib

how can i parse a webpage so that i can get all the links on that webpage... is it something like this

import htmllib
import string
import urllib

file = urllib.urlopen("http://www.python.org")
html = file.read()
file.close()

p = htmllib.HTMLParser()
p.feed(html)
p.close()

for v in p.anchorlist:
print v

(my problem is i've been learning python for 2 days only so this is all somewhat new to me)

Reply With Quote
  #2  
Old February 25th, 2004, 04:24 PM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,529 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 17 h 19 m 5 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
never had much reason to use htmllib but i think you might need to call the achore_bgn() method to tell htmllib what you want to collect. Anyway here's an example using regex with urllib.

Code:
>>> import re, urllib
>>> re.findall('<a href="(.+?)">', urllib.urlopen('http://www.python.org/').read())
['./', './search/', './download/', './doc/', './Help.html', './dev/', './community/', './sigs/', 'doc/Summary.html', 'doc/faq/', '2.3.3/', 'doc/2.3.3/', '2.2.3/', 'doc/2.2.3/', 'download/download_mac.html', 'http://www.jython.org/', 'http://www.python.org/pypi', ...
>>> 


Mark.
__________________
programming language development: www.netytan.com Hula


Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPython Programming > again httmlib


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 1 hosted by Hostway