
February 25th, 2004, 04:24 PM
|
 |
Hello World :)
|
|
Join Date: Mar 2003
Location: Hull, UK
|
|
never had much reason to use htmllib but i think you might need to call the achore_bgn() method to tell htmllib what you want to collect. Anyway here's an example using regex with urllib.
Code:
>>> import re, urllib
>>> re.findall('<a href="(.+?)">', urllib.urlopen('http://www.python.org/').read())
['./', './search/', './download/', './doc/', './Help.html', './dev/', './community/', './sigs/', 'doc/Summary.html', 'doc/faq/', '2.3.3/', 'doc/2.3.3/', '2.2.3/', 'doc/2.2.3/', 'download/download_mac.html', 'http://www.jython.org/', 'http://www.python.org/pypi', ...
>>>
Mark.
__________________
programming language development: www.netytan.com – Hula
|