
May 6th, 2004, 06:10 AM
|
|
Registered User
|
|
Join Date: May 2004
Posts: 1
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
Newbie to Python... Need help about HTMLParser
This is a Parser class that I wrote:
================================================
class MyParser(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self.keywords = []
def handle_data(self, data):
self.keywords.append(data);
def key(self):
return self.keywords
================================================
but I keep on geeting unwanted data inside the list
1. The data inside the <! -- some code inside --> tag
2. The CSS data included at the beginning of the HTML page
How do I not having this data in the list? I've thought of using the handle_comment function but since I'm only need the data
between a tag not inside. Can anyone give me some advice?
|