|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
HTML tag extraction
Hi folks,
I'm fresh to the coding game, so please bear with me. I'm completing an MSc in Computation at UMIST, Manchester (UK). My project is to use Perl regexps to scan and tear-out certain HTML tags, to make a Web page more accessible for users with low-vision. The code will sit on a proxy server (eventually), though I'm focusing on just a hotch-potch HTML page on my hard-drive. Just started. Any pointers? Cheers, Rich ![]() |
|
#2
|
||||
|
||||
|
Have a look at the HTML::Parser module on CPAN. It sounds like you'd find it useful with what you're doing.
~ishnid; |
|
#3
|
||||
|
||||
|
Also look at HTML::TagFilter, which will do conditional tag stripping for you and is a subclass of the very powerful HTML::Parser
Using regexes to parse arbitrary HTML is very, very difficult to do correctly, you *have* to have a tag-aware parser. |
![]() |
| Viewing: Dev Shed Forums > Other > Project Help Wanted > HTML tag extraction |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|
|