|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
||||
|
||||
|
strip certain html
is there a standard perl module / function which can strip html, just like the strip_tags() function in php. If so, which one ??
Marc |
|
#2
|
|||
|
|||
|
Huh
I don't know about a module, but you can use a regular expression to do it:
$line "<b>This is some text"; $line =~ s/<[^>]*>//g; The var will equal "This is some text". Good luck. Josh |
|
#3
|
|||
|
|||
|
Why not just change the < and >'s into & lt and & gt (minus the spaces). This would mean that you could still use < and > but it would not be treated as html.
|
|
#4
|
||||
|
||||
|
Well, that's the whole problem, I want to allow some html like
< img ... > <a href ...> < b > < u > and so on. with php I can use strip_tags($variable, "allowable tags"); Is there something like this in perl, I do't want anyone to post php or script tags and stuff like that, but do allow them to post links and images. |
|
#5
|
||||
|
||||
|
Look into HTML::Parser on CPAN. This is an incredibly powerful module to do a multitude of things with HTML tags.
Doing this type of parsing with your own hand-rolled regular expression is doomed for failure. The posted regular expression would fail on a multi-line comment, or on any tag that had a newline statement in it: <a href="/areallylongurlwithaline breakinit">Long Url</a> |
|
#6
|
|||
|
|||
|
Quote:
Ive managed to write a short program that takes out all the < and > from an input and then translates predefined things such as [u] to <u>. You could instead, 'make safe' the < and >'s and then search through the text replacing & ltu& gt with <u> (minus the spaces again). It would be slightly more comlicated with the the img and a href ones, but possible none the less. |
|
#7
|
|||
|
|||
|
Oh yeah?
You can use the regex for multiline operations:
$line =~ s/<[^>]*>//gm; it's the "m" at the end. multiline. Josh |
|
#8
|
||||
|
||||
|
Assuming your HTML is valid that regex would be fine.
It would flame out on something like <> Dan > and would be a hatchet job with javascript. It strips everything, which isn't what the poster wanted anyway. My only point is that stuff with this level of complexity is best left to the experts, and I'd suggest looking at HTML::Parser if you want to do it right, plus with HTML::Parser you could allow certain tags. |
|
#9
|
||||
|
||||
|
thanks, I'll go and try that then.
|
![]() |
| Viewing: Dev Shed Forums > Programming Languages > Perl Programming > strip certain html |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|