|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
Get inside! Sample the range of functionality easily built with JMSL Library for Time Series Data Analysis, Heat Maps, Portfolio Optimization, Monte Carlo Simulation, Stock Price Charting and more. Download Now! |
|
#1
|
|||
|
|||
|
So I need a regular expression to strip out all HTML tags EXCEPT the ones I've allowed.
I think I almost have it, but I can't get the negation right.. "/</?(^IMG|A|FONT|B|I|U|STRONG|EM|CODE|PRE|H1|H2|H3|H4|H5|H6)(.*)>?/i" Now the ^ isn't negating because it's not in a class. So how would I negate all those tags (meaning match anything EXCEPT those?) Also, what's a better alternative to the .* match so that they can't just throw a newline in there and **** things up? |
|
#2
|
|||
|
|||
|
As you note, the negation appears to not be working because '^' serves as a negator for character classes (the [] construct.)
Possibly you could set the match search to a negated match search? (Change the =~ ?) Also, what's a better alternative to the .* match so that they can't just throw a newline in there and f*** things up? I've read that trying [^>]* will encompass everything (including n) until the close of the tag; will this help? |
![]() |
| Viewing: Dev Shed Forums > Programming Languages > Perl Programming > Stripping HTML tags with regular expressions |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|