Perl Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming LanguagesPerl Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old July 20th, 2001, 09:37 AM
yhcmarc's Avatar
yhcmarc yhcmarc is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jul 2001
Location: Heemskerk, The Netherlands
Posts: 254 yhcmarc User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 53 m 49 sec
Reputation Power: 8
strip certain html

is there a standard perl module / function which can strip html, just like the strip_tags() function in php. If so, which one ??

Marc

Reply With Quote
  #2  
Old July 20th, 2001, 09:59 AM
JSchoof JSchoof is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2000
Posts: 82 JSchoof User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 23 m 46 sec
Reputation Power: 9
Huh

I don't know about a module, but you can use a regular expression to do it:

$line "<b>This is some text";
$line =~ s/<[^>]*>//g;

The var will equal "This is some text".

Good luck.

Josh

Reply With Quote
  #3  
Old July 20th, 2001, 10:09 AM
Acid Reign Acid Reign is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jul 2001
Posts: 190 Acid Reign User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 4 h 57 m 52 sec
Reputation Power: 8
Send a message via ICQ to Acid Reign
Why not just change the < and >'s into & lt and & gt (minus the spaces). This would mean that you could still use < and > but it would not be treated as html.

Reply With Quote
  #4  
Old July 20th, 2001, 10:16 AM
yhcmarc's Avatar
yhcmarc yhcmarc is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jul 2001
Location: Heemskerk, The Netherlands
Posts: 254 yhcmarc User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 53 m 49 sec
Reputation Power: 8
Well, that's the whole problem, I want to allow some html like

< img ... > <a href ...> < b > < u > and so on.

with php I can use strip_tags($variable, "allowable tags");

Is there something like this in perl, I do't want anyone to post php or script tags and stuff like that, but do allow them to post links and images.

Reply With Quote
  #5  
Old July 20th, 2001, 10:52 AM
Hero Zzyzzx's Avatar
Hero Zzyzzx Hero Zzyzzx is offline
11
Dev Shed Demi-God (4500 - 4999 posts)
 
Join Date: Jul 2001
Location: Lynn, MA
Posts: 4,635 Hero Zzyzzx User rank is Second Lieutenant (5000 - 10000 Reputation Level)Hero Zzyzzx User rank is Second Lieutenant (5000 - 10000 Reputation Level)Hero Zzyzzx User rank is Second Lieutenant (5000 - 10000 Reputation Level)Hero Zzyzzx User rank is Second Lieutenant (5000 - 10000 Reputation Level)Hero Zzyzzx User rank is Second Lieutenant (5000 - 10000 Reputation Level)Hero Zzyzzx User rank is Second Lieutenant (5000 - 10000 Reputation Level)Hero Zzyzzx User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 4 Days 23 h 44 m 19 sec
Reputation Power: 77
Send a message via AIM to Hero Zzyzzx
Look into HTML::Parser on CPAN. This is an incredibly powerful module to do a multitude of things with HTML tags.

Doing this type of parsing with your own hand-rolled regular expression is doomed for failure.

The posted regular expression would fail on a multi-line comment, or on any tag that had a newline statement in it:

<a href="/areallylongurlwithaline
breakinit">Long Url</a>

Reply With Quote
  #6  
Old July 20th, 2001, 11:19 AM
Acid Reign Acid Reign is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jul 2001
Posts: 190 Acid Reign User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 4 h 57 m 52 sec
Reputation Power: 8
Send a message via ICQ to Acid Reign
Quote:
Originally posted by yhcmarc
Well, that's the whole problem, I want to allow some html like

< img ... > <a href ...> < b > < u > and so on.

with php I can use strip_tags($variable, "allowable tags");

Is there something like this in perl, I do't want anyone to post php or script tags and stuff like that, but do allow them to post links and images.


Ive managed to write a short program that takes out all the < and > from an input and then translates predefined things such as [u] to <u>.

You could instead, 'make safe' the < and >'s and then search through the text replacing & ltu& gt with <u> (minus the spaces again). It would be slightly more comlicated with the the img and a href ones, but possible none the less.

Reply With Quote
  #7  
Old July 20th, 2001, 11:23 AM
JSchoof JSchoof is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2000
Posts: 82 JSchoof User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 23 m 46 sec
Reputation Power: 9
Oh yeah?

You can use the regex for multiline operations:

$line =~ s/<[^>]*>//gm;

it's the "m" at the end. multiline.

Josh

Reply With Quote
  #8  
Old July 20th, 2001, 11:59 AM
Hero Zzyzzx's Avatar
Hero Zzyzzx Hero Zzyzzx is offline
11
Dev Shed Demi-God (4500 - 4999 posts)
 
Join Date: Jul 2001
Location: Lynn, MA
Posts: 4,635 Hero Zzyzzx User rank is Second Lieutenant (5000 - 10000 Reputation Level)Hero Zzyzzx User rank is Second Lieutenant (5000 - 10000 Reputation Level)Hero Zzyzzx User rank is Second Lieutenant (5000 - 10000 Reputation Level)Hero Zzyzzx User rank is Second Lieutenant (5000 - 10000 Reputation Level)Hero Zzyzzx User rank is Second Lieutenant (5000 - 10000 Reputation Level)Hero Zzyzzx User rank is Second Lieutenant (5000 - 10000 Reputation Level)Hero Zzyzzx User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 4 Days 23 h 44 m 19 sec
Reputation Power: 77
Send a message via AIM to Hero Zzyzzx
Assuming your HTML is valid that regex would be fine.

It would flame out on something like

<> Dan >

and would be a hatchet job with javascript.

It strips everything, which isn't what the poster wanted anyway.

My only point is that stuff with this level of complexity is best left to the experts, and I'd suggest looking at HTML::Parser if you want to do it right, plus with HTML::Parser you could allow certain tags.

Reply With Quote
  #9  
Old July 21st, 2001, 04:55 PM
yhcmarc's Avatar
yhcmarc yhcmarc is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jul 2001
Location: Heemskerk, The Netherlands
Posts: 254 yhcmarc User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 53 m 49 sec
Reputation Power: 8
thanks, I'll go and try that then.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPerl Programming > strip certain html


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 5 hosted by Hostway