Regex Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming Languages - MoreRegex Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old November 22nd, 2008, 07:36 PM
Ahhk Ahhk is offline
Swimming in a fish bowl....
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2008
Location: Texas, Y'all!
Posts: 133 Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 1 Day 35 m 49 sec
Reputation Power: 15
Stripping Only Certain HTML Tags (and contents)

Ok, Ive been trying every way I can think of to do this and nothing is working right with PHP.

I basically want to strip a set of HTML tags from a string while removing the content between those tags and compensating for case and spaces in the tags (such as < img src...>)

strip_tags removes everything except for a whitelist. This is the opposite of what I need. I only want to strip a certain set of tags:

a, img, script, meta, etc

Things like this dont work (obviously)

preg_replace('@<\s*(a|img|script|meta)\b.*?>.*?</\1>@si', '', $htmlstring);


Any help? Please!

Reply With Quote
  #2  
Old November 22nd, 2008, 07:52 PM
liljim's Avatar
liljim liljim is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: Jul 2001
Location: England
Posts: 967 liljim User rank is Lance Corporal (50 - 100 Reputation Level)liljim User rank is Lance Corporal (50 - 100 Reputation Level)liljim User rank is Lance Corporal (50 - 100 Reputation Level) 
Time spent in forums: 20 h 32 m 5 sec
Reputation Power: 12
PHP Code:
 $htmlstring 'Hi there, < a href="http://www.google.com">What</a> do you want? < img src="someimage.jpg" alt="whatever" />

<script type="text/javascript">Whatever</script>

<meta name="keywords" />'
;

$htmlstring preg_replace('!<\s*(a|img|script|meta).*?>((.*?)</\1>)?!is''\3'$htmlstring);
echo 
$htmlstring


Something like that?

Reply With Quote
  #3  
Old November 22nd, 2008, 08:07 PM
Ahhk Ahhk is offline
Swimming in a fish bowl....
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2008
Location: Texas, Y'all!
Posts: 133 Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 1 Day 35 m 49 sec
Reputation Power: 15
Quote:
Originally Posted by liljim
PHP Code:
 $htmlstring 'Hi there, < a href="http://www.google.com">What</a> do you want? < img src="someimage.jpg" alt="whatever" />

<script type="text/javascript">Whatever</script>

<meta name="keywords" />'
;

$htmlstring preg_replace('!<\s*(a|img|script|meta).*?>((.*?)</\1>)?!is''\3'$htmlstring);
echo 
$htmlstring


Something like that?


Wow! Thanks for the response. That's much closer than I have gotten.

Two issues I see, though.

It's not removing the content between the tags. For example, the anchor text remains when the A tag is stripped.

It doesnt seem to be working on IMG tags - maybe because they dont have closing tags? I can remove the IMGs separately if that is the reason.

Reply With Quote
  #4  
Old November 22nd, 2008, 08:52 PM
liljim's Avatar
liljim liljim is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: Jul 2001
Location: England
Posts: 967 liljim User rank is Lance Corporal (50 - 100 Reputation Level)liljim User rank is Lance Corporal (50 - 100 Reputation Level)liljim User rank is Lance Corporal (50 - 100 Reputation Level) 
Time spent in forums: 20 h 32 m 5 sec
Reputation Power: 12
Content between the tags... Not sure what you mean there.... As in something like below would get totally stripped out?

<badstuff>You want this removed?</badstuff>

Reply With Quote
  #5  
Old November 22nd, 2008, 08:54 PM
Ahhk Ahhk is offline
Swimming in a fish bowl....
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2008
Location: Texas, Y'all!
Posts: 133 Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 1 Day 35 m 49 sec
Reputation Power: 15
Quote:
Originally Posted by liljim
Content between the tags... Not sure what you mean there.... As in something like below would get totally stripped out?

<badstuff>You want this removed?</badstuff>


Yes! Exactly like that.

Reply With Quote
  #6  
Old November 22nd, 2008, 09:14 PM
liljim's Avatar
liljim liljim is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: Jul 2001
Location: England
Posts: 967 liljim User rank is Lance Corporal (50 - 100 Reputation Level)liljim User rank is Lance Corporal (50 - 100 Reputation Level)liljim User rank is Lance Corporal (50 - 100 Reputation Level) 
Time spent in forums: 20 h 32 m 5 sec
Reputation Power: 12
Just remove the \3 in preg_replace, so you're left with single quotes.

Reply With Quote
  #7  
Old November 22nd, 2008, 10:06 PM
Ahhk Ahhk is offline
Swimming in a fish bowl....
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2008
Location: Texas, Y'all!
Posts: 133 Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 1 Day 35 m 49 sec
Reputation Power: 15
Quote:
Originally Posted by liljim
Just remove the \3 in preg_replace, so you're left with single quotes.


Thanks! That part I initially figured out, but there's something funky going on when I apply it to a large chunk of HTML.

When I pass a simple "<a href='blah.php'>anchor text</a>" it works fine. But when I pass in an entire page of code, it leaves the anchor text behind. How odd is that? I'll look into it further.

One thing I forgot to mention..how would I make this case-insenstive since there is no pregi_replace?

Thanks much for you help!

Reply With Quote
  #8  
Old November 22nd, 2008, 10:23 PM
liljim's Avatar
liljim liljim is offline
Contributing User
Dev Shed Novice (500 - 999 posts)
 
Join Date: Jul 2001
Location: England
Posts: 967 liljim User rank is Lance Corporal (50 - 100 Reputation Level)liljim User rank is Lance Corporal (50 - 100 Reputation Level)liljim User rank is Lance Corporal (50 - 100 Reputation Level) 
Time spent in forums: 20 h 32 m 5 sec
Reputation Power: 12
It's already case-insensitive - the 'i' modifier, which is at the end of the expression in the first argument to preg_replace() takes care of that.

Please post the 'code' you're having problems with, since otherwise, it's like peeing in the dark.

Goodnight.

Reply With Quote
  #9  
Old November 22nd, 2008, 11:05 PM
Ahhk Ahhk is offline
Swimming in a fish bowl....
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2008
Location: Texas, Y'all!
Posts: 133 Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 1 Day 35 m 49 sec
Reputation Power: 15
Quote:
Originally Posted by liljim
It's already case-insensitive - the 'i' modifier, which is at the end of the expression in the first argument to preg_replace() takes care of that.

Please post the 'code' you're having problems with, since otherwise, it's like peeing in the dark.

Goodnight.


Sorry, I was busy wipe'n up the floor in the bathroom...hehe

All I'm doing to test is pasting in the source from this page:

http://developer.yahoo.com/yui/calendar/

Reply With Quote
  #10  
Old November 23rd, 2008, 08:39 PM
Ahhk Ahhk is offline
Swimming in a fish bowl....
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2008
Location: Texas, Y'all!
Posts: 133 Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level)Ahhk User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 1 Day 35 m 49 sec
Reputation Power: 15
Figured out the problem.

I was using htmlspecialchars_decode instead of html_entity_decode

So, I wasn't decoding all the encoded chars after the post/get. Duh......

Thanks liljim!

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming Languages - MoreRegex Programming > Stripping Only Certain HTML Tags (and contents)

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap