#1
  1. No Profile Picture
    Swimming in a fish bowl....
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2008
    Location
    Texas, Y'all!
    Posts
    133
    Rep Power
    17

    Stripping Only Certain HTML Tags (and contents)


    Ok, Ive been trying every way I can think of to do this and nothing is working right with PHP.

    I basically want to strip a set of HTML tags from a string while removing the content between those tags and compensating for case and spaces in the tags (such as < img src...>)

    strip_tags removes everything except for a whitelist. This is the opposite of what I need. I only want to strip a certain set of tags:

    a, img, script, meta, etc

    Things like this dont work (obviously)

    preg_replace('@<\s*(a|img|script|meta)\b.*?>.*?</\1>@si', '', $htmlstring);


    Any help? Please!
  2. #2
  3. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jul 2001
    Location
    England
    Posts
    967
    Rep Power
    14
    PHP Code:
    $htmlstring 'Hi there, < a href="http://www.google.com">What</a> do you want? < img src="someimage.jpg" alt="whatever" />

    <script type="text/javascript">Whatever</script>

    <meta name="keywords" />'
    ;

    $htmlstring preg_replace('!<\s*(a|img|script|meta).*?>((.*?)</\1>)?!is''\3'$htmlstring);
    echo 
    $htmlstring
    Something like that?
  4. #3
  5. No Profile Picture
    Swimming in a fish bowl....
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2008
    Location
    Texas, Y'all!
    Posts
    133
    Rep Power
    17
    Originally Posted by liljim
    PHP Code:
    $htmlstring 'Hi there, < a href="http://www.google.com">What</a> do you want? < img src="someimage.jpg" alt="whatever" />

    <script type="text/javascript">Whatever</script>

    <meta name="keywords" />'
    ;

    $htmlstring preg_replace('!<\s*(a|img|script|meta).*?>((.*?)</\1>)?!is''\3'$htmlstring);
    echo 
    $htmlstring
    Something like that?
    Wow! Thanks for the response. That's much closer than I have gotten.

    Two issues I see, though.

    It's not removing the content between the tags. For example, the anchor text remains when the A tag is stripped.

    It doesnt seem to be working on IMG tags - maybe because they dont have closing tags? I can remove the IMGs separately if that is the reason.
  6. #4
  7. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jul 2001
    Location
    England
    Posts
    967
    Rep Power
    14
    Content between the tags... Not sure what you mean there.... As in something like below would get totally stripped out?

    <badstuff>You want this removed?</badstuff>
  8. #5
  9. No Profile Picture
    Swimming in a fish bowl....
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2008
    Location
    Texas, Y'all!
    Posts
    133
    Rep Power
    17
    Originally Posted by liljim
    Content between the tags... Not sure what you mean there.... As in something like below would get totally stripped out?

    <badstuff>You want this removed?</badstuff>
    Yes! Exactly like that.
  10. #6
  11. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jul 2001
    Location
    England
    Posts
    967
    Rep Power
    14
    Just remove the \3 in preg_replace, so you're left with single quotes.
  12. #7
  13. No Profile Picture
    Swimming in a fish bowl....
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2008
    Location
    Texas, Y'all!
    Posts
    133
    Rep Power
    17
    Originally Posted by liljim
    Just remove the \3 in preg_replace, so you're left with single quotes.
    Thanks! That part I initially figured out, but there's something funky going on when I apply it to a large chunk of HTML.

    When I pass a simple "<a href='blah.php'>anchor text</a>" it works fine. But when I pass in an entire page of code, it leaves the anchor text behind. How odd is that? I'll look into it further.

    One thing I forgot to mention..how would I make this case-insenstive since there is no pregi_replace?

    Thanks much for you help!
  14. #8
  15. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jul 2001
    Location
    England
    Posts
    967
    Rep Power
    14
    It's already case-insensitive - the 'i' modifier, which is at the end of the expression in the first argument to preg_replace() takes care of that.

    Please post the 'code' you're having problems with, since otherwise, it's like peeing in the dark.

    Goodnight.
  16. #9
  17. No Profile Picture
    Swimming in a fish bowl....
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2008
    Location
    Texas, Y'all!
    Posts
    133
    Rep Power
    17
    Originally Posted by liljim
    It's already case-insensitive - the 'i' modifier, which is at the end of the expression in the first argument to preg_replace() takes care of that.

    Please post the 'code' you're having problems with, since otherwise, it's like peeing in the dark.

    Goodnight.
    Sorry, I was busy wipe'n up the floor in the bathroom...hehe

    All I'm doing to test is pasting in the source from this page:

    http://developer.yahoo.com/yui/calendar/
  18. #10
  19. No Profile Picture
    Swimming in a fish bowl....
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2008
    Location
    Texas, Y'all!
    Posts
    133
    Rep Power
    17
    Figured out the problem.

    I was using htmlspecialchars_decode instead of html_entity_decode

    So, I wasn't decoding all the encoded chars after the post/get. Duh......

    Thanks liljim!

IMN logo majestic logo threadwatch logo seochat tools logo