
February 4th, 2003, 09:39 PM
|
|
Contributing User
|
|
Join Date: Sep 2001
Location: Pittsburgh PA USA
Posts: 105
  
Time spent in forums: 16 h 30 m 20 sec
Reputation Power: 8
|
|
A timely topic for me too.
My rss generating script (created by jpenn -- php forum) encountered an "&" and choked. I tried htmlentities() and htmlspecialchar() in the hope of sliding by, but no dice.
I'm assuming that we must create a mega preg_replace script like the one below except replacing with ascii?
PHP Code:
// $document should contain an HTML document.
// This will remove HTML tags, javascript sections
// and white space. It will also convert some
// common HTML entities to their text equivalent.
$search = array ("'<script[^>]*?>.*?</script>'si", // Strip out javascript
"'<[\/\!]*?[^<>]*?>'si", // Strip out html tags
"'([\r\n])[\s]+'", // Strip out white space
"'&(quot|#34);'i", // Replace html entities
"'&(amp|#38);'i",
"'&(lt|#60);'i",
"'&(gt|#62);'i",
"'&(nbsp|#160);'i",
"'&(iexcl|#161);'i",
"'&(cent|#162);'i",
"'&(pound|#163);'i",
"'&(copy|#169);'i",
"'&#(\d+);'e"); // evaluate as php
$replace = array ("",
"",
"\\1",
"\"",
"&",
"<",
">",
" ",
chr(161),
chr(162),
chr(163),
chr(169),
"chr(\\1)");
$text = preg_replace ($search, $replace, $document);
as is shown in the manual http://www.php.net/manual/en/function.preg-replace.php
or is there a more direct way?
Alex
|