
June 12th, 2000, 08:11 AM
|
|
Contributing User
|
|
Join Date: Apr 1999
Posts: 114
Time spent in forums: 42 m 44 sec
Reputation Power: 15
|
|
|
I doubt you could do this with a single regular expression. If the page contains JavaScript, for example, there might be many < characters that would throw your regexp off.:
<script language='javascript'>
if (x < 10 && y > 20) { ...
In this case, you would get < 10 && y > as an HTML tag. Similarly, if the content of the page contained any < characters, you'd get thrown.
You might want to write a loop that checks the file one character at a time. When it hits a <script or <SCRIPT, make it skip ahead until it gets a </script or </SCRIPT (unless of course, you also want to capture any HTML tags embedded in the JavaScript. Then you'll have to come up with similar rules to handle <s that might be in the page content.
|