February 19th, 2013, 09:42 PM
Need help with HTML tags Regex
Ok so I have the following regex expression for identifying html tags:
However, this highlights everything :/ please assist asap. Thanks!
February 20th, 2013, 07:20 AM
So :%! and 123 are valid HTML tags? That would surprise me ...
You regex just consist of repeating a character class, so any (non-empty) combination of those characters is considered valid. That obviously makes no sense. HTML tags usually look like this:
But maybe you mean something different?
<input type="text" name="password" />
Note that processing HTML with regexes is a really, really bad idea 99% of the time -- although many people seem to love it. Contrary to popular belief, regexes are not an all-powerful parsing tool. They are in fact very limited and can only parse subsets HTML. So whenever you find yourself trying to parse HTML with regexes, step back and consider using a different approach. Every mainstream language has specialized HTML parsers for exactly that purpose.
February 20th, 2013, 11:45 AM
How would I modify this?
I am trying to identify all html tags that have characters, numbers, or symbols between them. how would i do that?
February 20th, 2013, 04:02 PM
As Jacques said, it is almost always a bad idea to use regexes to try to parse HTML (or XML, for that matter).
If you really want to go this way (which could possibly possibly be tolerated for extremely simple operations), you could try something like this:
which means an opening <, followed by a number of anything but a closing >, followed by a closing >.
This is simplistic, but at least it will not consider this:
as one single long tag starting with the opening < at the beginning of the line and the closing > at the end of the line above, but will be more or less able to match tags individually.
<center><b><font face="Verdana">Foo Bar </font></b></center>
However, this will break, for example, if the tag spans over more than one line or in many other circumstances. In brief, don't do that except possibly as a one-shot script for extremely simple substitutions.