August 15th, 2012, 04:18 AM
Asp.net regex replace (VB)
Hi all. I'm trying to remove the <font face> tag but leave the data that is found within the tag intact. So far im using
this works well, but it doesnt remove the </font> tag. how can i also remove the closing tag? thanks
story = Regex.Replace(story, "</?(font face)[^>]*>", string.Empty, RegexOptions.IgnoreCase)
August 15th, 2012, 04:30 AM
Since you require "face" in the pattern, a pure "font" tag won't be found.
This won't really work, anyway, because you'd need to find the exact end tag. That's not possible with regular expressions (if the elements can be nested).
The clean solution to this would be to use a HTML parser and actually remove the elements. Otherwise you'll have to mix the regex with a kind of tag counter to find the matching end tag (which is really not that pretty).
August 15th, 2012, 04:34 AM
i see, what about leaving the font tags but simply removing anything found in "face" ?
edit... done using
<font face="arial">test</font> BECOMES.. <font>test</font>
story = Regex.Replace(story, "font face=[^>]*", "font", RegexOptions.IgnoreCase)
Last edited by nshack31; August 15th, 2012 at 04:49 AM.
August 15th, 2012, 06:20 AM
Well, then I don't get what you're trying to do.
I thought you wanted to remove the tags of every "font" element with a "face" attribute? If you remove the attributes, you can no longer distinguish between "normal" font elements and those with a "face" attribute.
August 15th, 2012, 06:42 AM
the font family is standardised in the css so i didnt want any <font face> tags in the body over-riding this, but i wanted to keep font size and color tags
Originally Posted by Jacques1
August 15th, 2012, 02:12 PM
Just add .replace("</font>", "") to the end of that line so its:
story = Regex.Replace(story, "</?(font face)[^>]*>", string.Empty, RegexOptions.IgnoreCase).replace("</font>", "")
Comments on this post
August 16th, 2012, 04:42 AM
You're confusing the words. "font" is the HTML element, and "face", "color" etc. are attributes of this element.
Originally Posted by nshack31
So if I understand you correctly now, you want to remove the "face" attribute from any "font" element. The cleanest way for this would really be to use a DOM parser. Alternatively, you could try this pattern:
(You might need to escape the double quotes)
And then concatenate both matching groups to build the replacement string.
However, this will only find the attribute syntax
It won't find face='...' or something. If you want that, too, the regular expression will become more and more complex.
Comments on this post
October 4th, 2012, 05:36 AM
sorry another question regarding something similar.
if i wanted to remove all instances of "font-family" e.g.
i can use the following:
but if the line was
story = Regex.Replace(story, "font-family[^>]*", """", RegexOptions.IgnoreCase)
it would become
i'd need to leave the font-size in tact and only remove the font-family attribute
Last edited by nshack31; October 4th, 2012 at 06:12 AM.