August 15th, 2012, 03:18 AM
-
Asp.net regex replace (VB)
Hi all. I'm trying to remove the <font face> tag but leave the data that is found within the tag intact. So far im using
PHP Code:
story = Regex.Replace(story, "</?(font face)[^>]*>", string.Empty, RegexOptions.IgnoreCase)
this works well, but it doesnt remove the </font> tag. how can i also remove the closing tag? thanks
August 15th, 2012, 03:30 AM
-
Hi,
Since you require "face" in the pattern, a pure "font" tag won't be found.
This won't really work, anyway, because you'd need to find the exact end tag. That's not possible with regular expressions (if the elements can be nested).
The clean solution to this would be to use a HTML parser and actually remove the elements. Otherwise you'll have to mix the regex with a kind of tag counter to find the matching end tag (which is really not that pretty).
August 15th, 2012, 03:34 AM
-
i see, what about leaving the font tags but simply removing anything found in "face" ?
eg
PHP Code:
<font face="arial">test</font> BECOMES.. <font>test</font>
edit... done using
PHP Code:
story = Regex.Replace(story, "font face=[^>]*", "font", RegexOptions.IgnoreCase)
Last edited by nshack31; August 15th, 2012 at 03:49 AM.
August 15th, 2012, 05:20 AM
-
Well, then I don't get what you're trying to do.
I thought you wanted to remove the tags of every "font" element with a "face" attribute? If you remove the attributes, you can no longer distinguish between "normal" font elements and those with a "face" attribute.
August 15th, 2012, 05:42 AM
-
Originally Posted by Jacques1
Well, then I don't get what you're trying to do.
I thought you wanted to remove the tags of every "font" element with a "face" attribute? If you remove the attributes, you can no longer distinguish between "normal" font elements and those with a "face" attribute.
the font family is standardised in the css so i didnt want any <font face> tags in the body over-riding this, but i wanted to keep font size and color tags
August 15th, 2012, 01:12 PM
-
Just add .replace("</font>", "") to the end of that line so its:
Code:
story = Regex.Replace(story, "</?(font face)[^>]*>", string.Empty, RegexOptions.IgnoreCase).replace("</font>", "")
Good?
Comments on this post
August 16th, 2012, 03:42 AM
-
Originally Posted by nshack31
the font family is standardised in the css so i didnt want any <font face> tags in the body over-riding this, but i wanted to keep font size and color tags
You're confusing the words. "font" is the HTML element, and "face", "color" etc. are attributes of this element.
So if I understand you correctly now, you want to remove the "face" attribute from any "font" element. The cleanest way for this would really be to use a DOM parser. Alternatively, you could try this pattern:
(<font[^>]+)face="[^"]"([^>]*>)
(You might need to escape the double quotes)
And then concatenate both matching groups to build the replacement string.
However, this will only find the attribute syntax
face="..."
It won't find face='...' or something. If you want that, too, the regular expression will become more and more complex.
Comments on this post
October 4th, 2012, 04:36 AM
-
sorry another question regarding something similar.
if i wanted to remove all instances of "font-family" e.g.
font-family:'Calibri','sans-serif';
i can use the following:
PHP Code:
story = Regex.Replace(story, "font-family[^>]*", """", RegexOptions.IgnoreCase)
but if the line was
PHP Code:
line-height:115%;font-family:'Calibri','sans-serif';font-size:12pt
it would become
PHP Code:
line-height:115%;
i'd need to leave the font-size in tact and only remove the font-family attribute
Last edited by nshack31; October 4th, 2012 at 05:12 AM.