#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2005
    Posts
    387
    Rep Power
    21

    Asp.net regex replace (VB)


    Hi all. I'm trying to remove the <font face> tag but leave the data that is found within the tag intact. So far im using

    PHP Code:
    story Regex.Replace(story"</?(font face)[^>]*>"string.Empty, RegexOptions.IgnoreCase
    this works well, but it doesnt remove the </font> tag. how can i also remove the closing tag? thanks
  2. #2
  3. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,957
    Rep Power
    1046
    Hi,

    Since you require "face" in the pattern, a pure "font" tag won't be found.

    This won't really work, anyway, because you'd need to find the exact end tag. That's not possible with regular expressions (if the elements can be nested).

    The clean solution to this would be to use a HTML parser and actually remove the elements. Otherwise you'll have to mix the regex with a kind of tag counter to find the matching end tag (which is really not that pretty).
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2005
    Posts
    387
    Rep Power
    21
    i see, what about leaving the font tags but simply removing anything found in "face" ?

    eg
    PHP Code:
    <font face="arial">test</fontBECOMES.. <font>test</font
    edit... done using
    PHP Code:
    story Regex.Replace(story"font face=[^>]*""font"RegexOptions.IgnoreCase
    Last edited by nshack31; August 15th, 2012 at 03:49 AM.
  6. #4
  7. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,957
    Rep Power
    1046
    Well, then I don't get what you're trying to do.

    I thought you wanted to remove the tags of every "font" element with a "face" attribute? If you remove the attributes, you can no longer distinguish between "normal" font elements and those with a "face" attribute.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2005
    Posts
    387
    Rep Power
    21
    Originally Posted by Jacques1
    Well, then I don't get what you're trying to do.

    I thought you wanted to remove the tags of every "font" element with a "face" attribute? If you remove the attributes, you can no longer distinguish between "normal" font elements and those with a "face" attribute.
    the font family is standardised in the css so i didnt want any <font face> tags in the body over-riding this, but i wanted to keep font size and color tags
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2011
    Posts
    5
    Rep Power
    0
    Just add .replace("</font>", "") to the end of that line so its:
    Code:
     story = Regex.Replace(story, "</?(font face)[^>]*>", string.Empty, RegexOptions.IgnoreCase).replace("</font>", "")
    Good?

    Comments on this post

    • delboy31 agrees : that works too, thanks
  12. #7
  13. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,957
    Rep Power
    1046
    Originally Posted by nshack31
    the font family is standardised in the css so i didnt want any <font face> tags in the body over-riding this, but i wanted to keep font size and color tags
    You're confusing the words. "font" is the HTML element, and "face", "color" etc. are attributes of this element.

    So if I understand you correctly now, you want to remove the "face" attribute from any "font" element. The cleanest way for this would really be to use a DOM parser. Alternatively, you could try this pattern:

    (<font[^>]+)face="[^"]"([^>]*>)

    (You might need to escape the double quotes)

    And then concatenate both matching groups to build the replacement string.

    However, this will only find the attribute syntax
    face="..."
    It won't find face='...' or something. If you want that, too, the regular expression will become more and more complex.

    Comments on this post

    • delboy31 agrees : thanks!
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2005
    Posts
    387
    Rep Power
    21
    sorry another question regarding something similar.

    if i wanted to remove all instances of "font-family" e.g.

    font-family:'Calibri','sans-serif';

    i can use the following:

    PHP Code:
    story Regex.Replace(story"font-family[^>]*"""""RegexOptions.IgnoreCase
    but if the line was

    PHP Code:
     line-height:115%;font-family:'Calibri','sans-serif';font-size:12pt 
    it would become

    PHP Code:
    line-height:115%; 
    i'd need to leave the font-size in tact and only remove the font-family attribute
    Last edited by nshack31; October 4th, 2012 at 05:12 AM.

IMN logo majestic logo threadwatch logo seochat tools logo