#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2018
    Posts
    5
    Rep Power
    0

    match words with parentheses


    Original String = <p>This is a test (<abbr title="TestTestTest">TTT</abbr> and <abbr title="HelpHelpHelp">HHH</abbr>)</p>
    Wanted String = <p>This is a test(TTT and HHH)</p>

    Can only be applied is abbr tags are within parentheses

    Currently searching this \(<abbr title="(.*?)">(.*?)<\/abbr>\) and replacing matches with $2

    This works, but only if there is just 1 instance of the abbr tag in the parentheses. If there are 2 or more, it messes up.

    Any help would be greatly appreciated!!
  2. #2
  3. Headless Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,931
    Rep Power
    9647
    Are you doing this through code? What language? Because the best solution to get rid of the ABBRs is to use something that can parse HTML, find the tags, and remove them.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2018
    Posts
    5
    Rep Power
    0
    Yes, through code. We have a small C# tool that applies ABBRs to HTML code using a database of acronyms. Once added, it removes any that have been added to acronyms that are in parentheses. I didn't build this tool and am simply trying to adjust the regex line to fix the issue when there are more than one ABBR within parentheses.
  6. #4
  7. Headless Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,931
    Rep Power
    9647
    But... if the problem is that ABBRs are being added to places they shouldn't, wouldn't that mean the proper solution would to be make it stop doing that?
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2018
    Posts
    5
    Rep Power
    0
    I guess I'm not trying to reinvent the tool. As I mentioned, this is something that was built by someone else. The process they chose to use was to add ABBRs everywhere and then remove any that may have been added in parentheses. They simply forgot to think about times where 2 or more were added between parentheses. Other than this function, the rest of the tool's functionality works perfectly. So If I could find a quick REGEX line that would address this issue, I'd be golden. Any help would be appreciated.
  10. #6
  11. Headless Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,931
    Rep Power
    9647
    Can you safely assume that parentheses are balanced? The normal tactic for this problem is to find the thing (ABBR) then make sure that somewhere following it is a closing parenthesis without an opening parenthesis in between.
    Code:
    #<abbr title="(.*?)">(.*?)</abbr>(?=[^(\r\n]*\))#
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2018
    Posts
    5
    Rep Power
    0
    Originally Posted by requinix
    Can you safely assume that parentheses are balanced? The normal tactic for this problem is to find the thing (ABBR) then make sure that somewhere following it is a closing parenthesis without an opening parenthesis in between.
    Code:
    #<abbr title="(.*?)">(.*?)</abbr>(?=[^(\r\n]*\))#
    Yes, there would be no opening parenthesis in between for sure.

IMN logo majestic logo threadwatch logo seochat tools logo