#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2016
    Posts
    4
    Rep Power
    0

    check some conditions in the regular expression


    Hi,

    I want to make a regular expression that is matched for conditions that don't contain any 'andmark' string.
    But it's complicated. Because another character (double quotation) must be there.
    And whole string must be inside single quotations.
    Every character can be there too; except this two: }{
    Outside of single quotations is free of any condition.

    Code:
    pattern= r'\'([^"^\'\}\{]*(?!andmark)[^"\}\{]*)"([^\'\}\{]*(?!andmark)[^\'\}\{]*)\'([^".]*)'
    This pattern does not work for this condition:
    'Succeeded by': {'William Drayton' andmark "Anthony Butler (as Chargé d'affaires)" andmark 'John Bell'}
    But it is correct for this two:
    'Spouse(s)': {'Greta Elizabeth "Lizzie" Wallace (née McMahon)'}
    'Height': {'5\'9"'}
    It's important that I check if this pattern matches with the string, I replace it with another one (so it can find all occurrences of the condition) and then I check pattern again, until it does not match.
    But the pattern is not matched with that string at all.

    What I have to do?
    Thank you
  2. #2
  3. Lazy Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,333
    Rep Power
    9645
    I'm lost. It doesn't sound like you simply want to check if a given input does not contain "andmark" somewhere within it. Why are the quotes and }{ relevant?
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2016
    Posts
    4
    Rep Power
    0
    Originally Posted by requinix
    I'm lost. It doesn't sound like you simply want to check if a given input does not contain "andmark" somewhere within it. Why are the quotes and }{ relevant?
    Because I have a JSON-like string that the value of any key is in the ‘{}‘ and every item in that maybe in single quotation or double quotation.
    And I want to access every double quotation in each item and change that to an equal string (like: 'twoquotation').
    I am sure if double quotation is in two single quotation, it is a true matching if '{' or '}' is not between them. Because it is possible to choose one of single quotations from another value incorrectly.
    And ‘andmark’ must not be there too. Because it’s possible to one of the single quotations from another item incorrectly.

    {x:{‘hi' andmark "flower" andmark ‘Johnney”box’}, x22:{‘hi22' andmark "flower22" andmark ‘Johnney”box22’}}
    ‘hi'
    "flower"
    ‘Johnney”box’
    I hope you understood my point.
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2016
    Posts
    4
    Rep Power
    0
    ‘Johnney”box’ is a target. because a double quotation is in an item correctly.
  8. #5
  9. Lazy Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,333
    Rep Power
    9645
    Does the "andmark" actually matter? It looks like you're just trying to capture everything within each set of matching quotes. So
    Code:
    (["']).*?(\1)
    https://regex101.com/r/cNoDIw/1

    Why does your example not match hi22, flower22, and Johnney"box22 as well as the first three?
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2016
    Posts
    4
    Rep Power
    0
    Thank you for your help. I was really useful.
    I could improve my pattern by yours. Because your pattern did not customize for items and returns all things in single quotations, I use ‘}’ , ‘andmark’ keywords to control strings. So:
    Code:
    {'5\'9"'...Lost, Erik Arakawa'}
    It is correctly return. Instead of:
    Code:
    '5\'
    For this example:
    Code:
    {'Born': {'(1991-09-05) September 5, 1991 (age\xa025)\nSebastian, Florida, U.S.'}, 'Weight': {'185 lbs'}, 'Favorite maneuvers': {'Airs and “barrels'}, 'Height': {'5\'9"'...Lost, Erik Arakawa'}, 'there is table': {'true'}, 'Website': {'http://islandtocity.com/' andmark  'hello'}, 'exceptional_header_count': {'0'}}
    This is new pattern:
    Code:
    (['])([^{}]*?)(\1( andmark |}))
    Now I have to find just cases of them that have at least one double quotation in themselves. Like:
    Code:
    'Airs and "barrels'}
  12. #7
  13. Lazy Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    16,333
    Rep Power
    9645
    Okay, let's revise this a little bit further.

    With your example I see 8 separate values:
    Code:
    {'(1991-09-05) September 5, 1991 (age\xa025)\nSebastian, Florida, U.S.'}
    {'185 lbs'}
    {'Airs and “barrels'}
    {'5\'9"'...Lost, Erik Arakawa'}
    {'true'}
    {'http://islandtocity.com/' andmark
    andmark  'hello'}
    {'0'}
    and you only want what's inside the quotes, right?

    If so, each of the values follows a pattern:
    1. '{' or 'andmark' + whitespace, followed by a quote (of either kind?)
    2. Content that may or may not include more of those quotes
    3. The same quote from before, followed by '}' or whitespace + 'andmark'

    You can express that as
    Code:
    (?:(?<=\{)|(?<=andmark)\s+)(['"])((?>[^'"]+)\1?)\1(?=\s+andmark|\})
    (which uses ?: to not capture that part, ?> for performance, ?<= to do a non-capturing lookbehind, and ?= to do a non-capturing lookahead)

    https://www.regex101.com/r/Fzi2UN/1

    The "hello" match includes the leading whitespace - I can't think of a way to avoid that, but you can use the second capturing group to get just the value between the quotes.

IMN logo majestic logo threadwatch logo seochat tools logo