#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    3
    Rep Power
    0

    Match quoted strings excluding the quotes..?


    Is it possible to write a simple regex that matches the contents of a quoted
    string _excluding_ the quotes?

    Consider the following text:

    | This is a quoted string "string1", and this is another string: "string2"

    Per the tutorial I am reading, the textbook regex that matches "string1", then
    "string2" including the quotes would go something like:

    | "[^"]+"

    Match:

    | " : a double quote,
    |
    | followed by..
    |
    | [^"] : any character that is not a double quote,
    | + : 1 to n times,
    |
    | followed by..
    |
    | " : a double quote

    I tried using zero-length matches for the opening/closing quotes, but with the
    above sample text, I ended up with something that also matches:

    | ',and this is another string: '

    Which is clearly not what I want..

    My (limited) understanding of what is going on is that excluding the closing
    quote from the match by specifying a zero-length match causes the regex engine
    to start at/before the closing quote's location when looking for the next match.

    I can think of something clumsy that would involve doing my zero-length match on
    either:

    (1) start-of-text followed by 1 to n 'non-quotes', or

    (2) a double-quoted string followed by 1 to n 'non-quotes',

    ... followed in both cases by my target string's opening quote..

    Looks like this strategy might work, but I'm wondering if anyone knew of
    a simple/obvious solution to this.. something that would force the regex to
    consume the zero-length matched closing quote, before it starts looking for the
    next match, perhaps..?

    Not really concerned about implementation at this point.. I believe the tutorial
    I'm working with uses a perl-compatible syntax.

    Thanks..!
  2. #2
  3. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2012
    Location
    spaceBAR Central
    Posts
    225
    Rep Power
    41
    Try this and see if it is what you want:
    Code:
    echo 'This is a quoted string "string1", and this is another string: "string2"' | perl -wnE 'say for /"([0-9a-zA-Z]*)"/g'
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    3
    Rep Power
    0
    Originally Posted by spacebar208
    Try this and see if it is what you want:
    Code:
    echo 'This is a quoted string "string1", and this is another string: "string2"' | perl -wnE 'say for /"([0-9a-zA-Z]*)"/g'
    That was quick..!

    I made it more general to (hopefully) include special characters, etc. like so:

    | % echo 'ascii string: "string_1", unicode string: "κορδόνι"' | perl -wnE 'say for /"([^"]*)"/g
    | string1
    | κορδόνι

    I then I noticed that if I remove the parentheses, I get the following:

    | % echo 'ascii string: "string_1", unicode string: "κορδόνι"' | perl -wnE 'say for /"[^"]*"/g
    | "string1"
    | "κορδόνι"

    So it's the parentheses that do the trick..!

    I'm not familiar with perl's intricacies, but perhaps you could direct me to the
    perl doc that describes this feature..?

    Thanks for help..!
  6. #4
  7. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2012
    Location
    spaceBAR Central
    Posts
    225
    Rep Power
    41
    It's called a "capturing group", Just google 'Perl Regular Expressions", This is just one link you can look at:
    http://www.tutorialspoint.com/perl/p...expression.htm
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    3
    Rep Power
    0
    "capturing group" is exactly what I was looking for..

    Thanks again for help.

IMN logo majestic logo threadwatch logo seochat tools logo