October 7th, 2012, 03:12 PM
Match quoted strings excluding the quotes..?
Is it possible to write a simple regex that matches the contents of a quoted
string _excluding_ the quotes?
Consider the following text:
| This is a quoted string "string1", and this is another string: "string2"
Per the tutorial I am reading, the textbook regex that matches "string1", then
"string2" including the quotes would go something like:
| " : a double quote,
| followed by..
| [^"] : any character that is not a double quote,
| + : 1 to n times,
| followed by..
| " : a double quote
I tried using zero-length matches for the opening/closing quotes, but with the
above sample text, I ended up with something that also matches:
| ',and this is another string: '
Which is clearly not what I want..
My (limited) understanding of what is going on is that excluding the closing
quote from the match by specifying a zero-length match causes the regex engine
to start at/before the closing quote's location when looking for the next match.
I can think of something clumsy that would involve doing my zero-length match on
(1) start-of-text followed by 1 to n 'non-quotes', or
(2) a double-quoted string followed by 1 to n 'non-quotes',
... followed in both cases by my target string's opening quote..
Looks like this strategy might work, but I'm wondering if anyone knew of
a simple/obvious solution to this.. something that would force the regex to
consume the zero-length matched closing quote, before it starts looking for the
next match, perhaps..?
Not really concerned about implementation at this point.. I believe the tutorial
I'm working with uses a perl-compatible syntax.
October 7th, 2012, 04:42 PM
Try this and see if it is what you want:
echo 'This is a quoted string "string1", and this is another string: "string2"' | perl -wnE 'say for /"([0-9a-zA-Z]*)"/g'
October 7th, 2012, 07:38 PM
That was quick..!
Originally Posted by spacebar208
I made it more general to (hopefully) include special characters, etc. like so:
| % echo 'ascii string: "string_1", unicode string: "κορδόνι"' | perl -wnE 'say for /"([^"]*)"/g
I then I noticed that if I remove the parentheses, I get the following:
| % echo 'ascii string: "string_1", unicode string: "κορδόνι"' | perl -wnE 'say for /"[^"]*"/g
So it's the parentheses that do the trick..!
I'm not familiar with perl's intricacies, but perhaps you could direct me to the
perl doc that describes this feature..?
Thanks for help..!
October 8th, 2012, 01:33 AM
October 8th, 2012, 02:43 PM
"capturing group" is exactly what I was looking for..
Thanks again for help.