August 14th, 2013, 08:48 AM
Optional character in look-around
I'm not a programmer, nor will I ever be. Still, I need to learn enough of regex to solve some issues I'm working on. I have learned about optional characters and look-around. Now I need to combine these but I'm running into problems and I can't figure it out. I'm sure an experienced programmer could help me along in a minute....?
I'm testing with regex powertoy: (java regex syntax (match only))
(can't post the url though, but it is a link I found somewhere on these forums)
My target text is:
wordleft STRING. Wordright
I need to match:
And all is well...
Until I found out that in some case my regex did not return anything. Turns out that the . (dot) is sometimes omitted in my target text.
So I changed my regex to:
In my target text without the . this results in: STRING (hurray)
In my target text with the . this results in: STRING. (booo)
I need to match STRING whether the . is there or not.
So I need the . to be optional in my look-around, but when it is there, I don't want to match it...
Now I tried all sorts of quantifiers but without any luck. What did I miss? Any help would be appreciated!
August 14th, 2013, 09:04 AM
this is a typical problem, and it's also the reason why your regex is extremely inefficient.
The dot matches any character, and the pattern .+ swallows the whole string until the very end (or at least until the end of the line).
So what happens is that you first read in the whole string. Then the regex realizes you want a string after this, so it reduces the matched string and tries again. The lookahead still doesn't match, so it again recuces the match and tries again. This trial-and-error goes on until finally the regex has arrived just before the "Wordright". The regex never matches the optional dot, because it can already stop before that.
This is obviously extremely inefficient and not what you want. As a rule of thumb: Never use the dot unless you really, truly know what you're doing.
In this case, you can fix the problem by replacing the greedy + quantifier (which reads as much as it can) with the non-greedy +? quantifier (which reads as little as it can):
Comments on this post
August 14th, 2013, 09:26 AM
thank you for the tip. It is frustrating when you have the idea that you are close to a solution but can't find the glitch.. I see now that I was trying to fix my boundary, while I needed to fix my match...
I did realize that .+ is 'all consuming'. At this point, STRING can be literally anything so I really need to use the .
I now see that .+? changes from 'get everything you can' to 'get as little as you can'.
Thanks a bunch
August 14th, 2013, 09:35 AM
I still have a question though....
matches: STRING in
wordleft STRING. Wordright
But does not match anything in
wordleft STRING Wordright
Can I match STRING in both targets with one regex?
will do just that. Sorry for the extra post...