November 27th, 2013, 04:36 PM
How to supress greedy behavior?
My task is to extract the "select ... from" body from the SQL caluse like:
"Select Sum(OTTime) Into @iOT
Where HoursID in (Select "Hour ID" From Hours Where ID = @EmplID and
"Adjust Date" >= @StartDate and "Adjust Date" <= @EndDate);"
m3 = String.Concat(@"[_A-z0-9\.", "\"", @"\@\'\(\)\,\s\n\<\>\+\-\*\|]+\s?\n?");
works Ok but only if the querry is not nested i.e. there is no other "from" within it.
reg = new Regex(String.Concat(@"select\s", m3, @"from[\s\n]+"), RegexOptions.IgnoreCase);
If the querry is nested (it is shown above), it returns all stuff up to the last "from" occurance.
How to prevent that behavior and get only first "select ... from" subclause?
November 27th, 2013, 04:52 PM
.NET uses the same construct as most other languages: add a ? to the quantifier. + becomes +? and * becomes *?.
November 27th, 2013, 05:10 PM
you cannot use regexes for this.
I understand that regexes are very popular, because they're easy to understand, and it looks like they could parse just about everything. This is wrong. Regexes are very primitive grammars. They're fine for simple patterns like dates or something, but they completely fail at anything more complex.
SQL is a very complex language. It's not as simple as reading from one keyword to the next, because you can have all kinds of expressions at all kinds of places. Just a few examples which will break your regex, no matter if it's greedy or not:
(SELECT MAX(b) FROM c)
EXTRACT(YEAR FROM a)
Understanding those queries requires the parser to understand nested expressions. Regexes don't have that (or rather: only in a very limited sense).
Long story short: You need a real parser. I'm sure somebody has already written an SQL parser for C#.