#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    3
    Rep Power
    0

    How to supress greedy behavior?


    Hi there
    My task is to extract the "select ... from" body from the SQL caluse like:

    "Select Sum(OTTime) Into @iOT
    From HoursOT
    Where HoursID in (Select "Hour ID" From Hours Where ID = @EmplID and
    "Adjust Date" >= @StartDate and "Adjust Date" <= @EndDate);"

    The pattern

    Code:
    m3 = String.Concat(@"[_A-z0-9\.", "\"", @"\@\'\(\)\,\s\n\<\>\+\-\*\|]+\s?\n?");
    Code:
    reg = new Regex(String.Concat(@"select\s", m3, @"from[\s\n]+"), RegexOptions.IgnoreCase);
    works Ok but only if the querry is not nested i.e. there is no other "from" within it.

    If the querry is nested (it is shown above), it returns all stuff up to the last "from" occurance.
    How to prevent that behavior and get only first "select ... from" subclause?
    Thanks.
  2. #2
  3. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,068
    Rep Power
    9398
    .NET uses the same construct as most other languages: add a ? to the quantifier. + becomes +? and * becomes *?.
  4. #3
  5. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,959
    Rep Power
    1014
    Hi,

    you cannot use regexes for this.

    I understand that regexes are very popular, because they're easy to understand, and it looks like they could parse just about everything. This is wrong. Regexes are very primitive grammars. They're fine for simple patterns like dates or something, but they completely fail at anything more complex.

    SQL is a very complex language. It's not as simple as reading from one keyword to the next, because you can have all kinds of expressions at all kinds of places. Just a few examples which will break your regex, no matter if it's greedy or not:

    sql Code:
    SELECT
    	a,
    	(SELECT MAX(b) FROM c)
    FROM
    	(
    		SELECT
    			d
    		FROM
    			e
    	) f
    ;

    sql Code:
    SELECT
    	EXTRACT(YEAR FROM a)
    FROM
    	(
    		SELECT
    			b
    		FROM
    			c
    	) d
    ;

    Understanding those queries requires the parser to understand nested expressions. Regexes don't have that (or rather: only in a very limited sense).

    Long story short: You need a real parser. I'm sure somebody has already written an SQL parser for C#.
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".

IMN logo majestic logo threadwatch logo seochat tools logo