#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Posts
    1
    Rep Power
    0

    Advice on using regex or not


    Hi, I'm trying to do a high-level parsing of Oracle SQL scripts to isolate single commands to execute, in an Oracle compilation plugin I'm writing for notepad++

    E.g.
    Code:
    CREATE TABLE x (column_a data_type(length), ...);
    
    ALTER TABLE y (column_a data_type)
    /
    
    CREATE OR REPLACE PACKAGE z IS ...
       ...
    END;
    /
    
    CREATE OR REPLACE PACKAGE BODY z AS ...
       -- remark...
       -- END;
    END;
    /
    
    INSERT INTO TABLE z (column_a) VALUES ('123');
    This is a kind of script you usually can give to sqlplus for execution, with / or ; ending commands (usually ddl with / and dml with ; )

    Now... I've this regex that correctly identifies each command with object owner, name and type, and I'm using it to check system tables for error after compilation:

    Code:
    (CREATE(\s+OR\s+REPLACE)?|ALTER)\s+(?<object_type>PACKAGE(?!\s+BODY)|PACKAGEs+BODY|VIEW|PROCEDURE|FUNCTION|TABLE|VIEW|TRIGGER)\s+((?<owner_name>[^\s\.]+?)\.)*(?<object_name>\w+)(\s+\w+|\s+IS|\s+AS|\s*\()
    but I need something more reliable to isolate each command, considering remarks and dml instructions inside packages and procedures, that are to be ignored.

    Right now, I'm using regex only to identify each object and I parse the script looking for a single / on a line, and it's working, but I can't handle single dml commands as the first and last in the example above. I was wondering if I'm using the right tool or if maybe it's better to learn antlr and build a grammar.
  2. #2
  3. No Profile Picture
    Still Learning
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Location
    Montreal, Canada
    Posts
    55
    Rep Power
    39

    limits of what regexs can do


    There are limits of what regex can do. I am pretty sure a single regex no matter how complex, cannot parse SQL. Language parsers tend to use more complex techniques in addition to REs.

    I think you will have to use a procedural languages with calls to RE to match to specific patterns.

IMN logo majestic logo threadwatch logo seochat tools logo