December 9th, 2012, 04:37 PM
Advice on using regex or not
Hi, I'm trying to do a high-level parsing of Oracle SQL scripts to isolate single commands to execute, in an Oracle compilation plugin I'm writing for notepad++
This is a kind of script you usually can give to sqlplus for execution, with / or ; ending commands (usually ddl with / and dml with ; )
CREATE TABLE x (column_a data_type(length), ...);
ALTER TABLE y (column_a data_type)
CREATE OR REPLACE PACKAGE z IS ...
CREATE OR REPLACE PACKAGE BODY z AS ...
INSERT INTO TABLE z (column_a) VALUES ('123');
Now... I've this regex that correctly identifies each command with object owner, name and type, and I'm using it to check system tables for error after compilation:
but I need something more reliable to isolate each command, considering remarks and dml instructions inside packages and procedures, that are to be ignored.
Right now, I'm using regex only to identify each object and I parse the script looking for a single / on a line, and it's working, but I can't handle single dml commands as the first and last in the example above. I was wondering if I'm using the right tool or if maybe it's better to learn antlr and build a grammar.
December 13th, 2012, 08:06 PM
limits of what regexs can do
There are limits of what regex can do. I am pretty sure a single regex no matter how complex, cannot parse SQL. Language parsers tend to use more complex techniques in addition to REs.
I think you will have to use a procedural languages with calls to RE to match to specific patterns.