April 17th, 2013, 05:08 PM
-
Remove T-SQL comment
hello everybody,
I have a problem with searching for T-SQL comments. I've searching a lot but no solution could solve my problem.
I will use regex replace with to snip out SQL comments. But I don't want to search and replace within strings - therefore I will look for following things
1.) search for strings - they begin with ' followed by 0-n '' or other characters and ends with '
2.) look for -- everything behind this is a comment
3.) look for /* followed by 0-n characters ends with */
therefore I create this regexpattern
(?<string>'(?:''|[^'])*')+|--[^\r\n]*|/\*[^*]*\*+(?:[^/*][^*]*\*+)*/
but by point 3) there is following possibility
/* blabla
/* comment within comment */
blabla
*/
with pattern above it will find:
/* blabla
/* comment within comment */
Does anybody know a solution for this problem - is there a possibility in regex to search forward and after a match start from the first beginning?
Thanks a lot for your reply
April 17th, 2013, 06:01 PM
-
A regular expression isn't a suitable solution for something like this; you need an actual T-SQL lexer in order to guarantee that you handle the query correctly.
In addition to the problem you've already pointed out, consider how your definition of a string (1) would handle a query like this:
Code:
/* This is a comment that ends in a single quote ' */
some actual sql code
/* This is another commend that ends in a single quote ' */
So you would have to amend your definition of a string in order to not include quotes that are inside comments. However, since your definition of a comment is based on your definition of a string that becomes a bit of a problem...
In the end, you will continue to run into problems like that as long as you're trying to do this with a regular expression.
Comments on this post
PHP FAQ
Originally Posted by Spad
Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around
April 17th, 2013, 11:59 PM
-
Hello E-Oreo,
But the string isn't a problem in this szenario. It is handled ok. First it will find the /* and therefore it goes till to the end of the command include the '
I look for a pattern something like
Search for /* followed of any charcter include /* any charcter */ any character */
All /*...*/ have to be pairwise.
But they also could be nested like this:
/* beginn comment1
/* beginn comment2
/* beginn comment3 end */
End commente 1*/
There are so many regex pattern i found, but ....
Maybe there is a "nearly perfect" solution and a better one than mine.
End 3*/
April 18th, 2013, 04:32 AM
-
I agree with E-Oreo, a regular expression is not suitable for this type of problem. There are too many special cases. Trying to handle them with a regex would quickly become a nighmare.
You need a real parser.
April 18th, 2013, 05:06 AM
-
Shi*, but thanks a lot for quick anwser