Regex Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming Languages - MoreRegex Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old September 19th, 2008, 10:01 AM
barefeet barefeet is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2008
Posts: 2 barefeet User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 28 m 28 sec
Reputation Power: 0
Breaking text into matching chunks

Hi all,

I'm trying to break up a string into quoted strings, commented strings, brackets (just the character) and other strings. It happens to be SQL, but it's not SQL specific.

For example, if I have:

create view "My View Name" -- this is where I define the name
as select
myFirstColumn || (select me from mine)
, ' a static literal' as mySecondColumn /* weird */

Then I want to break it up into successive chunks:

1: create view
2: "My View Name"
3: -- this is where I define the name\n
4: as select\n myFirstColumn ||
5: (
6: select me from mine
7: )\n ,
8: ' a static literal'
6: as mySecondColumn
9: /* weird */

I tried this regex:

m/(?xism)
( # one of the following
".*?"|'.*?'|\[.*?\] # quoted
| --.*?(?:\r\n|\n|\r)|/[*].*?[*]/ # comments
| \(|\) # brackets
| .+? # other
)
/g

but it's not quite right. Any ideas?

Thanks,
Tom
BareFeet

Reply With Quote
  #2  
Old September 20th, 2008, 06:33 PM
requinix's Avatar
requinix requinix is offline
Still alive
Click here for more information.
 
Join Date: Mar 2007
Location: Washington, USA
Posts: 12,717 requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)  Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 5 Months 1 Week 4 Days 7 h 21 m 39 sec
Reputation Power: 8969
Send a message via AIM to requinix Send a message via MSN to requinix Send a message via Yahoo to requinix Send a message via Google Talk to requinix
Having had time to think about it, I don't think I'd use a regular expression for this.

What programming language are you working with?

Reply With Quote
  #3  
Old September 22nd, 2008, 09:40 AM
ManiacDan's Avatar
ManiacDan ManiacDan is offline
Likely to be eaten by a grue.
Dev Shed God 10th Plane (9500 - 9999 posts)
 
Join Date: Oct 2006
Location: Pennsylvania, USA
Posts: 9,811 ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)ManiacDan User rank is General 77th Grade (Above 100000 Reputation Level)  Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1Folding Points: 127430 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 2 Months 3 Weeks 19 h 13 m 52 sec
Reputation Power: 6112
I agree, I started working on this Friday and didn't get anywhere. He appears to be using Perl. I would walk through the string one character at a time and set yourself flags for things like comments and quoted strings.

-Dan
__________________
HEY! YOU! Read the New User Guide and Forum Rules

"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

"The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.

Reply With Quote
  #4  
Old September 22nd, 2008, 03:16 PM
requinix's Avatar
requinix requinix is offline
Still alive
Click here for more information.
 
Join Date: Mar 2007
Location: Washington, USA
Posts: 12,717 requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)  Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 5 Months 1 Week 4 Days 7 h 21 m 39 sec
Reputation Power: 8969
Send a message via AIM to requinix Send a message via MSN to requinix Send a message via Yahoo to requinix Send a message via Google Talk to requinix
Maybe you could use a regex. Multiple, actually.

One generic will read a string until it reaches a delimiter: a quote, --, (, ), or /*.
Code:
/(.*?)(['"()]|--|/\*)/s

Depending on that second group, you use another regex for the next part:
Code:
'  -> /(.*?)'(?<!\\')/s
"  -> /(.*?)"(?<!\\")/s
-- -> /(.*)/
/* -> ~(.*?)\*/~s

( doesn't have one (it's just the character) and ) uses the generic.

Reply With Quote
  #5  
Old September 30th, 2008, 11:51 PM
barefeet barefeet is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2008
Posts: 2 barefeet User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 28 m 28 sec
Reputation Power: 0
Lightbulb

Quote:
Originally Posted by requinix
Having had time to think about it, I don't think I'd use a regular expression for this.


I managed to do it by just modifying my regular expression to include a lookahead after the "other" grouping for any known groups. It basically says "find on of the following up to just before the next occurrence of another match)

m/(?xis)
( # one of the following
".*?"|'.*?'|\[.*?\] # quoted
| --.*?(?:\r\n|\n|\r) # comments at end of line
| /[*].*?[*]/ # comments in line
| \(|\) # brackets
| .+? # other
(?= # look ahead for one of the following
".*?"|'.*?'|\[.*?\] # quoted
| --.*?(?:\r\n|\n|\r) # comments at end of line
| /[*].*?[*]/ # comments in line
| \(|\) # brackets
)
)/g

It gives me the result I needed:

1: create view
2: "My View Name"
3: <space>
4: -- this is where I define the name\n
5: as select\n myFirstColumn ||
6: (
7: select me from mine
8: )
9: \n ,
10: ' a static literal'
11: as mySecondColumn
12: /* weird */

Obviously this RegEx is scanning some of the text twice, which is slightly inefficient. I thought that the /g option would be smart enough to interpret a .+? within the RegEx as meaning up to just before the next iteration of the grouping, but I guess not, so I have to include the lookahead to do this specifically.

And of course I can simplify the lookahead to only look for the starting characters of the quotes and comments, so the RegEx is:

m/(?xis)
( # one of the following
".*?"|'.*?'|\[.*?\] # quoted
| --.*?(?:\r\n|\n|\r) # comments at end of line
| /[*].*?[*]/ # comments in line
| \(|\) # brackets
| .+? # other
(?= # look ahead for one of the following
"|'|\[ # quoted
| -- # comments at end of line
| /[*] # comments in line
| \(|\) # brackets
)
)/g

Quote:
Originally Posted by requinix
What programming language are you working with?


I'm using AppleScript, currently calling perl to do the actual regex, but may move that function into a scripting addition that does regex, or even some Cocoa subroutine. So I'm after a generic PCRE RegEx that I can use in whatever environment.

Thanks,
Tom
BareFeet

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming Languages - MoreRegex Programming > Breaking text into matching chunks

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap