Regex Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming Languages - MoreRegex Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old August 7th, 2012, 10:07 AM
Torsten79 Torsten79 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2012
Posts: 3 Torsten79 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 25 m 9 sec
Reputation Power: 0
Find a text, but not inside another one

Hi all,

I'm searching for a regex which will find a text, but this text should not be inside another text (comment).

e.g. searching string "<data>" inside the following comment sign (starting "<!---" till "--->") should not be found
Code:
<!---
this is a comment about my <data>.
but the <data> should not be found 
--->

but on the other hand in this case are two possible hits
Code:
<data> hits first here.
<!---
this is a comment about my <data>.
but the <data> should not be found 
--->
And here comes hit no. 2 for <data>


Thanks in advance for your helps.

Reply With Quote
  #2  
Old August 7th, 2012, 11:35 AM
requinix's Avatar
requinix requinix is offline
Still alive
Click here for more information.
 
Join Date: Mar 2007
Location: Washington, USA
Posts: 12,717 requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)  Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 5 Months 1 Week 4 Days 7 h 29 m 55 sec
Reputation Power: 8969
Send a message via AIM to requinix Send a message via MSN to requinix Send a message via Yahoo to requinix Send a message via Google Talk to requinix
If that's XML then you should be using XML methods, not regular expressions. Exactly how depends on what language(s) you're using.

Reply With Quote
  #3  
Old August 8th, 2012, 01:55 AM
Torsten79 Torsten79 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2012
Posts: 3 Torsten79 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 25 m 9 sec
Reputation Power: 0
Hi requinix,

Thanks for your reply.

But its no XML, its a Coldfusion source file - so you should expect it as a normal string. I will also not only search for certain tags, but searching for normal text too.

With "(<!---.*?--->)" I can find any comment. My question is now how to invert this regex, so it would find any text outside of any comment and at the next step this regex should also find the text I'm searching for ("<data>" in the example above).

Reply With Quote
  #4  
Old August 8th, 2012, 02:00 AM
requinix's Avatar
requinix requinix is offline
Still alive
Click here for more information.
 
Join Date: Mar 2007
Location: Washington, USA
Posts: 12,717 requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)  Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 5 Months 1 Week 4 Days 7 h 29 m 55 sec
Reputation Power: 8969
Send a message via AIM to requinix Send a message via MSN to requinix Send a message via Yahoo to requinix Send a message via Google Talk to requinix
Then it's HTML (or looks like it)? DOM is the answer.

I'm trying to steer you away from regular expressions because it's a nightmare to write something that respects HTML or XML grammar. Simply ignoring comments is not simple.

Reply With Quote
  #5  
Old August 8th, 2012, 02:45 AM
Jacques1's Avatar
Jacques1 Jacques1 is online now
pollyanna
Click here for more information.
 
Join Date: Jul 2012
Location: Germany
Posts: 1,881 Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level)Jacques1 User rank is Lieutenant General (80000 - 90000 Reputation Level) 
Time spent in forums: 1 Month 2 Weeks 2 Days 8 h 56 m 41 sec
Reputation Power: 813
Hi,

You might look for both comments and the actual search pattern and then skip the comments afterwards.

i. e.
Code:
/<!---.*?--->|YOURPATTERN/


However, this isn't really a good solution. I fully agree with requinix that you should use a parser rather than fumble with regular expressions.

Reply With Quote
  #6  
Old August 9th, 2012, 02:52 AM
Torsten79 Torsten79 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2012
Posts: 3 Torsten79 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 25 m 9 sec
Reputation Power: 0
Well, thanks for your efforts. So it seems that there is no way to skip text within a search.

I solved it now with two steps as suggested by Jaques1. First I delete all unwanted "<!---.*?--->" from text. In second step I'm searching for the "<data>".

DOM and other possibilities are too complex (and therefore time consumpting) for only looking if "<data>" is inside a text or not.

Reply With Quote
  #7  
Old August 9th, 2012, 05:14 PM
requinix's Avatar
requinix requinix is offline
Still alive
Click here for more information.
 
Join Date: Mar 2007
Location: Washington, USA
Posts: 12,717 requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)  Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 5 Months 1 Week 4 Days 7 h 29 m 55 sec
Reputation Power: 8969
Send a message via AIM to requinix Send a message via MSN to requinix Send a message via Yahoo to requinix Send a message via Google Talk to requinix
Quote:
Originally Posted by Torsten79
DOM and other possibilities are too complex

Huh. Then you must be using some language that doesn't feature any kind of DOM support whatsoever. Congratulations on being an exception to the rule.

Reply With Quote
  #8  
Old August 11th, 2012, 05:32 PM
spacebar208's Avatar
spacebar208 spacebar208 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Apr 2012
Location: spaceBAR Central
Posts: 191 spacebar208 User rank is Sergeant Major (2000 - 5000 Reputation Level)spacebar208 User rank is Sergeant Major (2000 - 5000 Reputation Level)spacebar208 User rank is Sergeant Major (2000 - 5000 Reputation Level)spacebar208 User rank is Sergeant Major (2000 - 5000 Reputation Level)spacebar208 User rank is Sergeant Major (2000 - 5000 Reputation Level)spacebar208 User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 2 Days 9 h 59 m 25 sec
Reputation Power: 41
If I understand you correctly, You can do it with sed:
Code:
$ cat t
<data> hits first here.
<!---
this is a comment about my <data>.
but the <data> should not be found
--->
And here comes hit no. 2 for <data>


Print lines with search data ignoring comment blocks:
Code:
$ sed -n -e '/^<\!---/,/^--->/d' -e '/<data>/p' t
<data> hits first here.
And here comes hit no. 2 for <data>


Print lines with search data ignoring comment blocks and their line numbers:
Code:
$ sed -n -e '/^<\!---/,/^--->/d' -e '/<data>/=;p' t
1
<data> hits first here.
6
And here comes hit no. 2 for <data>


And a couple of examples of printing line number on same line with text found:
Code:
$ sed = t | sed 'N;s/\n/\t/' | sed -n -e '/^[0-9]\{1,\}\t<\!---/,/^[0-9]\{1,\}\t--->/d' -e '/<data>/p'
1       <data> hits first here.
6       And here comes hit no. 2 for <data>

$ sed = t | sed 'N;s/\n/ - /' | sed -n -e '/^[0-9]\{1,\} - <\!---/,/^[0-9]\{1,\} - --->/d' -e '/<data>/p'
1 - <data> hits first here.
6 - And here comes hit no. 2 for <data>

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming Languages - MoreRegex Programming > Find a text, but not inside another one

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap