The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Programming Languages - More
> Regex Programming
|
Find a text, but not inside another one
Discuss Find a text, but not inside another one in the Regex Programming forum on Dev Shed. Find a text, but not inside another one Regular expressions forum covering PCRE and POSIX techniques, practices, and standards. Regular expressions help shorten coding time by providing the ability to compact many lines of code into one string.
|
|
 |
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

August 7th, 2012, 10:07 AM
|
|
Registered User
|
|
Join Date: Aug 2012
Posts: 3
Time spent in forums: 1 h 25 m 9 sec
Reputation Power: 0
|
|
|
Find a text, but not inside another one
Hi all,
I'm searching for a regex which will find a text, but this text should not be inside another text (comment).
e.g. searching string " <data>" inside the following comment sign (starting " <!---" till " --->") should not be found
Code:
<!---
this is a comment about my <data>.
but the <data> should not be found
--->
but on the other hand in this case are two possible hits
Code:
<data> hits first here.
<!---
this is a comment about my <data>.
but the <data> should not be found
--->
And here comes hit no. 2 for <data>
Thanks in advance for your helps.
|

August 7th, 2012, 11:35 AM
|
 |
Still alive
|
|
Join Date: Mar 2007
Location: Washington, USA
|
|
|
If that's XML then you should be using XML methods, not regular expressions. Exactly how depends on what language(s) you're using.
|

August 8th, 2012, 01:55 AM
|
|
Registered User
|
|
Join Date: Aug 2012
Posts: 3
Time spent in forums: 1 h 25 m 9 sec
Reputation Power: 0
|
|
|
Hi requinix,
Thanks for your reply.
But its no XML, its a Coldfusion source file - so you should expect it as a normal string. I will also not only search for certain tags, but searching for normal text too.
With "(<!---.*?--->)" I can find any comment. My question is now how to invert this regex, so it would find any text outside of any comment and at the next step this regex should also find the text I'm searching for ("<data>" in the example above).
|

August 8th, 2012, 02:00 AM
|
 |
Still alive
|
|
Join Date: Mar 2007
Location: Washington, USA
|
|
|
Then it's HTML (or looks like it)? DOM is the answer.
I'm trying to steer you away from regular expressions because it's a nightmare to write something that respects HTML or XML grammar. Simply ignoring comments is not simple.
|

August 8th, 2012, 02:45 AM
|
 |
pollyanna
|
|
Join Date: Jul 2012
Location: Germany
|
|
Hi,
You might look for both comments and the actual search pattern and then skip the comments afterwards.
i. e.
Code:
/<!---.*?--->|YOURPATTERN/
However, this isn't really a good solution. I fully agree with requinix that you should use a parser rather than fumble with regular expressions.
|

August 9th, 2012, 02:52 AM
|
|
Registered User
|
|
Join Date: Aug 2012
Posts: 3
Time spent in forums: 1 h 25 m 9 sec
Reputation Power: 0
|
|
|
Well, thanks for your efforts. So it seems that there is no way to skip text within a search.
I solved it now with two steps as suggested by Jaques1. First I delete all unwanted "<!---.*?--->" from text. In second step I'm searching for the "<data>".
DOM and other possibilities are too complex (and therefore time consumpting) for only looking if "<data>" is inside a text or not.
|

August 9th, 2012, 05:14 PM
|
 |
Still alive
|
|
Join Date: Mar 2007
Location: Washington, USA
|
|
Quote: | Originally Posted by Torsten79 DOM and other possibilities are too complex |
Huh. Then you must be using some language that doesn't feature any kind of DOM support whatsoever. Congratulations on being an exception to the rule.
|

August 11th, 2012, 05:32 PM
|
 |
Contributing User
|
|
Join Date: Apr 2012
Location: spaceBAR Central
|
|
If I understand you correctly, You can do it with sed:
Code:
$ cat t
<data> hits first here.
<!---
this is a comment about my <data>.
but the <data> should not be found
--->
And here comes hit no. 2 for <data>
Print lines with search data ignoring comment blocks:
Code:
$ sed -n -e '/^<\!---/,/^--->/d' -e '/<data>/p' t
<data> hits first here.
And here comes hit no. 2 for <data>
Print lines with search data ignoring comment blocks and their line numbers:
Code:
$ sed -n -e '/^<\!---/,/^--->/d' -e '/<data>/=;p' t
1
<data> hits first here.
6
And here comes hit no. 2 for <data>
And a couple of examples of printing line number on same line with text found:
Code:
$ sed = t | sed 'N;s/\n/\t/' | sed -n -e '/^[0-9]\{1,\}\t<\!---/,/^[0-9]\{1,\}\t--->/d' -e '/<data>/p'
1 <data> hits first here.
6 And here comes hit no. 2 for <data>
$ sed = t | sed 'N;s/\n/ - /' | sed -n -e '/^[0-9]\{1,\} - <\!---/,/^[0-9]\{1,\} - --->/d' -e '/<data>/p'
1 - <data> hits first here.
6 - And here comes hit no. 2 for <data>
|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|