The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Programming Languages - More
> Regex Programming
|
Quoted String regex
Discuss Quoted String regex in the Regex Programming forum on Dev Shed. Quoted String regex Regular expressions forum covering PCRE and POSIX techniques, practices, and standards. Regular expressions help shorten coding time by providing the ability to compact many lines of code into one string.
|
|
 |
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

October 13th, 2008, 02:58 AM
|
|
Contributing User
|
|
Join Date: Jul 2008
Location: USA
Posts: 42
Time spent in forums: 12 h 33 m 37 sec
Reputation Power: 5
|
|
|
Quoted String regex
Hi people,
I want to match and pick up quoted strings from html text. (but not the ones in the html tags)
( '([^\\']|\\.)*' | "([^\\"]|\\.)*" ) <- does the job of selecting quoted strings, first part for single-quoted and second half for double quoted.
But it also picks up the html tag properties.
eg. <p class="strong"> Here its mostly sunny. But it is raining outside.</p>
<span id="new" class="strong"> What you see now is "Some quoted text". This is 'single-quoted text'.</span>
The regex will match "strong","new" also. which I dont want. Any ideas how to modify the regex?
|

October 13th, 2008, 04:29 AM
|
 |
Still alive
|
|
Join Date: Mar 2007
Location: Washington, USA
|
|
|
Just strip out the tags beforehand. PHP has a strip_tags function for that exact purpose.
If your language doesn't have something similar replace /<[^>]*>/ with nothing.
|

October 13th, 2008, 05:37 AM
|
 |
kill 9, $$;
|
|
Join Date: Sep 2001
Location: Shanghai, An tSín
|
|
|
Also don't forget the HTML character entities for quotes too - " is for double quotes. I don't remember singles off the top of my head.
Last edited by ishnid : October 13th, 2008 at 07:15 AM.
|

October 13th, 2008, 05:46 AM
|
 |
Permanently Banned
|
|
Join Date: Jun 2006
Location: In a whale
|
|
Quote: | I don't remember singles off the top of my head. |
& #39; & apos; at times as well (certain browsers don't like this one, though).
|

October 13th, 2008, 11:13 AM
|
|
Contributing User
|
|
Join Date: Jul 2008
Location: USA
Posts: 42
Time spent in forums: 12 h 33 m 37 sec
Reputation Power: 5
|
|
|
Thank you all for replies.
@ requinix : My language is Perl. And sorry, I dont get what you are trying to say.
I want to be able to match quoted strings other than the ones in the HTML tags. Even if I strip of the tags, the attribute values will match the regex.
@ishnid and ryon420: Yeah I will keep the html entities in mind.
|

October 13th, 2008, 11:29 AM
|
 |
kill 9, $$;
|
|
Join Date: Sep 2001
Location: Shanghai, An tSín
|
|
Quote: | Originally Posted by m4st3rm1nd
@ requinix : My language is Perl. And sorry, I dont get what you are trying to say.
I want to be able to match quoted strings other than the ones in the HTML tags. Even if I strip of the tags, the attribute values will match the regex.
|
If you strip out the tags, the attribute values won't be there anymore, so they can't possibly match.
|

October 13th, 2008, 12:33 PM
|
|
Contributing User
|
|
Join Date: Jul 2008
Location: USA
Posts: 42
Time spent in forums: 12 h 33 m 37 sec
Reputation Power: 5
|
|
oh that's right. got it. dont know what i was thinking earlier. i aint a morning person. You can tell 
|

October 14th, 2008, 08:23 AM
|
|
|
Something like...
PHP Code:
<?php
$str = 'eg. <p class="strong"> Here its mostly sunny. But it is raining outside.</p>
<span id="new" class="strong"> What you see now is "Some quoted text". This is \'single-quoted text\'.</span>';
preg_match_all ( "/(?![^<]+>)(?:\"|')(.+)(?:\"|')/U", $str, $out );
print_r ( $out[1] );
?>
Or if you don't want to match inside </a> tags then it would be..
PHP Code:
preg_match_all ( "/(?!(?:[^<]+>|[^>]+\<\/a\>))(?:\"|')(.+)(?:\"|')/U", $str, $out );
|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|