#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2008
    Location
    USA
    Posts
    42
    Rep Power
    7

    Quoted String regex


    Hi people,

    I want to match and pick up quoted strings from html text. (but not the ones in the html tags)

    ( '([^\\']|\\.)*' | "([^\\"]|\\.)*" ) <- does the job of selecting quoted strings, first part for single-quoted and second half for double quoted.

    But it also picks up the html tag properties.

    eg. <p class="strong"> Here its mostly sunny. But it is raining outside.</p>
    <span id="new" class="strong"> What you see now is "Some quoted text". This is 'single-quoted text'.</span>

    The regex will match "strong","new" also. which I dont want. Any ideas how to modify the regex?
  2. #2
  3. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,005
    Rep Power
    9398
    Just strip out the tags beforehand. PHP has a strip_tags function for that exact purpose.
    If your language doesn't have something similar replace /<[^>]*>/ with nothing.
  4. #3
  5. kill 9, $$;
    Devshed Supreme Being (6500+ posts)

    Join Date
    Sep 2001
    Location
    Shanghai, An tSín
    Posts
    6,897
    Rep Power
    3886
    Also don't forget the HTML character entities for quotes too - &amp;quot; is for double quotes. I don't remember singles off the top of my head.
    Last edited by ishnid; October 13th, 2008 at 07:15 AM.
  6. #4
  7. Permanently Banned
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jun 2006
    Location
    In a whale
    Posts
    4,147
    Rep Power
    0
    I don't remember singles off the top of my head.
    & #39; & apos; at times as well (certain browsers don't like this one, though).
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2008
    Location
    USA
    Posts
    42
    Rep Power
    7
    Thank you all for replies.

    @ requinix : My language is Perl. And sorry, I dont get what you are trying to say.

    I want to be able to match quoted strings other than the ones in the HTML tags. Even if I strip of the tags, the attribute values will match the regex.

    @ishnid and ryon420: Yeah I will keep the html entities in mind.
  10. #6
  11. kill 9, $$;
    Devshed Supreme Being (6500+ posts)

    Join Date
    Sep 2001
    Location
    Shanghai, An tSín
    Posts
    6,897
    Rep Power
    3886
    Originally Posted by m4st3rm1nd
    @ requinix : My language is Perl. And sorry, I dont get what you are trying to say.

    I want to be able to match quoted strings other than the ones in the HTML tags. Even if I strip of the tags, the attribute values will match the regex.
    If you strip out the tags, the attribute values won't be there anymore, so they can't possibly match.
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2008
    Location
    USA
    Posts
    42
    Rep Power
    7
    oh that's right. got it. dont know what i was thinking earlier. i aint a morning person. You can tell
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Jan 2005
    Posts
    1,586
    Rep Power
    275
    Something like...

    PHP Code:
    <?php

    $str 
    'eg. <p class="strong"> Here its mostly sunny. But it is raining outside.</p>
    <span id="new" class="strong"> What you see now is "Some quoted text". This is \'single-quoted text\'.</span>'
    ;

    preg_match_all "/(?![^<]+>)(?:\"|')(.+)(?:\"|')/U"$str$out );

    print_r $out[1] );

    ?>

    Or if you don't want to match inside </a> tags then it would be..


    PHP Code:
    preg_match_all "/(?!(?:[^<]+>|[^>]+\<\/a\>))(?:\"|')(.+)(?:\"|')/U"$str$out ); 

IMN logo majestic logo threadwatch logo seochat tools logo