#1
  1. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Mar 2008
    Posts
    1,927
    Rep Power
    378

    REGEXP help


    I have a large chunk of text, a representative snippet of which looks something like this:
    Code:
    ... 139 Elegant Mews and other mews houses in the area, including 246 CHARMING MEWS SOUTH which was the last to be...
    From this chunk, I'd like to extract a list of mews streets, so:

    Elegant Mews
    Charming Mews South

    In plain English, I think the rules for extraction might look something like this:

    1. Find any occurrence of the word 'Mews' in which the 'M' is capitalized.

    2. Backtrack two or three words, until a number is found.

    3. If the word 'Mews' is immediately succeeded by the words 'North', 'South', 'East' or 'West', also include these words in the extraction.

    4. Echo each found occurrence on a new line, in title case.

    As always, any help appreciated.
  2. #2
  3. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,692
    Rep Power
    6351
    PHP Code:
    $string '... 139 Elegant Mews and other mews houses in the area, including 246 CHARMING MEWS SOUTH which was the last to be...';
    preg_match_all("/\d+\s*([A-Z-a-z\s]+\s*M[Ee][Ww][Ss]\s*(?:[Nn][Oo][Rr][Tt][Hh]|[Ss][Oo][Uu][Tt][Hh]|[Ee][Aa][Ss][Tt]|[Ww][Ee][Ss][Tt])?)/"$string$matches);
    echo 
    implode("<br />\n"$matches[1]); 
    Thread moved to the regex forum.

    -Dan
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  4. #3
  5. Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2008
    Location
    North Carolina
    Posts
    2,674
    Rep Power
    2674
    Only because I'm bored of work for the moment..
    php Code:
    <?php
    error_reporting(E_ALL);
     
    // Will match "Elegant Mews and CHARMING MEWS SOUTH"
    $subject = '... 139 Elegant Mews and other mews houses in the area, including 246 CHARMING MEWS SOUTH which was the last to be...';
     
    // Will match "Elegant and other mews"
    // $subject = '... 139 Elegant and other mews houses in the area, including 246 CHARMING which was the last to be...';
     
    preg_match_all('/\d+\s+(.*? Mews\s?(North|South|East|West)?)/i', $subject, $results, PREG_PATTERN_ORDER);
     
    foreach ($results[1] AS $result) {
    	echo htmlentities(trim($result), ENT_COMPAT, 'UTF-8') .'<br />';
    }
  6. #4
  7. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,692
    Rep Power
    6351
    The reason why mine is so sloppy is because the spec said he wanted to match only "mews" with an initial capital letter, and the rest had to be case-insensitive, so I had to do the stupid M[Ee][Ww][Ss] thing.

    Neither one of us wrapped the output in something that would generate title-caps, but he can do that.

    -Dan
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Mar 2008
    Posts
    1,927
    Rep Power
    378
    Brilliant, chaps. Brilliant.

    The speed and thoroughness of the responses is humbling.

    Comments on this post

    • ManiacDan agrees

IMN logo majestic logo threadwatch logo seochat tools logo