#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2008
    Posts
    14
    Rep Power
    0

    Unhappy Difficulty with line breaks


    I'm not really good with regex, as I don't use it often. And I'm giving it my best shot. I have text like this:
    page: 404;http://www.sbcri.info:80/talkingbrochure2/data/swf/notes/notesbig4.swf
    page found /help/error_pages/404b.php
    user: John Smith
    email:
    IP: http://ws.arin.net/whois/?queryinput=66.222.124.242
    Number of pages found: 0
    Suggestions:


    referring page: http://www.sbcri.info/talkingbrochure2/player/playershell.swf
    404 version: 060111

    User agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/530.5 (KHTML, like Gecko) Chrome/2.0.172.33 Safari/530.5

    Sent to: Brad (98), Jay (99)
    I want to capture several pieces of information, but I'll start with just the two I've highlighted above. Here's the statement I have right now, which isn't working--I'm getting stuck with the line returns or something:

    PHP Code:
    preg_match('/page: 404;(?P<URL>.+)\nuser= (?P<user>.+)/sm'$Body$match
    Can you help me with the expression, and give me a clue what I was doing wrong?

    I greatly appreciate the help!

    Jay
  2. #2
  3. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    938
    You're using the m- (multi-line) and s- (DOT-ALL) modifiers while you only need the DOT-ALL. When using the multi-line modifier, the regex meta character ^ and $ will not only match the start and end of the entire string, but also match the start and end of each line in the entire text. Since you're not utilizing these meta characters, you can leave it out. And DOT-ALL will cause the DOT to match new line characters as well (by default it doesn't). When enabling this option, you'll have to be careful with the greedy DOT-PLUS and DOT-STAR's in your regex because the entire string will be "eaten" by them! I've made them reluctant (un-greedy) by adding a question mark after them:

    Code:
    .*?
    And I also replaced some of them with a negated character class:

    Code:
    [^\r\n]+
    which means: match one or more characters of any type, except new line characters.

    So, the final regex might look like this:

    PHP Code:
    preg_match('/page:\s+404;(?P<URL>[^\r\n]+).*?user:\s+(?P<user>[^\r\n]+)/s'$text$match); 

IMN logo majestic logo threadwatch logo seochat tools logo