1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2008
    Rep Power

    Unhappy Difficulty with line breaks

    I'm not really good with regex, as I don't use it often. And I'm giving it my best shot. I have text like this:
    page: 404;http://www.sbcri.info:80/talkingbrochure2/data/swf/notes/notesbig4.swf
    page found /help/error_pages/404b.php
    user: John Smith
    IP: http://ws.arin.net/whois/?queryinput=
    Number of pages found: 0

    referring page: http://www.sbcri.info/talkingbrochure2/player/playershell.swf
    404 version: 060111

    User agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/530.5 (KHTML, like Gecko) Chrome/ Safari/530.5

    Sent to: Brad (98), Jay (99)
    I want to capture several pieces of information, but I'll start with just the two I've highlighted above. Here's the statement I have right now, which isn't working--I'm getting stuck with the line returns or something:

    PHP Code:
    preg_match('/page: 404;(?P<URL>.+)\nuser= (?P<user>.+)/sm'$Body$match
    Can you help me with the expression, and give me a clue what I was doing wrong?

    I greatly appreciate the help!

  2. #2
  3. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Rep Power
    You're using the m- (multi-line) and s- (DOT-ALL) modifiers while you only need the DOT-ALL. When using the multi-line modifier, the regex meta character ^ and $ will not only match the start and end of the entire string, but also match the start and end of each line in the entire text. Since you're not utilizing these meta characters, you can leave it out. And DOT-ALL will cause the DOT to match new line characters as well (by default it doesn't). When enabling this option, you'll have to be careful with the greedy DOT-PLUS and DOT-STAR's in your regex because the entire string will be "eaten" by them! I've made them reluctant (un-greedy) by adding a question mark after them:

    And I also replaced some of them with a negated character class:

    which means: match one or more characters of any type, except new line characters.

    So, the final regex might look like this:

    PHP Code:

IMN logo majestic logo threadwatch logo seochat tools logo