July 10th, 2009, 04:24 PM
Difficulty with line breaks
I'm not really good with regex, as I don't use it often. And I'm giving it my best shot. I have text like this:
I want to capture several pieces of information, but I'll start with just the two I've highlighted above. Here's the statement I have right now, which isn't working--I'm getting stuck with the line returns or something:
Can you help me with the expression, and give me a clue what I was doing wrong?
preg_match('/page: 404;(?P<URL>.+)\nuser= (?P<user>.+)/sm', $Body, $match)
I greatly appreciate the help!
July 11th, 2009, 02:05 AM
You're using the m- (multi-line) and s- (DOT-ALL) modifiers while you only need the DOT-ALL. When using the multi-line modifier, the regex meta character ^ and $ will not only match the start and end of the entire string, but also match the start and end of each line in the entire text. Since you're not utilizing these meta characters, you can leave it out. And DOT-ALL will cause the DOT to match new line characters as well (by default it doesn't). When enabling this option, you'll have to be careful with the greedy DOT-PLUS and DOT-STAR's in your regex because the entire string will be "eaten" by them! I've made them reluctant (un-greedy) by adding a question mark after them:
And I also replaced some of them with a negated character class:
which means: match one or more characters of any type, except new line characters.
So, the final regex might look like this:
preg_match('/page:\s+404;(?P<URL>[^\r\n]+).*?user:\s+(?P<user>[^\r\n]+)/s', $text, $match);