#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2008
    Posts
    112
    Rep Power
    76

    Ignoring line breaks in regular expression?


    Hi there,

    I'm using preg_match() to get the title tag from a website. I can get the text "Example Website" without any problems, but if there's line breaks inside the tag it won't work at all

    Working
    Code:
    <title>Example Website</title>
    Not working
    Code:
    <title>
    Example Website
    </title>
    I've tried modifying my regular expression by using "\n" to ignore any line breaks, but I can't get it working

    I've included my regex pattern below in case anyone can help
    Code:
    @<title[^>]*>(.*?)</title>@i
    Last edited by MagSafe; June 18th, 2010 at 02:44 PM.
  2. #2
  3. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,965
    Rep Power
    9397
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2008
    Posts
    112
    Rep Power
    76
    Originally Posted by requinix
    Add the /s flag.
    Thanks for that

    I tried this below but it won't work at all:
    Code:
    @<title[^>]*>(.*?)</title>\s@i
    Where as this will work if there's line breaks but won't if there isn't
    Code:
    @<title[^>]*>\s(.*?)\s</title>@i
    Sorry, bit confused!
  6. #4
  7. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,965
    Rep Power
    9397
    "/foo" is the typical way of writing what a flag is (least as far as I've seen). Since delimiters often change you're supposed to replace the "/" with whatever you're using.

    "flag", as in, the same thing that "@i" you have there is.
    Code:
    @<title[^>]*>(.*?)</title>@is
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2008
    Posts
    112
    Rep Power
    76
    Originally Posted by requinix
    "/foo" is the typical way of writing what a flag is (least as far as I've seen). Since delimiters often change you're supposed to replace the "/" with whatever you're using.

    "flag", as in, the same thing that "@i" you have there is.
    Code:
    @<title[^>]*>(.*?)</title>@is
    That seems so obvious now you've explained it

    Thanks so much! works perfectly
  10. #6
  11. kill 9, $$;
    Devshed Supreme Being (6500+ posts)

    Join Date
    Sep 2001
    Location
    Shanghai, An tSín
    Posts
    6,897
    Rep Power
    3886
    I would always advise against using regexps to parse HTML: it's fraught with problems and there are proper tag-aware HTML parsers out there that you can use.
  12. #7
  13. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,846
    Rep Power
    6351
    There's a debate over what to use. For PHP in particular, the DOM has gotten better recently, but I still prefer string parsing functions for looking at strings. If I'm not trying to treat it as an HTML document, I treat it like a string. I'd use regexp to pull every instance of "Mister Lastname" out of a report, so I'd use it to pull every instance of "<img src='Imagename'" as well.

    MagSafe, in this particular instance, if you used the DOMDocument to load your HTML, you could easily pull just the title of the document, with no regex required.

    -Dan
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.

IMN logo majestic logo threadwatch logo seochat tools logo