#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2009
    Posts
    1
    Rep Power
    0

    Multiple HTML Tags


    Hi there, having a nightmare getting a regex configured. new to this, but have spent a few hours with no luck.

    I need to parse data from a string in HTML. It looks like one of the two following

    <html1>Data I Need</html1>

    or

    <html1><html2>Data I Need</html1></html2>

    I can't seem to put together the regex to get the data saved in one variable. I can either capture it in the first instance but not the second, or vice versa... any help?!

    Thanks!
  2. #2
  3. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    1 - I have no idea what you're trying to do;
    2 - the second example is not valid html, it should be <html1><html2>Data I Need</html2></html1>
  4. #3
  5. kill 9, $$;
    Devshed Supreme Being (6500+ posts)

    Join Date
    Sep 2001
    Location
    Shanghai, An tSín
    Posts
    6,897
    Rep Power
    3887
    Originally Posted by prometheuzz
    1 - I have no idea what you're trying to do;
    2 - the second example is not valid html, it should be <html1><html2>Data I Need</html2></html1>
    There's no 'html1' or 'html2' tag that I'm aware of in the HTML specification either. Is this HTML you're parsing or something else that just has similarities?

    I would personally never dream of trying to write regexps for parsing HTML. Any mainstream programming language will have a HTML parsing library available for it. Those are generally far more robust than any hand-rolled regexp-based solution you might come up with yourself.

    Comments on this post

    • jzd agrees
    • prometheuzz agrees
  6. #4
  7. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by ishnid
    There's no 'html1' or 'html2' tag that I'm aware of in the HTML specification either. Is this HTML you're parsing or something else that just has similarities?
    I assumed that it was just an example of two different tags.

    Originally Posted by ishnid
    I would personally never dream of trying to write regexps for parsing HTML. Any mainstream programming language will have a HTML parsing library available for it. Those are generally far more robust than any hand-rolled regexp-based solution you might come up with yourself.
    I couldn't agree more with you!
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2009
    Posts
    159
    Rep Power
    208
    You can use recursive regular expressions, but using XPath is better.
  10. #6
  11. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by chorny_cpan
    You can use recursive regular expressions,
    Few regex implementations offer this functionality.

    Originally Posted by chorny_cpan
    but using XPath is better.
    Yes, a lot better!

IMN logo majestic logo threadwatch logo seochat tools logo