#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    9
    Rep Power
    0

    How to find specific HTML div and it's matching closing tag


    Hello

    I am trying to use regex in a vbscript to find a specific html div by ID like this '<div id="page_content_top_stage">' and everything in that div making sure that any nested div tags do not mess things up.

    Here is one example of this type of HTML code:
    Code:
     <div id="page_content_top_stage"> <br>
          <!-- InstanceBeginEditable name="content up" -->
          <table border="0" cellpadding="0" cellspacing="0">
            <tr>
              <td width="1" height="100%"><img src="../spacer.gif" width="1" height="627"></td>
              <td width="740" valign="top"><img src="../gui_assets/gui_images/spacer.gif" width="740" height="1"><br>
              </td>
            </tr>
          </table>
          <!-- InstanceEndEditable --> </div>

    This worked somewhat but if there was a nested div tag, this code would only match up to that nested '</div>' and miss content after that point. In other words, it was catching the wrong closing tag.
    Code:
    <div \b[^>]*>(.*?)</div>
  2. #2
  3. No Profile Picture
    Lost in code
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2004
    Posts
    8,296
    Rep Power
    7170
    Regular expressions are not really suitable for parsing complex structures like HTML. They work OK if you're just trying to grab really small snippets out of it, but if you need to actually match related tags it's a bit beyond their scope. Consider looking for a library that's actually able to parse the HTML as HTML.
    PHP FAQ

    Originally Posted by Spad
    Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2013
    Posts
    9
    Rep Power
    0
    Originally Posted by E-Oreo
    Regular expressions are not really suitable for parsing complex structures like HTML. They work OK if you're just trying to grab really small snippets out of it, but if you need to actually match related tags it's a bit beyond their scope. Consider looking for a library that's actually able to parse the HTML as HTML.
    I have never used something like that. Would it work in a vbs script file?

IMN logo majestic logo threadwatch logo seochat tools logo