#1
  1. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,497
    Rep Power
    594

    Parsing HTML Using HTML::TreeBuilder


    I am trying to parse an html file that has several tables. I am able to find the table I need to parse which contains many rows. My task is to find the data in one particular row. Unfortunately, the only way to identify which row, is by an embedded <script>. Here is the row I am trying to find:
    Code:
    <TR>
              <TD width=156 bgColor=#e7e7e7 height=25>&nbsp;</TD>
              <TD width=8 background=UI_04.gif height=25>&nbsp;</TD>
              <TD colSpan=3 height=25>&nbsp;</TD>
              <TD><FONT style="FONT-SIZE: 8pt"><script>Capture(share.ipaddr)</script>:</FONT></TD>
    
              <TD><FONT style="FONT-SIZE: 8pt"><B>74.176.153.107</B></FONT></TD>
              <TD width=13 height=25>&nbsp;</TD>
              <TD width=15 background=UI_05.gif height=25>&nbsp;</TD></TR>
    <TR>
    I cannot figure out how to get the text of that script. Here is my code segment:
    Code:
    my @tables=$html->look_down('_tag','table');
    foreach my $table (@tables) {
            if (defined($table->attr('id')) && $table->attr('id') eq "AutoNumber9") {
                    my @tds=$table->look_down('_tag','td');
                    foreach my $td (@tds) {
                            my $script=$td->look_down('_tag','script');
                            if (defined($script)) {
                                  print($script->as_text."\n");
                            }
                    }
            }
    }
    While the logic finds the script tags, it appears that the 'as_text' method returns an empty string for them. How can I determine the script text in that row? TIA.
    There are 10 kinds of people in the world. Those that understand binary and those that don't.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,497
    Rep Power
    594
    Never mind. It turns out that $script->as_HTML gave me what I needed.
    There are 10 kinds of people in the world. Those that understand binary and those that don't.

IMN logo majestic logo threadwatch logo seochat tools logo