#1
  1. No Profile Picture
    Swimming in a fish bowl....
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2008
    Location
    Texas, Y'all!
    Posts
    133
    Rep Power
    17

    Retrieve content within a specified (cell order) TD cell


    Hi!

    Can anyone help me figure how to grab the data out of every occurrence of the 5th TD in a TR

    This is basically what I have:
    Code:
    <tr blahblah>
     <td>random text</td>
     <td>random text</td>
     <td>random text</td>
     <td>random text</td>
     <td>DATA I WANT</td>
     <td>random text</td>
    <tr>
    
    <tr blahblah>
     <td>random text</td>
     <td>random text</td>
     <td>random text</td>
     <td>random text</td>
     <td>DATA I WANT</td>
     <td>random text</td>
    <tr>
    
    ...
    The TRs have some parameters but the TDs do not.

    There could be 10, 20, 50 of these TR rows in the table code that I want to build a list out of.

    This is way over my RegEx skills :/

    TIA
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Jan 2005
    Posts
    1,586
    Rep Power
    275
    Something like...

    PHP Code:
    <?php

    $str 
    '<tr blahblah>
     <td>random text</td>
     <td>random text</td>
     <td>random text</td>
     <td>random text</td>
     <td>DATA I WANT 1</td>
     <td>random text</td>
    </tr>

    <tr blahblah>
     <td>random text</td>
     <td>random text</td>
     <td>random text</td>
     <td>random text</td>
     <td>DATA I WANT 2</td>
     <td>random text</td>
    </tr>


    <tr blahblah>
     <td>random text</td>
     <td>random text</td>
     <td>random text</td>
     <td>random text</td>
     <td>DATA I WANT 3</td>
     <td>random text</td>
     <td>random text</td>
     <td>random text</td>
     <td>random text</td>
     <td>random text</td>
    </tr>
    '
    ;

    preg_match_all '/(?=<tr.*>(?:.*<td.*>.*<\/td>){4}.*<td.*>(.+)<\/td>.*<\/tr>)/Uis'$str$out );

    print_r $out[1] );
        
    ?>
    That should work, but it's untested. I use (?=) the zero-width positive lookhead even though it's not needed. I use it so $out[0] will be empty because those matches are not needed. I call it a hack because PHP preg_match_all doesn't respect no backreferences (? tag, so I use that to remove the junk, sure the array elements remains, but the unneeded junk values are removed.

    Comments on this post

    • Ahhk agrees
  4. #3
  5. kill 9, $$;
    Devshed Supreme Being (6500+ posts)

    Join Date
    Sep 2001
    Location
    Shanghai, An tSín
    Posts
    6,897
    Rep Power
    3887
    I tend to avoid using regexps to parse HTML or similar. Most languages will have well-established and thoroughly tested HTML parsers available for them. I'd always go for one of those before trying to roll my own.

    Comments on this post

    • Ahhk agrees
  6. #4
  7. No Profile Picture
    Swimming in a fish bowl....
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2008
    Location
    Texas, Y'all!
    Posts
    133
    Rep Power
    17
    Originally Posted by printf
    Something like...

    PHP Code:
    <?php

    preg_match_all 
    '/(?=<tr.*>(?:.*<td.*>.*<\/td>){4}.*<td.*>(.+)<\/td>.*<\/tr>)/Uis'$str$out );

    print_r $out[1] );
        
    ?>
    That should work, but it's untested. I use (?=) the zero-width positive lookhead even though it's not needed. I use it so $out[0] will be empty because those matches are not needed. I call it a hack because PHP preg_match_all doesn't respect no backreferences (? tag, so I use that to remove the junk, sure the array elements remains, but the unneeded junk values are removed.
    Oh sweeet! Thanks, printf.

    I seem to have problems trying to do anything but simple RegEx from scratch. Im REALLY hoping that it all clicks/sticks one of these days.

    Thanks again for your time!
  8. #5
  9. No Profile Picture
    Swimming in a fish bowl....
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2008
    Location
    Texas, Y'all!
    Posts
    133
    Rep Power
    17
    Originally Posted by ishnid
    I tend to avoid using regexps to parse HTML or similar. Most languages will have well-established and thoroughly tested HTML parsers available for them. I'd always go for one of those before trying to roll my own.
    Yeah, I keep hearing that. But, while I have limited (enough to break things REALLY well) RegEx experience, I have ZERO with a DOM parser.

    Plus, wouldnt a parser create a lot more overhead/code/etc than a masterful one line of RegEx like printf came up with?

    I just DL'd PHP Simple HTML DOM Parser ...and am going to check it out.

    Thanks!
  10. #6
  11. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,119
    Rep Power
    9398
    Originally Posted by Ahhk
    Plus, wouldnt a parser create a lot more overhead/code/etc than a masterful one line of RegEx like printf came up with?
    You're forgetting the overhead that comes with invoking a regular expression engine.
    A regex isn't just a command to be executed - it's code that has to be parsed, compiled, and interpreted.

    A regex isn't necessarily slower than some DOM parser, but it isn't necessarily faster either.

    Comments on this post

    • Ahhk agrees
  12. #7
  13. No Profile Picture
    Swimming in a fish bowl....
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2008
    Location
    Texas, Y'all!
    Posts
    133
    Rep Power
    17
    Originally Posted by requinix
    You're forgetting the overhead that comes with invoking a regular expression engine.
    A regex isn't just a command to be executed - it's code that has to be parsed, compiled, and interpreted.

    A regex isn't necessarily slower than some DOM parser, but it isn't necessarily faster either.
    Never really thought about that. Guess I continue to learn something new every day
  14. #8
  15. kill 9, $$;
    Devshed Supreme Being (6500+ posts)

    Join Date
    Sep 2001
    Location
    Shanghai, An tSín
    Posts
    6,897
    Rep Power
    3887
    Originally Posted by Ahhk
    Plus, wouldnt a parser create a lot more overhead/code/etc than a masterful one line of RegEx like printf came up with?
    If you're absolutely 100% confident that the regexp will cater for every possible situation you may come across, then you can start worrying about overhead. A properly written and tested HTML parser will give you reliable results, which is far more important.

    Comments on this post

    • prometheuzz agrees : Couldn't agree more!

IMN logo majestic logo threadwatch logo seochat tools logo