#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2009
    Posts
    12
    Rep Power
    0

    Grab HTML from Source code pasted into Text Field


    I posted this question under the PHP forum and to reduce the risk of spaming the boards I am going to just link this one to that post.

    Link to PHP Board Post

    Can anyone point me in the right direction on this.

    Thanks
  2. #2
  3. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,961
    Rep Power
    9397
    Hey, you look familiar. Have we met? Sorry to be a pain but I'm trying to get people to post their regex questions in the right forum.

    PHP Code:
    $before "The status of the <B>[^<]+</B> of <B>[^<]+</B>\.<P>";
    $after '<TR><TD>Net Income</TD><TD ALIGN=RIGHT>\$[\d,]+</TD></TR>';
    preg_match("!$before(.*?)$after!is"$text$matches);
    // $matches[1] is the text in between

    $before "<TABLE>
    <TR><TH COLSPAN=2 BGCOLOR=#000040>Land Distribution</TH></TR>"
    ;
    $after "<TR><TD>SDI</TD><TD align=right>[\d,]+</TD></TR></TABLE></TD></TR>
    </TABLE>"
    ;
    preg_match("!$before(.*?)$after!is"$text$matches);
    // again, $matches[1] is the text in between 
    I think that should suffice.

    Comments on this post

    • Winters agrees : Unusually polite. Feeling alright?
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2009
    Posts
    12
    Rep Power
    0
    little confused here, how does this code work... I would like to understand it instead of just asking for someone to do it for me.

    I don't understand how this will print or echo back to the user.

    I tried using the code snippet you provided and I get nothing back.

    would really like to understand how this is suppose to work.

    thanks for the reply.
  6. #4
  7. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by requinix
    ...
    I think that should suffice.
    Since that input is coming from a text field filled by a user, I highly doubt that it will always be the same. So you won't have a fixed $before and $after string.
  8. #5
  9. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by PigeonMarine
    I posted this question under the PHP forum and to reduce the risk of spaming the boards I am going to just link this one to that post.

    Link to PHP Board Post

    Can anyone point me in the right direction on this.

    Thanks
    Before being able to answer your question (with an explanation), could you explain the rules for the strings you want preserve, or the other way around: explain the rules for the strings that should be removed. You gave just a single source and said I want this and that to be preserved, but what about other forms of input?

    Before being able to tell the regex engine what should and should not be removed, you should explain it in great detail here.

    Good luck.
  10. #6
  11. Did you steal it?
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,961
    Rep Power
    9397
    Originally Posted by prometheuzz
    Since that input is coming from a text field filled by a user, I highly doubt that it will always be the same. So you won't have a fixed $before and $after string.
    Right. Which is why I looked at the two strings and guessed what parts of them would change and what would not.
    If I missed something he would likely mention it.

    Originally Posted by PidgeonMarine
    little confused here, how does this code work... I would like to understand it instead of just asking for someone to do it for me.

    I don't understand how this will print or echo back to the user.

    I tried using the code snippet you provided and I get nothing back.

    would really like to understand how this is suppose to work.

    thanks for the reply.
    It stuffs that "everything from... to..." into two variables. You're supposed to... I don't know, all you said was that you wanted what was in between them.
    I assumed you were going to do something else, like use an HTML parser, or maybe just print out the stuff literally. You didn't really say what you were going to do next and I didn't ask.

    Literally, all that code does is search for the $before string (which is generalized a bit) and the $after string (also generalized) and get everything in between them. That's it.
    In both cases, $matches[1] contains the text. If you want to do something then you use that. Keep in mind that it contains HTML so if you simply echo/print it out then you'll get (invalid) HTML-formatted text.

    If you need an explanation or tutorial on regular expressions then check the sticky here: it has a bunch of links you should look at.


    If it's not working then
    Originally Posted by prometheuzz
    could you explain the rules for the strings you want preserve, or the other way around: explain the rules for the strings that should be removed. You gave just a single source and said I want this and that to be preserved, but what about other forms of input?

    Before being able to tell the regex engine what should and should not be removed, you should explain it in great detail here.
  12. #7
  13. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by requinix
    Right. Which is why I looked at the two strings and guessed what parts of them would change and what would not.
    ...
    Aha, I should have read your post with more attention: I missed that completely! Sorry.
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2009
    Posts
    12
    Rep Power
    0
    The data I posted in the other thread is a page called advisor in an online game.

    This page is the same for all users, except the data in the table and the main Title.

    Everything else is the same.

    an example of what I am trying to is at this link.
    Code:
    http://evolution2025.com/qzStatusTidy.php
    Copy the code (Source Code) I posted before, and past it into that page and check the preveiw table box, and click the button....

    This will show you what I am trying to learn how to do.

    I wish I could explain more, but like I said, I am trying to learn how to do this, but not sure where to start.

    Thanks
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2009
    Posts
    12
    Rep Power
    0
    been messing around with this, and got it to return the $before lines of each block of code, but is will not return the other data nor the $after lines

    Also it is not stripping the javascript or other tags out, just clearing all white space from the code.
  18. #10
  19. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Jan 2005
    Posts
    1,586
    Rep Power
    275
    Why not just put it into an array and then you can format it any way you want. This is an old script that still works...


    PHP Code:
    <?php

    // $html would be the Status Report you want to process!

    // sub string from where we need our data to where our data ends...

    // start

    $html substr $htmlstripos $html'the status' ) );

    // end

    $html substr $html0strripos $html'<br><br>' ) );

    // setup the html... (get it ready to convert);

    $regex = array ( '#<th.*>#Uis''#<tr.*>#Uis'
                
    '#<td.*>#Uis''#<\/?table.*>#Uis'
                
    '#<\/th.*><\/tr.*>#Uis''#&nbsp;#Uis'
                
    '#<\/tr.*>#Uis''#<\/td.*>#Uis' );

    $replace = array ( '<th>''<tr>'
                
    '<td>'''
                
    ''''
                
    '</tr>''</td>' );

    $html preg_replace $regex$replace$html );

    // split the data up starting at each title element (header)

    $parts explode '<tr><th>'$html );

    // set our output container

    $out = array ();

    // set our main header text

    $out['header'] = strip_tags $parts[0] );

    // remove $parts[0] = (our header text) and reset the $parts array!

    array_shift $parts );

    // now build our data array

    foreach ( $parts AS $data )
    {
        
    // split the data fields (IE: <td>name</td>||<td>value</td>)

        
    $data str_replace '</td><td>''</td>||<td>'$data );

        
    // create a new data array for each new (<tr><td>)

        
    $data explode '<tr><td>'$data );

        
    // the first element $data[0] is always the header for this data block

        
    $header trim array_shift $data ) );

        
    // go through each <TR> tag set (IE: <tr><td>? = name</td><td>? = value</td</tr>)

        
    foreach ( $data AS $item )
        {
            
    // get the name value pairs

            
    list ( $name$value ) = array_map 'trim'explode '</td>||<td>'substr $item0strpos $item'</td></tr>' ) ) ) );

            
    // there is one case where the structure may cause a false positive, so we catch it here...

            
    if ( ! empty ( $name ) && ! empty ( $value ) )
            {
                
    $out[$header][$name] = $value;
            }
        }
    }

    // print out the result array...

    print_r $out );
        
    ?>
    It will output this...

    Code:
    Array
    (
        [header] => The status of the Republic of Canyon Land (#750).
    
    
    
        [The Basics] => Array
            (
                [Turns Left] => 10
                [Turns Taken] => 2938
                [Rank] => 38
                [Networth] => $18,143,148
            )
    
        [Current Status] => Array
            (
                [Money] => $238,681,220
                [Population] => 501,921
                [Land] => 18865 Acres
                [Food] => 2,165,120 bushels
                [Production] => 7 bushels
                [Consumption] => 23,525 bushels
                [Net Change] => -23,518 bushels
                [Oil] => 1,140,920 barrels
            )
    
        [Economics] => Array
            (
                [Tax Revenues] => $10,375,953
                [Tax Rate] => 35%
                [Per Capita Income] => $59.06
                [Expenses] => $4,419,146
                [Military] => $4,112,567
                [Alliance/GDI] => $117,929
                [Land] => $188,650
                [Net Income] => $5,956,807
            )
    
        [Land Distribution] => Array
            (
                [Enterprise Zones] => 8663
                [Residences] => 8663
                [Industrial Complexes] => 260
                [Military Bases] => 960
                [Construction Sites] => 300
                [Unused Lands] => 19
            )
    
        [Military Forces] => Array
            (
                [Spies] => 218,990
                [Troops] => 4,807,197
                [Jets] => 9,142,002
                [Turrets] => 4,348,609
                [Tanks] => 1,328,015
                [Nuclear Missiles] => 5
                [Chemical Missiles] => 12
                [Cruise Missiles] => 9
            )
    
        [Technology] => Array
            (
                [Military] => 288,889
                [Medical] => 18,716
                [Business] => 508,812
                [Residential] => 509,326
                [Agricultural] => 2316
                [Warfare] => 2931
                [Military Strategy] => 9334
                [Weapons] => 827
                [Industrial] => 8651
                [Spy] => 3587
                [SDI] => 190,345
            )
    
    )
  20. #11
  21. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2009
    Posts
    12
    Rep Power
    0
    Thanks, but I only want to pull the table out of the code... Putting it into an array is not useful, I want the output to look just like the orginal table, but I am going to add some things in after it strips the useless data off.

IMN logo majestic logo threadwatch logo seochat tools logo