#1
  1. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,232
    Rep Power
    593

    Complex (for me) string manipulations


    I am trying to write a script that modifies tables in Wiki markup (not that that really matters) without changing the rest of the page content. The make up of my string is as follows.
    1) Text preceding a table
    2) A table
    3) Text following the table
    4) If another table then 2)
    5) else done

    That is essentially how I want to split the string. A table start is identified by a double vertical bar or pipes (||). There are many pipes (including double pipes) in the table to denote cells and the end of a row is denoted by a pipe and new line. The real problem (for me) is finding the end of the table. When a row ends (|\n) and other than white space and pipe is found, the table has ended. But this is after the fact.

    I could use some help figuring out how to construct the logic for this. I can find the start of a table with index and the double pipe then split the string at that point for 1). What I don't know how to do is find the last pipe of that table without running into the first pipe of the next table. Once I do that I'm home free as I can just look for the double pipe again in what is left of the string. TIA
    Last edited by gw1500se; February 28th, 2013 at 11:18 AM.
    There are 10 kinds of people in the world. Those that understand binary and those that don't.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    776
    Rep Power
    495
    If the volume of your data is not too big, one easy approach would be to store your input in an array of lines. It is then easier to go back to the previous line when you find a line end.

    Another approach is deferred action by having always two lines (current and next) in memory: only do what you want to do to line n after you have checked what line n + 1 has.

    It would be much easier to help you if you provided some samples lines of your input.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,232
    Rep Power
    593
    Thanks for the reply. This exists as a single string but I suppose I could break it up on new line characters. What I can't predict is how well behaved the string will be. Supposed a new line character shows up in the middle of a cell (although unusual it is not illegal)?

    Here is a sample of the Wiki markup:
    Code:
    h2. Available FBE's\\                                                                                                                                                                                       
                                                                                                                                                                                                                
    || h4. Products ||                                                                                                                                                                                          
    
    | [ADBU-FBE-1.2_rh72_WAAS|PBE:ADBU-FBE-1.2_rh72_WAAS] | [FBE-1.0-fc10-i386_Gateway-7908|PBE:FBE-1.0-fc10-i386_Gateway-7908] | [SPVTG_FBE-1.0_fc10-i386_SCM-G6-RP|PBE:SPVTG_FBE-1.0_fc10-i386_SCM-G6-RP] |
    | [SPVTG_FBE-1.0_fc10-i386_SCM-GW7908|PBE:SPVTG_FBE-1.0_fc10-i386_SCM-GW7908] | [SPVTG_FBE-1.0_fc13-i386_SCM-G8|PBE:SPVTG_FBE-1.0_fc13-i386_SCM-G8] | [SPVTG_FBE-1.0_fc13-i386_SCM-G8-RP|PBE:SPVTG_FBE-1.0_fc13-i386_SCM-G8-RP]\\ |
    | [SPVTG_FBE-1.0_fc6-i386_SCM-RTN|PBE:SPVTG_FBE-1.0_fc6-i386_SCM-RTN] | [SPVTG_FBE-1.0_ubuntu10.04-i386_CSWBU-YBE|PBE:SPVTG_FBE-1.0_ubuntu10.04-i386_CSWBU-YBE] | [SPVTG_FBE-1.1_fc10-i386_SCM-G6-RP|PBE:SPVTG_FBE-1.1_fc10-i386_SCM-G6-RP] |
    | [SPVTG_FBE-1.0_fc3-i386_SCM-NGP|PBE:SPVTG_FBE-1.0_fc3-i386_SCM-NGP]\\ |
    
    h5.  \\
    
    || h4. Shrinkwraps ||
    
    | [cel5.03-i386-1.0|PBE:cel5.03-i386-1.0] | [cel5.03-i386-2.0|PBE:cel5.03-i386-2.0] | [cel5.03-i386-2.1|PBE:cel5.03-i386-2.1] | [cel5.03-x86_64-1.0|PBE:cel5.03-x86_64-1.0] |
    | [cel5.03-x86_64-2.0|PBE:cel5.03-x86_64-2.0] | [cel5.03-x86_64-2.1|PBE:cel5.03-x86_64-2.1] | [cel5.50-x86_64-1.0|PBE:cel5.50-x86_64-1.0] | [cel5.50-x86_64-1.1|PBE:cel5.50-x86_64-1.1] |
    | [cel6.20-x86_64-1.0|PBE:cel5.50-x86_64-1.0] | [cel6.20-x86_64-1.1|PBE:cel5.50-x86_64-1.1] | [f10-i386_1.0|PBE:f10-i386_1.0] | [f13-i386_1.0|PBE:f13-i386_1.0] |
    | [f15-i386_1.0|PBE:f15-i386_1.0] | [fc10-1.1_i386-SCM-G6-RP|PBE:fc10-1.1_i386-SCM-G6-RP] | [fc3-1.1_i386-SCM_NGP|PBE:fc3-1.1_i386-SCM_NGP] | [fc3-i386_1.0|PBE:fc3-i386_1.0] |
    | [fc6-i386_1.0|PBE:fc6-i386_1.0] | [GIS-suse-8.1-i386_1.0|PBE:GIS-suse-8.1-i386_1.0]\\ | [rh7.3-i386_1.0|PBE:rh7.3-i386_1.0] | [rh72_adbu_i386_1.0|PBE:rh72_adbu_i386_1.0] |
    | [rhel3-i386_1.0|PBE:rhel3-i386_1.0] | [rhel4.8-i386_1.0|PBE:rhel4.8-i386_1.0] | [suse-8.1-i386_1.0|PBE:suse-8.1-i386_1.0] | [ubuntu10.04-amd64_1.0-ALPHA|PBE:ubuntu10.04-amd64_1.0-ALPHA] |
    | [ubuntu10.04-i386_1.0|PBE:ubuntu10.04-i386_1.0] |
    
    h5.  \\
    There are 10 kinds of people in the world. Those that understand binary and those that don't.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,232
    Rep Power
    593
    I guess I got it. It was a brute force method using index and rindex but it works.
    There are 10 kinds of people in the world. Those that understand binary and those that don't.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    776
    Rep Power
    495
    Originally Posted by gw1500se
    Thanks for the reply. This exists as a single string but I suppose I could break it up on new line characters. What I can't predict is how well behaved the string will be. Supposed a new line character shows up in the middle of a cell (although unusual it is not illegal)?
    You could split on pīpe + new line:

    Perl Code:
    my @lines = split /\|\n/, $input;
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,232
    Rep Power
    593
    Thanks but that would not distinguish one table from the next nor separate out the intervening text.
    There are 10 kinds of people in the world. Those that understand binary and those that don't.
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    776
    Rep Power
    495
    It would not distinguish by itself, but it would enable you to look up what comes at the beginning of the next line to figure out whether you are at the end of a table or not.
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,232
    Rep Power
    593
    That brings me back to my brute force method finding a non-table amid any amount of white space then rindexing back to the end of that table. I'm satisfied that what I have is doing what I need. Thanks.
    There are 10 kinds of people in the world. Those that understand binary and those that don't.

IMN logo majestic logo threadwatch logo seochat tools logo