Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support |

#1
February 28th, 2013, 09:33 AM
 gw1500se
Contributing User

Join Date: Jul 2003
Posts: 2,875
Time spent in forums: 1 Year 1 Week 6 Days 5 h 34 m 24 sec
Reputation Power: 581
Complex (for me) string manipulations

I am trying to write a script that modifies tables in Wiki markup (not that that really matters) without changing the rest of the page content. The make up of my string is as follows.
1) Text preceding a table
2) A table
3) Text following the table
4) If another table then 2)
5) else done

That is essentially how I want to split the string. A table start is identified by a double vertical bar or pipes (||). There are many pipes (including double pipes) in the table to denote cells and the end of a row is denoted by a pipe and new line. The real problem (for me) is finding the end of the table. When a row ends (|\n) and other than white space and pipe is found, the table has ended. But this is after the fact.

I could use some help figuring out how to construct the logic for this. I can find the start of a table with index and the double pipe then split the string at that point for 1). What I don't know how to do is find the last pipe of that table without running into the first pipe of the next table. Once I do that I'm home free as I can just look for the double pipe again in what is left of the string. TIA
__________________
There are 10 kinds of people in the world. Those that understand binary and those that don't.

Last edited by gw1500se : February 28th, 2013 at 11:18 AM.

#2
February 28th, 2013, 10:57 AM
 Laurent_R
Contributing User

Join Date: Jun 2012
Posts: 504
Time spent in forums: 4 Days 18 h 54 m 9 sec
Reputation Power: 385
If the volume of your data is not too big, one easy approach would be to store your input in an array of lines. It is then easier to go back to the previous line when you find a line end.

Another approach is deferred action by having always two lines (current and next) in memory: only do what you want to do to line n after you have checked what line n + 1 has.

It would be much easier to help you if you provided some samples lines of your input.

#3
February 28th, 2013, 11:12 AM
 gw1500se
Contributing User

Join Date: Jul 2003
Posts: 2,875
Time spent in forums: 1 Year 1 Week 6 Days 5 h 34 m 24 sec
Reputation Power: 581
Thanks for the reply. This exists as a single string but I suppose I could break it up on new line characters. What I can't predict is how well behaved the string will be. Supposed a new line character shows up in the middle of a cell (although unusual it is not illegal)?

Here is a sample of the Wiki markup:
Code:
```h2. Available FBE's\\

|| h4. Products ||

| [SPVTG_FBE-1.0_fc10-i386_SCM-GW7908|PBE:SPVTG_FBE-1.0_fc10-i386_SCM-GW7908] | [SPVTG_FBE-1.0_fc13-i386_SCM-G8|PBE:SPVTG_FBE-1.0_fc13-i386_SCM-G8] | [SPVTG_FBE-1.0_fc13-i386_SCM-G8-RP|PBE:SPVTG_FBE-1.0_fc13-i386_SCM-G8-RP]\\ |
| [SPVTG_FBE-1.0_fc6-i386_SCM-RTN|PBE:SPVTG_FBE-1.0_fc6-i386_SCM-RTN] | [SPVTG_FBE-1.0_ubuntu10.04-i386_CSWBU-YBE|PBE:SPVTG_FBE-1.0_ubuntu10.04-i386_CSWBU-YBE] | [SPVTG_FBE-1.1_fc10-i386_SCM-G6-RP|PBE:SPVTG_FBE-1.1_fc10-i386_SCM-G6-RP] |
| [SPVTG_FBE-1.0_fc3-i386_SCM-NGP|PBE:SPVTG_FBE-1.0_fc3-i386_SCM-NGP]\\ |

h5. &nbsp;\\

|| h4. Shrinkwraps ||

| [cel5.03-i386-1.0|PBE:cel5.03-i386-1.0] | [cel5.03-i386-2.0|PBE:cel5.03-i386-2.0] | [cel5.03-i386-2.1|PBE:cel5.03-i386-2.1] | [cel5.03-x86_64-1.0|PBE:cel5.03-x86_64-1.0] |
| [cel5.03-x86_64-2.0|PBE:cel5.03-x86_64-2.0] | [cel5.03-x86_64-2.1|PBE:cel5.03-x86_64-2.1] | [cel5.50-x86_64-1.0|PBE:cel5.50-x86_64-1.0] | [cel5.50-x86_64-1.1|PBE:cel5.50-x86_64-1.1] |
| [cel6.20-x86_64-1.0|PBE:cel5.50-x86_64-1.0] | [cel6.20-x86_64-1.1|PBE:cel5.50-x86_64-1.1] | [f10-i386_1.0|PBE:f10-i386_1.0] | [f13-i386_1.0|PBE:f13-i386_1.0] |
| [f15-i386_1.0|PBE:f15-i386_1.0] | [fc10-1.1_i386-SCM-G6-RP|PBE:fc10-1.1_i386-SCM-G6-RP] | [fc3-1.1_i386-SCM_NGP|PBE:fc3-1.1_i386-SCM_NGP] | [fc3-i386_1.0|PBE:fc3-i386_1.0] |
| [rhel3-i386_1.0|PBE:rhel3-i386_1.0] | [rhel4.8-i386_1.0|PBE:rhel4.8-i386_1.0] | [suse-8.1-i386_1.0|PBE:suse-8.1-i386_1.0] | [ubuntu10.04-amd64_1.0-ALPHA|PBE:ubuntu10.04-amd64_1.0-ALPHA] |
| [ubuntu10.04-i386_1.0|PBE:ubuntu10.04-i386_1.0] |

h5. &nbsp;\\

```

#4
February 28th, 2013, 01:23 PM
 gw1500se
Contributing User

Join Date: Jul 2003
Posts: 2,875
Time spent in forums: 1 Year 1 Week 6 Days 5 h 34 m 24 sec
Reputation Power: 581
I guess I got it. It was a brute force method using index and rindex but it works.

#5
March 1st, 2013, 01:43 AM
 Laurent_R
Contributing User

Join Date: Jun 2012
Posts: 504
Time spent in forums: 4 Days 18 h 54 m 9 sec
Reputation Power: 385
Quote:
 Originally Posted by gw1500se Thanks for the reply. This exists as a single string but I suppose I could break it up on new line characters. What I can't predict is how well behaved the string will be. Supposed a new line character shows up in the middle of a cell (although unusual it is not illegal)?

You could split on pīpe + new line:

Perl Code:
 Original - Perl Code
```my @lines = split /\|\n/, \$input;
```

#6
March 1st, 2013, 06:52 AM
 gw1500se
Contributing User

Join Date: Jul 2003
Posts: 2,875
Time spent in forums: 1 Year 1 Week 6 Days 5 h 34 m 24 sec
Reputation Power: 581
Thanks but that would not distinguish one table from the next nor separate out the intervening text.

#7
March 1st, 2013, 09:20 AM
 Laurent_R
Contributing User

Join Date: Jun 2012
Posts: 504
Time spent in forums: 4 Days 18 h 54 m 9 sec
Reputation Power: 385
It would not distinguish by itself, but it would enable you to look up what comes at the beginning of the next line to figure out whether you are at the end of a table or not.

#8
March 1st, 2013, 10:23 AM
 gw1500se
Contributing User

Join Date: Jul 2003
Posts: 2,875
Time spent in forums: 1 Year 1 Week 6 Days 5 h 34 m 24 sec
Reputation Power: 581
That brings me back to my brute force method finding a non-table amid any amount of white space then rindexing back to the end of that table. I'm satisfied that what I have is doing what I need. Thanks.

 Viewing: Dev Shed Forums > Programming Languages > Perl Programming > Complex (for me) string manipultations