The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Programming Languages
> Perl Programming
|
Complex (for me) string manipultations
Discuss Complex (for me) string manipultations in the Perl Programming forum on Dev Shed. Complex (for me) string manipultations Perl Programming forum discussing coding in Perl, utilizing Perl modules, and other Perl-related topics. Perl, the Practical Extraction and Reporting Language, is the choice for many for parsing textual information.
|
|
 |
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

February 28th, 2013, 09:33 AM
|
|
|
|
Complex (for me) string manipulations
I am trying to write a script that modifies tables in Wiki markup (not that that really matters) without changing the rest of the page content. The make up of my string is as follows.
1) Text preceding a table
2) A table
3) Text following the table
4) If another table then 2)
5) else done
That is essentially how I want to split the string. A table start is identified by a double vertical bar or pipes (||). There are many pipes (including double pipes) in the table to denote cells and the end of a row is denoted by a pipe and new line. The real problem (for me) is finding the end of the table. When a row ends (|\n) and other than white space and pipe is found, the table has ended. But this is after the fact.
I could use some help figuring out how to construct the logic for this. I can find the start of a table with index and the double pipe then split the string at that point for 1). What I don't know how to do is find the last pipe of that table without running into the first pipe of the next table. Once I do that I'm home free as I can just look for the double pipe again in what is left of the string. TIA
__________________
There are 10 kinds of people in the world. Those that understand binary and those that don't.
Last edited by gw1500se : February 28th, 2013 at 11:18 AM.
|

February 28th, 2013, 10:57 AM
|
|
|
|
If the volume of your data is not too big, one easy approach would be to store your input in an array of lines. It is then easier to go back to the previous line when you find a line end.
Another approach is deferred action by having always two lines (current and next) in memory: only do what you want to do to line n after you have checked what line n + 1 has.
It would be much easier to help you if you provided some samples lines of your input.
|

February 28th, 2013, 11:12 AM
|
|
|
Thanks for the reply. This exists as a single string but I suppose I could break it up on new line characters. What I can't predict is how well behaved the string will be. Supposed a new line character shows up in the middle of a cell (although unusual it is not illegal)?
Here is a sample of the Wiki markup:
Code:
h2. Available FBE's\\
|| h4. Products ||
| [ADBU-FBE-1.2_rh72_WAAS|PBE:ADBU-FBE-1.2_rh72_WAAS] | [FBE-1.0-fc10-i386_Gateway-7908|PBE:FBE-1.0-fc10-i386_Gateway-7908] | [SPVTG_FBE-1.0_fc10-i386_SCM-G6-RP|PBE:SPVTG_FBE-1.0_fc10-i386_SCM-G6-RP] |
| [SPVTG_FBE-1.0_fc10-i386_SCM-GW7908|PBE:SPVTG_FBE-1.0_fc10-i386_SCM-GW7908] | [SPVTG_FBE-1.0_fc13-i386_SCM-G8|PBE:SPVTG_FBE-1.0_fc13-i386_SCM-G8] | [SPVTG_FBE-1.0_fc13-i386_SCM-G8-RP|PBE:SPVTG_FBE-1.0_fc13-i386_SCM-G8-RP]\\ |
| [SPVTG_FBE-1.0_fc6-i386_SCM-RTN|PBE:SPVTG_FBE-1.0_fc6-i386_SCM-RTN] | [SPVTG_FBE-1.0_ubuntu10.04-i386_CSWBU-YBE|PBE:SPVTG_FBE-1.0_ubuntu10.04-i386_CSWBU-YBE] | [SPVTG_FBE-1.1_fc10-i386_SCM-G6-RP|PBE:SPVTG_FBE-1.1_fc10-i386_SCM-G6-RP] |
| [SPVTG_FBE-1.0_fc3-i386_SCM-NGP|PBE:SPVTG_FBE-1.0_fc3-i386_SCM-NGP]\\ |
h5. \\
|| h4. Shrinkwraps ||
| [cel5.03-i386-1.0|PBE:cel5.03-i386-1.0] | [cel5.03-i386-2.0|PBE:cel5.03-i386-2.0] | [cel5.03-i386-2.1|PBE:cel5.03-i386-2.1] | [cel5.03-x86_64-1.0|PBE:cel5.03-x86_64-1.0] |
| [cel5.03-x86_64-2.0|PBE:cel5.03-x86_64-2.0] | [cel5.03-x86_64-2.1|PBE:cel5.03-x86_64-2.1] | [cel5.50-x86_64-1.0|PBE:cel5.50-x86_64-1.0] | [cel5.50-x86_64-1.1|PBE:cel5.50-x86_64-1.1] |
| [cel6.20-x86_64-1.0|PBE:cel5.50-x86_64-1.0] | [cel6.20-x86_64-1.1|PBE:cel5.50-x86_64-1.1] | [f10-i386_1.0|PBE:f10-i386_1.0] | [f13-i386_1.0|PBE:f13-i386_1.0] |
| [f15-i386_1.0|PBE:f15-i386_1.0] | [fc10-1.1_i386-SCM-G6-RP|PBE:fc10-1.1_i386-SCM-G6-RP] | [fc3-1.1_i386-SCM_NGP|PBE:fc3-1.1_i386-SCM_NGP] | [fc3-i386_1.0|PBE:fc3-i386_1.0] |
| [fc6-i386_1.0|PBE:fc6-i386_1.0] | [GIS-suse-8.1-i386_1.0|PBE:GIS-suse-8.1-i386_1.0]\\ | [rh7.3-i386_1.0|PBE:rh7.3-i386_1.0] | [rh72_adbu_i386_1.0|PBE:rh72_adbu_i386_1.0] |
| [rhel3-i386_1.0|PBE:rhel3-i386_1.0] | [rhel4.8-i386_1.0|PBE:rhel4.8-i386_1.0] | [suse-8.1-i386_1.0|PBE:suse-8.1-i386_1.0] | [ubuntu10.04-amd64_1.0-ALPHA|PBE:ubuntu10.04-amd64_1.0-ALPHA] |
| [ubuntu10.04-i386_1.0|PBE:ubuntu10.04-i386_1.0] |
h5. \\
|

February 28th, 2013, 01:23 PM
|
|
|
|
I guess I got it. It was a brute force method using index and rindex but it works.
|

March 1st, 2013, 01:43 AM
|
|
|
Quote: | Originally Posted by gw1500se Thanks for the reply. This exists as a single string but I suppose I could break it up on new line characters. What I can't predict is how well behaved the string will be. Supposed a new line character shows up in the middle of a cell (although unusual it is not illegal)?
|
You could split on pīpe + new line:
Perl Code:
Original
- Perl Code |
|
|
|
my @lines = split /\|\n/, $input;
|

March 1st, 2013, 06:52 AM
|
|
|
|
Thanks but that would not distinguish one table from the next nor separate out the intervening text.
|

March 1st, 2013, 09:20 AM
|
|
|
|
It would not distinguish by itself, but it would enable you to look up what comes at the beginning of the next line to figure out whether you are at the end of a table or not.
|

March 1st, 2013, 10:23 AM
|
|
|
|
That brings me back to my brute force method finding a non-table amid any amount of white space then rindexing back to the end of that table. I'm satisfied that what I have is doing what I need. Thanks.
|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|