#1
  1. C Neophyte.
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2004
    Location
    Melbourne Australia
    Posts
    405
    Rep Power
    46

    Graceful extraction of all tagged items


    It's been a while.

    I've been writing a parser that extracts tagged items from a templating engine. The templated file is of the format:

    Code:
    content xyz &abc&&cde&&qrt&
    &abc&&srt&
    &pax&

    My code so far is such:
    perl Code:
     
    use strict;
    use warnings;
     
    $/ = '';
     
    open IN, 'list-of-tags.txt';
    my $str = '';
     
    while(<IN>) {
    	chomp;
    	$str = $_;	
    }
     
    while($str =~ m!(&(.+?)&)!i) {
    	my $extract = $1;
    	$str =~ s!$extract!!i;
    	print $extract."\n";
    }
    close IN;



    I feel it could be optimised further- the way I'm doing it is just plain nasty.

    What's the cleanest way to extract individual tags denoted by &..&?
    For one to know everything, first one must accept he knows nothing.
  2. #2
  3. Contributed User
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jun 2005
    Posts
    4,379
    Rep Power
    1871
    Originally Posted by man page
    Several special variables also refer back to portions of the previous
    match. $+ returns whatever the last bracket match matched. $& returns
    the entire matched string. (At one point $0 did also, but now it
    returns the name of the program.) "$`" returns everything before the
    matched string. "$'" returns everything after the matched string. And
    $^N contains whatever was matched by the most-recently closed group
    (submatch). $^N can be used in extended patterns (see below), for
    example to assign a submatch to a variable.
    So you could replace
    $str =~ s!$extract!!i;
    with
    $str = $';


    Or you could do this.
    Code:
    #!/usr/bin/perl -w
    use strict;
    
    my $line = "skip&WANT&skip&WANT&skip&WANT&skip";
    my @fields = split(/&/,$line);
    for ( my $i = 1 ; $i <= $#fields ; $i += 2 ) {
        print "$fields[$i]\n";
    }
    
    $ perl foo.pl
    WANT
    WANT
    WANT
    Between each pair of && which you want, you can also consider that the following thing between && is something you don't want. So instead, just use & as a field separator and take alternate fields.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    May 2007
    Posts
    765
    Rep Power
    929
    It would help if you explained exactly what you're trying to do. I suspect that the twisty matching and feeding back into a substituted has more to do with your solution that your problem

    Is this what you want to do?
    Code:
    F:\>perl -le "$str = 'stuff &capture& other stuff &get&'; print for $str =~ m/(&[^&]+&)/g"
    &capture&
    &get&
    (m//g in list context returns all matches)

    If you do need to remove the &XXX&'s from the string, s/// can both substitute and set the match variables:

    Code:
    F:\>perl -le "$str = 'stuff &capture& other stuff &get&'; print $& while $str =~ s/&[^&]+&//"
    &capture&
    &get&
    (BTW, your file slurp probably isn't doing what you expect--only the last line of the file is being saved in $str)

    Comments on this post

    • ishnid agrees
    sub{*{$::{$_}}{CODE}==$_[0]&& print for(%:: )}->(\&Meh);
  6. #4
  7. C Neophyte.
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2004
    Location
    Melbourne Australia
    Posts
    405
    Rep Power
    46
    Thanks guys

    As always, much to think about... I knew my hacked together thing was nasty but just trying to show what I was trying to do



    My input is actually a file with multiple lines, some lines may have one or more occurance of the tagged value



    EDIT: I didn't know that you could do a "for $str" like that

    mind blown !!!

    I know entirely why it's possible (the return product of the regex is an array) but didn't know that, in a greedy return it actually returns every instance
    Last edited by fuzzybunny; October 8th, 2012 at 07:15 PM.
    For one to know everything, first one must accept he knows nothing.

IMN logo majestic logo threadwatch logo seochat tools logo