#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2017
    Posts
    4
    Rep Power
    0

    how to extract specific portion of a line


    Hi,
    I am not so used to in PERL. I have one input file and the format is something line below:

    Comparing test alignment in alignment.msf
    with reference alignment in ../msf_file/2hsdA_ref2.msf

    SP score= 0.149123

    TC score= 0.000000

    Comparing test alignment in align.msf
    with reference alignment in ../msf_file/2hsdA_ref2.msf

    SP score= 0.071595

    TC score= 0.000000

    Comparing test alignment in alignment.msf
    with reference alignment in ../msf_file/2hsdA_ref2.msf

    SP score= 0.218159

    TC score= 0.000000


    Now I want to extract the numeric values after SP score and TC score and want to store them in an output file.

    For that I have built a Perl script as shown below:

    #! /usr/bin/perl/

    use warnings;

    if($#ARGV!=0){

    die "please provide the file name in the command line\n";

    }

    my ($file_name)=@ARGV;

    open(INP,"$file_name") or die "cannot open the file $!\n";

    open(OUTP,">result_table_$file_name.txt") or die "cannot open the file $!\n";

    @file=<INP>;

    foreach $line(@file){

    if($line =~/^\s*$/){
    next;
    }

    elsif($line=~/SP score/){

    ($sop1)=($line=~/^\s*SP score=(\d*)/);

    }
    elsif($line=~/TC score/){

    ($tc1)=($line=~/^\s*TC score=(\d*)/);

    }
    print OUTP"$sop1\t\t$tc1\n";

    }
    close(INP);
    close(OUTP);

    However, while running the program, I am getting an error something like:

    Use of uninitialized value $sop1 in concatenation (.) or string at extract_parameters.pl line ....

    Please help me to resolve this issue. It is very urgent for me.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Location
    Minnesota, USA
    Posts
    30
    Rep Power
    66
    Given input that looks like:

    Comparing test alignment in alignment.msf
    with reference alignment in ../msf_file/2hsdA_ref2.msf

    SP score= 0.149123

    TC score= 0.000000

    You're finding the SP and TC lines, and dropping the blank lines. But, what about the "Comparing" and "with" lines? They aren't accounted for at all.

    Also, your code assumes one output line per input line, but you need to process several input lines in order to produce the output line. You need a bit more logic to handle the input of multiple lines before you produce the output line.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Location
    Minnesota, USA
    Posts
    30
    Rep Power
    66
    First pass, I changed the tests to:

    if($line !~/score/){
    next;
    }

    elsif($line=~/SP\sscore=\s*([0123456789.]*)/){
    print "found SP = $1\n";
    $sop1 = $1;

    }
    elsif($line=~/TC\sscore=\s*([0123456789.]*)/){
    print "found TC = $1\n";
    $tc1 = $1;

    }
    print OUTP "$sop1\t\t$tc1\n";

    \d doesn't include the period, so I expanded the class of characters to match. Also, there was no reason to use two separate regular expressions to capture the number; it could be done right in the test. The list assignment you were attempting for $sop1 and $tc1 didn't work at all in any case; The captured value from the regex is returned in $1, not as a direct result of the regex.

    The code still has the many lines to one conflict I sited above, but that should just be keeping track of where you are in the sequence (i.e. "Did I get an SP? Did I get a TC? Then output a record and clear my statuses")
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2017
    Location
    Minnesota, USA
    Posts
    30
    Rep Power
    66
    And as a final pass, to keep track of the necessary values to produce an output line:

    $sop1 = "";
    $tc1 = "";

    foreach $line(@file){

    if($line !~/score/){
    next;
    }

    elsif($line=~/SP\sscore=\s*([0123456789.]*)/){
    print "found SP = $1\n";
    $sop1 = $1;
    }
    elsif($line=~/TC\sscore=\s*([0123456789.]*)/){
    print "found TC = $1\n";
    $tc1 = $1;
    }

    if ($sop1 ne "" && $tc1 ne "") {
    print OUTP "$sop1\t\t$tc1\n";
    $sop1 = "";
    $tc1 = "";
    }
    }
    close(INP);
    close(OUTP);

IMN logo majestic logo threadwatch logo seochat tools logo