#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2017
    Posts
    3
    Rep Power
    0

    How do I extract specific lines from an input file based on the matching pattern


    Hello,
    I am trying to open an input file and trying to find a pattern which starts with '=' then a single gap and then a score which
    may be starts with 0.8something or 0.9 something. If it founds that then write those lines and their previous lines to an output file.

    Code:
    #! /usr/bin/perl/
    use strict;
    use warnings;
    if($#ARGV!=0){
    die "please provide the file name in the command line\n";
    }
    
    my $input_file=@ARGV;
    open (IN,"<$input_file")
    or die "cannot open the file\n";
    open(OUT,">extracted_baliscore.out")
    or die "cannot open the file\n";
    while(my $line=<IN>){
    if($line!~/^$/){
    $line=(s/\s+/ /g);
    @arr_line=split(' ',$line);
    for(my $count=0;$count<scalar@arr_line;$count++){
    $pattern=$arr_line[$count];
     if($pattern=~/= [8-9]+/){
      print OUT "$pattern\t";
      print OUT "$arr_line[$count-1]\n";
     }
    }
    }
    }
    but whenever I am tring to execute it says compilation errors.
    I am running the command : perl programname.pl inputfilename
    Please help as I am new in perl and have to execute this as early as I can.
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2003
    Location
    SC
    Posts
    3
    Rep Power
    0
    I need a little more information to complete your answer, but here's my take on what you have...
    - Can you provide a little bit of the input file to determine how many 'previous lines' you want put into the output file

    Otherwise, in your script...
    Code:
    my $input_file=@ARGV;
    Should be either
    Code:
    my $input_file= shift @ARGV;
    or
    Code:
    my $input_file=$ARGV[0];
    Both will do what you want. The first example pulls out the value of first element of the @ARGV array and puts it into the $input_file variable. The second does the same thing, but it explicitly gets the 0th element, and it also leaves it there in the @ARGV array. (if more elements were in @ARGV in the first example, they would all shift down by one)
    It only becomes important when you have more arguments to send to the per script, or expecting arguments in a certain order. (perlfile.pl (input file) (output file))

    The compilation errors I got can be resolved by putting
    Code:
    my
    before
    Code:
    @arr_line
    in line 14,
    Code:
    $pattern
    in line 16.

    The line @arr_line=split(' ',$line); removes all spaces and puts everything around them into an array, so you's have to test for the 0th element to be an '=', and the 1st element to be a '0.[89]'. You can do that in a nested if-then structure, but you can also test that without breaking the line into an array.:
    To match a line that begins with an equals sign, some whitespace, and a 0.8 or 0.9, the following should work:
    Code:
    if($line=~ /^=\s+0\.[89]/ ) {
    Once that pattern is matched, you can split the line up afterwards.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2017
    Posts
    3
    Rep Power
    0
    Thank you very much for your reply. I think I can now write the code. But if I face any problem then I will again contact you. One thing I want to clear that is that okay if I write
    the pattern matching part like: if($line=~ /^=\s+0.[89]/ ) instead of putting backslash (escaping character) after zero?
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2003
    Location
    SC
    Posts
    3
    Rep Power
    0
    If you leave the regular expression like 0.[89] then the dot will match any one character, rather than literally matching the decimal point. It may not be an issue. It just depends on what's in $line.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2017
    Posts
    3
    Rep Power
    0
    Okay. I got that. Thank you very much.

IMN logo majestic logo threadwatch logo seochat tools logo