#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2013
    Posts
    1
    Rep Power
    0

    Quick help with a Perl script


    Hi all,

    I'm writing a program to replace A's and G's in a text file I have. This is the code I have:

    #!usr/bin/perl

    open(FILE, "whatshappenin") or die $!;

    #QUICKEST WAY OF READING A LINE BY LINE INTO AN ARRAY
    foreach $line (<FILE>)
    {
    chomp($line);

    my @array_of_chars = split(//,$line);
    my $array_of_charsSize = $#array_of_chars + 1;
    if($array_of_charsSize < 1480)
    {
    #THIS PRINTS THE SEQUENCE TITLES
    print"$line\n";
    }else{
    #LOOP THROUGH SEQUENCE DATA ONE CHARACTER AT A TIME
    for($i=0;$i<$array_of_charsSize;$i++)
    {
    #ONLY EDIT ONCE YOU REACH GAP PARTITION (1416 FOR TRN)
    if($i>1416)
    {
    #CHANGE PRESENCE OF DATA 'A' TO 1
    if($array_of_chars[$i] == "A" || $array_of_chars[$i] == "a")
    {
    $array_of_chars[$i] = 0;
    #CHANCE PRESENCE OF GAP 'G' TO 0
    }elsif($array_of_chars[$i] == "G" || $array_of_chars[$i] == "g")
    {
    $array_of_chars[$i] = 1;
    }
    }
    }
    #PRINT EDITED SEQUENCE
    $str = "@array_of_chars";
    $str =~ s/(.)\s/$1/seg;
    print"$str\n";

    }
    }

    Unfortunately it does not recognize any G's and just changes all the characters to 0's once $i>1416. I have a feeling it has to do with the equality statement inside my ifelseif statements, but I can't figure it out. Do I need to use ASCII values or something?

    Any help is much appreciated!
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    May 2007
    Posts
    765
    Rep Power
    929
    First, please put code between [code]...[/code] tags to preserve formatting and otherwise protect it from being mangled by the forum software.

    Second, you've picked up a lot of bad habits. That might just be indicative of a background in a language like C or Java--but if you've been using a tutorial or similar reference I'd recommend you find a different one. The official perl documentation is pretty good for the basics (perldoc perlintro is a good place to start).


    The problem you're having at the moment is that "$x == $y" performs a numeric comparison--and any values that aren't numbers are treated as zero. In order to compare text you need to use string comparison "$x eq $y".

    Code:
    # Prints;  'a' and 'b' are non-numeric and are
    # both converted to 0
    print 'equal' if 'a' == 'b';
    
    # Doesn't print
    print 'equal' if 'a' eq 'b';
    But there's a lot more you should think about with this script. First, perl would have told you about this problem had you turned on warnings. It's good practice to start a script with "use warnings;". "use strict;" is also a good idea for catching typos in variable names and other common errors.

    Next, when you recombine @array_of_chars into a string via interpolation you introduce a mess of spaces which you later have to strip out. Perl has a join() function that would do this much neater. (Or if you look in "perldoc perlvar" you'll see how to control the way "@array" joins the values.)

    Code:
    @array = ('a', 'b', 'c');
    print join '-', @array;  # prints a-b-c
    $" = '+';
    print "@array";  # prints a+b+c
    But there's no need to split apart the line in the first place. Perl has many useful tools for manipulating text. Substituting one character for another can be more easily done with the s/// or tr/// operators. You can manipulate part of a string with the substr() and length() functions.

    Code:
    if( length($line) > some_amount ) {
      # Divide the line into the part that doesn't get changed
      # and the part you want to change
      $start = substr( $line, 0, some_amount );
      $end = substr( $line, some_amount );
    
      # Make some substitutions in the line
      $end =~ s/replace-this/with-that/g;
      $end =~ tr/change-these-chars/to-these-chars/;
    }
    Using a while loop is better practice than a for loop when reading a file. And in the case of simply programs based around reading a line and modifying it the -p or -n switches can make your code easier to read by hiding the boilerplate. And finally there are some hazards of using open() with just two arguments. Putting that together you'll end up with something like:

    Code:
    #!perl
    use strict;
    use warnings;
    
    open FILE, '<', 'input.txt'
      or die $!;
    
    while( defined( $line = <FILE> ) ) {
       # ...
       # Perform some manipulation of $line
       # ...
    
       print $line;
    }
    
    close FILE;
    If you end up thinking about individual characters or writing long comparisons in perl, there's probably a better way. Perl is very good a describing your intent at a high level. For example, your problem can be solved with just a single line! For the moment, a while() { substr; tr/// } construction may be easiest to understand, but with practice the following (please excuse my showing off) should become natural:

    Code:
    #!perl -p
    substr($_,1416) =~ tr/AaGg/0011/;

    Comments on this post

    • Axweildr agrees
    • Laurent_R agrees
    sub{*{$::{$_}}{CODE}==$_[0]&& print for(%:: )}->(\&Meh);

IMN logo majestic logo threadwatch logo seochat tools logo