Page 2 of 2 First 12
  • Jump to page:
    #16
  1. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    776
    Rep Power
    495
    HI,

    I'll explain a bit the code.

    There are two main instructions in the code I posted. I'll describe in detail the second one (lines 7 to 9), because the first one is a bit tricky with nested instructions and more difficult to explain to someone not used to these things.

    This is the second one:

    Perl Code:
         my $out_line = $start . "##" . join ";",
    		map {$_->[0]} 
    		sort {$a->[1] <=> $b->[1]} @fields;


    Before executing this line, @fields is an array of arrays containing the following data structure (assuming we've just read the first line of your example):

    Code:
    0  ARRAY(0x3031ada8)
       0  'MF01624.15'
       1  13
    1  ARRAY(0x3031ae08)
       0  'MF01624.15'
       1  900
    2  ARRAY(0x3031add8)
       0  'MF05188.12'
       1  133
    3  ARRAY(0x3031ae48)
       0  'MF05192.13'
       1  273
    4  ARRAY(0x3031ae88)
       0  'MF05190.13'
       1  430
    5  ARRAY(0x3031aec8)
       0  'MF00488.16'
       1  569
    Each of the arrays within the main array contains a pair of values: the ID and the priority. We can now sort this array in accordance with the priority (second field, index 1).

    Perl Code:
    my @sorted_array = sort {$a->[1] <=> $b->[1]} @fields;


    The following is the structure of @sorted_array, the same thing as before but ordered in accordance with the priority.

    Code:
    0  ARRAY(0x3031ada8)
       0  'MF01624.15'
       1  13
    1  ARRAY(0x3031add8)
       0  'MF05188.12'
       1  133
    2  ARRAY(0x3031ae48)
       0  'MF05192.13'
       1  273
    3  ARRAY(0x3031ae88)
       0  'MF05190.13'
       1  430
    4  ARRAY(0x3031aec8)
       0  'MF00488.16'
       1  569
    5  ARRAY(0x3031ae08)
       0  'MF01624.15'
       1  900
    Now, we need to extract the ID from this array. The map function below looks at each element of the main array (i.e. the references to the arrays), extracts the first field (index 0) and returns the list of those first fields.

    Perl Code:
    my @final_array = map {$_->[0]} @sorted_array;


    The @final_array array contains now the list of IDs correctly sorted:

    Code:
    0  'MF01624.15'
    1  'MF05188.12'
    2  'MF05192.13'
    3  'MF05190.13'
    4  'MF00488.16'
    5  'MF01624.15'
    The join function merges the array into a string (with elements separated by semi-colons).

    Perl Code:
    my $reordered_line = join ";", @final_array;


    Finally, the line header is merged to the calculated string:

    Perl Code:
    my $out_line = $start . "##" . $reordered_line;


    The point with these list operators is that each takes on the right side a list of items, does something on the list (like sorting it, modifying the items, removing some of them, etc.) and returns the new list on the left side, where it can be fed to another of these operators, so that you don't need temporary intermediate variables. In the end, this is very powerful, but not necessarily very clear.

    Lines 3 to 9 of my previous code could actually be merged into a single instruction (rather than 2) in the same way, without the need for the @fields array:

    Perl Code:
    while (my $line = <$FILE_IN>) {	
    	my ($start, $end) = split /##/, $line;
    	my $out_line = $start . "##" . join ";", 
    		map {$_->[0]} 
    		sort {$a->[1] <=> $b->[1]}  
    		map { my ($id, @values) = split /;/, $_; 
    			@values = map { (split /-/, $_)[0]} @values;
    			map {[$id, $_]} @values;} 
    		split /@/, $end;
    }


    I did not do it this way in what I posted yesterday, because this is getting quite hairy to understand and difficult to debug if something is wrong. With two steps, I could check that my data structure (the @fields array) as shown above was the way I wanted it.

    If you want to read such types of instructions, you have to start from bottom up and from right to left, to understand how the data gets transformed step by step by each successive operator.

    I hope this helps. Please let me know if you need further information.
    Last edited by Laurent_R; April 16th, 2013 at 04:12 PM. Reason: Corrected a couple of typos and improved layout
  2. #17
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2013
    Posts
    8
    Rep Power
    0

    To find tandem repeats


    Hi

    I am having a file with some ids. I need to find the tandem repeats.

    input file:

    AB;CD;AC;DE
    AD;CD;CA;AD;CD;CA;AD;CD;CA
    AF;BF;GF;AF;BF;GF;AF;BF;GF
    AF;CD;BF

    Output file:
    AD;CD;CA;AD;CD;CA;AD;CD;CA
    AF;BF;GF;AF;BF;GF;AF;BF;GF
  4. #18
  5. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    776
    Rep Power
    495
    Hi,

    assuming you are only dealing with neighbour IDs, I think you should probably build a hash with each key pair found in the file. Each time, before inserting a pair into a hash, check whether is is already defined.

    As you are not giving too many details, I'll just give you a very basic solution that you might want to complete:

    Code:
    my %seen;
    while (my $line = <$INPUT>) {
    	chomp $line;
    	my @ids = split /;/, $line;
    	foreach (0.. $#ids-1)  {
    		my $key = join ";", $ids[$_], $ids[$_+1];
    		if (exists $seen{$key}) {
    			print $key, "\n";
    		} else {
    			$seen($key} = 1;
    		}
    	}
    }
  6. #19
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2013
    Posts
    8
    Rep Power
    0
    Thank You. It looks some what complicated, if u make it simple means it will be helpful for me since am a beginner
  8. #20
  9. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    776
    Rep Power
    495
    I can explain it, but it will be hard to make it much simpler.

    I'll try later to provide a slightly simplified code (not using implicit defaults) along with comments in the code.
  10. #21
  11. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    776
    Rep Power
    495
    Hi,

    the general idea is that I read the file line by line. Suppose that I am reading this line: "AB;CD;AC;DE". I first split the line on the semi-colon so as to get an array containing 4 elements: AD, CD, AC, DE. I then walk through the array to create the following pairs: "AB;CD", "CD;AC" and "AC;DE". I store each of these pairs into a hash, unless the pair already exists, in which case it means that I have already seen this pair, meaning that I found a "tandem repeat".



    Perl Code:
    my %seen;     # hash to store the pairs of IDs and find out if a pair has already been seen
    while (my $line = <$INPUT>) {      # reading the file line by line
    	chomp $line;      # removing trailing newline character from the line
    	my @ids = split /;/, $line;      # splitting the line into an array of elements
    	my $max_nr = $#ids-1;        # $#ids is the subscript of the last element of the array
    	foreach my $subscript (0.. $max_nr)  {      # iterating on every number between 0 and $max_nr
    		my $key = join ";", $ids[$subscript], $ids[$subscript + 1];      # concatenating the n th element of the array with the n+1 th element to create the hash key
    		if (exists $seen{$key}) {      # if this hash element exists, this a pair that has already been seen
    			print $key, "\n";      # we print the pair that already been seen
    		} else {
    			$seen($key} = 1; # we create the hash element for future searches
    		}
    	}
    }


    I hope this is now clear for you.
Page 2 of 2 First 12
  • Jump to page:

IMN logo majestic logo threadwatch logo seochat tools logo