#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    1
    Rep Power
    0

    Frequency Report


    I have a pipe delimited file, with 3 columns

    example:

    A B C
    A B B
    B B A
    B B C
    C A C

    I want to generate a frequency report on how often each variable appears in a column - something similar to below

    A 2 1 1
    B 2 4 1
    C 1 0 3

    I am new to Perl, so I am not sure how to accomplish this

    Thanks - Daniel
  2. #2
  3. !~ /m$/
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    May 2004
    Location
    Reno, NV
    Posts
    4,259
    Rep Power
    1810
    Code:
    #!/usr/bin/perl
    use strict;
    use warnings;
    
    use Data::Dumper;
    
    my %hash;
    while (<DATA>) {
    	# remove newlines and split the columns on tabs
    	chomp;
    	my @row = split/\t/;
    	
    	# get the number of columns
    	my $n = @row;
    	
    	# for each column in the array
    	for (my $i=0; $i < $n; $i++) {
    		# store the element as the key, and increment the count for that column number
    		# each key in the hash will have it's own array of column counts
    		$hash{$row[$i]}[$i]++;
    	}
    }
    
    # uncomment the next line to see the built structure
    #print Dumper \%hash;
    
    # sort the hash to prepare for printing
    foreach my $key (sort keys %hash) {
    	# get a pointer to the stored array to make access a bit easier
    	my $ref = $hash{$key};
    	# going to build a string of the results by joining the array
    	# but need to run through map to set any undefined entries to zero
    	# because we didn't start with all array positions initialized
    	my $result_string = join ' ', map {$_ ? $_ : 0} @$ref;
    	print "$key $result_string\n";
    }
    
    __DATA__
    A	B	C
    A	B	B
    B	B	A
    B	B	C
    C	A	C
    Last edited by keath; November 27th, 2013 at 01:34 AM.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    30
    Rep Power
    1
    Originally Posted by dgeehot
    I am new to Perl, so I am not sure how to accomplish this

    Thanks - Daniel
    Hi dgeehot,
    You should have at least show how far you have tried with your code.
    However, I would show one way to do it using Hashes of Hashes.
    That is you split on space character for each line, then you put each letter into an hash for each column. In fact, you can take how many times each letter occur in that column as a value for the each letter as a key.
    Like this:

    Code:
    use strict;
    use warnings;
    use Data::Dumper;
    
    my %hash;
    
    while (<DATA>) {
        my $counter = 0;
    
        $hash{ $counter++ }{$_}++ for split /\s+/, $_;
    }
    {
        local $Data::Dumper::Sortkeys = 1;
        print Dumper \%hash;
    }
    
    __DATA__
    A       B	C
    A	 B	B
    B	 B	A
    B	 B	C
    C	 A	C
    Which produces this:
    Code:
    $VAR1 = {
              '0' => {
                       'A' => 2,
                       'B' => 2,
                       'C' => 1
                     },
              '1' => {
                       'A' => 1,
                       'B' => 4
                     },
              '2' => {
                       'A' => 1,
                       'B' => 1,
                       'C' => 3
                     }
            };
    using the module
    Code:
    Data::Dumper
    to show result.
    You can also check this perldsc
    So to have your desire print out is the easy part.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    832
    Rep Power
    496
    You could start by first transposing your data (interverting lines and columns), which can, be done this way.

    Data:

    Code:
    $cat test_data.txt
    A B C
    A B B
    B B A
    B B C
    C A C
    One-liner script to transpose your data:
    Perl Code:
    perl -ne '$i = 0; map {@c[$i++] .= $_;} split; END {print $_, "\n" for @c;}'  test_data.txt


    Which prints:
    Code:
    AABBC
    BBBBA
    CBACC
    It is now trivial to replace the print part of the one-liner above into something that counts each letter for each field of the array and output the numbers (basicall replacing what is in the END block). I leave it to you to do this last part. Don't hesite to ask more help if you don't succeed, but please show us what you've tried.

IMN logo majestic logo threadwatch logo seochat tools logo