July 10th, 2013, 04:35 PM

Statistics in Perl (sub routines)
So I have three sets of data,
Data1: (5, 6, 7 , 8, 10)
Data2: (10,11,12)
Data3: (16,48,49)
I am trying to write a program that would calculate and print out for each of the data sets: number of measurements, average, variance and standard deviation (a subroutine should be used to do all). The program should also call the subroutines for each data set and print out the results.
So far what I have it bits and pieces cant I am stuck. Cant link them together to make sense.
use strict;
use warnings;
my @data = (5, 6, 7 , 8, 10);
my $avg = average(\@data);
print "The average is $avg \n";
sub average {
@_ == 1 or die ('Sub usage: $average = average(\@array);');
my ($array_ref) = @_;
my $sum;
my $count = scalar @$array_ref;
foreach (@$array_ref) { $sum += $_; }
return $sum / $count;
}
print standard_deviation(5, 6, 7 , 8, 10)."\n";
sub standard_deviation {
my(@data) = @_;
#Prevent division by 0 error in case you get junk data
return undef unless(scalar(@data));
# Step 1, find the mean of the numbers
my $total1 = 0;
foreach my $num (@numbers) {
$total1 += $num;
}
my $mean1 = $total1 / (scalar @numbers);
# Step 2, find the mean of the squares of the differences
# between each number and the mean
my $total2 = 0;
foreach my $num (@numbers) {
$total2 += ($mean1$num)**2;
}
my $mean2 = $total2 / (scalar @numbers);
# Step 3, standard deviation is the square root of the
# above mean
my $std_dev = sqrt($mean2);
return $std_dev;
}
July 10th, 2013, 05:42 PM

There are numbers of statistical package modules existing on the CPAN that will do all the calculations for you and have been thoroughly tested. I will assume for the rest of this post that you don't want to use them because it is part of your training to do it yourself.
Hum, if you are a beginner, do it as a beginner, i.e. do it step by step. Juste remove temporarily everything about standard deviation, variance and what have you, and test your part on simple average. Does it work or not? If not, try to fix it. If you don't succeed, ask us for help. (My first impression, on looking very quicky at the code, is that it should probably work, but only serious tests can really confirm that.) Once it works, and only then, get to the next step and implement your standard deviation procedure. If you try to pack in everything as a beginner, you will get simply too many errors and will not know where to start or what to look.
One additional point: you have an average procedure. Let's assume that it works. Even if it is not currently the case, we said earlier that step 1 was to get it to work. When you want to canculate the standard deviation, one of the steps is to calculate the deviation from the mean (or average), why do you try to recalculate this mean. This is useless work. You already have your average subroutine which just does that. Use it. So your step 1 should be just to call average on the data for which you want the standard deviation. DON'T RECODE something you have already coded (and tested).
Last edited by Laurent_R; July 10th, 2013 at 05:46 PM.