#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    4
    Rep Power
    0

    Filtering and sum data in CSV files


    Good evenning,

    I'm new in use perl language and I'm wanting some suggestions on how to generate a script to perform the filtering and sum of some datas that I have in a csv table. I have a table with columns of values ​​measured for N individuals. In this table, the same individual may have more than one value for each measure.

    For exemple:

    individual measure1 measure2
    1 5 6
    1 6 7
    1 10 12
    2 11 13
    2 12 15
    3 15 16
    3 17 18
    3 19 25

    From the data, I would like to generate a file containing the sum of the values ​​of each individual, to a certain extent in a certain range. For example, the sum of the values ​​of measure2 between 15 and 20 for the three individuals. Which in this case would be: individual 1 = 0; individual2 = 15; individual3 = 34.
    *
    Which way do you suggest?
    *
    Thank you.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Location
    /dev/null
    Posts
    162
    Rep Power
    18
    Two questions:
    1. What is the desired output? (Something like.. this is what my output should look like)
    2. What have you tried?
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    4
    Rep Power
    0
    The output that I desire is something like a csv table where one column has the identification of the individual and the other column has the sum of the measures in the determined interval. Using the same example that I had been said, I believe that will be something like this:

    Individual measure2
    1 0
    2 15
    3 34

    I could explain it simply?

    As for what I've tried, I'll put down my last attempt at script (remember that I am still beginner in Perl).

    open (MYFILE, "MYFILE.csv");
    $line= <MYFILE>;
    open (MYFILE2, ">MYFILE_trimmed.csv);
    While($line=<MYFILE>){
    chomp $line;
    $id= substr($line,0,6);
    $lenght= substr($line,27,7);
    push (@id,$id);
    push(@lenght,$lenght);

    After this moment I have been trying a lot of possibilities, but any had sucess. I try use the function split instead of the function substr. But, the problem is the same: I can't continue with the script.

    Thanks
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Location
    /dev/null
    Posts
    162
    Rep Power
    18
    Code:
    perl -lane 'if($. == 1){print "individual measure"; next}; if($F[2] >= 15 && $F[2] <= 20){$x{$F[0]}+=$F[2]}else{$x{$F[0]}+=0} END {for (sort {$a <=> $b} keys %x) {print "$_ $x{$_}"}}' input.csv
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    4
    Rep Power
    0
    Thank you for the code. Like I said, I'm very new in the use of perl language, so, my main interest is understand like the code works. If you could me explain to me some parts, like the use of the functions new and next I will be very glad.

    Sincerily,
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Location
    /dev/null
    Posts
    162
    Rep Power
    18
    That's a one liner. Here's the equivalent program. This prints to STDOUT. Hope you know how to print to a file.

    perl Code:
    #! /usr/bin/perl -w
    use strict;
     
    my $line = undef;
    my @fields = ();
    my %fld_msre = ();
     
    open I, "< file";
    while ($line = <I>) {
     
        if ($. == 1) { # if line number is 1, then print header and skip to next iteration
            print "indivudual measure\n";
            next;
        }
     
        chomp ($line); # chomp $line to remove the newline char at end of line
        @fields = split(' ', $line); # split line with a space as field separator
        if ($fields[2] >= 15 && $fields[2] <= 20) { # check condition
            $fld_msre{$fields[0]} += $fields[2]; # use a hash and add measure2 to existing value
        }
        else {
            $fld_msre{$fields[0]} += 0; # add 0 if condition not met
        }
    }
    close I;
     
    # for loop to iterate through keys (numerically sorted)
    for (sort {$a <=> $b} keys %fld_msre) {
        print "$_ $fld_msre{$_}\n";
    }
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    4
    Rep Power
    0
    Thank you so much for this. I know how print to a file. I will test the code and I tell you if the answer.
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    776
    Rep Power
    495
    Hi,

    Originally Posted by pablo_bio
    open (MYFILE, "MYFILE.csv");
    $line= <MYFILE>;
    open (MYFILE2, ">MYFILE_trimmed.csv);
    While($line=<MYFILE>){
    chomp $line;
    $id= substr($line,0,6);
    $lenght= substr($line,27,7);
    push (@id,$id);
    push(@lenght,$lenght);
    OK, noobie1000 has given you a solution, which I did not try, but since she or he asked for your code, I think it would be fair to give some comments on this code. I am not giving to give you a solution, but, if you are serious about learning Perl, this might be more useful for you.

    Your code has obviously no chance to compile, so I will concentrate on syntax errors plus a few others.

    Code:
    open (MYFILE, "MYFILE.csv");
    You should always test the status of a system operation: the file might not exist, not be where you think, etc., so you need to raise an error if the file could not be opened.

    Try this instead:
    Code:
    open (MYFILE, "MYFILE.csv") or die "could not open MYFILE.csv $!";
    Or, even much better:

    Code:
    open my $MYFILE, "<", "MYFILE.csv" or die "could not open MYFILE.csv $!";

    Next line:
    Code:
    $line= <MYFILE>;
    You are not doing anything with the content of this line. Probably an error.

    Code:
    open (MYFILE2, ">MYFILE_trimmed.csv);
    Same comment as above

    Code:
    While($line=<MYFILE>){
    While, with an uppercase W, is not an existing Perl keyword.

    It would be better to have:

    Code:
    while (my $line=<MYFILE>){

    Code:
    $lenght= substr($line,27,7);
    ...
    push(@lenght,$lenght);
    You have a typo on $length vs $lenght. And @lenght is not declared, nor used anywhere. You MUST use the use strict; and
    Code:
    use warning;
    pragmas at the top of your program, many of your mistakes would be revealed by the compiler. This will require you to declare your variables with "my", you might initially find it painful, but I am sure you will find that enabling the compiler telling you about some of your mistakes is very useful.

    There are a number of other things to improve, but you should implement those that I have given, try to do it and get rid of remaining errors. And post your new code, together with possible errors you get. If you do it this way, you will learn much more than if you use a ready made solution.

IMN logo majestic logo threadwatch logo seochat tools logo