#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2013
    Posts
    4
    Rep Power
    0

    Compare two files, print header field if


    Hi, I have two files:

    File 1 (with header)
    gene h1 h2 h3 h4 h5 h6 h7 h8...
    gene_name1 e1 e2 e3 e4 e5 e6 e7 e8
    gene_name2 ...
    gene_name3 ...
    ...

    File 2 (no header)

    gene_name1 mean1 mean2 sd1 sd2
    gene_name2 ...
    gene_name3 ...

    I would like to output header field if any of fields e1,e2,e3...is > mean1+3(sd1) for the corresponding gene_name. Example: if e1>mean1+3(sd1) TRUE, then print :

    gene_name1 h1

    If e2 also accomplishes the condition, then add:

    gene_name1 h1 h2

    Do that for each line if $1 matches both files.

    Desired output:

    gene_name1 h1 h2
    gene_name2
    gene_name3 h5 h6 h8
    gene_name4 h1 h5
    gene_name5 h3
    gene_name6
    gene_name7 h2 h5 h7 h8
    ...

    I was thinking in something like:

    awk 'FNR==NR{a[$1]=$2+3*$4;next} $1 in a ... and then a 'for loop' for each field in File 1. But I do not know how to store header fields.

    Thanks for your help.
    aec
  2. #2
  3. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Sep 2006
    Posts
    840
    Rep Power
    387

    Cool


    Post some sample data and expected result.
  4. #3
  5. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Sep 2006
    Posts
    840
    Rep Power
    387
    I tested with some data, check it out:
    Code:
    cat - <<! >File1
    gene      h1 h2 h3 h4 h5 h6 h7 h8
    gene_name1 1  2  3  4  5  6  7 18
    gene_name2 9 10  3  4 11  5  13 6
    gene_name3 1  5 22 44  3 17  0  6
    !
    
    cat - <<! >File2
    #gene_name mean1 mean2 sd1 sd2
    gene_name1     1 0 1 0
    gene_name2     2 0 2 0
    gene_name3     3 0 1 0 
    !
    
    awk 'BEGIN {while ((getline <"File2") > 0) {f2[$1]=$0}}
    NR==1{split($0,hdr," "); next}
    NR>1 {o=" ";
         split(f2[$1],m0," ")
         for (i=2;i<=NF;i++){
           if($i>(m0[2]+3*m0[4])){o=o" "hdr[i]}
        }
      print $1,o
    }' File1
    Results:
    Code:
    ==> ./m0
    gene_name1   h5 h6 h7 h8
    gene_name2   h1 h2 h5 h7
    gene_name3   h3 h4 h6
    ==>
    Last edited by LKBrwn_DBA; April 19th, 2013 at 08:49 AM.
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2013
    Posts
    4
    Rep Power
    0
    thanks!

IMN logo majestic logo threadwatch logo seochat tools logo