#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    1
    Rep Power
    0

    Unhappy How to update value in a column using awk


    Hi guys,

    i really need help to update my file. This is quite urgent. I have thousands of data that i need to work on.

    file1.txt
    Code:
    ID   P_6
    START    235411
    END    18763
    //
    ID    P_133
    START    389417
    END    314124
    //
    ID    P_60
    START    3112
    END    3281
    //
    ID    P_10
    START    631012
    END    32814
    //
    file2.txt
    Code:
    ex    37193    37735    P_10    
    S     37193    37735    P_10	exNum 5
    ex    37862    38019    P_10    
    S     37862    38019    P_10	exNum 4
    ex    38076    38835    P_10    
    S     38076    38835    P_10	exNum 3
    ex    38880    39050    P_10    
    S     38880    39050    P_10	exNum 2
    ex    39093    39644    P_10    
    S     39093    39644    P_10	exNum 1
    ex    21204    22151    P_6    
    S     21204    22151    P_6	exNum 2
    ex    22217    22765    P_6    
    S     22217    22765    P_6	exNum 1
    ex    42305    42440    P_133    
    S     42305    42440    P_133	exNum 3
    ex    42496    42656    P_133    
    S     42496    42656    P_133	exNum 2
    ex    42657    42674    P_133    
    S     42657    42674    P_133	exNum 1
    I've been trying to update my data in file1.txt but failed. The script should work like this:
    If IDs are matched, the START value ($2) should be updated by the values in $3
    (file2.txt) and those unmatched ID remains unchanged. the value in $3 should be taken only those that has "exNum_1" in $5. I tried a
    code something like this:

    Code:
    awk 'NR==FNR{if ($5~/exonNumber_1/) b[$2]=$3;f[$2]=$4;next}
    $1=="ID" {id=substr($2,index($2,"_")+1)}
    id in b {$2=($1=="START")?b[id]:$2}
    1' file2.txt file1.txt
    but it des not changed anything. I guess the script is not working because it print out file1.txt without any update from file2.txt.

    the correct output should be like this:
    Code:
    ID    P_200
    START    12412
    END    12444
    //
    ID    P_6
    START   22765
      END    18763
    //
    ID    P_10
    START    39644
    END    32814
    //
      ID    P_60
    START    3112
    END    3281
    //
    ID    P_9
    START    5812
    END    6112
    //
    ID    P_133
    START    42674
    END    314124
    //
    where The START values for P_6, P_10 and P_133 should be updated with new
    values from file2.txt, while the values of START for ID P_200, P_60
    and P_9 should remain unchanged as there are no match in file2 for all
    of them. (The bold ones are values from $3 of file2.txt that update the previous values in file1.txt)

    Any help on this are highly appreciated. Thanks
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    24
    Rep Power
    0
    It is very difficult to understand the topics you put here
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Mar 2006
    Posts
    2,486
    Rep Power
    1752
    Try this:

    Code:
    awk
    BEGIN {
      while ( getline < "file2.txt") {
        if ($1=="S" && $5=="exNum" && $6=="1") { new_ID[$4]=$3;  }
      }
    }
    {
      if ($1=="ID") { new=new_ID[$2]; if (new=="") { new=$2; } }
      if ($1=="START") { if (new!="") {$2=new; } }
    }
    file1.txt
    It won't over-write in place so redirect output, check it, and then move/rename.
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc

IMN logo majestic logo threadwatch logo seochat tools logo