Hi guys,
i really need help to update my file. This is quite urgent. I have thousands of data that i need to work on.
file1.txt
Code:
ID P_6
START 235411
END 18763
//
ID P_133
START 389417
END 314124
//
ID P_60
START 3112
END 3281
//
ID P_10
START 631012
END 32814
//
file2.txt
Code:
ex 37193 37735 P_10
S 37193 37735 P_10 exNum 5
ex 37862 38019 P_10
S 37862 38019 P_10 exNum 4
ex 38076 38835 P_10
S 38076 38835 P_10 exNum 3
ex 38880 39050 P_10
S 38880 39050 P_10 exNum 2
ex 39093 39644 P_10
S 39093 39644 P_10 exNum 1
ex 21204 22151 P_6
S 21204 22151 P_6 exNum 2
ex 22217 22765 P_6
S 22217 22765 P_6 exNum 1
ex 42305 42440 P_133
S 42305 42440 P_133 exNum 3
ex 42496 42656 P_133
S 42496 42656 P_133 exNum 2
ex 42657 42674 P_133
S 42657 42674 P_133 exNum 1
I've been trying to update my data in file1.txt but failed. The script should work like this:
If IDs are matched, the START value ($2) should be updated by the values in $3
(file2.txt) and those unmatched ID remains unchanged. the value in $3 should be taken only those that has "exNum_1" in $5. I tried a
code something like this:
Code:
awk 'NR==FNR{if ($5~/exonNumber_1/) b[$2]=$3;f[$2]=$4;next}
$1=="ID" {id=substr($2,index($2,"_")+1)}
id in b {$2=($1=="START")?b[id]:$2}
1' file2.txt file1.txt
but it des not changed anything. I guess the script is not working because it print out file1.txt without any update from file2.txt.
the correct output should be like this:
Code:
ID P_200
START 12412
END 12444
//
ID P_6
START 22765
END 18763
//
ID P_10
START 39644
END 32814
//
ID P_60
START 3112
END 3281
//
ID P_9
START 5812
END 6112
//
ID P_133
START 42674
END 314124
//
where The START values for P_6, P_10 and P_133 should be updated with new
values from file2.txt, while the values of START for ID P_200, P_60
and P_9 should remain unchanged as there are no match in file2 for all
of them. (The bold ones are values from $3 of file2.txt that update the previous values in file1.txt)
Any help on this are highly appreciated. Thanks