#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2014
    Posts
    1
    Rep Power
    0

    Comparing two files using awk


    Dear All,

    I have two big files.

    File 1 looks like following:
    10 2864001 2864012
    10 5942987 5943316

    File 2 looks like following:
    10 2864000 28
    10 2864001 28
    10 2864002 28
    10 2864003 27
    10 2864004 28
    10 2864005 26
    10 2864006 26
    10 2864007 26
    10 2864008 26
    10 2864009 26
    10 2864010 26
    10 2864011 26
    10 2864012 26

    So I want to create a for loop in such a way that,
    (1) First column of File 1 must match first column of File 2 AND
    (2) To start a for loop by matching second column of File 1 with second column of File 2 AND
    (3) Sum third column of File 2 until third column of File 1 match to second column of File 2.

    So the output of above example should be sum of third column of File 2 for first line of File 1 which is 347. I tried to use NR and FNR but I have not been able to do it so far. Could you please help me to generate awk script?

    Thank you so much
  2. #2
  3. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Sep 2006
    Posts
    834
    Rep Power
    387

    Cool


    Originally Posted by hsmart
    Dear All,

    I have two big files.

    File 1 . . .
    . . . E t c . . .
    So I want to create a for loop in such a way that,
    (1) First column of File 1 must match first column of File 2 AND
    (2) To start a for loop by matching second column of File 1 with second column of File 2 AND
    (3) Sum third column of File 2 until third column of File 1 match to second column of File 2.

    So the output of above example should be sum of third column of File 2 for first line of File 1 which is 347. I tried to use NR and FNR but I have not been able to do it so far. Could you please help me to generate awk script?
    Actually the sum is 319, but you can try this:
    Code:
    ==> cat m3
    cat - <<! >File1
    10 2864001 2864012
    10 5942987 5943316
    !
    cat - <<! >File2
    10 2864000 28
    10 2864001 28
    10 2864002 28
    10 2864003 27
    10 2864004 28
    10 2864005 26
    10 2864006 26
    10 2864007 26
    10 2864008 26
    10 2864009 26
    10 2864010 26
    10 2864011 26
    10 2864012 26
    !
    awk 'BEGIN{
    while ((getline line[++i] < "File1") > 0){
        split(line[i],f);f1[i]=f[2];f2[i]=f[3]}; close("File1")}
    {for(i in line){if($2>=f1[i] && $2 <= f2[i]) s[i]=s[i]+$3;}}
    END{
        for(i in line){print i, line[i],s[i]}
    }' File2
    
    ==> ./m3
    2 10 5942987 5943316
    1 10 2864001 2864012 319

IMN logo majestic logo threadwatch logo seochat tools logo