The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Operating Systems
> Linux Help
|
AWK: rm duplicates based on muli fields
Discuss AWK: rm duplicates based on muli fields in the Linux Help forum on Dev Shed. AWK: rm duplicates based on muli fields Linux Help forum discussing topics including usage, troubleshooting, modules, and distributions. Linux is an open source OS, based on UNIX.
|
|
 |
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

February 7th, 2011, 03:22 PM
|
 |
strongbad dance now
|
|
Join Date: Oct 2004
Location: Bermuda
|
|
|
AWK: rm duplicates based on muli fields
Ok, long winded title
I'm trying to use awk to remove rows that are duplicates based on 3 fields, and I want to keep the on that has the higher value in another field. I'm working in C-Shell. For example the below is greped out of a larger data set to use in here as example:
Input (Field separator is a comma:
Code:
4180,-6999,MA,BARNSTABLE,BOURNE,1,1.7,1700,PM,1/26
4180,-6999,MA,BARNSTABLE,BOURNE,1,3.5,2025,PM,1/26
4180,-6999,MA,BARNSTABLE,BOURNE,1,1.0,1511,PM,1/26
4180,-6999,MA,BARNSTABLE,BOURNE,1,5.7,0540,AM,1/27
I want to identify duplicates based on fields 1, 2, and 10, and the one with the highest value in field 7. I want the output to be:
Output
Code:
4180,-6999,MA,BARNSTABLE,BOURNE,1,3.5,2025,PM,1/26
4180,-6999,MA,BARNSTABLE,BOURNE,1,5.7,0540,AM,1/27
I can sort the data, it won't cause issues. So I tried
Code:
sort -r -t, master.txt | awk -F, '{ keys=$1$2$10 ; if ( data[keys]++ == 0 ) lines[++count] = $0 } END {for ( i = 1 ; i <= count ; i++ ) print lines[i] }'
When I try the above command piping "grep BOURNE" onto the end of it, it can't find any BOURNE. Howeverm, if instead of reverse sorting, I just use:
Code:
sort -t, master.txt | awk -F, '{ keys=$1$2$10 ; if ( data[keys]++ == 0 ) lines[++count] = $0 } END {for ( i = 1 ; i <= count ; i++ ) print lines[i] }'
and grep BOURNE on the above, output, I get the lowest values of field seven to return:
Code:
4180,-6999,MA,BARNSTABLE,BOURNE,1,1.0,1511,PM,1/26
4180,-6999,MA,BARNSTABLE,BOURNE,1,5.7,0540,AM,1/27
I'd really appreciate any help!
|

February 8th, 2011, 07:56 AM
|
 |
strongbad dance now
|
|
Join Date: Oct 2004
Location: Bermuda
|
|
|
Solved
Following up with the solution I came up with:
Code:
sort -t, master.txt | awk -F, '{ s = $1 $2 $10; if(s != prevs) { if (FNR > 1 ) print prevline; preval = $7; prevline = $0;} else if ($7 > preval) { preval = $7; prevline = $0;} prevs =s;} END {print prevline }'
Hope someone finds it helpful.
|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|