Linux Help
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsOperating SystemsLinux Help

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old February 7th, 2011, 03:22 PM
bkolts's Avatar
bkolts bkolts is offline
strongbad dance now
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Oct 2004
Location: Bermuda
Posts: 324 bkolts User rank is Second Lieutenant (5000 - 10000 Reputation Level)bkolts User rank is Second Lieutenant (5000 - 10000 Reputation Level)bkolts User rank is Second Lieutenant (5000 - 10000 Reputation Level)bkolts User rank is Second Lieutenant (5000 - 10000 Reputation Level)bkolts User rank is Second Lieutenant (5000 - 10000 Reputation Level)bkolts User rank is Second Lieutenant (5000 - 10000 Reputation Level)bkolts User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 13 h 3 m 46 sec
Reputation Power: 60
AWK: rm duplicates based on muli fields

Ok, long winded title

I'm trying to use awk to remove rows that are duplicates based on 3 fields, and I want to keep the on that has the higher value in another field. I'm working in C-Shell. For example the below is greped out of a larger data set to use in here as example:

Input (Field separator is a comma:
Code:
4180,-6999,MA,BARNSTABLE,BOURNE,1,1.7,1700,PM,1/26
4180,-6999,MA,BARNSTABLE,BOURNE,1,3.5,2025,PM,1/26
4180,-6999,MA,BARNSTABLE,BOURNE,1,1.0,1511,PM,1/26
4180,-6999,MA,BARNSTABLE,BOURNE,1,5.7,0540,AM,1/27


I want to identify duplicates based on fields 1, 2, and 10, and the one with the highest value in field 7. I want the output to be:
Output
Code:
4180,-6999,MA,BARNSTABLE,BOURNE,1,3.5,2025,PM,1/26
4180,-6999,MA,BARNSTABLE,BOURNE,1,5.7,0540,AM,1/27


I can sort the data, it won't cause issues. So I tried
Code:
sort -r -t, master.txt | awk -F, '{ keys=$1$2$10 ; if ( data[keys]++ == 0 ) lines[++count] = $0 } END {for ( i = 1 ; i <= count ; i++ ) print lines[i] }' 


When I try the above command piping "grep BOURNE" onto the end of it, it can't find any BOURNE. Howeverm, if instead of reverse sorting, I just use:
Code:
sort -t, master.txt | awk -F, '{ keys=$1$2$10 ; if ( data[keys]++ == 0 ) lines[++count] = $0 } END {for ( i = 1 ; i <= count ; i++ ) print lines[i] }' 

and grep BOURNE on the above, output, I get the lowest values of field seven to return:
Code:
4180,-6999,MA,BARNSTABLE,BOURNE,1,1.0,1511,PM,1/26
4180,-6999,MA,BARNSTABLE,BOURNE,1,5.7,0540,AM,1/27


I'd really appreciate any help!

Reply With Quote
  #2  
Old February 8th, 2011, 07:56 AM
bkolts's Avatar
bkolts bkolts is offline
strongbad dance now
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Oct 2004
Location: Bermuda
Posts: 324 bkolts User rank is Second Lieutenant (5000 - 10000 Reputation Level)bkolts User rank is Second Lieutenant (5000 - 10000 Reputation Level)bkolts User rank is Second Lieutenant (5000 - 10000 Reputation Level)bkolts User rank is Second Lieutenant (5000 - 10000 Reputation Level)bkolts User rank is Second Lieutenant (5000 - 10000 Reputation Level)bkolts User rank is Second Lieutenant (5000 - 10000 Reputation Level)bkolts User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 13 h 3 m 46 sec
Reputation Power: 60
Solved

Following up with the solution I came up with:

Code:
sort -t, master.txt | awk -F, '{ s = $1 $2 $10; if(s != prevs) { if (FNR > 1 ) print prevline; preval = $7; prevline = $0;} else if ($7 > preval) { preval = $7; prevline = $0;} prevs =s;} END {print prevline }'


Hope someone finds it helpful.

Reply With Quote
Reply

Viewing: Dev Shed ForumsOperating SystemsLinux Help > AWK: rm duplicates based on muli fields

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap