UNIX Help
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsOperating SystemsUNIX Help

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old December 11th, 2004, 09:56 PM
awk_newbie awk_newbie is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2004
Posts: 1 awk_newbie User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
awk help needed please

I'm writing an awk program to parse an /etc/passwd file that contains records like:

jones::100:100:John Jones:/usr/bin/csh
fred::200:0:Fred Franklin:/usr/bin/csh
bob::300:100:Bob Smith:/usr/bin/csh
jjones::600:100:John Jones:/usr/bin/csh
admin::200:100:Fred Franklin:/usr/bin/csh
useless::200:0::/usr/bin/csh
...

From this file I created another file named "names" using cut and egrep that contains the 5th field from /etc/passwd and gets rid of blank lines, which looks like:

John Jones
Fred Franklin
Bob Smith
John Jones
Fred Franklin

Now, I'm trying to use awk to examine each line of the /etc/passwd file and calculate the following:
- total number of records in /etc/passwd
- % of records that are duplicates
- print the duplicate users (their full name) and how many entries in /etc/password each dupe user has.

So I'm using the "names" file to create an associative array called users_array, but I am completely stuck after that point. Here is what I have so far, any help would be greatly appreciated!! :

BEGIN {
while ( getline var<"names" ) {
users_array[var]=0;
print var;
print users_array[var];
}
}
# this is a lot harder than perl in my opinion
# for each name process each line in /etc/passwd

END {
print FILENAME
}
#print output

Reply With Quote
  #2  
Old December 13th, 2004, 03:00 PM
jim mcnamara jim mcnamara is offline
......@.........
Dev Shed Beginner (1000 - 1499 posts)
 
Join Date: Jun 2004
Posts: 1,308 jim mcnamara User rank is Sergeant Major (2000 - 5000 Reputation Level)jim mcnamara User rank is Sergeant Major (2000 - 5000 Reputation Level)jim mcnamara User rank is Sergeant Major (2000 - 5000 Reputation Level)jim mcnamara User rank is Sergeant Major (2000 - 5000 Reputation Level)jim mcnamara User rank is Sergeant Major (2000 - 5000 Reputation Level)jim mcnamara User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 1 Week 3 Days 5 h 2 m 57 sec
Reputation Power: 48
try something liek this - I can't test this but it should be close:
Code:
awk 'BEGIN { FS=":"}
     {print $5}' /etc/passwd | sort > newfile
     awk 'BEGIN {dups=0
                 counter=0
                 old=""}
       {if (NR==1)
          {
             old=$1
             counter=1
          }
        if(NR > 1)
          {
             if (old==$1)
             {
               counter++;             
               dups++;
             }
             else
             {
               print $1,counter
               old=$1
               counter=1;
             }
          }
        }                    
        END { printf ("# duplicates %d:  percent duplicates %.2f\n",dups, 100*(dups/NR) )  }' newfile
          
     

Reply With Quote
  #3  
Old December 27th, 2004, 07:19 AM
zlutovsky zlutovsky is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2004
Location: Prague, Czech Rep.
Posts: 117 zlutovsky User rank is Corporal (100 - 500 Reputation Level)zlutovsky User rank is Corporal (100 - 500 Reputation Level)zlutovsky User rank is Corporal (100 - 500 Reputation Level)zlutovsky User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 22 h 42 m 44 sec
Reputation Power: 6
Quote:
Originally Posted by awk_newbie
I'm writing an awk program to parse an /etc/passwd file that contains records like:

jones::100:100:John Jones:/usr/bin/csh
fred::200:0:Fred Franklin:/usr/bin/csh
bob::300:100:Bob Smith:/usr/bin/csh
jjones::600:100:John Jones:/usr/bin/csh
admin::200:100:Fred Franklin:/usr/bin/csh
useless::200:0::/usr/bin/csh
...

From this file I created another file named "names" using cut and egrep that contains the 5th field from /etc/passwd and gets rid of blank lines, which looks like:

John Jones
Fred Franklin
Bob Smith
John Jones
Fred Franklin

Now, I'm trying to use awk to examine each line of the /etc/passwd file and calculate the following:
- total number of records in /etc/passwd
- % of records that are duplicates
- print the duplicate users (their full name) and how many entries in /etc/password each dupe user has.

So I'm using the "names" file to create an associative array called users_array, but I am completely stuck after that point. Here is what I have so far, any help would be greatly appreciated!! :

BEGIN {
while ( getline var<"names" ) {
users_array[var]=0;
print var;
print users_array[var];
}
}
# this is a lot harder than perl in my opinion
# for each name process each line in /etc/passwd

END {
print FILENAME
}
#print output



Your problem is not too complicated. You do not need generate the "names" file, because all can be done in memory. You must only read the /etc/passwd twice in one awk script. Take following as my Xmas present:

:
awk '
BEGIN {

# Extract all user names
FS = ":" # Use this field separator
TNR = 0 # Initialize total number of records
I = 0 # Initialize counter
while (getline < "/etc/passwd" > 0) {
TNR++
if ($5 != "") # The name given?
USER[++I]=$5
}
N=I # Total number of records with unempty name field
close ("/etc/passwd")

# Reread /etc/passwd and process it
while (getline < "/etc/passwd" > 0) {
if ($5 != "")
DUP[$5]++ # Remember this user's count of duplicates
}
close ("/etc/passwd")

# Generate the output
ND = 0 # Initialize number of duplicates
for (U in DUP) {
if (DUP[U] > 1) {
printf "duplicate: %s - %d times\n", U, DUP[U]
ND++
}
}
printf "total number of records %d\n", TNR
printf "total number of named records %d\n", N
printf "percent of duplicates: %10.2f\n", 100* ND/N

exit 0 # The end
}
'

Perl is realy better, but it is a "write only language". One cannot decipher his own code...
(Tested on Linux)

Reply With Quote
Reply

Viewing: Dev Shed ForumsOperating SystemsUNIX Help > awk help needed please


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 3 hosted by Hostway