|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
awk help needed please
I'm writing an awk program to parse an /etc/passwd file that contains records like:
jones::100:100:John Jones:/usr/bin/csh fred::200:0:Fred Franklin:/usr/bin/csh bob::300:100:Bob Smith:/usr/bin/csh jjones::600:100:John Jones:/usr/bin/csh admin::200:100:Fred Franklin:/usr/bin/csh useless::200:0::/usr/bin/csh ... From this file I created another file named "names" using cut and egrep that contains the 5th field from /etc/passwd and gets rid of blank lines, which looks like: John Jones Fred Franklin Bob Smith John Jones Fred Franklin Now, I'm trying to use awk to examine each line of the /etc/passwd file and calculate the following: - total number of records in /etc/passwd - % of records that are duplicates - print the duplicate users (their full name) and how many entries in /etc/password each dupe user has. So I'm using the "names" file to create an associative array called users_array, but I am completely stuck after that point. Here is what I have so far, any help would be greatly appreciated!! : BEGIN { while ( getline var<"names" ) { users_array[var]=0; print var; print users_array[var]; } } # this is a lot harder than perl in my opinion # for each name process each line in /etc/passwd END { print FILENAME } #print output |
|
#2
|
|||
|
|||
|
try something liek this - I can't test this but it should be close:
Code:
awk 'BEGIN { FS=":"}
{print $5}' /etc/passwd | sort > newfile
awk 'BEGIN {dups=0
counter=0
old=""}
{if (NR==1)
{
old=$1
counter=1
}
if(NR > 1)
{
if (old==$1)
{
counter++;
dups++;
}
else
{
print $1,counter
old=$1
counter=1;
}
}
}
END { printf ("# duplicates %d: percent duplicates %.2f\n",dups, 100*(dups/NR) ) }' newfile
|
|
#3
|
|||
|
|||
|
Quote:
Your problem is not too complicated. You do not need generate the "names" file, because all can be done in memory. You must only read the /etc/passwd twice in one awk script. Take following as my Xmas present: : awk ' BEGIN { # Extract all user names FS = ":" # Use this field separator TNR = 0 # Initialize total number of records I = 0 # Initialize counter while (getline < "/etc/passwd" > 0) { TNR++ if ($5 != "") # The name given? USER[++I]=$5 } N=I # Total number of records with unempty name field close ("/etc/passwd") # Reread /etc/passwd and process it while (getline < "/etc/passwd" > 0) { if ($5 != "") DUP[$5]++ # Remember this user's count of duplicates } close ("/etc/passwd") # Generate the output ND = 0 # Initialize number of duplicates for (U in DUP) { if (DUP[U] > 1) { printf "duplicate: %s - %d times\n", U, DUP[U] ND++ } } printf "total number of records %d\n", TNR printf "total number of named records %d\n", N printf "percent of duplicates: %10.2f\n", 100* ND/N exit 0 # The end } ' Perl is realy better, but it is a "write only language". One cannot decipher his own code... (Tested on Linux) ![]() |
![]() |
| Viewing: Dev Shed Forums > Operating Systems > UNIX Help > awk help needed please |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|