#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2004
    Posts
    1
    Rep Power
    0

    awk help needed please


    I'm writing an awk program to parse an /etc/passwd file that contains records like:

    jones::100:100:John Jones:/usr/bin/csh
    fred::200:0:Fred Franklin:/usr/bin/csh
    bob::300:100:Bob Smith:/usr/bin/csh
    jjones::600:100:John Jones:/usr/bin/csh
    admin::200:100:Fred Franklin:/usr/bin/csh
    useless::200:0::/usr/bin/csh
    ...

    From this file I created another file named "names" using cut and egrep that contains the 5th field from /etc/passwd and gets rid of blank lines, which looks like:

    John Jones
    Fred Franklin
    Bob Smith
    John Jones
    Fred Franklin

    Now, I'm trying to use awk to examine each line of the /etc/passwd file and calculate the following:
    - total number of records in /etc/passwd
    - % of records that are duplicates
    - print the duplicate users (their full name) and how many entries in /etc/password each dupe user has.

    So I'm using the "names" file to create an associative array called users_array, but I am completely stuck after that point. Here is what I have so far, any help would be greatly appreciated!! :

    BEGIN {
    while ( getline var<"names" ) {
    users_array[var]=0;
    print var;
    print users_array[var];
    }
    }
    # this is a lot harder than perl in my opinion
    # for each name process each line in /etc/passwd

    END {
    print FILENAME
    }
    #print output
  2. #2
  3. No Profile Picture
    ......@.........
    Devshed Beginner (1000 - 1499 posts)

    Join Date
    Jun 2004
    Posts
    1,345
    Rep Power
    59
    try something liek this - I can't test this but it should be close:
    Code:
    awk 'BEGIN { FS=":"}
         {print $5}' /etc/passwd | sort > newfile
         awk 'BEGIN {dups=0
                     counter=0
                     old=""}
           {if (NR==1)
              {
                 old=$1
                 counter=1
              }
            if(NR > 1)
              {
                 if (old==$1)
                 {
                   counter++;             
                   dups++;
                 }
                 else
                 {
                   print $1,counter
                   old=$1
                   counter=1;
                 }
              }
            }                    
            END { printf ("# duplicates %d:  percent duplicates %.2f\n",dups, 100*(dups/NR) )  }' newfile
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2004
    Location
    Prague, Czech Rep.
    Posts
    117
    Rep Power
    16
    Originally Posted by awk_newbie
    I'm writing an awk program to parse an /etc/passwd file that contains records like:

    jones::100:100:John Jones:/usr/bin/csh
    fred::200:0:Fred Franklin:/usr/bin/csh
    bob::300:100:Bob Smith:/usr/bin/csh
    jjones::600:100:John Jones:/usr/bin/csh
    admin::200:100:Fred Franklin:/usr/bin/csh
    useless::200:0::/usr/bin/csh
    ...

    From this file I created another file named "names" using cut and egrep that contains the 5th field from /etc/passwd and gets rid of blank lines, which looks like:

    John Jones
    Fred Franklin
    Bob Smith
    John Jones
    Fred Franklin

    Now, I'm trying to use awk to examine each line of the /etc/passwd file and calculate the following:
    - total number of records in /etc/passwd
    - % of records that are duplicates
    - print the duplicate users (their full name) and how many entries in /etc/password each dupe user has.

    So I'm using the "names" file to create an associative array called users_array, but I am completely stuck after that point. Here is what I have so far, any help would be greatly appreciated!! :

    BEGIN {
    while ( getline var<"names" ) {
    users_array[var]=0;
    print var;
    print users_array[var];
    }
    }
    # this is a lot harder than perl in my opinion
    # for each name process each line in /etc/passwd

    END {
    print FILENAME
    }
    #print output

    Your problem is not too complicated. You do not need generate the "names" file, because all can be done in memory. You must only read the /etc/passwd twice in one awk script. Take following as my Xmas present:

    :
    awk '
    BEGIN {

    # Extract all user names
    FS = ":" # Use this field separator
    TNR = 0 # Initialize total number of records
    I = 0 # Initialize counter
    while (getline < "/etc/passwd" > 0) {
    TNR++
    if ($5 != "") # The name given?
    USER[++I]=$5
    }
    N=I # Total number of records with unempty name field
    close ("/etc/passwd")

    # Reread /etc/passwd and process it
    while (getline < "/etc/passwd" > 0) {
    if ($5 != "")
    DUP[$5]++ # Remember this user's count of duplicates
    }
    close ("/etc/passwd")

    # Generate the output
    ND = 0 # Initialize number of duplicates
    for (U in DUP) {
    if (DUP[U] > 1) {
    printf "duplicate: %s - %d times\n", U, DUP[U]
    ND++
    }
    }
    printf "total number of records %d\n", TNR
    printf "total number of named records %d\n", N
    printf "percent of duplicates: %10.2f\n", 100* ND/N

    exit 0 # The end
    }
    '

    Perl is realy better, but it is a "write only language". One cannot decipher his own code...
    (Tested on Linux)

IMN logo majestic logo threadwatch logo seochat tools logo