Thread: Awk RS trouble

    #1
  1. manwich
    Devshed Novice (500 - 999 posts)

    Join Date
    Oct 2003
    Location
    Canadanistan
    Posts
    578
    Rep Power
    384

    Awk RS trouble


    I'm currently to adapt a script to parse through our mail servers' mailq output to display various stats, but one server is giving me some troublesome output that is difficult to parse with awk.
    Code:
    23 Sep 2009 22:09:31 GMT  #26200266  3106535  <user@external.com>
            local   internal.com-user@internal.com
    23 Sep 2009 12:53:56 GMT  #26201531  1616  <>
            local   internal.com-user@internal.com
    24 Sep 2009 01:51:22 GMT  #26200795  2862  <user@external.com>
            remote  user@external.com
    On the other servers I've been working on there is an extra newline between each record that's much easier to catch, but this has proven to be much more difficult. I've run the output through hexdump hoping there might be an extra CR character I can latch on to, but it's purely \n with a \t stuck in the middle of the record on the second line.

    I would be very appreciative if someone could help me out with a proper RS to chunk this down into the correct, bite-sized pieces.

    TIA
  2. #2
  3. manwich
    Devshed Novice (500 - 999 posts)

    Join Date
    Oct 2003
    Location
    Canadanistan
    Posts
    578
    Rep Power
    384
    Bumpdate:

    A coworker suggested a different approach to the problem, where I should just replace the occurences of \n\t with something else to put all the info for each record on one line, and go from there.

    Naturally I looked into sed and/or tr, but I'm having far too much trouble catching the newline with a sed expression, and tr replaces \n and \t separately, as opposed to as one entity.

    Ideally I'd like to use:
    Code:
    sed 's/\n\t/X/g'
    but for some reason the escape sequences do not seem to work as expected.

    I've moved on to a backquoted printf to insert the literal characters into the expression:
    Code:
    sed "`printf 's/\\\n\t/X/g'`"
    which does work to catch the tab charater, but ignores the newline.

    I've realized that this is most likely due to sed's behaviour of reading input one line at a time, without the newline. While I can see this potentially causing a memory issue, is it at all possible to get sed to read the input all in one go, as opposed to line by line?

    I am open to other lines of thought as well, should anyone have another method in mind.

IMN logo majestic logo threadwatch logo seochat tools logo