September 24th, 2009, 01:26 PM
Awk RS trouble
I'm currently to adapt a script to parse through our mail servers' mailq output to display various stats, but one server is giving me some troublesome output that is difficult to parse with awk.
On the other servers I've been working on there is an extra newline between each record that's much easier to catch, but this has proven to be much more difficult. I've run the output through hexdump hoping there might be an extra CR character I can latch on to, but it's purely \n with a \t stuck in the middle of the record on the second line.
23 Sep 2009 22:09:31 GMT #26200266 3106535 <email@example.com>
23 Sep 2009 12:53:56 GMT #26201531 1616 <>
24 Sep 2009 01:51:22 GMT #26200795 2862 <firstname.lastname@example.org>
I would be very appreciative if someone could help me out with a proper RS to chunk this down into the correct, bite-sized pieces.
October 4th, 2009, 02:33 PM
A coworker suggested a different approach to the problem, where I should just replace the occurences of \n\t with something else to put all the info for each record on one line, and go from there.
Naturally I looked into sed and/or tr, but I'm having far too much trouble catching the newline with a sed expression, and tr replaces \n and \t separately, as opposed to as one entity.
Ideally I'd like to use:but for some reason the escape sequences do not seem to work as expected.
I've moved on to a backquoted printf to insert the literal characters into the expression:
which does work to catch the tab charater, but ignores the newline.
sed "`printf 's/\\\n\t/X/g'`"
I've realized that this is most likely due to sed's behaviour of reading input one line at a time, without the newline. While I can see this potentially causing a memory issue, is it at all possible to get sed to read the input all in one go, as opposed to line by line?
I am open to other lines of thought as well, should anyone have another method in mind.