Thread: line stiching

Page 1 of 2 12 Last
  • Jump to page:
    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2005
    Posts
    10
    Rep Power
    0

    Post line stiching


    Hi

    i have data in files as shown below.

    this is line one.

    this is line two.


    this is line four.
    this is line five.

    i have got rid of the empty lines by running the command sed '/^$/d'
    and getting the file to the format below

    this is line one.
    this is line two.
    this is line three.
    this is line four.
    this is line five.

    i am asking your help/advise to end up with a file
    format as below.

    this is line one.this is line two.this is line three.this is line four.this is line five.

    basically from the original file that i have, i want to get rid
    of all the blank lines and then i want to get rid of all
    new line charaters and stich all lines into one BIG SINGLE line
    of data.

    thanks for your time
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2005
    Posts
    50
    Rep Power
    13
    nawk '!/^$/ { printf }' file
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2005
    Posts
    10
    Rep Power
    0
    Originally Posted by vgersh99
    nawk '!/^$/ { printf }' file

    works like a charm , thank you for your time
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2005
    Posts
    10
    Rep Power
    0
    i will make this short.
    can you please help/advise on making the command to work for large files

    when i run the command
    nawk '!/^$/ { printf }' file
    on large files, it's failing with a message shown below

    too long
    source line number 1

    thank you for your time
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2005
    Posts
    50
    Rep Power
    13
    if on Solaris, try /usr/xpg4/bin/awk instead of nawk.
    if you have gawk installed on your system - try gawk.
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2005
    Posts
    10
    Rep Power
    0
    vgresh99

    am working on solaris 8 , gawk is not available for me so
    i am executing this command below

    /usr/xpg4/bin/awk '!/^$/ { printf }' filename

    i get this error message

    /usr/xpg4/bin/awk: syntax error Context is:
    >>> !/^$/ { printf } <<<

    any advise please ?
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2005
    Posts
    50
    Rep Power
    13
    sorry - should've tried it first myself:
    Code:
    /usr/xpg4/bin/awk '!/^$/ { printf $0}' filename
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2005
    Posts
    50
    Rep Power
    13
    or you can try something like:

    Code:
    sed -e '/^$/d' filename | tr -d '\n'
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2005
    Posts
    10
    Rep Power
    0
    vgersh99 thank for your help

    i am running this command

    /usr/xpg4/bin/awk '!/^$/ { printf $0}' Com.txt > c.txt

    Com.txt size is 53926201 (nearly 53 MB)

    after some time of processing the input file, its throwing
    this message
    /usr/xpg4/bin/awk: line 0 (NR=54): insufficient arguments to printf or sprintf

    but the same command is working fine on larger files

    i think its because of some special charaters ?

    any advise ? can we skip such occurences and continue
    processing successfully ?
  18. #10
  19. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2005
    Posts
    50
    Rep Power
    13
    most likely.

    Code:
    /usr/xpg4/bin/awk '!/^$/ { printf("%s", $0)}' Com.txt
    what does line 54 look like?
  20. #11
  21. No Profile Picture
    Contributing User
    Devshed Beginner (1000 - 1499 posts)

    Join Date
    Jul 2004
    Location
    Middle Europa
    Posts
    1,198
    Rep Power
    17
    use vgersh99 sed¦tr suggestion
    that's really the fastest one
    note the '-e' opt for sed (i know, mentioned in man) is useless.
    working on Solaris[5-9], preferred languages french and C.
  22. #12
  23. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2005
    Posts
    10
    Rep Power
    0
    following up on this, i would like to add something and request some help.
    over the past few days i have found one of the fastest ways of processing a file in the format shown below.

    this is line one#@#@#this is line two#@#@#this is line three#@#@#this is line four#@#@#this is line five#@#@#


    as you know i have encountered files upto 500 MB in my case,
    and the best so far i found is this

    awk 'BEGIN { RS="#\@#\@#"} {print $0}' test.txt | sed -e "/@/d;s#'~'#|#g"

    i had to do awk first as the whole text in the file is in ONE BIG LINE.

    now coming to my question,
    when i use the RS expression as shown above in my command ,
    i am getting the output like
    this is line one
    @
    @
    this is line two
    @
    @
    this is line three
    @
    @
    this is line four
    @
    @
    this is line five
    @
    @

    so i am ending up using the sed to delete lines with just
    the @ in them

    can one of you comment on what i am doing wrong or
    how to make the awk command recognise my RS string
    completely like #@#@#
    right now even though i am escaping the charaters its not taking the 5 charater string as separator

    thank you for your time
  24. #13
  25. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2005
    Posts
    50
    Rep Power
    13
    from Solaris' 'man nawk':
    RS The first character of the string value of RS is
    the input record separator; a newline character by
    default. If RS contains more than one character,
    the results are unspecified. If RS is null, then
    records are separated by sequences of one or more
    blank lines: leading or trailing blank lines do
    not produce empty records at the beginning or end
    of input, and the field separator is always new-
    line, no matter what the value of FS.
  26. #14
  27. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2005
    Posts
    10
    Rep Power
    0
    vgresh

    so you think there is no way that we can split the input
    based on the expression #@#@# using awk ???
    if there is, can you please let me know ?

    i cant rely on splitting it by the # sign , as my data
    might have # as character in one of the fields

    and using sed directly on huge files is taking lot of time, it runs into 4-5 hours. i have seen the same running in 4-5
    minutes when i use awk first and then process the output
    using sed.

    any advise ?
  28. #15
  29. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2005
    Posts
    50
    Rep Power
    13
    if you have gawk installed - use it with RS as out outlined.
    if you don't, use something like this with FS:
    Code:
    nawk 'BEGIN{FS="#@#@#"} {for (i=1;i<=NF;i++) print $i}'
Page 1 of 2 12 Last
  • Jump to page:

IMN logo majestic logo threadwatch logo seochat tools logo