UNIX Help
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsOperating SystemsUNIX Help

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
Get inside! Sample the range of functionality easily built with JMSL Library for Time Series Data Analysis, Heat Maps, Portfolio Optimization, Monte Carlo Simulation, Stock Price Charting and more. Download Now!
  #1  
Old February 8th, 2005, 09:05 AM
dsshed dsshed is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2005
Posts: 10 dsshed User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 3 h 8 m 13 sec
Reputation Power: 0
Post line stiching

Hi

i have data in files as shown below.

this is line one.

this is line two.


this is line four.
this is line five.

i have got rid of the empty lines by running the command sed '/^$/d'
and getting the file to the format below

this is line one.
this is line two.
this is line three.
this is line four.
this is line five.

i am asking your help/advise to end up with a file
format as below.

this is line one.this is line two.this is line three.this is line four.this is line five.

basically from the original file that i have, i want to get rid
of all the blank lines and then i want to get rid of all
new line charaters and stich all lines into one BIG SINGLE line
of data.

thanks for your time

Reply With Quote
  #2  
Old February 8th, 2005, 09:46 AM
vgersh99 vgersh99 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2005
Posts: 47 vgersh99 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 5 Days 5 h 36 m 43 sec
Reputation Power: 4
Send a message via AIM to vgersh99 Send a message via MSN to vgersh99 Send a message via Yahoo to vgersh99
nawk '!/^$/ { printf }' file

Reply With Quote
  #3  
Old February 8th, 2005, 01:07 PM
dsshed dsshed is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2005
Posts: 10 dsshed User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 3 h 8 m 13 sec
Reputation Power: 0
Quote:
Originally Posted by vgersh99
nawk '!/^$/ { printf }' file



works like a charm , thank you for your time

Reply With Quote
  #4  
Old February 9th, 2005, 01:25 PM
dsshed dsshed is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2005
Posts: 10 dsshed User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 3 h 8 m 13 sec
Reputation Power: 0
i will make this short.
can you please help/advise on making the command to work for large files

when i run the command
nawk '!/^$/ { printf }' file
on large files, it's failing with a message shown below

too long
source line number 1

thank you for your time

Reply With Quote
  #5  
Old February 9th, 2005, 01:48 PM
vgersh99 vgersh99 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2005
Posts: 47 vgersh99 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 5 Days 5 h 36 m 43 sec
Reputation Power: 4
Send a message via AIM to vgersh99 Send a message via MSN to vgersh99 Send a message via Yahoo to vgersh99
if on Solaris, try /usr/xpg4/bin/awk instead of nawk.
if you have gawk installed on your system - try gawk.

Reply With Quote
  #6  
Old February 9th, 2005, 02:02 PM
dsshed dsshed is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2005
Posts: 10 dsshed User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 3 h 8 m 13 sec
Reputation Power: 0
vgresh99

am working on solaris 8 , gawk is not available for me so
i am executing this command below

/usr/xpg4/bin/awk '!/^$/ { printf }' filename

i get this error message

/usr/xpg4/bin/awk: syntax error Context is:
>>> !/^$/ { printf } <<<

any advise please ?

Reply With Quote
  #7  
Old February 9th, 2005, 02:06 PM
vgersh99 vgersh99 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2005
Posts: 47 vgersh99 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 5 Days 5 h 36 m 43 sec
Reputation Power: 4
Send a message via AIM to vgersh99 Send a message via MSN to vgersh99 Send a message via Yahoo to vgersh99
sorry - should've tried it first myself:
Code:
/usr/xpg4/bin/awk '!/^$/ { printf $0}' filename

Reply With Quote
  #8  
Old February 9th, 2005, 02:23 PM
vgersh99 vgersh99 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2005
Posts: 47 vgersh99 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 5 Days 5 h 36 m 43 sec
Reputation Power: 4
Send a message via AIM to vgersh99 Send a message via MSN to vgersh99 Send a message via Yahoo to vgersh99
or you can try something like:

Code:
sed -e '/^$/d' filename | tr -d '\n'

Reply With Quote
  #9  
Old February 9th, 2005, 03:16 PM
dsshed dsshed is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2005
Posts: 10 dsshed User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 3 h 8 m 13 sec
Reputation Power: 0
vgersh99 thank for your help

i am running this command

/usr/xpg4/bin/awk '!/^$/ { printf $0}' Com.txt > c.txt

Com.txt size is 53926201 (nearly 53 MB)

after some time of processing the input file, its throwing
this message
/usr/xpg4/bin/awk: line 0 (NR=54): insufficient arguments to printf or sprintf

but the same command is working fine on larger files

i think its because of some special charaters ?

any advise ? can we skip such occurences and continue
processing successfully ?

Reply With Quote
  #10  
Old February 9th, 2005, 03:53 PM
vgersh99 vgersh99 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2005
Posts: 47 vgersh99 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 5 Days 5 h 36 m 43 sec
Reputation Power: 4
Send a message via AIM to vgersh99 Send a message via MSN to vgersh99 Send a message via Yahoo to vgersh99
most likely.

Code:
/usr/xpg4/bin/awk '!/^$/ { printf("%s", $0)}' Com.txt


what does line 54 look like?

Reply With Quote
  #11  
Old February 11th, 2005, 06:12 AM
guggach guggach is offline
Contributing User
Dev Shed Beginner (1000 - 1499 posts)
 
Join Date: Jul 2004
Location: Middle Europa
Posts: 1,083 guggach User rank is Corporal (100 - 500 Reputation Level)guggach User rank is Corporal (100 - 500 Reputation Level)guggach User rank is Corporal (100 - 500 Reputation Level)guggach User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 4 Days 19 h 44 m 45 sec
Reputation Power: 9
use vgersh99 sed¦tr suggestion
that's really the fastest one
note the '-e' opt for sed (i know, mentioned in man) is useless.
__________________
working on Solaris[5-9], preferred languages french and C.

Reply With Quote
  #12  
Old February 22nd, 2005, 08:51 AM
dsshed dsshed is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2005
Posts: 10 dsshed User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 3 h 8 m 13 sec
Reputation Power: 0
following up on this, i would like to add something and request some help.
over the past few days i have found one of the fastest ways of processing a file in the format shown below.

this is line one#@#@#this is line two#@#@#this is line three#@#@#this is line four#@#@#this is line five#@#@#


as you know i have encountered files upto 500 MB in my case,
and the best so far i found is this

awk 'BEGIN { RS="#\@#\@#"} {print $0}' test.txt | sed -e "/@/d;s#'~'#|#g"

i had to do awk first as the whole text in the file is in ONE BIG LINE.

now coming to my question,
when i use the RS expression as shown above in my command ,
i am getting the output like
this is line one
@
@
this is line two
@
@
this is line three
@
@
this is line four
@
@
this is line five
@
@

so i am ending up using the sed to delete lines with just
the @ in them

can one of you comment on what i am doing wrong or
how to make the awk command recognise my RS string
completely like #@#@#
right now even though i am escaping the charaters its not taking the 5 charater string as separator

thank you for your time

Reply With Quote
  #13  
Old February 22nd, 2005, 09:10 AM
vgersh99 vgersh99 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2005
Posts: 47 vgersh99 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 5 Days 5 h 36 m 43 sec
Reputation Power: 4
Send a message via AIM to vgersh99 Send a message via MSN to vgersh99 Send a message via Yahoo to vgersh99
from Solaris' 'man nawk':
Quote:
RS The first character of the string value of RS is
the input record separator; a newline character by
default. If RS contains more than one character,
the results are unspecified. If RS is null, then
records are separated by sequences of one or more
blank lines: leading or trailing blank lines do
not produce empty records at the beginning or end
of input, and the field separator is always new-
line, no matter what the value of FS.


Reply With Quote
  #14  
Old February 22nd, 2005, 09:35 AM
dsshed dsshed is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2005
Posts: 10 dsshed User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 3 h 8 m 13 sec
Reputation Power: 0
vgresh

so you think there is no way that we can split the input
based on the expression #@#@# using awk ???
if there is, can you please let me know ?

i cant rely on splitting it by the # sign , as my data
might have # as character in one of the fields

and using sed directly on huge files is taking lot of time, it runs into 4-5 hours. i have seen the same running in 4-5
minutes when i use awk first and then process the output
using sed.

any advise ?

Reply With Quote
  #15  
Old February 22nd, 2005, 09:45 AM
vgersh99 vgersh99 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2005
Posts: 47 vgersh99 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 5 Days 5 h 36 m 43 sec
Reputation Power: 4
Send a message via AIM to vgersh99 Send a message via MSN to vgersh99 Send a message via Yahoo to vgersh99
if you have gawk installed - use it with RS as out outlined.
if you don't, use something like this with FS:
Code:
nawk 'BEGIN{FS="#@#@#"} {for (i=1;i<=NF;i++) print $i}'

Reply With Quote
Reply

Viewing: Dev Shed ForumsOperating SystemsUNIX Help > line stiching


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump