#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2004
    Posts
    19
    Rep Power
    0

    Sed search and replace


    Hi all,

    I want to write a sed or awk routine that will find the instance where a line feed and double quote are together in a line and replace them with just the " double quote. I can replace all line feeds, for one gigantic line of data, but that's no good either...I'm not having much success.

    Here is my brilliant, errrr, not so successful code to date. http://forums.devshed.com/newthread.php?do=newthread&f=35#
    eh?

    sed -e 's/"\012/"/g' File1 > File2


    TIA for any help.

    joe.
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2003
    Posts
    210
    Rep Power
    28
    Originally Posted by jabs
    Hi all,

    I want to write a sed or awk routine that will find the instance where a line feed and double quote are together in a line and replace them with just the " double quote. I can replace all line feeds, for one gigantic line of data, but that's no good either...I'm not having much success.

    Here is my brilliant, errrr, not so successful code to date. http://forums.devshed.com/newthread.php?do=newthread&f=35#
    eh?

    sed -e 's/"\012/"/g' File1 > File2


    TIA for any help.

    joe.
    You'll want to use awk for this purpose. Sed doesn't doesn't handle newlines. Try this:

    Code:
    awk '/\"$/{printf "%s",$0; next}{print}' file1 > file2
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2004
    Posts
    19
    Rep Power
    0
    Originally Posted by stanleypane
    You'll want to use awk for this purpose. Sed doesn't doesn't handle newlines. Try this:

    Code:
    awk '/\"$/{printf "%s",$0; next}{print}' file1 > file2
    Thanks for the help. Unfortuntely, it made the file one huge line.

    Here is an example of the data I'm using:

    1234|320|1|"Sample of data"
    1234|321|2|"for DevShed"
    these are good.

    Here are the bad lines
    9876|1000|1|"This sample
    "
    9876|1001|2|"show the bad rec
    "
    56574|1015|1|"Another bad one
    "

    For the good data, obviously, I'd like it to remain the same. For the bad, I'd like to move the " to the end of the preceding line (or add a " to the end of line 1 and delete line 2).

    Thanks again for looking at this!

    Joe.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2003
    Posts
    210
    Rep Power
    28
    Ahhhh... I understand your problem a little better now. Sorry for the confusion.

    If you know that all offending lines always begin with a doube quote, then you can simply remove those lines via grep. Then, any lines that don't end with a double quote can have it added.

    There's probably a billion ways to do this, but here goes two:
    Code:
    Method 1 - grep & sed
    
       grep "^[^\"]" file1 | sed -e "s/\([^\"]\)$/\1\"/g" > file2
    
    Method 2 - grep & awk
    
       grep "^[^\"]" file1 | awk '/[^\"]$/{printf "%s\"\n",$0; next}{print}' > file2
    Hope this helps!
  8. #5
  9. Not a clue what to put ...
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2006
    Location
    in front of this keyboard
    Posts
    879
    Rep Power
    333
    Change it to %s\n in the printf to add a newline in.
    According to Sod's Law, buttered toast lands butter side down, when dropped.
    Per nature, cats always land on their feet.
    So, what happens when you strap buttered toast to the back of a cat and throw it out a window?
    .
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2004
    Posts
    19
    Rep Power
    0
    Originally Posted by stanleypane
    Ahhhh... I understand your problem a little better now. Sorry for the confusion.

    If you know that all offending lines always begin with a doube quote, then you can simply remove those lines via grep. Then, any lines that don't end with a double quote can have it added.

    There's probably a billion ways to do this, but here goes two:
    Code:
    Method 1 - grep & sed
    
       grep "^[^\"]" file1 | sed -e "s/\([^\"]\)$/\1\"/g" > file2
    
    Method 2 - grep & awk
    
       grep "^[^\"]" file1 | awk '/[^\"]$/{printf "%s\"\n",$0; next}{print}' > file2
    Hope this helps!
    Yes - The grep/sed got me most of the way there. I just did a couple of manual edits and it worked fine. Thanks again for your help.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2004
    Posts
    19
    Rep Power
    0
    Originally Posted by Ehlanna
    Change it to %s\n in the printf to add a newline in.
    Thanks for your help! (I like that avatar too!)
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2003
    Posts
    210
    Rep Power
    28
    Originally Posted by Ehlanna
    Change it to %s\n in the printf to add a newline in.
    I've got the \n in there. It just has a \" just before it. He was wanting to add a quote before the newline.
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2006
    Posts
    177
    Rep Power
    234
    Originally Posted by jabs
    Hi all,

    I want to write a sed or awk routine that will find the instance where a line feed and double quote are together in a line and replace them with just the " double quote. I can replace all line feeds, for one gigantic line of data, but that's no good either...I'm not having much success.

    Here is my brilliant, errrr, not so successful code to date. http://forums.devshed.com/newthread.php?do=newthread&f=35#
    eh?

    sed -e 's/"\012/"/g' File1 > File2


    TIA for any help.

    joe.
    Here's a Python alternative, without regular expressions
    Input:
    1234|320|1|"Sample of data"
    1234|321|2|"for DevShed"
    9876|1000|1|"This sample
    "
    9876|1001|2|"show the bad rec
    "
    56574|1015|1|"Another bad one
    "


    Code:
    >>> for lines in open("input.txt"):
    ... 	lines = lines.strip() #strip newlines
    ... 	if not lines == '"':
    ... 		if not lines.endswith('"'):
    ... 			print lines + '"'
    ... 		else:
    ... 			print lines
    ... 
    1234|320|1|"Sample of data"
    1234|321|2|"for DevShed"
    9876|1000|1|"This sample"
    9876|1001|2|"show the bad rec"
    56574|1015|1|"Another bad one"
  18. #10
  19. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2004
    Posts
    19
    Rep Power
    0
    Originally Posted by ghostdog74
    Here's a Python alternative, without regular expressions
    Input:
    1234|320|1|"Sample of data"
    1234|321|2|"for DevShed"
    9876|1000|1|"This sample
    "
    9876|1001|2|"show the bad rec
    "
    56574|1015|1|"Another bad one
    "


    Code:
    >>> for lines in open("input.txt"):
    ... 	lines = lines.strip() #strip newlines
    ... 	if not lines == '"':
    ... 		if not lines.endswith('"'):
    ... 			print lines + '"'
    ... 		else:
    ... 			print lines
    ... 
    1234|320|1|"Sample of data"
    1234|321|2|"for DevShed"
    9876|1000|1|"This sample"
    9876|1001|2|"show the bad rec"
    56574|1015|1|"Another bad one"
    I've never used python, but I'll give this a try. Cheers!
  20. #11
  21. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2006
    Location
    Sweden
    Posts
    14
    Rep Power
    0
    Originally Posted by stanleypane
    Code:
    Method 1 - grep & sed
    
       grep "^[^\"]" file1 | sed -e "s/\([^\"]\)$/\1\"/g" > file2
    Some of the chars can be ommited, here is a purged version:
    Code:
    grep -v '^"' file | sed 's/\([^"]$\)/\1"/' > file2
    grep only supplies sed with one line at once therefore the 'g' option can be omitted.

    That was just my 50 cents

IMN logo majestic logo threadwatch logo seochat tools logo