August 28th, 2012, 08:54 AM
Matching multiple lines with fixed starting strings and intervening other strings
I am a perl newbie, a grep novice, and I can't figure out how to handle the following.
Say I have file that that contains info about Andy, Bill and Chris, like this.
Andy has a dog.
Bill has a cat.
Chris has a frog.
Andy is a chemist.
Bill has is a painter.
Chris is a pilot.
... and so on.
My goal is to create a new file like this:
Andy has a dog. Bill has a cat. Chris has a frog.
Andy is a chemist. Bill has is a painter. Chris is a pilot.
this I can do, assuming that none of the relevant data is on intervening lines, i.e. that do not start with Andy, Bill or Chris. The data guarantees that all of the date btw an Andy and Bill line, btw a Bill and Chris line, or btw a Chris and Andy line, is part of the description of the first of each of these pairs. So, e.g., parts of the file are like this.
Andy is tall
and like football
Bill is friendly
Chris is cheerful
and the result I want is this is:
Andy is tall and like football and baseball. Bill is friendly and jocular. Chris is cheerful. [a single line]
Andy is... [a new line]
any pointers anyone could provide would be awesome (I would rather not confess how many hours I have spent on this so far).
August 28th, 2012, 10:44 AM
Not entirely sure of what you want exactly. Would removing each newline character of your file, except when it come right before Andy, fit what you are trying to do?
August 28th, 2012, 11:04 AM
Yes. I have tried and failed to do that with something like:
$new_line =~ s/[/r|/n]// if $new_line !~ m/^Andy/;
August 28th, 2012, 02:02 PM
Originally Posted by jackvio
$new_line =~ s/[/r|/n]// if $new_line !~ m/$Andy/;
August 28th, 2012, 02:11 PM
Something like that should work, I think, but it really depends on what code you have around that.
But you have to correct some errors:
- It should be \n, not /n, and \r, not /r
- you don't need to put an alternation in a character class, some remove the "|" in the s/// statement. Also put a "g" modifier.
So, in brief change you s/// statement to:
Actually, I doubt the "\r" is needed, but it depends which OS your file is coming from and on which OS you are processing it.
August 28th, 2012, 03:52 PM
Sorry - I am working on one machine and typing here on another (security situation). The exact line I have now (except "Andy" is an obfuscation, is this:
$NewLine =~ s/\n//g if $NewLine !~ m/\nAndy/;
and somehow this removes all line breaks from the file. In vi, I checked that \nAndy matches exactly the lines that precede the ones beginning with Andy, i.e. those that should be skipped by the replacement expression. What the heck? I don't know if this is helpful, but in vi, in the output file, each line that was a single line in the input file now ends with a blue (other text is white) ^M character sequence. These do not appear in the input file.
August 29th, 2012, 02:45 AM
Thinking about it, you code cannot work as you wish. It will not remove the line feeds at the right place.
I think you should do it in two steps:
- First remove all line feed characters from yopur text
- Then add a line feed right before each occurrence of Andy.
Something like this, assuming all your text is in a multiline string variable $line:
$line =~ s/\n//g;
$line =~ s/Andy/\nAndy/g;
August 29th, 2012, 09:26 AM
perl -pe 's/\n/ /; $_ = qq(\n$_) if /^Andy/' test.txt > fixed.txt
August 29th, 2012, 07:04 PM
Thanks! That works perfectly.
Originally Posted by Laurent_R
August 30th, 2012, 09:01 AM
Removing newlines from the end of a string is better achieved using chomp.