#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Posts
    5
    Rep Power
    0

    Matching multiple lines with fixed starting strings and intervening other strings


    I am a perl newbie, a grep novice, and I can't figure out how to handle the following.

    Say I have file that that contains info about Andy, Bill and Chris, like this.

    Andy has a dog.
    Bill has a cat.
    Chris has a frog.
    Andy is a chemist.
    Bill has is a painter.
    Chris is a pilot.
    ... and so on.

    My goal is to create a new file like this:

    Andy has a dog. Bill has a cat. Chris has a frog.
    Andy is a chemist. Bill has is a painter. Chris is a pilot.

    this I can do, assuming that none of the relevant data is on intervening lines, i.e. that do not start with Andy, Bill or Chris. The data guarantees that all of the date btw an Andy and Bill line, btw a Bill and Chris line, or btw a Chris and Andy line, is part of the description of the first of each of these pairs. So, e.g., parts of the file are like this.

    Andy is tall
    and like football
    and baseball.
    Bill is friendly
    and jocular.
    Chris is cheerful
    but lonesome.
    Andy is...

    and the result I want is this is:

    Andy is tall and like football and baseball. Bill is friendly and jocular. Chris is cheerful. [a single line]
    Andy is... [a new line]

    any pointers anyone could provide would be awesome (I would rather not confess how many hours I have spent on this so far).
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Location
    Paris area, France
    Posts
    843
    Rep Power
    496
    Not entirely sure of what you want exactly. Would removing each newline character of your file, except when it come right before Andy, fit what you are trying to do?
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Posts
    5
    Rep Power
    0
    Yes. I have tried and failed to do that with something like:

    $new_line =~ s/[/r|/n]// if $new_line !~ m/^Andy/;
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Posts
    5
    Rep Power
    0
    Originally Posted by jackvio
    Yes. I have tried and failed to do that with something like:

    $new_line =~ s/[/r|/n]// if $new_line !~ m/^Andy/;

    I "meant":

    $new_line =~ s/[/r|/n]// if $new_line !~ m/$Andy/;
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Location
    Paris area, France
    Posts
    843
    Rep Power
    496
    Something like that should work, I think, but it really depends on what code you have around that.

    But you have to correct some errors:
    - It should be \n, not /n, and \r, not /r
    - you don't need to put an alternation in a character class, some remove the "|" in the s/// statement. Also put a "g" modifier.

    So, in brief change you s/// statement to:

    Code:
    s/[\r\n]//g
    Actually, I doubt the "\r" is needed, but it depends which OS your file is coming from and on which OS you are processing it.
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Posts
    5
    Rep Power
    0
    Sorry - I am working on one machine and typing here on another (security situation). The exact line I have now (except "Andy" is an obfuscation, is this:

    $NewLine =~ s/\n//g if $NewLine !~ m/\nAndy/;

    and somehow this removes all line breaks from the file. In vi, I checked that \nAndy matches exactly the lines that precede the ones beginning with Andy, i.e. those that should be skipped by the replacement expression. What the heck? I don't know if this is helpful, but in vi, in the output file, each line that was a single line in the input file now ends with a blue (other text is white) ^M character sequence. These do not appear in the input file.
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Location
    Paris area, France
    Posts
    843
    Rep Power
    496
    Thinking about it, you code cannot work as you wish. It will not remove the line feeds at the right place.

    I think you should do it in two steps:
    - First remove all line feed characters from yopur text
    - Then add a line feed right before each occurrence of Andy.

    Something like this, assuming all your text is in a multiline string variable $line:

    Perl Code:
    $line =~ s/\n//g;
    $line =~ s/Andy/\nAndy/g;
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,967
    Rep Power
    1225
    Code:
    perl -pe 's/\n/ /; $_ = qq(\n$_) if /^Andy/' test.txt > fixed.txt
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Posts
    5
    Rep Power
    0
    Originally Posted by Laurent_R
    Thinking about it, you code cannot work as you wish. It will not remove the line feeds at the right place.

    I think you should do it in two steps:
    - First remove all line feed characters from yopur text
    - Then add a line feed right before each occurrence of Andy.

    Something like this, assuming all your text is in a multiline string variable $line:

    Perl Code:
    $line =~ s/\n//g;
    $line =~ s/Andy/\nAndy/g;
    Thanks! That works perfectly.
  18. #10
  19. kill 9, $$;
    Devshed Supreme Being (6500+ posts)

    Join Date
    Sep 2001
    Location
    Shanghai, An tSín
    Posts
    6,898
    Rep Power
    3887
    Removing newlines from the end of a string is better achieved using chomp.

IMN logo majestic logo threadwatch logo seochat tools logo