#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2009
    Posts
    34
    Rep Power
    9

    Thumbs up Using Perl to merge specify lines of two different files.


    Hi,
    if I have two files, how can I merge two files based on the same contents within them?
    For example. File1:
    Fly1 Hum11 Ali
    Fly3 Hum7 Abu
    Fly7 Hum36 Amy

    File2:
    Fly1 Hum11 Ali overeating 4
    Fly2 Ta9 Ahmad stupid 5
    Fly4 Hu10 Pat overlazy 12
    Fly3 Hum7 Abu oversleeping 32
    Fly7 Hum36 Amy light-sensing 22

    Desired output result:
    Fly1 Hum11 Ali overeating 4
    Fly3 Hum7 Abu oversleeping 32
    Fly7 Hum36 Amy light-sensing 22


    My senior is advised me to modify based on the command line b4:


    perl -e ' $col1=1; $col2=0; ($f1,$f2)=@ARGV; open(F2,$f2); while (<F2>) { s/\r?\n//; @F=split /\t/, $_; $line2{$F[$col2]} .= "$_\n" }; $count2 = $.; open(F1,$f1); while (<F1>) { s/\r?\n//; @F=split /\t/, $_; $x = $line2{$F[$col1]}; if ($x) { $num_changes = ($x =~ s/^/$_\t/gm); print $x; $merged += $num_changes } } warn "\nJoining $f1 column $col1 with $f2 column $col2\n$f1: $. lines\n$f2: $count2 lines\nMerged file: $merged lines\n"; ' file1 file2


    anybody got better idea to modify this command line to generate the desired output?

    My desired output result will just take only those lines in file2 where all the first three columns have the exactly same contents in file1.

    Hope can get all of your advise to modify the command line that I provided. Or anybody got any better command line or script to solve this problem?

    Really thanks a lot for all of your advise
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2006
    Posts
    177
    Rep Power
    237
    why don;t you ask your senior to explain to you? its faster this way isn't it?
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2009
    Posts
    34
    Rep Power
    9
    Originally Posted by ghostdog74
    why don;t you ask your senior to explain to you? its faster this way isn't it?
    actually my senior just give me some "cues" as "tips" to solve this problem.
    I just start learn about the perl. Thus asking more explanation or suggestion at forum d..
    Hope can learn from all of your advise
    you got idea to solve this problem d?
    thanks a lot ...
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2006
    Posts
    177
    Rep Power
    237
    Code:
    # grep -i -f file1 file2
    Fly1 Hum11 Ali overeating 4
    Fly3 Hum7 Abu oversleeping 32
    Fly7 Hum36 Amy light-sensing 22
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2009
    Posts
    34
    Rep Power
    9
    Originally Posted by ghostdog74
    Code:
    # grep -i -f file1 file2
    Fly1 Hum11 Ali overeating 4
    Fly3 Hum7 Abu oversleeping 32
    Fly7 Hum36 Amy light-sensing 22
    hi,
    if I got a list of contents in file1 and file2 d, your command line (grep -i -f file1 file2) still can work d?
    thanks a lot for your explanation ^_^
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2009
    Location
    Charlotte, NC
    Posts
    111
    Rep Power
    12
    I find that when you have a delimited file that both grep and join are good solutions. The perl command your trying is a bit complex from the command line.
    Code:
    join -t" " -o 1.1,1.2,1.3,2.4,2.5 file1 file2
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2009
    Posts
    34
    Rep Power
    9
    Originally Posted by ldapswandog
    I find that when you have a delimited file that both grep and join are good solutions. The perl command your trying is a bit complex from the command line.
    Code:
    join -t" " -o 1.1,1.2,1.3,2.4,2.5 file1 file2
    hi, I try the command line that you suggested ald...
    It is function but can't generate my desired output result
    Do you any better suggestion?
    Thanks a lot for your advise
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2009
    Posts
    34
    Rep Power
    9
    Originally Posted by ghostdog74
    Code:
    # grep -i -f file1 file2
    Fly1 Hum11 Ali overeating 4
    Fly3 Hum7 Abu oversleeping 32
    Fly7 Hum36 Amy light-sensing 22
    Hi,
    I try the command line you suggested already d...
    It is consuming long time when dealing with the bigger file size data.
    Do you have any better suggestion to improve the command line that you given?
    I think grep and join all takes longer time when dealing with large size file...
    awk and perl maybe more functional when dealing with huge size file
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2009
    Location
    Charlotte, NC
    Posts
    111
    Rep Power
    12
    I have edited your perl command, see how it works on a large file. You can test the execution times of the different commands that have been provided by using the 'time' utility
    Code:
    perl -e '
      $col1=1;
      $col2=1;
      ($f1,$f2)=@ARGV;
      open(F2,$f2);
      open(F1,$f1);
      while (<F2>) {
        chomp;
        @F2=split /\s/, $_;
        $line2{$F2[$col2]} .= "$_\n"
      }
      $count2 = $.;
      while (<F1>) {
        chomp;
        @F1=split /\s/, $_;
        $x = $line2{$F1[$col1]};
        print "$x\n";
        if ($x == "") {
          $num_changes = ($x =~ s/^/$_\t/gm);
          $merged += $num_changes
        }
      }
      warn "\nJoining $f1 column $col1 with $f2 column $col2\n$f1: $. lines\
    ount2 lines\nMerged file: $merged lines\n";
    ' test1 test2
    Last edited by ldapswandog; April 6th, 2009 at 05:46 PM.
  18. #10
  19. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2009
    Posts
    34
    Rep Power
    9
    Originally Posted by ldapswandog
    I have edited your perl command, see how it works on a large file. You can test the execution times of the different commands that have been provided by using the 'time' utility
    Code:
    perl -e '
      $col1=1;
      $col2=1;
      ($f1,$f2)=@ARGV;
      open(F2,$f2);
      open(F1,$f1);
      while (<F2>) {
        chomp;
        @F2=split /\s/, $_;
        $line2{$F2[$col2]} .= "$_\n"
      }
      $count2 = $.;
      while (<F1>) {
        chomp;
        @F1=split /\s/, $_;
        $x = $line2{$F1[$col1]};
        print "$x\n";
        if ($x == "") {
          $num_changes = ($x =~ s/^/$_\t/gm);
          $merged += $num_changes
        }
      }
      warn "\nJoining $f1 column $col1 with $f2 column $col2\n$f1: $. lines\
    ount2 lines\nMerged file: $merged lines\n";
    ' test1 test2
    Hi,
    the command line that you suggested can't generate the desired output result at the end.
    You got any better suggestion?
    Thanks a lot for your advise
  20. #11
  21. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2009
    Location
    Charlotte, NC
    Posts
    111
    Rep Power
    12
    The script works for me. You now have three methods of joining the two files. I suggest you now spend some time learning how they work and then modify them to get the result you need. Have fun!
    Last edited by ldapswandog; April 7th, 2009 at 08:38 PM.
  22. #12
  23. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2009
    Posts
    34
    Rep Power
    9
    Originally Posted by ldapswandog
    The script works for me. You now have three methods of joining the two files. I suggest you now spend some time learning how they work and them modify them to get the result you need. Have fun!
    really?
    I will try to run it again...
    Thanks a lot for your advise and suggestion...
    Have a nice day
  24. #13
  25. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2009
    Location
    Charlotte, NC
    Posts
    111
    Rep Power
    12
    The perl script is splitting the line in the file based on a {space} being the delimiter "@F2=split /\s/, $_;" see '\s'. Your original was using {tab} '\t', so you may need to make that change. Have fun learning.
  26. #14
  27. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2009
    Posts
    34
    Rep Power
    9
    Originally Posted by ldapswandog
    The perl script is splitting the line in the file based on a {space} being the delimiter "@F2=split /\s/, $_;" see '\s'. Your original was using {tab} '\t', so you may need to make that change. Have fun learning.
    Hi, I think I get what you mean now.
    Hope it can work now
    Really thanks a lot for your remind.
    Although I still fresh with perl, I feel it quite interesting
  28. #15
  29. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2009
    Posts
    4
    Rep Power
    0
    Here is a Perl solution that I think is easy to understand.

    ------------------------------------------------------------------------

    my %h = ();

    open(F1,'file1');
    while(<F1>) {chomp;
    $h{$_}='';
    }

    open(F2,'file2');
    while(<F2>) {chomp;
    if(/(\S+\t\S+\t\S+)\t(.*)/){
    if(exists $h{$1}) {
    $h{$1} = $2;
    }
    }
    }

    for my $k (sort keys %h){

    print $k."\t".$h{$k}."\n";
    }


    ------------------------------------------------------------------------

    In summary:

    1. Read file1. Store each line as a key in %h.

    2. Read file 2. Check if each line corresponds to a key in %h. If yes, then store the non-match part of the line as a value for that key.

    3. Print the results

IMN logo majestic logo threadwatch logo seochat tools logo