#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2011
    Posts
    6
    Rep Power
    0

    Merging Files in Unix


    We have two set of files, let's say they are in directory a and directory b. Let's say the file names are a1 a2 a3 a4 ... and b1 b2 b3 b4.

    What I need to do is to create a 3rd set of files in directory c where files from directory a and b are merged. I need to compare each line from file1 of directory a with file1 of directory b and write it into file1 of directory c. I need this automated in a shell script. Can this be done, any suggestions?

    Example:

    file1 of a:
    ffffff eee ccc r 12 ddd fff k

    file1 of b:
    ccc bbbb zzz nnn eeeee aaaaaaaaa 3

    file1 of c:
    12 3 aaaaaaaaa bbbb ccc ddd eee eeeee fff ffffff k nnn r zzz
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Mar 2006
    Posts
    2,436
    Rep Power
    1688
    If you care not for error handling, you should be able to do this with a simple script and some awk. Find a simple way to identify the fixed parts of the file names and do that in a loop which is driven the a list of file names in either directory a or b, inside the loop pull make the filename of the file from the other directories (b or a, plus c). Use the -v parameter to pass awk the name of the secondary file, then just, for each line of the driving file grab the next line of the secondary file, build the needed output line and print it.
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2011
    Posts
    6
    Rep Power
    0
    Thanks a lot for the guidance. I am working on it now and will update thread with status. Any chance you have a general sample of what the script could look like?
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Mar 2006
    Posts
    2,436
    Rep Power
    1688
    Rats - sorry, I don't know why but I thought you could get awk to read a different file to the one being used to drive the process. It's been too long since I have done this!
    But, fear not - there will be a way, and almost certainly a better or more efficient way than what I am thinking of!

    Code:
    dir_a="/dira"
    dir_b="/dirb"
    dir_c="/dirc"
    dir_t="/tmp"
    
    tfile="${dir_t}/tempfile.tmp"
    for fname in $(ls -1p $dir_a | grep -v "/$")
    do
      aname="${dir_a}/${fname}"
    # get the filename parts here, using whatever is needed
      bprefix=???
      anumber=???
      bname="${dir_b}/${bprefix}${anumber}"
      cname="${dir_c}/${cprefix}${anumber}"
      t1="${dir_t}/t1.tmp"
      t2="${dir_t}/t2.tmp"
      awk '{print NR,$0}' $aname >$t1
      awk '{print NR,$0}' $bname >$t2
      join $t1 $t2 > " ${tfile}"
      awk '{print $6, $16, ...}' $tfile > $cname
    done
    We cheat by adding on a dummy line number into each file, use that as a join key to get the lines joined, then use that temporary output file in awk to order the fields as you wish them to be, remembering that $1 will be the dummy number we added.
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2011
    Posts
    6
    Rep Power
    0
    If the directories and files indicated in the shell below are already created would this work? The files would contain content on multiple lines like:

    aaa
    aaa


    bbb
    bbb

    11
    11

    Would this simple script work in taking files from dira and dirb and creating a new file in dirc that has no multiples and would have sorted content?

    #!/bin/bash

    #must have three directories (dira dirb dirc)

    #must have 6 files (/dira/aone /dira/atwo /dira/athree /dirb/bone /dirb/btwo /dirb/bthree)

    for filename in one two three

    do
    cat /dira/a$filename /dirb/b$filename > /dirc/c$filename

    $ cat /dirc/c$filename

    sort | uniq /dirc/c$filename

    $ cat /dirc/c$filename


    done


    thanks!
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Mar 2006
    Posts
    2,436
    Rep Power
    1688
    Ok, we have a movable feast - requirements have changed somewhat!

    Try this (untested!):
    Code:
    #!/bin/bash
    
    #must have three directories (dira dirb dirc)
    #must have 6 files (/dira/aone /dira/atwo /dira/athree /dirb/bone /dirb/btwo /dirb/bthree)
    
    for filename in one two three
    do
      sort -u /dira/a$filename /dirb/b$filename -o /dirc/c$filename
    done
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2011
    Posts
    6
    Rep Power
    0
    this is what I think will work as long as the newly combined file has an even number of characters and all characters are in one line. Let me know what you think and if there is another piece of code that would solve the even # of characters problem. Thanks again for all your input.

    #!/bin/bash
    # all files must contain an even number of characters or else this shell will error out
    y=1
    echo "Input the # of file(s) in each directory"
    read x
    while [ $y -le $x ]
    do
    filea="/dira/a"$y
    fileb="/dirb/b"$y
    filetemp="/temp/temp"$y
    filec="/dirc/c"$y

    paste $filea $fileb > $filetemp
    tsort $filetemp |sort -d |awk '{ str1=str1 $0 " "}END{ print str1 }' > $filec
    cat $filec

    done
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Mar 2006
    Posts
    2,436
    Rep Power
    1688
    Sad to say it won't work since you seem to be not incrementing $y inside you (currently infinite) loop.
    Not sure why you're doing two sorts, but the rest seems to be ok.
    Don't know why the 'even number of characters' is happening, nor where ...!
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2011
    Posts
    6
    Rep Power
    0
    How about now? Will it run now with the newly added second to last line?

    #!/bin/bash
    # all files must contain an even number of characters or else this shell will error out
    y=1
    echo "Input the # of file(s) in each directory"
    read x
    while [ $y -le $x ]
    do
    filea="/dira/a"$y
    fileb="/dirb/b"$y
    filetemp="/temp/temp"$y
    filec="/dirc/c"$y

    paste $filea $fileb > $filetemp
    tsort $filetemp |sort -d |awk '{ str1=str1 $0 " "}END{ print str1 }' > $filec
    cat $filec
    y=$(( $y + 1 ))
    done
  18. #10
  19. No Profile Picture
    Contributing User
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Mar 2006
    Posts
    2,436
    Rep Power
    1688
    Yes, that should work. Four comments:
    You are not checking that your number input is actually numeric.
    If you will be wanting to process all the files then it would be better to have the loop driven by the files, not a user input.
    The tempfile can be defined just once, outside the loop, as it is, after all, temporary and is always over-written.
    The awk could be re-done as awk '{ printf(%s " ",$0) }' which may be better if there is a lot of lines to be appended.
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc
  20. #11
  21. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2011
    Posts
    6
    Rep Power
    0
    Thanks, everything worked out successfully! I really appreciate all the feed back.
  22. #12
  23. No Profile Picture
    Contributing User
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Mar 2006
    Posts
    2,436
    Rep Power
    1688
    Glad it all worked out!
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc
  24. #13
  25. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2011
    Posts
    99
    Rep Power
    0
    I use unix very rarely but its good knowledge if i need to merge the file in unix in future. Thanks

IMN logo majestic logo threadwatch logo seochat tools logo