#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2005
    Posts
    3
    Rep Power
    0

    Unix "Join" help please!!


    Hi new to all this so bear with me.

    I have two lists. List A has the old names and some values for each of the names

    e.g.
    List A:
    snpa 3.2
    snpb -2
    snpc 0


    List B has old names and new names equivalent:
    snpa rs1234
    snpb rs2345
    snpc re3456

    Now I have over 50,000 lines in each and want it to end up as

    Final list after join.
    rs1234 3.2
    rs2345 -2
    rs3456 0

    The code I have been using:

    join fileA.txt filebB.txt | awk '{print $3 ,$2}' | sort -k1 > final_list.txt

    But all I get is the first few lines rather the 50k lines I should get.

    Why is it working but only for a small amount of the data?
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Mar 2006
    Posts
    2,448
    Rep Power
    1751
    Do you have a complete, one-to-one, relationship between the two files? Drop each command from the pipeline from the right, and check what exit code is produced (echo $?) to see where, if anywhere, an error is happening.
    Being memory based it is possible that the join is losing some data.
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2005
    Posts
    3
    Rep Power
    0
    No file B has more entries than file A
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Mar 2006
    Posts
    2,448
    Rep Power
    1751
    You should, then, only get in your output file a maximum number of lines equal to those that are within file A. That will be further reduced by any lines within file A that do not have a match within file B.
    What happens if you also use -a 1, -a 2 or -e Empty?
    The moon on the one hand, the dawn on the other:
    The moon is my sister, the dawn is my brother.
    The moon on my left and the dawn on my right.
    My brother, good morning: my sister, good night.
    -- Hilaire Belloc
  8. #5
  9. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Sep 2006
    Posts
    854
    Rep Power
    387

    Cool


    Originally Posted by qnc
    No file B has more entries than file A
    And are they SORTED in the same key order?

IMN logo majestic logo threadwatch logo seochat tools logo