December 10th, 2013, 08:20 AM
Remove duplicates in flat files
I have a issues while loading a flat file to the DB. It is taking much time.
When analyzed i found out that there are duplicates entry in the flat file.
There are 2 type of Duplicate entry.
1) is entire row is duplicate. ( i can use sort | uniq) to remove the duplicated entry.
2) the PK which are forming the composite columns are same for 2 records , but the other columns are different which is also rejected and only one is getting loaded. PFB an example for the same.
My Pk are 1 , 4, 6, 8 from the flat file which is going to be loaded into the DB.
Column names : 1 2 3 4 5 6 7 8 9 10
Records 1 a b c d e f g h i j
records 2 a k l d m f n h o p
So since my PK are alone same and the rest is also different the Loader is ommiting those records. Can you tell me a script by which i can omit the record 2.
Please help.. We are in brink of issues to be fixed before tomorrow evening.
Thanks in advance