|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
How can I delete all but one instance of a duplicate row?
I have several rows in a table that contain duplicate information. I want to trim them down to one instance per row, without other duplicates.
I have several columns, but only 2 of the columns are used to determine the "Identical Rows", however one of the other columns could also be used to determine which of the others are deleted. Let's just say I have a table called DATA. In DATA I have the following columns: C1: ID/Key column (Unique numbers not associated with any other row) C2: Specific Data column 1 C3: Specific Data Column 2 C3: Keeper if not null C4: Misc Info C5: Misc Info Basically IF C2 & C3 are duplicated, then I want to keep one of these and ditch the others. The row I want to keep is the one that has info in C4, but at times C4 does not have anything in it on any of them, at that point I just care about keeping one of the C2 & C3 Duplicate rows. The other issue I face is C2 will always have something in it, but at times C3 will not and C4 will. So I want to keep the one duplicate row that has the most info in it. At minimum, I want to always keep C2 & C3 if they both contain data. For example, if I had the following duplicate rows: (C2) red (C3) 123 (C4) XYZ (C2) red (C3) 123 (C4) XYZ (C2) red (C3) 123 (C4) XYZ (C2) red (C3) (C4) XYZ (C2) red (C3) 123 (C4) I want to keep: (C2) red (C3) 123 (C4) XYZ I have some rows that have three exact duplicates, some that have four, some that have two. How can I tell the Query Manager to delete all but one instance of each duplicate row? I did do a query to find out how many duplicates I have and found it to be around 232,000+. Basically I will be cutting this in half if I can get rid of the duplicates. I have read 30 different ways on many sites and I am now officially confused Thank you for your help. Last edited by confuzed2xover : October 7th, 2004 at 11:10 PM. Reason: Realized another issue |
|
#2
|
|||
|
|||
|
Well I have gotten this far with the above question:
Code:
select C1,C2,C3, C4,C5,C6,C7,C8, C9 from DATA WHERE EXISTS (Select C2, C3, C4 from DATA GROUP BY C2, C3,C4 HAVING count(*) > 1) ORDER BY C2, C3 Where I am having issues is turning this into a statement to delete all but one row of each unique row. I want to keep one of the rows and delete the other duplicates. Any ideas? |
|
#3
|
||||
|
||||
|
which one do you want to keep?
you have to have some rule, like pick the row with the largest C4 or something |
|
#4
|
|||
|
|||
|
Still confused
Where I am having the issue is.............
I want to keep one of the rows, but I want to keep the row that has the most information in it. If I have a row that has C2, C3, & C4 populated, and that has three other exact matches, I want to keep one of those rows and delete the others. If I have a row that has C2, C3, C4, & C5 that has several other matches with C2, C3, & C4, then I want to keep the row that has C2, C3, C4, & C5 populated. I know this sounds kind of confusing, but I'll explain what I have and why....... I have a table that is populated with several columns of information about parts. C2 is the part number column C3 is the alternate number C4 is the description C5 is the type of part C6 is the manufacturer C7 is the manufacturer code C8 is the Revision or the Series/make/model. Basically I am trying to keep the one row that has the most information in it, and delete the other rows. All rows will have at least part number & manufacturer. From there is where it gets more detailed. Basically I want to keep one row with the most information in it. The other problem I face is that though part number and manufactuer will be consistent, there are many times I will have the same part number with several different alternate part numbers. I need to keep one row of each that has the most information in it with the alternate part number as well. Okay, that probably just got confusing which is why I am having problems figuring out how to do this. Some part numbers do not have or will never have alternate part numbers, but some have many different alternate part numbers for the same part number, that is why I have a seperate column for the alternate part numbers. This way when I look up a part number, I can see all the alternate numbers that one part number has as well, or if I look up the alternate, I can see the industry common part number. So with that in mind, I want to have only one reference of a particular part number, alternate, and description at minimum. Though there are several duplicates that are just part number & manufacturer & type. What I am faced with is multiple scenerios of duplicates that are a wide range of exact matches. It could be C2, C3, C4, & C5 that match several, it could be C2, C4, C5, & C8 that have exact matches, etc. My main objective is to keep one row that has the most information of a particular part number & or alternate part number & manufacturer. |
|
#5
|
||||
|
||||
|
okay, i have an idea
do an ordinary SELECT, pull out every row, and assign each row a "score" this score will be based on how many columns have information save the result into a table then you can delete all rows from the main table where they match a row in the saved table that has a lower than maximum score for its group make sense? |
|
#6
|
|||
|
|||
|
Not being a SQL guru or anything, I will have to research the "Score" thing, but that starts me in the right direction.
I think I can see where you are going with this and it sort of makes sense, though I will have to research how to get a score for each, I think I can see how this would work. I'll hammer on it...................... Thank you so much for your input, it is invaluable. I really appreciate your suggestions, you have helped me more than once and I am very grateful. |
|
#7
|
||||
|
||||
|
okay, good luck
remember to use CASE expressions, and you can nest them if necessary e.g. Code:
select case when C4 is not null
then case when C5 is null
then 3
else 5
end
else 12
end as score
from yourtable
|
![]() |
| Viewing: Dev Shed Forums > Databases > MS SQL Development > How can I delete all but one instance of a duplicate row? |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|