August 14th, 2013, 11:58 AM
Updating Rows and Merging Duplicates
I'm a relative beginner to MySql, and I think this should be a relatively simple question. However, I've searched online and on these forums for an answer for a while and I haven't found it.
Here's my problem, simplified to it's core.
I have a database of addresses. The full address (Number + Street name + city) is the key index.
Here's an example:
Number Street_name City
10 Main St Centerville
12 Main St Centerville
12 Main St Centervill
15 Main St Centerville
15 Main St Centervill
Here's the thing: all of the data entries without the 'e' at the end of 'Centerville' are mis-spellings. I need to clean the data. When I try to run a simple update, I am getting an error stating that there is duplication in the key index (the update script runs properly when there isn't duplication).
I have another table with data on what is at each location, using foreign keys to point to this table. I can't just delete the mis-spellings without losing the foreign keys.
Right now, I'm being told to manually trawl through the addresses and just insert a 0 before each address with a duplicate (thus making the key indexes different, and SQL will now accept the updates). The database has thousands of entries and over 150 misspellings, some of which are duplicated 100+ times. The process they want me to do is time-consuming, clunky, and we still end up with multiple entries for the same address.
Is there a way to update the city names and merge any resulting duplicates without losing the foreign keys?
Thanks for any help you can give me. Even if you just want to point me to a relevant article, that would help a ton.
August 15th, 2013, 04:51 AM
When you have foreign keys you must always traverse the dependency structure until you find the last table that doesn't depend on anyone else and fix the foreign key so that it points to the right reacord and then you can change the actual record in the "mother" table.
Originally Posted by JSVG
I would say you don't have to manually go through everything, if it was me I would probably run a few
SELECT someField, COUNT(*) FROM someTable GROUP BY someField
to get to know how many duplicates of different records you have (erroneous and correct).
And then I would probably write separate updates to fix a certain error.
This of course depends on that there are enough similar errors. If there is only on row wrong with each address all the writing of queries would probably take longer time than just fix the errors.
As I've only written this very briefly I hope that you can find it useful, otherwise just ask again.