#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2013
    Posts
    1
    Rep Power
    0

    Updating Rows and Merging Duplicates


    Hey,

    I'm a relative beginner to MySql, and I think this should be a relatively simple question. However, I've searched online and on these forums for an answer for a while and I haven't found it.

    Here's my problem, simplified to it's core.

    I have a database of addresses. The full address (Number + Street name + city) is the key index.

    Here's an example:
    Number Street_name City
    10 Main St Centerville
    12 Main St Centerville
    12 Main St Centervill
    15 Main St Centerville
    15 Main St Centervill


    Here's the thing: all of the data entries without the 'e' at the end of 'Centerville' are mis-spellings. I need to clean the data. When I try to run a simple update, I am getting an error stating that there is duplication in the key index (the update script runs properly when there isn't duplication).

    I have another table with data on what is at each location, using foreign keys to point to this table. I can't just delete the mis-spellings without losing the foreign keys.

    Right now, I'm being told to manually trawl through the addresses and just insert a 0 before each address with a duplicate (thus making the key indexes different, and SQL will now accept the updates). The database has thousands of entries and over 150 misspellings, some of which are duplicated 100+ times. The process they want me to do is time-consuming, clunky, and we still end up with multiple entries for the same address.

    Is there a way to update the city names and merge any resulting duplicates without losing the foreign keys?

    Thanks for any help you can give me. Even if you just want to point me to a relevant article, that would help a ton.
  2. #2
  3. No Profile Picture
    Problem Solver
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    Jan 2001
    Location
    Stockholm, Sweden
    Posts
    4,495
    Rep Power
    534
    Originally Posted by JSVG
    I have another table with data on what is at each location, using foreign keys to point to this table. I can't just delete the mis-spellings without losing the foreign keys.
    When you have foreign keys you must always traverse the dependency structure until you find the last table that doesn't depend on anyone else and fix the foreign key so that it points to the right reacord and then you can change the actual record in the "mother" table.

    I would say you don't have to manually go through everything, if it was me I would probably run a few
    SELECT someField, COUNT(*) FROM someTable GROUP BY someField
    to get to know how many duplicates of different records you have (erroneous and correct).
    And then I would probably write separate updates to fix a certain error.
    This of course depends on that there are enough similar errors. If there is only on row wrong with each address all the writing of queries would probably take longer time than just fix the errors.

    As I've only written this very briefly I hope that you can find it useful, otherwise just ask again.
    Good Luck!
    /Stefan

IMN logo majestic logo threadwatch logo seochat tools logo