#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    5
    Rep Power
    0

    Finding duplicates based on multiple columns AND deleting the second occurrence.


    Good morning.

    I currently have the following MySQL query that finds rows in a table that have identical values for "tutor", "startTime", "endTime", and "date":

    select tutor, date, startTime, endTime, count(*) cnt from reservations group by date, tutor, startTime, endTime having cnt>1 order by cnt asc;
    I would like to expand this query to delete all but the first occurrence of each of the duplicates.

    In the table, the only difference between the two rows is a field called "appointmentID". Therefore, the end result of the query should be that the first appointmentID row is kept while any subsequent duplicate rows are deleted.

    Is this possible? I've searched here and through many other sites and, while I've found numerous examples for finding and deleting duplicates, none have dealt with doing so based on the values in several columns.

    Thanks so much!

    Richard
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Mar 2008
    Posts
    1,928
    Rep Power
    378
    Which one's first?

    Think before you answer!
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    5
    Rep Power
    0
    Originally Posted by cafelatte
    Which one's first?

    Think before you answer!
    Hi Cafe Latte!

    Thank you for writing!

    The "first" one would be defined as the one with the lowest "appointmentID" as sorted numerically.

    Is this what you're asking?

    Richard
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    6
    Rep Power
    0
    Originally Posted by heyrichard
    I currently have the following MySQL query that finds rows in a table that have identical values for "tutor", "startTime", "endTime", and "date":

    I would like to expand this query to delete all but the first occurrence of each of the duplicates.
    Hopefully this will be a job carried out only once! After you have taken care of this, you should add a unique constraint on those columns in your table. Otherwise, you'll have to clean things up over and over again...
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    5
    Rep Power
    0
    Originally Posted by bobhairgrove
    Hopefully this will be a job carried out only once! After you have taken care of this, you should add a unique constraint on those columns in your table. Otherwise, you'll have to clean things up over and over again...
    It is a one time job, and it's not my original code. But, I'm still looking for ideas on the find and delete if anyone has any!
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    6
    Rep Power
    0
    Try this:
    Code:
    CREATE TABLE tmp_reservations AS 
    SELECT 
      `date`, -- reserved word ... I would choose another name
      tutor, 
      startTime, 
      endTime, 
      MIN(appointmentID) AS appointmentID
    FROM reservations 
    GROUP BY `date`, tutor, startTime, endTime;
    
    TRUNCATE TABLE reservations;
    
    INSERT INTO reservations (appointmentID,`date`,tutor,startTime,endTime)
    SELECT appointmentID,`date`,tutor,startTime,endTime
    FROM tmp_reservations;
    
    DROP TABLE tmp_reservations;
    Caveat emptor, I didn't try this at home!
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Mar 2008
    Posts
    1,928
    Rep Power
    378
    Or just...
    Code:
    DELETE x 
      FROM my_table x
      JOIN my_table y
        ON y.id = x.id
       AND y.date < x.date;
  14. #8
  15. SQL Consultant
    Devshed Supreme Being (6500+ posts)

    Join Date
    Feb 2003
    Location
    Toronto Canada
    Posts
    27,163
    Rep Power
    4274
    Originally Posted by cafelatte
    Or just...
    um... no

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    5
    Rep Power
    0
    Originally Posted by r937
    um... no

    Since there doesn't seem to be a good solution without creating a new table, is there a way, in the original request, to have MySQL output both duplicated appointmentIDs? That way, I could manually construct a deletion run, I guess...
  18. #10
  19. SQL Consultant
    Devshed Supreme Being (6500+ posts)

    Join Date
    Feb 2003
    Location
    Toronto Canada
    Posts
    27,163
    Rep Power
    4274
    Originally Posted by heyrichard
    Since there doesn't seem to be a good solution without creating a new table...
    i dispute this assertion

    cafelatte was on the right track but didn't get the self-join conditions right
    Code:
    DELETE x 
      FROM my_table x
      JOIN my_table y
        ON y.tutor = x.tutor
       AND y.date = x.date
       AND y.startTime = x.startTime
       AND y.endTime = x.endTime
       AND y.id < x.id;
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
  20. #11
  21. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2012
    Posts
    5
    Rep Power
    0
    Originally Posted by r937
    i dispute this assertion
    Thank you for disputing and helping! I'm out to play with this now. I appreciate it and understand how it works, so thank you.

IMN logo majestic logo threadwatch logo seochat tools logo