Tuesday, July 31, 2012

Removing Duplicates in MySQL

I inherited a database that had some performance problems.

To improve performance on one of the tables we added a few indexes based on common searches for that table. The problem was that there were a number of duplicates that prevented creation of the indexes.

To fix this problem I ran the following query all at once in phpmyadmin as temporary tables only last the duration of the connection.



CREATE TEMPORARY TABLE bad_temp AS SELECT DISTINCT * FROM bad_table;
 
DELETE FROM bad_table;

INSERT INTO bad_table(email,nid,timestamp) SELECT email,nid,timestamp FROM bad_temp;

DROP TABLE bad_temp;

This took care of the problem and I was able to add the index.
ALTER TABLE `good_table` 
ADD PRIMARY KEY (`email`, `nid`, `timestamp`)
A nod to Database Journal that gave me the idea.