Friday, August 29, 2014

Delete duplicate data from redshift table.

Purpose:

I am tired searching a way to eliminate duplicate data in redshift tables.

Here I am with a simple query to avoid duplicate records in redshift.
 

SELECT * FROM (
SELECT *, ROW_NUMBER() OVER ( PARTITION BY column1,column2,column3,... ORDER BY sortcolumn ASC ) rownum 
FROM tablename)
WHERE rownum = 1;

ORDER BY clause places a major role in this query. Usually, there is a single column which is responsible for the duplicate records. Use that column in the ORDER BY clause.

How this query avoids duplicate?

Follow the below approach to avoid to the duplicate in records or delete duplicate records.

Consider the below Table "Employee" which has two duplicate records



EmpId
EmpName
Salary
1
AAAA
20000
2
BBBB
30000
2
BBBB
30000
3
CCCC
25000
4
DDDD
20000
4
DDDD
20000
 

Now create a temporary table of the same schema and insert data into that temp table using the below query


INSERT into Employee_temp 
SELECT EmpId, EmpName, Salary 
From(
     SELECT *, 
                ROW_NUMBER() OVER ( PARTITION BY EmpId,EmpName,Salary ORDER BY EmpId ASC ) rownum 
     FROM
Employee)
WHERE rownum = 1;


Now the temp table will have unique data as below



EmpId
EmpName
Salary
1
AAAA
20000
2
BBBB
30000
3
CCCC
25000
4
DDDD
20000

Now drop the Employee table and Rename the Employee_temp table to Employee.

Hope this should avoid duplicate records or delete duplicate records.


2 comments:

  1. Thanks a lot for putting this together! I spent hours trying to get this done using a different method, in vain.
    Just one thing though, your first query:
    Select *, ROW_NUMBER() OVER ( PARTITION BY column1,column2,column3,... ORDER BY sortcolumn ASC ) rownum
    FROM tablename
    WHERE rownum = 1;

    Doesn't run on Rehshift. I had to change it to:
    Select * from (
    Select *, ROW_NUMBER() OVER ( PARTITION BY column1,column2,column3,... ORDER BY sortcolumn ASC ) rownum
    FROM tablename)
    WHERE rownum = 1;

    ReplyDelete
    Replies
    1. Thanks for the feedback. I have updated my blog

      Delete