A usual situation we come across in development: a table missed a unique key and allowed duplicate rows to be entered unchecked. Now we want to find and delete those duplicates.
In this article, we’ll see how to achieve this.
Consider the table dup_emp with these values:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| SQL> select * 2 from dup_emp 3 order by empno; EMPNO ENAME SAL ---------- -------------- ---------- 1 Adam 400 2 Sandy 300 2 Ted 450 3 Mark 450 4 Alan 450 4 Carol 200 4 Peter 250 5 David 350 8 rows selected. |
As you can see, there are two entries for empno 2 and three entries for empno 4. We want each empno to correspond to a single employee only.
The query below finds all duplicate records i.e. all empnos with more than one entry in the table:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| SQL> select * 2 from 3 ( select d.* 4 , count (*) over 5 (partition by empno) cnt 6 from dup_emp d 7 ) 8 where cnt > 1; EMPNO ENAME SAL CNT ---------- ------ ---- ---------- 2 Sandy 300 2 2 Ted 450 2 4 Alan 450 3 4 Carol 200 3 4 Peter 250 3 |
Deleting the duplicate records
The usual need is to delete all but one such row. The first thing is to know the deciding factor -out of the duplicates, which one is to be retained?
The next SQL retains the row with the minimum rowid:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| SQL> delete from dup_emp a 2 where rowid > 3 ( select min (rowid) 4 from dup_emp b 5 where a.empno = b.empno); 3 rows deleted. SQL> select * 2 from dup_emp; EMPNO ENAME SAL ---------- ------ ---- 1 Adam 400 2 Sandy 300 3 Mark 450 4 Alan 450 5 David 350 |
Change the WHERE condition according to your needs.
This comment has been removed by the author.
ReplyDeleteThanks for the post :-) This Qs was asked to me in interview
ReplyDelete