SQLYoga: SQL SERVER: Find Duplicate Rows with Row

March 19, 2009

SQL SERVER: Find Duplicate Rows with Row_Number() SQL SERVER 2005

This is very known question to find Duplicate Rows from the Table and Delete Duplicate rows from the table
I came to know about "ROW_NUMBER()" of SQL SERVER 2005.
Lets see how we can use this feature:
There is one Table:

CREATE TABLE FindDuplicates(                
EmpID INT IDENTITY(1,1),                 
EmpName VARCHAR(500)                 
)

INSERT some data into the table:

INSERT INTO FindDuplicates VALUES('A')                
INSERT INTO FindDuplicates VALUES('A')                 
INSERT INTO FindDuplicates VALUES('B')                 
INSERT INTO FindDuplicates VALUES('C')                 
INSERT INTO FindDuplicates VALUES('C')                 
INSERT INTO FindDuplicates VALUES('D')                 
INSERT INTO FindDuplicates VALUES('D')

Now we have data like:
Find Duplicate Rows

Now, we need to Find the duplicate Rows from the table. So we need to get "A", "C" and "D".
Lets see the use of Row_Number():

;With CTE AS(                
SELECT ROW_NUMBER() OVER(PARTITION BY EmpName ORDER BY EmpName) As RowID,                 
EmpID,                 
EmpName                 
FROM FindDuplicates                 
)                 
SELECT *                 
FROM CTE                 
WHERE RowID > 1

Output will be like:
Find Duplicate Rows

So we can Delete Duplicate Rows as we have EmpID (Primary Key)
I Used:
PARTITION BY: which will find same rows and assign ID accordingly.
ORDER BY: In which order we want to assign ID. By Default its Ascending

6 comments:

AnonymousJuly 18, 2009 at 12:27 AM
If you have am identity column you don't need anything else:

CREATE TABLE FindDuplicates(
EmpID INT IDENTITY(1,1),
EmpName VARCHAR(500)
)

----INSERT some data into the table:

INSERT INTO FindDuplicates VALUES('A')
INSERT INTO FindDuplicates VALUES('A')
INSERT INTO FindDuplicates VALUES('B')
INSERT INTO FindDuplicates VALUES('C')
INSERT INTO FindDuplicates VALUES('C')
INSERT INTO FindDuplicates VALUES('D')
INSERT INTO FindDuplicates VALUES('D')
INSERT INTO FindDuplicates VALUES('D')
INSERT INTO FindDuplicates VALUES('E')
INSERT INTO FindDuplicates VALUES('E')
INSERT INTO FindDuplicates VALUES('E')
INSERT INTO FindDuplicates VALUES('E')

select
max(EmpID)
from
FindDuplicates
group by
EmpName
having
count(*)>1
while @@rowcount>0
delete from
FindDuplicates
where
EmpID in
(
select
max(EmpID)
from
FindDuplicates
group by
EmpName
having
count(*)>1
)

select * from FindDuplicates
ReplyDelete
Replies
Tejas ShahJuly 20, 2009 at 4:30 PM
Hi,

Yes we can also use the way that you suggested to remove duplicate rows on basis of IDENTITY column.

But when we need to identify the duplicate rows based on another columns too, then it better to use ROW_NUMBER.

You can also find that syntax simplicity with the use of ROW_NUMBER.

Tejas
ReplyDelete
Replies
PremJuly 1, 2010 at 12:59 PM
Hi All,

I want to display all duplicate records in the table. My query has to fetch all the records which are duplicate(First Name or Last Name). Also I want the ability to also pull names where there might be a middle initial placed in the end of the first name field, (i.e., "Maria Z. " vs. "Maria") as well.

Please guide me to find this.

Table:

ID FirstName LastName

1 Zach H Hoffman

2 Zach Hoffman

3 Troy Hoffman

4 Shawn Livermore

5 Prem S

6 Jony Hoffman H

7 Zach Modan

I need the query to filter.........

ID FirstName LastName

1 Zach H Hoffman

2 Zach Hoffman

3 Troy Hoffman

6 Jony Hoffman H

7 Zach Modan

I hope this example will give you clear idea.....

Thanks in Advance

Prem
ReplyDelete
Replies
Manthan DevaniMay 15, 2012 at 7:01 PM
Thanks for this article
ReplyDelete
Replies
UnknownAugust 20, 2012 at 8:02 PM
Can We Do The Following With Identity Column:

Remove Duplicates While Keeping One Row Of Each Duplicate(I Mean, On Encountering Duplicate Record, We Should Keep The One Row Out Of Those Duplicate Records).
ReplyDelete
Replies
Tejas ShahAugust 21, 2012 at 9:27 AM
Hi Singh,

Yes, we can do the same with the ROW_NUMBER. What we need to do is just write following query:

;With CTE AS(
SELECT ROW_NUMBER() OVER(PARTITION BY EmpName ORDER BY EmpName) As RowID,
EmpID,
EmpName
FROM FindDuplicates
)
DELETE
FROM CTE
WHERE RowID > 1

This above query, will keep row having RowID = 1 and duplicate rows will be removed from the table: "FindDuplicates".

Thanks,
Tejas
SQLYoga.com
ReplyDelete
Replies

Add comment