SQL Yoga: SQL SERVER: Find Duplicate Rows with Row_Number() SQL SERVER 2005

March 19, 2009

SQL SERVER: Find Duplicate Rows with Row_Number() SQL SERVER 2005

 

This is very known question to find Duplicate Rows from the Table and Delete Duplicate rows from the table

I came to know about "ROW_NUMBER()" of SQL SERVER 2005.

Lets see how we can use this feature:

There is one Table:

CREATE TABLE FindDuplicates(
EmpID INT IDENTITY(1,1),
EmpName VARCHAR(500)
)

INSERT some data into the table:

INSERT INTO FindDuplicates VALUES('A')
INSERT INTO FindDuplicates VALUES('A')
INSERT INTO FindDuplicates VALUES('B')
INSERT INTO FindDuplicates VALUES('C')
INSERT INTO FindDuplicates VALUES('C')
INSERT INTO FindDuplicates VALUES('D')
INSERT INTO FindDuplicates VALUES('D')

Now we have data like:

Find Duplicate Rows

Now, we need to Find the duplicate Rows from the table. So we need to get "A", "C" and "D".

Lets see the use of Row_Number():

;With CTE AS(
SELECT ROW_NUMBER() OVER(PARTITION BY EmpName ORDER BY EmpName) As RowID,
EmpID,
EmpName
FROM FindDuplicates
)
SELECT *
FROM CTE
WHERE RowID > 1

Output will be like:

Find Duplicate Rows

So we can Delete Duplicate Rows as we have EmpID (Primary Key)

I Used:

PARTITION BY: which will find same rows and assign ID accordingly.

ORDER BY: In which order we want to assign ID. By Default its Ascending

6 comments :

Anonymous said...

If you have am identity column you don't need anything else:

CREATE TABLE FindDuplicates(
EmpID INT IDENTITY(1,1),
EmpName VARCHAR(500)
)

----INSERT some data into the table:

INSERT INTO FindDuplicates VALUES('A')
INSERT INTO FindDuplicates VALUES('A')
INSERT INTO FindDuplicates VALUES('B')
INSERT INTO FindDuplicates VALUES('C')
INSERT INTO FindDuplicates VALUES('C')
INSERT INTO FindDuplicates VALUES('D')
INSERT INTO FindDuplicates VALUES('D')
INSERT INTO FindDuplicates VALUES('D')
INSERT INTO FindDuplicates VALUES('E')
INSERT INTO FindDuplicates VALUES('E')
INSERT INTO FindDuplicates VALUES('E')
INSERT INTO FindDuplicates VALUES('E')


select
max(EmpID)
from
FindDuplicates
group by
EmpName
having
count(*)>1
while @@rowcount>0
delete from
FindDuplicates
where
EmpID in
(
select
max(EmpID)
from
FindDuplicates
group by
EmpName
having
count(*)>1
)

select * from FindDuplicates

Tejas Shah said...

Hi,

Yes we can also use the way that you suggested to remove duplicate rows on basis of IDENTITY column.

But when we need to identify the duplicate rows based on another columns too, then it better to use ROW_NUMBER.

You can also find that syntax simplicity with the use of ROW_NUMBER.

Tejas

Prem said...

Hi All,

I want to display all duplicate records in the table. My query has to fetch all the records which are duplicate(First Name or Last Name). Also I want the ability to also pull names where there might be a middle initial placed in the end of the first name field, (i.e., "Maria Z. " vs. "Maria") as well.

Please guide me to find this.



Table:

ID FirstName LastName

1 Zach H Hoffman

2 Zach Hoffman

3 Troy Hoffman

4 Shawn Livermore

5 Prem S

6 Jony Hoffman H

7 Zach Modan



I need the query to filter.........

ID FirstName LastName

1 Zach H Hoffman

2 Zach Hoffman

3 Troy Hoffman

6 Jony Hoffman H

7 Zach Modan



I hope this example will give you clear idea.....




Thanks in Advance

Prem

Manthan Devani said...

Thanks for this article

Ravinder Pal Singh said...

Can We Do The Following With Identity Column:

Remove Duplicates While Keeping One Row Of Each Duplicate(I Mean, On Encountering Duplicate Record, We Should Keep The One Row Out Of Those Duplicate Records).

Tejas Shah said...

Hi Singh,

Yes, we can do the same with the ROW_NUMBER. What we need to do is just write following query:

;With CTE AS(
SELECT ROW_NUMBER() OVER(PARTITION BY EmpName ORDER BY EmpName) As RowID,
EmpID,
EmpName
FROM FindDuplicates
)
DELETE
FROM CTE
WHERE RowID > 1

This above query, will keep row having RowID = 1 and duplicate rows will be removed from the table: "FindDuplicates".

Thanks,
Tejas
SQLYoga.com

Post a Comment