How to Delete Duplicate Rows in SQL?

Ayushi Trivedi Last Updated : 28 Aug, 2024

4 min read

Introduction

Managing databases often means dealing with duplicate records that can complicate data analysis and operations. Whether you’re cleaning up customer lists, transaction logs, or other datasets, removing duplicate rows is vital for maintaining data quality. This guide will explore practical techniques for deleting duplicate rows in SQL databases, including detailed syntax and real-world examples to help you efficiently address and eliminate these duplicates.

Overview

Identify the common causes of duplicate records in SQL databases.
Discover various methods to pinpoint and remove duplicate entries.
Understand SQL syntax and practical approaches for duplicate removal.
Learn best practices to ensure data integrity while cleaning up duplicates.

How to Delete Duplicate Rows in SQL?
Frequently Asked Questions

How to Delete Duplicate Rows in SQL?

Removing duplicate rows in SQL can be achieved through several methods. Each approach has its own advantages depending on the database system you’re using and the specific needs of your task. Below are some effective techniques for deleting duplicate records.

Common Causes of Duplicate Rows

Duplicate rows can appear in your database due to several reasons:

Data Entry Errors: Human mistakes during data input.
Merging Datasets: Combining data from multiple sources without proper de-duplication.
Improper Import Procedures: Incorrect data import processes can lead to duplication.

Identifying Duplicate Rows

Before deleting duplicates, you need to locate them. Duplicates often occur when multiple rows contain identical values in one or more columns. Here’s how to identify such duplicates:

Syntax:

SELECT column1, column2, COUNT(*)
FROM table_name
GROUP BY column1, column2
HAVING COUNT(*) > 1;

Example:

Suppose you have a table employees with the following data:

id	name	email
1	Alice	[email protected]
2	Bob	[email protected]
3	Carol	[email protected]
4	Alice	[email protected]
5	Dave	[email protected]

To find duplicate emails:

SELECT email, COUNT(*)
FROM employees
GROUP BY email
HAVING COUNT(*) > 1;

Output:

email	COUNT(*)
[email protected]	2

This query identifies emails that appear more than once in the table.

Deleting Duplicates Using `ROW_NUMBER()`

A powerful method for removing duplicates involves the ROW_NUMBER() window function, which assigns a unique sequential number to each row within a partition.

Syntax:

WITH CTE AS (
    SELECT column1, column2, 
           ROW_NUMBER() OVER (PARTITION BY column1, column2 ORDER BY (SELECT NULL)) AS rn
    FROM table_name
)
DELETE FROM CTE
WHERE rn > 1;

Example:

To eliminate duplicate rows from the employees table based on email:

sqlCopy codeWITH CTE AS (
    SELECT id, name, email, 
           ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
    FROM employees
)
DELETE FROM CTE
WHERE rn > 1;

Output:

After running the above query, the table will be cleaned up, resulting in:

id	name	email
1	Alice	[email protected]
2	Bob	[email protected]
3	Carol	[email protected]
5	Dave	[email protected]

The duplicate row with id = 4 has been removed.

Deleting Duplicates Using a Self Join

Another effective strategy involves using a self join to detect and delete duplicate rows.

Syntax:

DELETE t1
FROM table_name t1
JOIN table_name t2
ON t1.column1 = t2.column1
AND t1.column2 = t2.column2
AND t1.id < t2.id;

Example:

To remove duplicate entries from the employees table:

sqlCopy codeDELETE e1
FROM employees e1
JOIN employees e2
ON e1.email = e2.email
AND e1.id < e2.id;

Output:

After executing this query, the table will look like:

id	name	email
1	Alice	[email protected]
2	Bob	[email protected]
3	Carol	[email protected]
5	Dave	[email protected]

The row with id = 4 is deleted, leaving only unique entries.

Deleting Duplicates Using `DISTINCT` in a New Table

Sometimes, creating a new table with unique records and replacing the old table is the safest method.

Syntax:

CREATE TABLE new_table AS
SELECT DISTINCT *
FROM old_table;

DROP TABLE old_table;

ALTER TABLE new_table RENAME TO old_table;

Example:

To clean up duplicates in the employees table:

sqlCopy codeCREATE TABLE employees_unique AS
SELECT DISTINCT *
FROM employees;

DROP TABLE employees;

ALTER TABLE employees_unique RENAME TO employees;

Output:

The new table employees will now have:

id	name	email
1	Alice	[email protected]
2	Bob	[email protected]
3	Carol	[email protected]
5	Dave	[email protected]

The employees table is now free of duplicates.

Best Practices for Avoiding Duplicates

Implement Data Validation Rules: Ensure data is validated before insertion.
Use Unique Constraints: Apply unique constraints to columns to prevent duplicate entries.
Regular Data Audits: Periodically check for duplicates and clean data to maintain accuracy.

Conclusion

Effectively managing duplicate rows is a crucial aspect of database maintenance. By using methods like ROW_NUMBER(), self joins, or creating new tables, you can efficiently remove duplicates and maintain a clean dataset. Each method offers different advantages depending on your needs, so select the one that best suits your specific scenario. Always remember to back up your data before performing any deletion operations to safeguard against accidental loss.

Frequently Asked Questions

Q1. What are some common reasons for duplicate rows in SQL databases?

A. Duplicates can arise from data entry errors, issues during data import, or incorrect merging of datasets.

Q2. How can I avoid accidentally deleting important data when removing duplicates?

A. Make sure to back up your data before performing deletions and carefully review your queries to target only the intended records.

Q3. Is it possible to remove duplicates without affecting the original table?

A. Yes, you can create a new table with unique records and then replace the original table with this new one.

Q4. What distinguishes ROW_NUMBER() from DISTINCT for removing duplicates?

A. ROW_NUMBER() provides more control by allowing you to keep specific rows based on criteria, whereas DISTINCT simply eliminates duplicate rows in the new table.

Ayushi Trivedi

My name is Ayushi Trivedi. I am a B. Tech graduate. I have 3 years of experience working as an educator and content editor. I have worked with various python libraries, like numpy, pandas, seaborn, matplotlib, scikit, imblearn, linear regression and many more. I am also an author. My first book named #turning25 has been published and is available on amazon and flipkart. Here, I am technical content editor at Analytics Vidhya. I feel proud and happy to be AVian. I have a great team to work with. I love building the bridge between the technology and the learner.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.6

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

How to Delete Duplicate Rows in SQL?

Introduction

Overview

Table of contents

How to Delete Duplicate Rows in SQL?

Common Causes of Duplicate Rows

Identifying Duplicate Rows

Deleting Duplicates Using `ROW_NUMBER()`

Deleting Duplicates Using a Self Join

Deleting Duplicates Using `DISTINCT` in a New Table

Best Practices for Avoiding Duplicates

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Introduction

Tools

Libraries

Plots

Use cases

How to Delete Duplicate Rows in SQL?

Introduction

Overview

Table of contents

How to Delete Duplicate Rows in SQL?

Common Causes of Duplicate Rows

Identifying Duplicate Rows

Deleting Duplicates Using ROW_NUMBER()

Deleting Duplicates Using a Self Join

Deleting Duplicates Using DISTINCT in a New Table

Best Practices for Avoiding Duplicates

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Deleting Duplicates Using `ROW_NUMBER()`

Deleting Duplicates Using `DISTINCT` in a New Table