SQL `NOT IN` Operator: Usage, Pitfalls, and Best Practices

Abhishek Kumar 20 Jun, 2024
5 min read

Introduction

In simpler terms, imagine you have a giant list of information. The NOT IN operator in SQL acts like a filter to help you find exactly what you’re looking for. It can exclude any items on the list that match certain criteria, like specific values or even results from another search. By using the NOT IN function, you can quickly and accurately get the information you need without all the unwanted stuff.

If you’re just starting out to explore SQL, here’s a beginner’s guide to help you: SQL For Data Science: A Beginner Guide

Mastering SQL `NOT IN`: Usage, Pitfalls, and Best Practices

Overview

  • Understand what the ‘NOT IN’ operator in SQL does.
  • Learn the syntax and applications of the function.
  • Learn how to use the ‘NOT IN’ function in SQL through an example.
  • Learn the best practices to follow and how to avoid common mistakes while using ‘NOT IN’.

Basics of SQL `NOT IN`

Imagine you have a basket full of clothes. The NOT IN operator is like taking out specific colors of shirts. You can give it a list of colors to remove, like red, blue, and green. This way, you’re left with only the shirts that aren’t those colors. It works the same way with data in SQL. You can tell it to exclude any entries that match certain values, like specific numbers or even results from another search.

Syntax of ‘NOT IN’ Function in SQL

SELECT column_name(s)

FROM table_name

WHERE column_name NOT IN (value1, value2, ...);

Sample Data

Let’s understand how ‘NOT IN’ operator works in SQL, through a real-world example. For this, we first need some sample data. We’ll be using two tables in our example: `students` and `courses`.

Table: students

Create students table

CREATE TABLE students (

    student_id INT PRIMARY KEY,

    student_name VARCHAR(50)

);

Create courses table

CREATE TABLE courses (

    course_id INT PRIMARY KEY,

    course_name VARCHAR(50),

    student_id INT,

    FOREIGN KEY (student_id) REFERENCES students(student_id)

);

Insert data into students table

INSERT INTO students (student_id, student_name) VALUES

(1, 'John'),

(2, 'Alice'),

(3, 'Bob'),

(4, 'Carol'),

(5, 'David');
SQL NOT IN Operator

Insert data into courses table

INSERT INTO courses (course_id, course_name, student_id) VALUES

(101, 'Math', 1),

(102, 'Science', 2),

(103, 'History', 3),

(104, 'Art', NULL),

(105, 'Literature', 2);
SQL NOT IN function

Practical Application

Now let’s apply the ‘NOT IN’ SQL operator to our sample data and see how it works.

 1. Specific Values

Use the `NOT IN` function to exclude rows that match any value in a given list of values.

Example:

SELECT employee_id, employee_name

FROM employees

WHERE department_id NOT IN (1, 2, 3);

This query retrieves all employees who are not in departments 1, 2, or 3.

 2. With Subqueries

Use `NOT IN` operator with subqueries to exclude rows based on a subquery result.

Example:

SELECT student_id, student_name
FROM students
WHERE student_id NOT IN (SELECT student_id FROM courses WHERE course_name = 'Science');

This query retrieves all students who are not enrolled in the Science course.

Common Pitfalls and How to Avoid Them

Now that you’ve understood how to use the ‘NOT IN’ function, let’s learn how to avoid some of the most common mistakes or errors that can occur while using it.

1. Handling NULL Values

The `NOT IN` operator can behave unexpectedly if the list contains NULL values. Any comparison with NULL results in UNKNOWN, which can lead to no rows being returned.

Example:

SELECT student_id, student_name
FROM students
WHERE student_id NOT IN (1, 2, NULL);

To avoid this, ensure that the list or subquery does not contain NULL values.

Solution:

SELECT student_id, student_name
FROM students
WHERE student_id NOT IN (SELECT student_id FROM courses 
WHERE course_name = 'Science' AND student_id IS NOT NULL);

 2. Performance Issues

Using `NOT IN` with large subqueries can lead to performance issues. Ensure that the subquery is optimized and the columns used in the subquery are indexed.

Alternatives to SQL `NOT IN`

Now let’s explore some SQL functions that can be used in certain cases instead of the ‘NOT IN’ operator.

 1. `NOT EXISTS`

`NOT EXISTS` is often more efficient and handles NULL values more gracefully.

Example:

SELECT student_id, student_name
FROM students s
WHERE NOT EXISTS (SELECT 1 FROM courses c 
WHERE s.student_id = c.student_id AND c.course_name = 'Science');
SQL NOT IN function

 2. LEFT JOIN with IS NULL

Using a `LEFT JOIN` with `IS NULL` can also serve as an alternative to `NOT IN`.

Example:

SELECT s.student_id, s.student_name
FROM students s
LEFT JOIN courses c ON s.student_id = c.student_id AND c.course_name = 'Science'
WHERE c.student_id IS NULL;
SQL NOT IN function

Best Practices for Using the ‘NOT IN’ Operator

  1. Avoid NULL Values: Ensure that the list or subquery used with `NOT IN` does not contain NULL values to avoid unexpected results.
  2. Optimize Subqueries: Ensure that the subqueries are optimized and the columns involved are indexed for better performance.
  3. Use Alternatives When Appropriate: Consider using `NOT EXISTS` or `LEFT JOIN … IS NULL` when dealing with large datasets or when NULL values are involved.

Conclusion

Think of NOT IN as a sieve for your data. It lets you shake out unwanted information and keeps only the results you need. By learning the ins and outs of NOT IN, like when to use it and what to watch out for, you can become a pro at filtering data in your SQL queries. Using it the right way makes your queries accurate and efficient.

Learn More: SQL: A Full Fledged Guide from Basics to Advanced Level

Frequently Asked Questions

Q1. What does `NOT IN` do in SQL?

A. The `NOT IN` operator in SQL excludes rows that match any value in a specified list or subquery result.

Q2. How does `NOT IN` handle NULL values?

A. The `NOT IN` function can return no rows if the list or subquery result contains NULL values. This is because the comparison with NULL results in UNKNOWN.

Q3. What are the alternatives to `NOT IN`?

A. Alternatives of ‘NOT IN’ operator include `NOT EXISTS` and `LEFT JOIN … IS NULL`, which can be more efficient and handle NULL values better.

Q4. When should I use `NOT EXISTS` over `NOT IN`?

A. Use `NOT EXISTS` when dealing with subqueries that might return NULL values or when you need better performance with large datasets.

Q5. How can I optimize queries using `NOT IN`?

A. You can optimize SQL queries using the ‘NOT IN’ operator by optimizing the subqueries and indexing the columns involved. Avoid using the `NOT IN` function with lists or subqueries that contain NULL values.

Abhishek Kumar 20 Jun, 2024

Hello, I'm Abhishek, a Data Engineer Trainee at Analytics Vidhya. I'm passionate about data engineering and video games I have experience in Apache Hadoop, AWS, and SQL,and I keep on exploring their intricacies and optimizing data workflows :)

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,