Nested Queries in SQL

ayushi9821704 24 Sep, 2024
8 min read

Introduction

Imagine you’re trying to find a specific piece of information from a giant library where some books have other smaller books inside them. To find the right answer, you may need to first look at the smaller books, then use that information to find the larger one. This is exactly how nested queries in SQL work! By placing one query inside another, you can extract complex data with ease. In this guide, we’ll explore how nested queries function and how you can harness their power in SQL for more efficient database management.

Understanding Nested Queries in SQL

Learning Outcome

  • Understand what nested queries (subqueries) are in SQL.
  • Write and implement nested queries within various SQL statements.
  • Differentiate between correlated and non-correlated nested queries.
  • Optimize SQL queries using nested structures for improved performance.

What Are Nested Queries in SQL?

A nested query, also known as a subquery, is an SQL query placed inside another SQL query. The result of the inner query (the subquery) is used by the outer query to achieve the desired outcome. This approach is particularly useful when the results from the inner query depend on the data retrieved by the outer query.

Basic Syntax

SELECT column_name(s)  
FROM table_name  
WHERE column_name = (SELECT column_name FROM table_name WHERE condition);

Types of Nested Queries in SQL

Nested queries, also known as subqueries, allow you to perform complex data retrieval by embedding one SQL query within another. This functionality is essential for writing efficient SQL code and handling intricate database operations. In this section, we’ll explore the different types of nested queries, complete with examples and expected outputs.

Single-row Subquery in SQL

A single-row subquery is a nested type of query that results in one or more columns in just a single row. It is very common with SQL statements where you want to use a comparison operator or a condition against a single value, such as =, <, >, etc.

Key Characteristics of Single-row Subqueries

  • Returns One Row: Owing to the name assigned, one can expect a single row of data.
  • Usually Used with Comparison Operators: Usually used with operators such as =, >, <, >=, <= etc.
  • Can Return One or More Columns: Though it returns a single row, yet it can return multiple columns.

Example: Find Employees Earning More Than the Average Salary

Table: employees

employee_idfirst_namelast_namesalarydepartment_id
1JohnDoe900001
2JaneSmith950001
3AliceJohnson600002
4BobBrown650002
5CharlieDavis400003
6EveAdams750003

Table: departments

department_iddepartment_namelocation_id
1Sales1700
2Marketing1700
3IT1800
4HR1900
SELECT first_name, last_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

Output:

| first_name | last_name | salary |
|------------|-----------|--------|
| John       | Doe       | 90000  |
| Jane       | Smith     | 95000  |

For instance, in the example, the inner query (SELECT AVG(salary) FROM employees) finds what all the employees’ average salaries are. The outer query gets the first name, last name and salary of all employees whose earnings are above this number.

Multi-row Subquery in SQL

Multi-row subquery is a kind of nested query that returns hence, more than one row of data. It Is usually used with IN, ANY, or ALL operators to compare a column with a set of values returned by the subquery. One of the advantages of using multi-row subquery is that it combines the results on a list of values and applies multiple rows in making computations.

Example: Find Employees in Certain Departments

SELECT first_name, last_name
FROM employees
WHERE department_id IN (SELECT department_id FROM departments WHERE location_id = 1700);

Output:

| first_name | last_name |
|------------|-----------|
| Alice      | Johnson   |
| Bob        | Brown     |

Here, the inner query retrieves department_ids from the departments table where the location_id is 1700. The outer query then finds employees who work in these departments.

Correlated Subquery in SQL

A correlated subquery is a type of nested query in SQL. It depends on the outer query for its values. While a regular subquery can execute independently, a correlated subquery calculates in relation to every row processed by the outer query, hence dynamic and context-sensitive.

Characteristics of Correlated Subqueries

  • Dependency: The inner query references columns from the outer query, establishing a direct dependency.
  • Row-by-row Execution: The inner query runs multiple times—once for each row processed by the outer query.
  • Performance Considerations: Because the inner query runs repeatedly, correlated subqueries can be slower than their non-correlated counterparts, especially on large datasets.

Example: Find Employees with Salaries Above Their Department’s Average

SELECT first_name, salary
FROM employees e1
WHERE salary > (SELECT AVG(salary) FROM employees e2 WHERE e1.department_id = e2.department_id);

Output:

| first_name | salary |
|------------|--------|
| John       | 90000  |
| Jane       | 95000  |

In this case, the inner query calculates the average salary for each department as the outer query processes each employee. The outer query selects employees who earn more than their department’s average salary.

Nested Subqueries in SQL

A nested subquery is also known as a nested query. This is an inner query or a query positioned inside another query where one query appears inside another. Such queries become quite handy for accessing difficult data and transforming it in rather very specific ways, allowing complex problems to break into more constituent, manageable parts, making it much easier to query relational databases.

Structure of Nested Subqueries

A nested subquery typically consists of two main components:

  • Outer Query: This is the main query that contains the subquery. It uses the result of the subquery to filter or manipulate data.
  • Inner Query (Subquery): This query is embedded within the outer query and provides a result set that can be utilized by the outer query.

Example: Find Departments with Employees Earning More Than the Average Salary

SELECT department_id, department_name
FROM departments
WHERE department_id IN (
    SELECT department_id
    FROM employees
    WHERE salary > (SELECT AVG(salary) FROM employees)
);

Output:

| department_id | department_name |
|---------------|------------------|
| 1             | Sales            |
| 2             | Marketing        |

In this example, the innermost query (SELECT AVG(salary) FROM employees) is taking the average. The middle query will fetch department_ids of employees making above that average and the outer query will retrieve the department names assigned to it.

Scalar Subquery

A scalar subquery is defined as a subquery which gives out a single value, a single row and a single column. So scalar subqueries are pretty handy to use wherever there is a requirement of a single value in the main query. Scalar subqueries can be utilized within many SQL clauses like SELECT, WHERE and HAVING.

Characteristics of Scalar Subqueries

  • Returns One Value: As its name suggests, scalar subquery only returns a single value. Any subquery which tries to return a row other than a single one or a column other than one will lead to an error.
  • Used in Various Clauses: Derived columns can be calculated in the SELECT statements, narrowing down the results in WHERE clauses, and adding conditions on a collection of data within a HAVING clause all with the help of these scalar subqueries within this clause.
  • Efficient for Comparisons: They are often used for making comparisons against a single value derived from another query.

Example: Retrieve Employees and Their Salary Difference from the Average Salary

SELECT first_name, last_name, salary - (SELECT AVG(salary) FROM employees) AS salary_difference
FROM employees;

Output:

| first_name | last_name | salary_difference |
|------------|-----------|-------------------|
| John       | Doe       | 10000             |
| Jane       | Smith     | 15000             |

In this case, the scalar subquery computes the average salary once, and the outer query calculates the difference for each employee’s salary from the average.

Use Cases for Nested Queries

Nested queries, or subqueries, are powerful tools in SQL that can solve a variety of complex data retrieval challenges. Here are some common use cases:

Data Filtering

Nested queries can be used to filter results based on values derived from another table.

Example: Find employees whose salaries are above the average salary in their respective departments.

SELECT first_name, last_name, salary
FROM employees e1
WHERE salary > (SELECT AVG(salary) FROM employees e2 WHERE e1.department_id = e2.department_id);

Calculating Aggregates

You can calculate aggregates in a nested query and use those results in the outer query.

Example: Retrieve departments with an average salary greater than the overall average salary.

SELECT department_id, AVG(salary) AS average_salary
FROM employees
GROUP BY department_id
HAVING AVG(salary) > (SELECT AVG(salary) FROM employees);

Conditional Logic

Nested queries allow you to implement conditional logic within your SQL statements.

Example: List employees who belong to departments located in a specific city.

SELECT first_name, last_name
FROM employees
WHERE department_id IN (SELECT department_id FROM departments WHERE city = 'New York');

Correlated Subqueries for Row-Level Calculations

Correlated subqueries enable row-level calculations based on values from the current row in the outer query.

Example: Get a list of products with a price higher than the average price of products in the same category.

SELECT product_name, price
FROM products p1
WHERE price > (SELECT AVG(price) FROM products p2 WHERE p1.category_id = p2.category_id);

Differences Between Nested Queries and Other SQL Queries

Let us now look into the difference between nested queries and other SQL queries below:

FeatureNested QueriesJoinsSimple Queries
DefinitionA query placed inside another queryCombines rows from two or more tables based on a related columnA single SQL statement that retrieves data
ExecutionExecutes the inner query for each row processed by the outer queryExecutes simultaneously for all rows from both tablesExecutes independently without any dependencies
Use CaseUseful for complex calculations and filtering based on another queryIdeal for combining related data from multiple tablesSuitable for straightforward data retrieval
PerformanceMay lead to slower performance due to repeated execution of the inner queryGenerally more efficient as it processes data in one goFastest for simple data retrieval
ComplexityCan become complex and difficult to readCan also be complex but typically clearer with explicit relationshipsSimple and easy to understand
Data DependencyThe inner query can depend on the outer query’s resultData from joined tables is independent of each otherData retrieved is independent, no subqueries involved
ExampleSELECT first_name FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);SELECT e.first_name, d.department_name FROM employees e JOIN departments d ON e.department_id = d.department_id;SELECT * FROM employees;

Common Mistakes with Nested Queries

While nested queries can be incredibly useful, they also come with pitfalls. Here are some common mistakes to watch out for:

Returning Multiple Rows

A scalar subquery must return a single value; if it returns multiple rows, it will cause an error.

Mistake:

SELECT first_name
FROM employees
WHERE salary = (SELECT salary FROM employees);

Solution: Ensure the inner query uses aggregation or filtering to return a single value.

Performance Issues

Nested queries can sometimes lead to performance bottlenecks, especially if they are executed for each row in the outer query.

Mistake: Using a nested query inside a large outer query without considering performance implications.

Solution: Analyze query execution plans and consider alternative methods, like joins, when dealing with large datasets.

Improper Use of Parentheses

Incorrect placement of parentheses can lead to unexpected results or errors.

Mistake:

SELECT first_name
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees WHERE department_id);

Solution: Ensure the logic of your query is clear, and parentheses are used appropriately to group conditions.

Not Considering NULL Values

Nested queries can produce unexpected results when NULL values are present in the data.

SELECT first_name
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees WHERE department_id IS NOT NULL);

Solution: Handle NULL values explicitly using functions like COALESCE to avoid unintended filtering.

Conclusion

SQL nested queries, also known as subqueries, are very useful in carrying out highly complex data retrieval operations efficiently. You can embed a query inside another, to do any calculations on data that cannot be done by simple queries alone. Having the knowledge of four main types of these will be helpful: single-row, multi-row, correlated, and scalar subqueries. Applying best practices and avoiding some common pitfalls, you can tap into the full potential of nested queries to improve your database management and performance.

Frequently Asked Questions

Q1. What is a nested query in SQL?

A. A nested query, or subquery, is an SQL query placed inside another query. The inner query’s result is used by the outer query to perform complex data retrieval.

Q2. What are the types of nested queries?

A. The main types include single-row subqueries, multi-row subqueries, correlated subqueries, and scalar subqueries, each serving different use cases.

Q3. When should I use a correlated subquery?

A. Use a correlated subquery when the inner query needs to reference a column from the outer query, allowing for dynamic row-by-row evaluations.

Q4. Can nested queries impact performance?

A. Yes, nested queries can lead to performance issues, especially if they are executed for every row in the outer query. Analyzing execution plans and considering alternatives like joins can help improve efficiency.

ayushi9821704 24 Sep, 2024

My name is Ayushi Trivedi. I am a B. Tech graduate. I have 3 years of experience working as an educator and content editor. I have worked with various python libraries, like numpy, pandas, seaborn, matplotlib, scikit, imblearn, linear regression and many more. I am also an author. My first book named #turning25 has been published and is available on amazon and flipkart. Here, I am technical content editor at Analytics Vidhya. I feel proud and happy to be AVian. I have a great team to work with. I love building the bridge between the technology and the learner.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,