Handling NULL Values in SQL

Ayushi Trivedi Last Updated : 26 Sep, 2024

7 min read

Introduction

In the world of databases, NULL values can often feel like the proverbial black sheep. They represent missing, undefined, or unknown data, and can pose unique challenges in data management and analysis. Imagine you’re analyzing a sales database, and some entries lack customer feedback or order quantities. Understanding how to effectively handle NULL values in SQL is crucial for ensuring accurate data retrieval and meaningful analysis. In this guide, we’ll delve into the nuances of NULL values, explore how they affect SQL operations, and provide practical techniques for managing them.

Learning Outcomes

Understand what NULL values represent in SQL.
Identify the impact of NULL values on data queries and calculations.
Utilize SQL functions and techniques to handle NULL values effectively.
Implement best practices for managing NULLs in database design and querying.

What Ar>e NULL Values in SQL?

Impact of NULL Values on SQL Queries

Techniques for Handling NULL Values

Best Practices for Managing NULL Values

Common Mistakes to Avoid with NULLs

Frequently Asked Questions

What Are NULL Values in SQL?

NULL is a special marker in SQL that is used to point to the fact that value for some factor is not known. It should also be understood that NULL is not equal to ‘’, 0 and other such values, whereas instead it points towards the absence of value. In SQL, NULL can be used in any type of an attribute, whether integer, string, or date.

Example of NULL Values

Consider a table named employees:

employee_id	first_name	last_name	email	department_id
1	John	Doe	[email protected]	NULL
2	Jane	Smith	[email protected]	3
3	Alice	Johnson	NULL	2
4	Bob	Brown	[email protected]	NULL

In this table, the department_id for John and Bob is NULL, indicating that their department is unknown. Alice’s email is also NULL, meaning there is no email recorded.

Impact of NULL Values on SQL Queries

SQL NULL has defined any columns that do not contain data and its use influences how queries perform and what results are delivered. One of the things that everyone needs to know in order to write good queries and be able to work with data correctly is the behavior of NULL values. In this blog, I will explain some approaches, depending on whether fields contain the NULL value and the perspective in which the fields are considered, for SQL queries for comparison, calculation, logical operations, and so on.

Comparisons with NULL

When performing comparisons in SQL, it’s essential to understand that NULL values do not equate to zero or an empty string. Instead, NULL represents an unknown value. As a result, any direct comparison involving NULL will yield an UNKNOWN result, rather than TRUE or FALSE.

Example:

SELECT * FROM employees WHERE department_id = NULL;

Output: No rows will be returned because comparisons to NULL using = do not evaluate to TRUE.

To correctly check for NULL values, use:

SELECT * FROM employees WHERE department_id IS NULL;

Assuming the employees table has:

employee_id	first_name	department_id
1	John	101
2	Jane	NULL
3	Bob	102
4	Alice	NULL

Output:

employee_id	first_name	department_id
2	Jane	NULL
4	Alice	NULL

Boolean Logic and NULLs

NULL values affect boolean logic in SQL queries. When NULL is involved in logical operations, the result can often lead to unexpected outcomes. In SQL, the three-valued logic (TRUE, FALSE, UNKNOWN) means that if any operand in a logical expression is NULL, the entire expression could evaluate to UNKNOWN.

Example:

SELECT * FROM employees WHERE first_name = 'John' AND department_id = NULL;

Output: This query will return no results, as the condition involving NULL will evaluate to UNKNOWN.

For correct logical operations, explicitly check for NULL:

SELECT * FROM employees WHERE first_name = 'John' AND department_id IS NULL;

Output:

employee_id	first_name	department_id
No output

Aggregation Functions

NULL values have a unique impact on aggregate functions such as SUM, AVG, COUNT, and others. Most aggregate functions ignore NULL values, which means they will not contribute to the result of calculations. This behavior can lead to misleading conclusions if you are not aware of the NULLs present in your dataset.

Example:

SELECT AVG(salary) FROM employees;

Assuming the employees table has:

employee_id	salary
1	50000
2	NULL
3	60000
4	NULL

Output:

AVG(salary)
55000

The average is calculated from the non-NULL salaries (50000 and 60000).

If all values in a column are NULL:

SELECT COUNT(salary) FROM employees;

Output:

COUNT(salary)
2

In this case, COUNT only counts non-NULL values.

DISTINCT and NULL Values

When using the DISTINCT keyword, NULL values are treated as a single unique value. Thus, if you have multiple rows with NULLs in a column, the DISTINCT query will return only one instance of NULL.

Example:

SELECT DISTINCT department_id FROM employees;

Assuming the employees table has:

employee_id	department_id
1	101
2	NULL
3	102
4	NULL

Output:

department_id
101
NULL
102

Even if there are multiple NULLs, only one NULL appears in the result.

Techniques for Handling NULL Values

Handling NULL values is crucial for maintaining data integrity and ensuring accurate query results. Here are some effective techniques:

Using IS NULL and IS NOT NULL

The most straightforward way to filter out NULL values is by using the IS NULL and IS NOT NULL predicates. This allows you to explicitly check for NULL values in your queries.

Example:

SELECT * FROM employees WHERE department_id IS NULL;

Output:

employee_id	first_name	department_id
2	Jane	NULL
4	Alice	NULL

To find employees with a department assigned:

SELECT * FROM employees WHERE department_id IS NOT NULL;

Output:

employee_id	first_name	department_id
1	John	101
3	Bob	102

Using COALESCE Function

The COALESCE function returns the first non-NULL value in the list of arguments. This is useful for providing default values when NULL is encountered.

Example:

SELECT first_name, COALESCE(department_id, 'No Department') AS department FROM employees;

Output:

first_name	department
John	101
Jane	No Department
Bob	102
Alice	No Department

Using NULLIF Function

The NULLIF function returns NULL if the two arguments are equal; otherwise, it returns the first argument. This can help avoid unwanted comparisons and handle defaults elegantly.

Example:

SELECT first_name, NULLIF(department_id, 0) AS department_id FROM employees;

Assuming department_id is sometimes set to 0 instead of NULL:

Output:

first_name	department_id
John	101
Jane	NULL
Bob	102
Alice	NULL

Using the CASE Statement

The CASE statement allows for conditional logic in SQL queries. You can use it to replace NULL values with meaningful substitutes based on specific conditions.

Example:

SELECT first_name, 
       CASE 
           WHEN department_id IS NULL THEN 'Unknown Department'
           ELSE department_id 
       END AS department 
FROM employees;

Output:

first_name	department
John	101
Jane	Unknown Department
Bob	102
Alice	Unknown Department

Using Aggregate Functions with NULL Handling

When using aggregate functions like COUNT, SUM, AVG, etc., it’s essential to remember that they ignore NULL values. You can combine these functions with COALESCE or similar techniques to manage NULLs in aggregate results.

Example:

To count how many employees have a department assigned:

SELECT COUNT(department_id) AS AssignedDepartments FROM employees;

Output:

AssignedDepartments
2

If you want to include a count of NULL values:

SELECT COUNT(*) AS TotalEmployees, 
       COUNT(department_id) AS AssignedDepartments,
       COUNT(*) - COUNT(department_id) AS UnassignedDepartments 
FROM employees;

Output:

TotalEmployees	AssignedDepartments	UnassignedDepartments
4	2	2

Best Practices for Managing NULL Values

We will now look into the best practices for managing NULL Value.

Use NULL Purposefully: Only use NULL to indicate the absence of a value. This distinction is crucial; NULL should not be confused with zero or an empty string, as each has its own meaning in data context.
Establish Database Constraints: Implement NOT NULL constraints wherever applicable to prevent unintentional NULL entries in critical fields. This helps enforce data integrity and ensures that essential information is always present.
Normalize Your Database Schema: Properly design your database schema to minimize the occurrence of NULL values. By organizing data into appropriate tables and relationships, you can reduce the need for NULLs and promote clearer data representation.
Utilize Sensible Default Values: When designing tables, consider using sensible default values to fill in for potential NULL entries. This approach helps avoid confusion and ensures that users understand the data’s context without encountering NULL.
Document NULL Handling Strategies: Clearly document your approach to handling NULL values within your organization. This includes establishing guidelines for data entry, reporting, and analysis to promote consistency and understanding among team members.
Regularly Review and Audit Data: Conduct periodic reviews and audits of your data to identify and manage NULL values effectively. This practice helps maintain data quality and integrity over time.
Educate Team Members: Recognize and explain NULL values to the staff so they understand their importance and proper handling. Informing the team with the correct knowledge is crucial for making the right decisions regarding data and reporting.

Common Mistakes to Avoid with NULLs

Let us now explore the common mistakes that we can avoid with NULLs.

Confusing NULL with Zero or Empty Strings: The first and most frequently encountered anti-patterns are NULL used as the same as zero or an empty string. Recognising that NULL is used to denote the absence of value is crucial in order to avoid misinterpretations of data.
Using the Equality Operator for NULL Comparisons: Do not use equality operators (=) when testing NULL values, this will result to an UNKNOWN condition. In stead of this, you should use predicates IS NULL or IS NOT NULL for comparison.
Neglecting NULLs in Aggregate Functions: Some of the common issues include the fact that most users seem to ignore the fact that aggregate functions like SUM, AVG and COUNT will always omit NULL values resulting to wrong signs. Use care of aggregate data and NULLs exist even in records containing only whole numbers.
Not Considering NULLs in Business Logic: Failing to account for NULL values in business logic can lead to unexpected outcomes in applications and reports. Always include checks for NULL when performing logical operations.
Overusing NULLs: While NULLs can be useful, overusing them can complicate data analysis and reporting. Strive for a balance, ensuring that NULLs are used appropriately without cluttering the dataset.
Ignoring Documentation: Neglecting to document your strategies for managing NULL values can lead to confusion and inconsistency among team members. Clear documentation is essential for effective data management.
Neglecting Regular Audits of NULL Values: Regular audits of NULL values help maintain data integrity and quality. Ignoring this step can result in accumulating errors and misinterpretations in your data analysis.

Conclusion

Handling NULL values in SQL requires careful attention to avoid skewing and affecting data analysis. You can solve issues with NULLs by intentionally using NULL, setting up constraints in the database, and auditing information daily. Further, there are specific pitfalls that, if familiarized with—such as confusing NULL with zero or failure to account for NULLs in logical operations—will improve data manipulation professional methods. Finally and more importantly an appropriate management of NULL values enhances query and reporting credibility and encourages appreciation of data environments and thus the formation of the right decisions/insights about a particular data.

Frequently Asked Questions

Q1. What does NULL mean in SQL?

A. NULL represents a missing or undefined value in SQL, indicating the absence of data.

Q2. How can I check for NULL values in a query?

A. Use IS NULL or IS NOT NULL to check for NULL values in SQL queries.

Q3. Will NULL values affect aggregate functions?

A. Yes, aggregate functions ignore NULL values, which can impact the results.

Q4. How can I replace NULL values with a default value?

A. You can use the COALESCE, IFNULL, or ISNULL functions to replace NULL values with a specified default.

Q5. Is it a good practice to allow NULL values in my database?

A. While NULLs can be necessary, it’s often best to minimize their use by enforcing NOT NULL constraints and providing default values where appropriate.

Ayushi Trivedi

My name is Ayushi Trivedi. I am a B. Tech graduate. I have 3 years of experience working as an educator and content editor. I have worked with various python libraries, like numpy, pandas, seaborn, matplotlib, scikit, imblearn, linear regression and many more. I am also an author. My first book named #turning25 has been published and is available on amazon and flipkart. Here, I am technical content editor at Analytics Vidhya. I feel proud and happy to be AVian. I have a great team to work with. I love building the bridge between the technology and the learner.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Handling NULL Values in SQL

Introduction

Learning Outcomes

Table of contents

What Are NULL Values in SQL?

Example of NULL Values

Impact of NULL Values on SQL Queries

Comparisons with NULL

Boolean Logic and NULLs

Aggregation Functions

DISTINCT and NULL Values

Techniques for Handling NULL Values

Using IS NULL and IS NOT NULL

Using COALESCE Function

Using NULLIF Function

Using the CASE Statement

Using Aggregate Functions with NULL Handling

Best Practices for Managing NULL Values

Common Mistakes to Avoid with NULLs

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)