Who says only lawyers have to deal with challenging cases and solve them? For sure, they’ve never talked to a data enthusiast — whether a data analyst, data scientist, or any other role in the vast realm of data (be sure that the data realm is too vast to have only a few roles of superheroes).
However, don’t try that at home! Discussing data with a data lover means diving into a conversation that might never end.
In the world of data, we have our own CASEs to deal with and logic to implement, to give, in the end, Caesar what belongs to Caesar (aka our client). Unlike lawyers, we don’t have piles of papers to read and interpret to save our client’s skin, but we have an important mission too…to ensure the system operates flawlessly and delivers the most accurate information whenever the client asks.
It’s the same in some ways — we both save our clients, but from different challenges, right?
Imagine the far-reaching consequences of making a mistake, particularly within the banking sector. An error could result in substantial financial losses for either the bank or the client simply due to incorrect logic in a CASE statement.
The CASE statement is a must-have tool in our survival toolkit, especially in complex conditions. Its versatility makes it invaluable for handling missing values, creating calculated fields, and managing nested conditions.
As data enthusiasts, we always look forward to making sense of the darker chaos within datasets and extracting the most valuable insights. It’s like solving a complex puzzle with thousands of pieces, often involving twisted logic brought by our clients.
Let’s delve deeper into this CASE of ours. We’ll explore everything from the syntax to real-world applications, providing practical examples and best practices. By the end of this article, you will be well-equipped to master this essential tool, making your data work more effective and insightful.
Another way to ask how? Yes, with CASE
In some SQL environments, the IF statements aren’t as free to use as in other programming languages. Because it’s almost impossible not to have any criteria presented by the client (though life would be a lot easier in this situation), the solution comes in the form of CASE.
After a short search on Google, we see that:
CASE statements in SQL are similar to the IF-ELSE logic from other programming languages, allowing the developer to implement different logic based on specific conditions.
Any statement in SQL has its way of telling the IDE that something is going to be done so that the IDE can recognize and prepare to interpret the expression or function we will use.
The syntax of a CASE statement in SQL is pretty simple and similar to an IF-ELSE statement.
SELECT
CASE expression
WHEN value1 THEN result1
COUNT(CASE WHEN OrderStatus = 'Pending' THEN 1 END) AS PendingOrders,
COUNT(CASE WHEN OrderStatus = 'Cancelled' THEN 1 END) AS CancelledOrders
FROM Orders;
Conditional Formatting in Reports — when generating reports, CASE statements can be used to apply conditional formatting, such as flagging important records or highlighting anomalies.
SELECT
CASE
WHEN Salary > 70000 THEN 'Executive'
WHEN Salary BETWEEN 50000 AND 70000 THEN 'Professional'
ELSE 'Entry-Level'
END AS SalaryCategory
FROM Employees;
Like any other statement, we need to know that understanding their impact on query
WHEN value2 THEN result2
WHEN value3 THEN result3
....
ELSE resultN
END AS your_alias
FROM your_table_name
Now, let’s shed some light on the code above:
This type of statement allows you to map or transform values based on the ‘surprises’ (requirements/rules) each client comes with, providing an easier and more readable way to handle different scenarios in your data.
I was surprised when I heard that CASE comes in two flavors: simple and searched. It’s most likely to use one or another without knowing that the magic you have written already has a name (don’t worry it happens to me a lot, and it’s normal to not know everything).
To help you get a clearer picture, let’s dive into each type and see how they work.
As the name suggests, this is the most used type of CASE. It allows you to compare an expression to a set of possible values to determine the correct value for each scenario. It’s straightforward and really helpful when you need to evaluate a single expression against multiple values.
The syntax of a simple CASE statement is as follows:
SELECT
CASE expression
WHEN value1 THEN result1
WHEN value2 THEN result2
WHEN value3 THEN result3
....
ELSE resultN
END AS your_alias
FROM your_table_name
Example:
Let’s assume we have the ‘Orders’ table with the following structure and data:
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
CustomerName VARCHAR(100),
OrderDate DATE,
OrderStatus VARCHAR(50),
Amount DECIMAL(10, 2)
);
INSERT INTO Orders (OrderID, CustomerName, OrderDate, OrderStatus, Amount) VALUES
(1, 'John Doe', '2023-07-01', 'Shipped', 150.00),
(2, 'Jane Smith', '2023-07-05', 'Pending', 200.00),
(3, 'Emily Johnson', '2023-07-10', 'Cancelled', 50.00),
(4, 'Michael Brown', '2023-07-12', 'Shipped', 300.00),
(5, 'Sarah Davis', '2023-07-15', 'Pending', 120.00),
(6, 'David Wilson', '2023-07-20', 'Shipped', 75.00),
(7, 'Laura Garcia', '2023-07-22', 'Cancelled', 100.00),
(8, 'James Martinez', '2023-07-25', 'Shipped', 250.00),
(9, 'Linda Anderson', '2023-07-30', 'Pending', 180.00),
(10, 'Robert Thomas', '2023-08-01', 'Cancelled', 90.00);
We aim to categorize all order statuses into 3 main categories: Pending, Processed, and Others.
To do that, we use a simple CASE statement that looks like this:
SELECT
OrderID,
CustomerName,
OrderDate,
OrderStatus AS FormerStatus,
CASE OrderStatus
WHEN 'Shipped' THEN 'Processed'
WHEN 'Pending' THEN 'Pending'
ELSE 'Others'
END AS NewStatus
FROM Orders;
Result:
To better understand the effects of using CASE in this example, I also kept the ‘OrderStatus’ column but gave it the alias ‘FormerStatus’. In the new column, the one we created using the CASE statement, called ‘NewStatus’, we see the 3 statuses: Processed for the orders that have been Shipped and Pending for those that are still in Pending status.
I included the ELSE clause to ensure that values that do not fit into any of the categories specified in the WHEN clauses are classified as “Other”. This approach helps avoid NULL values, which can affect further analysis.
On the other hand, a search CASE statement deals with multiple comparisons by evaluating a set of boolean expressions to determine the result.
Comparative with the simple CASE, it provides more flexibility and power, allowing you to perform complex condition checks.
The syntax of a searched CASE statement is as follows:
SELECT
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
WHEN condition3 THEN result3
...
ELSE resultN
END AS your_alias
FROM your_table_name;
Now, let’s shed some light on the code above:
Example:
In our workplace, salary raises are determined based on the current salary. Employees are categorized into three salary levels:
Based on these categories, the salary raise is applied as follows:
We have the ‘Employees’ table with the below structure. We need to create an SQL query to calculate the new salary after the raise and categorize the salaries into the specified levels.
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
EmployeeName VARCHAR(100),
Salary DECIMAL(10, 2)
);
INSERT INTO Employees (EmployeeID, EmployeeName, Salary) VALUES
(1, 'John Doe', 75000.00),
(2, 'Jane Smith', 65000.00),
(3, 'Emily Johnson', 45000.00),
(4, 'Michael Brown', 55000.00),
(5, 'Sarah Davis', 30000.00);
To have both the raise calculation and the salary categorization in the same query we need to implement 2 CASE statements: one for salary categorization, and one for the new salary after the raise.
SELECT
EmployeeID,
EmployeeName,
Salary AS CurrentSalary,
CASE
WHEN Salary > 70000 THEN Salary * 1.02
WHEN Salary BETWEEN 50000 AND 70000 THEN Salary * 1.05
ELSE Salary * 1.10
END AS NewSalary,
CASE
WHEN Salary > 70000 THEN 'Executive'
WHEN Salary BETWEEN 50000 AND 70000 THEN 'Professional'
ELSE 'Entry-Level'
END AS SalaryCategory
FROM Employees;
Result:
What we did was:
Simple CASE statements in SQL are used to determine the result value by evaluating an expression against a set of specified values. The corresponding result is returned when the expression matches a specified value.
On the other hand, Searched CASE statements determine the result value by evaluating a set of Boolean expressions. Each Boolean expression is evaluated sequentially, and the corresponding result is returned when a true condition is found. This allows for more complex conditional logic to be applied in SQL queries.
Now that you have become more comfortable with CASE statements, let me introduce you to NESTED CASE. In some projects, you might encounter situations when a single CASE , regardless of its type, won’t be enough to handle your complex logic.
Well, these are the scenarios when a Nested Case comes and sets the stage by allowing you to have a CASE statement embedded within another, deal with intricate decision-making processes, and can help simplify complex conditional logic by breaking it down into smaller and more manageable parts.
Example:
Sometimes, the bank creates customized loans for the oldest client of their bank. To determine which type of loan can be offered to each client, it must check its credit score and the years it’s been with the bank. So the logic might look like this:
Credit Score:
Years with the bank:
Based on these, the bank established the following types of loans:
To find the needed answer we use a nested CASE :
SELECT
CustomerID,
CustomerName,
CreditScore,
YearsWithBank,
CASE
WHEN CreditScore > 750 THEN
CASE
WHEN YearsWithBank > 5 THEN 'Premium Loan'
WHEN YearsWithBank BETWEEN 1 AND 5 THEN 'Standard Loan'
ELSE 'Basic Loan'
END
WHEN CreditScore BETWEEN 600 AND 750 THEN
CASE
WHEN YearsWithBank > 5 THEN 'Standard Loan'
ELSE 'Basic Loan'
END
ELSE 'Basic Loan'
END AS LoanType
FROM Customers;
Result:
Based on the conditions, only Alice Johnson received a Premium Loan offer because her credit score is higher than 750, and she’s been a bank client for about six years already.
Also read: SQL For Data Science: A Beginner Guide!
We have already seen that the CASE statement is a powerful tool for implementing conditional logic directly in your queries. Below are some of the most common use cases where CASE statements saved the situation:
SELECT
Name,
Age,
CASE
WHEN Age < 18 THEN 'Minor'
WHEN Age BETWEEN 18 AND 64 THEN 'Adult'
ELSE 'Senior'
END AS AgeGroup
FROM Employees;
SELECT
COUNT(CASE WHEN OrderStatus = 'Shipped' THEN 1 END) AS ShippedOrders,
COUNT(CASE WHEN OrderStatus = 'Pending' THEN 1 END) AS PendingOrders,
COUNT(CASE WHEN OrderStatus = 'Cancelled' THEN 1 END) AS CancelledOrders
FROM Orders;
SELECT
CASE
WHEN Salary > 70000 THEN 'Executive'
WHEN Salary BETWEEN 50000 AND 70000 THEN 'Professional'
ELSE 'Entry-Level'
END AS SalaryCategory
FROM Employees;
Performance considerations with CASE statements
Like any other statement, we need to know that understanding their impact on query performance is crucial. Here are some key points to consider when including the CASE statement within your queries:
Also Read: SQL: A Full Fledged Guide from Basics to Advance Level
The CASE statement in SQL is an essential tool for data enthusiasts. It provides a powerful and flexible way to handle complex conditional logic within queries. Similar to how lawyers solve challenging cases, data professionals use the CASE statement to ensure the accuracy and reliability of their analyses and reports. This tool is indispensable for transforming and categorizing data, performing conditional aggregations, and applying conditional formatting in reports, which makes data insights more meaningful and actionable.
In this article, we have explored both the syntax and practical applications of simple and searched CASE statements, demonstrated their use in real-world scenarios, and highlighted best practices for optimizing their performance. By mastering the CASE statement, data analysts and scientists can enhance the effectiveness of their SQL queries, ensuring they deliver precise and insightful results to their clients.