This article was published as a part of the Data Science Blogathon
SQL is one of the most important skills for an Analyst. SQL helps you query your data, which helps you find answers to all your questions and validate hypotheses. In this article, I’ve listed some of the most important functions you should know if you are starting out as an Analyst.
Let us imagine we have a database called ‘PRODUCTS’ which contains information on different products available in a store. What if we want to look at the entire data at once? We can do that using the SELECT statement and FROM statement in SQL. You can use the code below:
SELECT Category, Vendor, Price FROM PRODUCTS;
You will get the following output:
The SELECT statement allows you to select columns from the database. The FROM statement is to indicate which table the data needs to be picked up from.
In case you don’t want to type out individual column names, and want to select all the columns in the table, you can use * to denote that:
SELECT * FROM PRODUCTS;
This will return all columns present in the PRODUCTS table.
From the data, it is clear that the category column contains a set of fixed values. What if we want to select all distinct values present in the category column? We can use the DISTINCT statement for that. You can use the code below:
SELECT DISTINCT Category FROM PRODUCTS;
You will get the following output:
What if we only want to select only those records when a certain criterion is met? We can use the WHERE statement in such a situation. WHERE acts as a filter that helps us to filter rows basis certain conditions. For instance, if we want to filter rows where the price is more than 60, we can use the code below:
SELECT Category, Vendor, Price, Rating FROM PRODUCTS WHERE Price > 60;
You will get the following output:
You can see that the price is greater than 60 in all records.
We can also use multiple conditions (filters) under where statement using the AND clause. For instance, the below code will filter out records where the price is more than 60 and the rating is more than 3.
SELECT Category, Vendor, Price, Rating FROM PRODUCTS WHERE Price > 60 AND Rating > 3;
What if we want to see all records sorted by rating, or by price? The ORDER BY clause in SQL will help us do that. ORDER BY helps us in ordering records based on the values present in a column.
SELECT Category, Vendor, Price, Rating FROM PRODUCTS WHERE Price > 50 ORDER BY Rating;
The above code will list all records where the price is more than 50 in increasing order of rating. The output of the above piece of code will look like this:
It will order records in ascending order by default, but we can also arrange records in descending order by specifying DESC at the end of the ORDER BY statement.
SELECT Category, Vendor, Price, Rating FROM PRODUCTS WHERE Price >= 50 ORDER BY Rating DESC;
The above code will list out records in decreasing order of rating.
MIN helps us in identifying the minimum value in a column, MAX in the maximum, and AVG in the average value in a column. Let us use these functions to calculate the minimum, maximum, and average value in the price column. You can use the code below:
SELECT MIN(Price), MAX(Price), AVG(Price) FROM PRODUCTS;
The output will look like this:
If you want to change the column (field) name and replace it with an alias, you can use the AS clause.
SELECT MIN(Price) AS "Minimum Price", MAX(Price) AS "Maximum Price", AVG(Price) AS "Average Price" FROM PRODUCTS;
The output will look like this:
The GROUP BY clause in SQL is used to group records that have the same value in a particular column. For example, you can group on the category column in our dataset and see what is the average price in each category. You can use the below code:
SELECT Category, AVG(Price) AS "Average Price" FROM PRODUCTS GROUP BY Category;
The output will look something like this:
The GROUP BY clause is often used in combination with aggregate functions like AVG, MIN, MAX, and COUNT. You can also use the HAVING clause in conjunction with the GROUP BY clause to filter records.
SELECT Category, AVG(Price) AS "Average Price" FROM PRODUCTS GROUP BY Category HAVING MIN(Price) > 20;
The HAVING statement at the end will help us filter out categories where the minimum price is less than 20. The output will now contain only those categories where the minimum price is more than 20.
The IN operator helps you give a range of values instead of a single value. It is often used with the WHERE clause. For example, look at the code below:
SELECT Title, Category, Vendor, Price, Rating FROM PRODUCTS WHERE Category IN ('Gizmo', 'Gadget') ORDER BY Price DESC;
The above code will give us all products in the Gizmo and Gadget category and arrange them in decreasing order of price. The output will look like this:
BETWEEN is a useful operator that helps us filter values between a range. You can specify the minimum and maximum values of the range, and BETWEEN will filter records that lie in between these values. Keep in mind that BETWEEN is inclusive on both ends.
SELECT Title, Category, Vendor, Price FROM PRODUCTS WHERE Price BETWEEN 25 AND 50;
The above code will filter out values where the Price value is between 25 and 50. BETWEEN is often used with dates to filter records within a certain date range.
In this article, we learnt about the most commonly used SQL functions and operators that are important for every Analyst. This article is written by Vishesh Arora. You can connect with me on LinkedIn.