This article was published as a part of the Data Science Blogathon
SQL is one of the most widely used skills when it comes to Data Analysis. Analysts around the world rely on SQL for their analysis. In this article, we will cover the basics of Data Analysis using SQL.
You can also learn the basics of SQL in my previous article on Analytics Vidhya.
The ORDER BY keyword is used to order records based on values in a certain column. It allows you to arrange records in ascending and descending order. For example, let us assume we have sales records in a table called “Orders”. Let us have a look at the data in this table.
SELECT * FROM Orders LIMIT 10;
The output will look like this:
The table contains information about the buyer (USER_ID), the product they bought (PRODUCT_ID), the amount they spent (SUBTOTAL and TOTAL), quantity, and the date of purchase.
Let us order these records in decreasing order of amount spent. This way, the highest-grossing orders will come on top, and smaller orders will come at the bottom. We will use the ORDER BY and DESC keywords for this.
SELECT * FROM Orders ORDER BY Total DESC;
The output will look like this:
You can see that the records are now arranged in decreasing order of the order total. In case you want to see records in increasing (ascending) order, you can use the ASC keyword followed by the ORDER BY. Alternately, you can omit the ASC keyword, and SQL will order records in increasing order by default.
The GROUP BY statement is used to group records based on values in a certain column. For example, we can group all records for a particular user together and then use aggregate functions (SUM, AVG, COUNT, MAX, MIN) on top of it.
Let us try to see the average order value of each user. We will use the GROUP BY statement and the AVG function for this. Also, let us arrange them in decreasing order of average order value.
SELECT User_id, AVG(Total) AS "Average Order Value" FROM Orders GROUP BY User_id ORDER BY AVG(Total) DESC;
The output will look like this:
You can see the users with the highest average order value appear on top.
Aggregate functions are often used in conjunction with the GROUP BY statement. Let us see another example to understand better. Let us see the number of orders placed by each user. To do this, we will use the GROUP BY statement and the COUNT function.
SELECT User_id, COUNT(*) AS "Number of orders" FROM Orders GROUP BY User_id;
COUNT(*) will count the number of records associated with each USER_ID. The output will look like this:
We can see the number of orders associated with each user.
JOINs in SQL allow you to join two or more tables on a common column (field). JOINs are extremely useful because they help us combine data from multiple tables in a single output. Let us understand this with an example.
The Orders table contains information about customers’ orders. The Products table contains information about different products. Let us join these two tables.
SELECT User_id, Total, Discount, Product_id, Vendor, Price, Rating FROM Orders JOIN Products ON Orders.Product_id = Products.id;
The JOIN clause allows us to join the Orders table with the Products table on a common column. The ON clause lets us specify the columns on which the tables should be joined.
The output will look like this:
We can see the Orders table has been joined on the Products table. We can see columns from both tables together in a single output.
We can also combine JOINs with the GROUP BY statement or the ORDER BY keyword and do more meaningful analyses. Let us see an example of this.
We will see the total amount spent by different users, and the total cost of the products they had bought. We will then arrange records by the total amount spent by users in decreasing order.
SELECT User_id, SUM(Total) AS "Total amount spent", SUM(Price) AS "Total cost", AVG(Rating) AS "Average Rating" FROM Orders JOIN Products ON Orders.Product_id = Products.id GROUP BY User_id ORDER BY SUM(Total) DESC;
The output will look like this:
We have combined the two tables, Orders and Products, and then used the GROUP BY statement and ORDER BY in the same query to see the total amount spent and the total cost of products bought by different users.
JOINs can be used in a variety of ways to solve different business use cases. There are 5 major types of JOINs in SQL.
1. INNER JOIN: An Inner Join returns all records from both the tables where the join condition is satisfied.
2. LEFT (OUTER) JOIN: A Left Join returns all records in the left table, and matches records from the right table.
3. RIGHT (OUTER) JOIN: A Right Join returns all records in the right table, and matches records from the left table. A right join works similar to a left join.
4. FULL (OUTER) JOIN: A Full Join returns all records from both tables.
5. SELF JOIN: A Self Join is used to join a table on its own. It is quite useful in some use cases.
Source: https://learnsql.com/blog/learn-and-practice-sql-joins/
The WHERE statement helps us in filtering records based on values in a column. Let us look at an example to see how it works. We will look at orders where the total field is more than 100.
SELECT * FROM Orders WHERE Total > 100;
The output will look like this:
We can see that the result only contains orders where the total is more than 100.
The LIKE operator is used with the WHERE statement to look for a pattern in a column. We can also use wildcards (like % and _) with the LIKE operator. The % sign represents 0 or more than 0 characters. The _ sign represents exactly 1 character.
If we want to filter records where the product name contains “Shoes”, we can use the LIKE operator.
SELECT * FROM Products WHERE Title LIKE '%Shoes%';
The output will look like this:
We can see that all the records have “Shoes” in their title. The LIKE operator is very useful when columns contain string values.
The AND and OR operators are used to combine multiple conditions. For instance, if we want to filter records where the product title contains shoes and the price is less than 50, we can use the AND operator.
SELECT * FROM Products WHERE Title LIKE '%Shoes%' AND Price < 50;
This will filter out records where the product title contains shoes, and the price is less than 50. The output will only contain records where both these conditions are met. The output will look like this:
Only 3 records are returned in the output. Well, guess what. The shoes are expensive!
If we want to filter records where the price is less than 25 or the rating is more than 4.5, we can use the OR operator.
SELECT * FROM Products WHERE Price < 25 OR Rating > 4.5;
The output will look like this:
We can see that the output contains records where the price is less than 25 or the rating is more than 4.5. It returns all records where at least one of these conditions is met.
The AND and OR operators can be used in conjunction also to create custom filters.
In this article, we looked at the basics of Data Analysis in SQL. This article was written by Vishesh Arora. You can connect with me on LinkedIn. You can also read my other blogs on Analytics Vidhya. To know more about me, you can check out my portfolio.