This article was published as a part of the Data Science Blogathon.
SQL stands for Structured Query Language
As its name suggests, SQL follows a structured pattern to query or fetch data from a relational database. In SQL, JOINs are powerful tools that make it easier to get the data stored in different tables.
In the real world, companies usually store the data on different small tables. So getting the required data for analysis essentially needs to go through the process of connecting different tables. Hence, interviewers always want to know how efficiently you can use SQL JOINs. And that’s why in data science job interviews, JOINs are always favorite topics.
I listed here the 5 most frequently asked interview questions on SQL JOINs and how you can answer them. I gathered them through my interview experience at different companies.
Let’s start…
This is the most straightforward interview question.
The JOIN
command in SQL is used to combine data from two or more tables based on the common columns between these tables.
When the data you need resides in different tables, SQL JOIN allows you to combine these tables in one or more common columns. And then you can select the records, which may or may not have columns from both the tables.
Based on how you would like to combine two or more tables and select records, there are 4 fundamental types of JOINs in SQL.
INNER JOIN
: It combines two tables so that the common columns on which the join operation is performed have matching values in both the tables. Therefore it is used to retrieve records that are common in both tables. Instead of writing INNER JOIN, you can write only JOIN. By default, it will be taken as INNER JOIN.LEFT JOIN
: This is used to retrieve all records from the left table and the matching records from the right table. However, when there is no matching row in the right table, this type of JOIN returns NULL for certain rows in the left table. This join is also called a LEFT OUTER JOIN.RIGHT JOIN
: It is used to retrieve all records from the right table and the matching records from the left table. Similar to the previous join type, for certain records in the right table, when there is no matching row in the left table, this type of JOIN returns NULL. This join is also called a RIGHT OUTER JOIN.FULL JOIN
: As its name suggests, it is used to retrieve all the records from both the tables. So the result set of this join will contain all the rows from both the tables. When there is no matching row in the left or right table, a NULL value is returned at that position. This join is also called a FULL OUTER JOINHere instead of stating only the types of JOINs, you should try to explain them in 1–2 sentences. This certainly gives the interviewer an idea that you know more about the joins.
This returns all the possible combinations of two tables. So each row of the first table combines with each row of the second table.
Ultimately, the total number of records returned by CROSS JOIN
is essentially the multiplication of the number of records in both the tables, if no WHERE
a clause is used.
The result generated from this JOIN is called the Cartesian Product of the two tables.
Unlike other types of joins, you do not need to mention the ON clause in CROSS JOIN. It can be as simple as shown below.
SELECT t1.column1, t1.column2, t1.column3 FROM Table1 AS t1 CROSS JOIN Table2 AS t2
This is used to combine a table with itself. It might sound confusing if you are a beginner. However, it is beneficial for comparing rows within the same table.
Remember, there is no SELF JOIN
keyword, so you just use JOIN
where both tables involved in the join are the same table. As both the table names are the same, it is essential to use the table alias in the case of SELF JOIN. ✅
Write a SQL query that finds out employees who earn more than their managers
— One of the most frequently asked interview question onSELF JOIN
let’s take this as an example and create a Dummy_Employees dataset as below.
Dummy employees dataset | Image by Author
And try to get find out which employees handle more orders than their manager using this query,
SELECT t1.EmployeeName, t1.TotalOrders FROM Dummy_Employees AS t1 JOIN Dummy_Employees AS t2 ON t1.ManagerID = t2.EmployeeID WHERE t1.TotalOrders > t2.TotalOrders
As expected, it returned employees — Abdul and Maria — who handled more orders than their manager — Pablo.
I got this question in almost 80% of the interviews I faced. So, it is the classic use-case of SELF JOIN
.
You can read more about SELF JOIN and its use in one of my most-read articles –>
And last but not least,
Combining multiple tables is the same as using JOIN between two tables.
This type of join requires consecutive JOIN
operations → first, you join the first and the second table and get an intermediate result set; then, you join another table to this intermediate table.
Let’s see how this works using the example below.
Suppose you want to retrieve orderID, sales in USD, and quantity which are present in three different tables as shown below.
And orderID is the common column between them. So, the JOIN operation will be performed on this column as shown below.
SELECT orders.orderID, sales.salesUSD, products.quantity FROM orders INNER JOIN sales ON orders.orderID = sales.orderID INNER JOIN products ON orders.orderID = products.orderID
This gives only a single row output, as you used INNER JOIN
, and only orderID = A1234
is present in all the tables.
Likewise, you can use other types of JOINs as well!
I hope you found this article useful to know and understand interview questions about SQL JOINs. Knowing different types of JOINs and their uses will certainly assist you in acing the job interview.
Ultimately, you learned in this article –
Along with basic types of JOINs, you now have a more conceptual understanding of CROSS JOINs and SELF JOINs.
If you found this piece of article useful, feel free to join me on Medium & Sign up for my E-mail list
Thank you for reading!
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.