SQL proficiency is crucial for the field of data science. We’ll talk about two SQL queries that product businesses use to screen applicants for jobs as data scientists in this article. The StrataScratch website generates the SQL questions.
StataScratch is an excellent tool for anyone wishing to get started in data science and improve their SQL and Python skills. This platform offers coding questions and non-coding topics related to data science, such as statistics, probability, and so on. I strongly advise you to create an account on the StrataScratch website and practice the question along with the article. To solve this problem, I will use the Postgres SQL database.
If you know SQL well, you will stand a better chance of clearing data science interviews or dealing with day-to-day tasks efficiently. This article will focus on the approach to solving the problem. After going through this article, you would better understand how you should approach the solution for a given problem. You must read this article to improve your understanding and ways to approach solutions. Let’s get ahead to questions.
Find the total number of downloads for paying and non-paying users by date. Include only records where non-paying customers have more downloads than paying customers. The output should be sorted by earliest date first and contain 3 columns date, non-paying downloads, and paid downloads.
Interview Question Date: November, 2020, Company: Microsoft, Difficulty-Level: Medium, Interview QuestionsID: 10300, Tables: ms_user_dimension(fields: user_id(int), acc_id(int)), ms_acc_dimension (fields: acc_id(int), paying_customer(varchar)) , ms_download_facts(fields: date(datetime), user_id(int), downloads(int)
Preview of table ms_user_dimension:
Preview of table ms_acc_dimension:
Preview of table ms_download_facts:
Approach:
Three tables are provided here. To solve the problem, we must determine the number of daily downloads made by paying and nonpaying customers. The problem can be divided into three sections. We will join all of the tables in the first section. The second section will determine the number of paid and non-paid downloads for each user. Finally, we will show records with more non-paid downloads than paid downloads.
select date, downloads,paying_customer from ms_user_dimension
inner join ms_acc_dimension on ms_user_dimension.acc_id = ms_acc_dimension.acc_id inner join ms_download_facts on ms_user_dimension.user_id = ms_download_facts.user_id;
select date, sum(case when paying_customer = 'yes' then downloads end) as paid_downloads, sum(case when paying_customer = 'no' then downloads end) as non_paid_downloads from ms_user_dimension inner join ms_acc_dimension on ms_user_dimension.acc_id = ms_acc_dimension.acc_id inner join ms_download_facts on ms_user_dimension.user_id = ms_download_facts.user_id group by date;
select date,
sum(case when paying_customer = 'no' then downloads end) as non_paid_downloads,
sum(case when paying_customer = 'yes' then downloads end) as paid_downloads
from ms_user_dimension inner join ms_acc_dimension
on ms_user_dimension.acc_id = ms_acc_dimension.acc_id
inner join ms_download_facts
on ms_user_dimension.user_id = ms_download_facts.user_id
group by date
having sum(case when paying_customer = 'no'
then downloads end) >
sum(case when paying_customer = 'yes' then downloads end)
order by date;
This is the hard-level question asked by Facebook/Meta in one of its interviews. You can view the question here. The question name is Highest Energy Consumption. The details of the question are given below:
Find the date with the highest total energy consumption from the Meta/Facebook data centers. Output the date along with the total energy consumption across all data centers.
Interview Question Date: March 2020, Company: Meta/Facebook, Difficulty-Level: Medium, Interview QuestionsID: 10064, Tables: fb_eu_energy (field: date(datetime), consumption(int)), fb_asia_energy (field: date(datetime), consumption(int)) , fb_na_energy(field: date(datetime), consumption(int))
Preview of table fb_eu_energy:
Preview of table fb_asia_energy:
Preview of table fb_na_energy:
Approach: The problem will be divided into three sections. We will combine the records from the tables in the first section. The total energy consumed each day will be determined in the second part. Finally, we must determine the date on which the most energy was consumed and return the result.
Step 1: Join Tables
As data is present among three tables, we must combine all of the records from all three tables. We can’t use the union to combine records from these three tables because there are duplicate records in fb_eu_energy and fb_na_energy. For instance, the record (2020-01-01, 400) can be found in the fb_eu_energy and fb_na_energy tables. Therefore, if we combine records using the union, it will eliminate duplicate records. Therefore we have used union all, which will contain the duplicate record also.
SELECT date, consumption FROM fb_eu_energy union all SELECT date, consumption FROM fb_asia_energy union all SELECT date, consumption FROM fb_na_energy;
Step 2: Calculate the Total Amount of Energy Consumed for Each Day
After combining all records, we will select the sum of energy consumption for each day. We can use the group on the date and take the total energy consumed for this.
select date, sum(consumption) as total_consumption from ( SELECT date, consumption FROM fb_eu_energy union all SELECT date, consumption FROM fb_asia_energy union all SELECT date, consumption FROM fb_na_energy )E group by date;
Step 3: Filter the Records and Format Result into Manner Specified
Now we must format our query result in the manner specified in the question. Across all data centers, we must output the data with the highest energy consumption. To arrange rows in descending order of total consumption, we can use the order by clause on the total consumption field. The first row will give us the date with the highest energy consumption; we can use limit 1 to output only one row for this task.
select date, sum(consumption) as total_consumption from ( SELECT date, consumption FROM fb_eu_energy union all SELECT date, consumption FROM fb_asia_energy union all SELECT date, consumption FROM fb_na_energy )E group by date order by sum(consumption) desc limit 1 ;
In this article, we looked at two SQL questions and how to solve them efficiently. We have seen union all, group by, having clause, case clause, filtering the rows using where clause, and how they have been used to solve the questions. When attempting to solve any complex problem, keep the following points in mind
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.