SQL is a universal tool in data science. Irrespective of which language you use as your main tool – you need to know SQL. There are no 2 ways about it. For most of the organisations, SQL is the way to store and retrieve structured data form underlying systems. So, if you are an aspiring data scientist or already a data science professional, having . expertise in SQL is a big boon.
To help our community test themselves on SQL, we designed a SQL skill test as part of DataFest 2017. More than 1500 people registered in the skill test and more than 500 people took the test. Below is the distribution of scores:
You can access the final scores here. Here are a few statistics about the distribution:
Mean Score: 18.41
Median Score: 20
Mode Score: 28
1) Which of the following option(s) is/are correct?
A) 1
B) 2
C) Both 1 and 2
D) None of these
Solution: (A)
SQL is a querying language and it is not case sensitive.
2) What is true for a Null value in SQL?
A) 1 and 3
B) 2 and 4
C) 1 and 4
D) 2 and 3
Solution: (A)
NULL represents a unknown value so adding anything to null value will give a null value in result.
3) Which of the following is not an aggregrate function in SQL?
A) MIN()
B) MAX()
C) DISTINCT()
D) COUNT()
Solution: (C)
All of the functions except DISTINCT function given in the question is an example of aggregate function.
4) Which of the following option is true for the below queries?
Query1: select 1 in (1,2,1,2); Query2: select 1 in (1,2);
B) Both queries will give same outputs.
C) Can’t say
Solution: (B)
Both query will return the same output.
5) Which of the following cannot be a superkey in a relational schema with attributes A,B,C,D,E and primary key AD?
B) A C D E
C) A B C E
D) A B D E
Solution: (C)
The attributes “A”, “D” should be present in the super key. Option C doen’t have “D” attribute so C would be the right answer.
Questions Context 6-7
You run the following Queries in the following order:
Create a table “Me” using the below SQL query.
Query1: Create table Me(name varchar(20), salary int);
Next, you create a view based on “Me” table by using the following query.
Query2: Create view me_view as select name from me;
Finally, you run the following query:
Query3: Drop table Me;
6) Which of the following statements are true?
A) 1 and 3
B) 1 and 4
C) 2 and 3
D) 2 and 4
Solution: (B)
Query 2 is used for creating the view on table “Me” so it would run fine but if you run the Query3 it will generate the below error.
ERROR: cannot drop table me because other objects depend on it
DETAIL: view me_view depends on table me
HINT: Use DROP … CASCADE to drop the dependent objects too.
7) Now, you have changed the ‘Query3’ as below.
Query3: DROP TABLE Me CASCADE;
And, you also want to run below query on the same table.
Query4: select * from me_view;
Which of the following statements are true for such cases?
A) 1 and 3
B) 1and 4
C) 2 and 3
D) 2 and 4
Solution: (C)
If you drop the base table using cascade it will drop the base table as well as view table also so Query 3 will run fine but Query 4 will give an error.
8) Imagine, you have a column ‘A’ in table1 which is a primary key and it refers to column ‘B’ in table2.
Further, column ‘A’ has only three values (1,2,3). Which of the following options is / are correct?
A) 1 and 2
B) 2 and 3
C) 1 and 2
D) 3 and 4
Solution: (B)
You can insert any value except the duplicate values in column A in table 1 but you cannot insert the values other then 1,2 and 3 in columns B in table2 due to foreign key integrity because it is referenced by the column A.
9) Consider a table T1 which contains ‘Column A’ in it. Column A is declared as a primary key. Which of the following is true for column “A” in T1?
A) 1 and 4
B) 2 and 4
C) 1 and 3
D) 2 and 3
Solution: (B)
A primary key column cannot contain duplicate and null values.
10) Imagine you have a table “T1” which has three columns “A”, “B” and “C” where A is a “primary key”.
Which of the following query will return number of rows present in the table T1
Query1: SELECT COUNT(A) FROM T1; Query2: SELECT COUNT(*) FROM T1; Query3: SELECT COUNT(A,B) FROM T1;
B) 2 and 3
C) 1 and 3
D) 1, 2 and 3
Solution: (A)
Query1 and Query2 will return the same output.
11) Which of the following statement describes the capabilities of UPDATE command most appropriately?
B) It can update multiple values of a single column
C) It can update one value of multiple columns
D) It can update multiple values of multiple columns
Solution: (D)
12) What is true about indexing in a database?
B) Search will be slower after using indexing
C) Indexing has nothing to do with search
D) None of these
Solution: (A)
Option A is correct. Read more here.
13) Consider three tables T1, T2 and T3. Table T1, T2 and T3 have 10, 20, 30 number of records respectively. Further, there are some records which are common in all three tables.
You want to apply a cartesian product on these three tables. How many rows will be available in cartesian product of these tables?
B) More than 6000
C) Less than 6000
D) None of these
Solution: (A)
14) Tables A, B have three columns (namely: ‘id’, ‘age’, ‘name’) each. These tables have no null values and there are 100 records in each of the table.
Here are two queries based on these two tables A and B.
Query1: SELECT A.id FROM A WHERE A.age > ALL (SELECT B.age FROM B WHERE B.name = 'Ankit') Query2: SELECT A.id FROM A WHERE A.age > ANY (SELECT B.age FROM B WHERE B.name = 'Ankit')
Which of the following statement is correct for the output of each query?
B) The number of tuples in the output of Query 1 will be equal to the output of Query 2
C) The number of tuples in the output Query 1 will be less than or equal to the output of Query 2
D) Can’t say
Solution: (C)
Answer C is correct because natural join always give either same or less number of rows if you compare it with cartesian product. To know more read from this tutorial.
15) What will be the output of the following query in PostgreSQL?
Query 1: SELECT DATE_PART('year', '2012-01-01'::date) - DATE_PART('year', '2011-10-02'::date);
?column? |
365 |
(1 row) |
B)
?column? |
9 |
(1 row) |
C)
?column? |
1 |
(1 row) |
D)
?column? |
305 |
(1 row) |
Solution: (C)
It will give the year differece in output so answer C is correct.
16) Imagine you are given the following table named “AV”.
ID | NAME | DOB |
1 | ANKIT | 1990-09-19 |
2 | FAIZAN | 1993-01-01 |
3 | SUNIL | 1985-11-02 |
4 | SAURABH | 1994-11-12 |
5 | KUNAL | 1983-11-12 |
And you want to run the following queries Q1, Q2 and Q3 given below.
Q1: DROP TABLE AV; Q2: DELETE FROM AV; Q3: SELECT * FROM AV;
Which sequence for the three queries will not result in an error?
B) Q2 -> Q1 -> Q3
C) Q3 -> Q1 -> Q2
D) Q2 -> Q3 -> Q1
Solution: (D)
“DROP TABLE” will drop the table as well as it’s reference. So, you can’t access the table once you have dropped it. But in case of “DELETE TABLE” reference will not be droped so you can still access the table if you use “DELETE TABLE” command.
17) Imagine you are given the following table named “AV”.
id | name | dob | sal |
1 | ANKIT | 1990-09-19 | 200 |
2 | FAIZAN | 1993-01-01 | 300 |
3 | SUNIL | 1985-11-02 | 500 |
4 | SAURABH | 1994-11-12 | 350 |
5 | KUNAL | 1983-11-12 | 600 |
You apply the following query Q1 on AV, which is given below:
Q1: SELECT * FROM AV WHERE SAL BETWEEN 200 AND 500;
What will be the output for query Q1?
Id | Name | Dob | sal |
1 | ANKIT | 1990-09-19 | 200 |
2 | FAIZAN | 1993-01-01 | 300 |
3 | SUNIL | 1985-11-02 | 500 |
4 | SAURABH | 1994-11-12 | 350 |
B)
Id | Name | Dob | sal |
1 | ANKIT | 1990-09-19 | 200 |
2 | FAIZAN | 1993-01-01 | 300 |
4 | SAURABH | 1994-11-12 | 350 |
C)
Id | Name | Dob | Sal |
1 | FAIZAN | 1993-01-01 | 300 |
2 | SUNIL | 1985-11-02 | 500 |
3 | SAURABH | 1994-11-12 | 350 |
D)
Id | Name | Dob | Sal |
1 | FAIZAN | 1993-01-01 | 300 |
2 | SAURABH | 1994-11-12 | 350 |
Solution: (A)
The boundary salaries (200 and 500) will also be in the out so A is the right answer.
18) Imagine you are given the following table named “AV”.
id | name | dob | sal |
1 | ANKIT | 1990-09-19 | 200 |
2 | FAIZAN | 1993-01-01 | 300 |
3 | SUNIL | 1985-11-02 | 500 |
4 | SAURABH | 1994-11-12 | 350 |
5 | KUNAL | 1983-11-12 | 600 |
What would be the output for the following query?
Query: SELECT ID, SUBSTRING(NAME,2,5) "sub_name" FROM AV;
Id | Sub_name |
1 | NKI |
2 | AIZ |
3 | UNI |
4 | AUR |
5 | UNA |
B)
Id | Sub_name |
1 | NKIT |
2 | AIZAN |
3 | UNIL |
4 | AURAB |
5 | UNAL |
C)
Id | Sub_name |
1 | NK |
2 | AI |
3 | UN |
4 | AU |
5 | UN |
Solution: (B)
Question Context 19-21
Assume you are given the two tables AV1 and AV2 which represent two different departments of AV.
AV1 TABLE
Id | name |
1 | ANKIT |
2 | FAIZAN |
3 | SUNIL |
4 | SAURABH |
5 | KUNAL |
AV2 TABLE
Id | name |
1 | DEEPAK |
2 | SWATI |
3 | DEEPIKA |
4 | PRANAV |
5 | KUNAL |
5 | SUNIL |
19) Now, you want the names of all people who work in both the departments. Which of the following SQL query would you write?
B) SELECT NAME FROM AV1 UNION SELECT NAME FROM AV2;
C) SELECT NAME FROM AV1 DIFFERENCE SELECT NAME FROM AV2;
D) None of these
Solution: (A)
INTERSECT would be used for such output.
20) What is the output for the below query?
Query: SELECT NAME FROM AV1 EXCEPT SELECT NAME FROM AV2;
name |
FAIZAN |
SAURABH |
ANKIT |
B)
name |
FAIZAN |
SAURABH |
ANKIT |
SUNIL |
KUNAL |
C) ERROR
D) None of these
Solution: (A)
This query will give the names in AV1 which are not present in AV2.
21) What will be the output of below query?
Query: SELECT NAME FROM AV1 NATURAL JOIN AV2;
name |
SUNIL |
KUNAL |
B)
name |
SUNIL |
C) None of these
Solution: (B)
Question Context 22-24
Suppose you are given the below table called A_V.
Id | Name | Sal | dept |
1 | ANKIT | 100 | DS |
2 | FAIZAN | 200 | DS |
3 | SUNIL | 800 | ALL |
4 | SAURABH | INTERN | |
5 | KUNAL | 1000 | ALL |
22) What is the output for the below query?
Query: SELECT DEPT, AVG(SAL) FROM A_V GROUP BY DEPT,NAME;
dept | avg |
DS | 100.0000000000000000 |
ALL | 800.0000000000000000 |
ALL | 1000.0000000000000000 |
DS | 200.0000000000000000 |
B)
dept | avg |
INTERN | |
DS | 100.0000000000000000 |
ALL | 800.0000000000000000 |
ALL | 1000.0000000000000000 |
DS | 200.0000000000000000 |
C) ERROR
D) None of these
Solution: (B)
23) What is the output for the below query?
Query: SELECT COALESCE(sal,2)+100 AS sal FROM A_V;
Sal |
202 |
302 |
902 |
Null |
1102 |
B)
Sal |
200 |
300 |
900 |
102 |
1100 |
C)
Sal |
202 |
302 |
902 |
102 |
1102 |
D) None of these
Solution: (B)
First replace null value will be replaced to 2 using COALESCE then 100 will be added.
24) What is the output for the below query?
Query: SELECT * FROM a_v WHERE name In ('Ankit', 'Faizan');
Id | Name | Sal | Dept |
1 | ANKIT | 100 | DS |
2 | FAIZAN | 200 | DS |
B) Empty output
C)Error
D)None of these
Solution: (B)
SQL is not case sensitive but when you search for something in a string column it becomes case sensitive. So output will have zero rows because ‘Ankit’ != ‘ANKIT’ and ‘Faizan’ != ‘FAIZAN’.
25) You are given a string ” AnalyticsVidhya “. The string contains two unnecessary spaces – one at the start and another at the end. You find out the length of this string by applying the below queries.
Query1: SELECT length(rtrim(' AnalyticsVidhya ')); Query2: SELECT length(ltrim(' AnalyticsVidhya ')); Query3: SELECT length(rtrim(ltrim(' AnalyticsVidhya '))); Query4: SELECT length(ltrim(rtrim(' AnalyticsVidhya ')));
If op1, op2, op3, op4 are the output of the Query 1, 2, 3 and 4 respectively, what will be the correct relation between these four queries?
B) 2
C) 3
D) 1 and 4
Solution: (D)
Option D is correct. For more information read from this tutorial.
Questions Context 26-27
Below you are given a table “split”.
uk | id |
ANKIT-001-1000-AV1 | 1 |
SUNIL-002-2000-AV2 | 2 |
FAIZY-007-3000-AV1 | 3 |
26) Now, you want to apply a query on this.
Query: SELECT SPLIT_PART(uk, '-', 0) FROM SPLIT;
What is the output for the above query?
split_part |
ANKIT |
SUNIL |
FAIZY |
B)
split_part |
001 |
002 |
007 |
C)
split_part |
1000 |
2000 |
3000 |
D)
split_part |
AV1 |
AV2 |
AV1 |
E) Error
Solution:(E)
The query will give the below error.
ERROR: field position must be greater than zero
27) In the above table “split”, you want to replace some characters using “translate” command. Which of the following will be the output of the following query?
Query: SELECT TRANSLATE(UK, 'ANK', '123') FROM SPLIT;
translate |
123IT-001-1000-1V1 |
SU2IL-002-2000-1V2 |
F1IZY-007-3000-1V1 |
B)
translate |
123IT-001-1000-AV1 |
SUNIL-002-2000-AV2 |
FAIZY-007-3000-AV1 |
C) Error
D)None of these
Solution: (A)
In the above query character “A” will replace to “1”, “B” to 2 and “C” to 3.
28) Which of the following query will list all station names which contain their respective city names. For example, station “Mountain View Caltrain Station” is for city “Mountain View”.
Refer to the table below this question.
Index | Station_name | City |
1 | Mountain View Caltrain Station | Mountain View |
2 | Dlf Square Phase 2 | Dlf Square |
3 | Sikandarpur Metro Gurgaon | Gurgaon |
4 | Akola Station | Akola |
B) select * from station where city like ‘%’ || station_name || ‘%’ ;
C) Error
D) None of these
Solution: (A)
29) Consider the following legal instance of a relational schema S with attributes ABC.
Which of following functional dependencies is/are not possible?
30) Suppose you have a table called “Student” and this table has a column named “marks”. Now, you apply Query1 on “Student” table.
Query 1: SELECT * FROM Student where marks * 100 > 70;
After this, you create an index on column “marks” and then you re-run Query 2 (same as Query 1).
Query 2: SELECT * FROM Student where marks * 100 > 70;
If Query 1 is taking time T1 and Query 2 is taking time T2.
Which of the following is true for the query time?
B) T2 > T1
C) T1 ~ T2
D) None of these
Solution: (C)
To search fast you need to create the index on marks*100 but in the question we have created the index on marks.
31) Suppose you have 1000 records in a table called “Customers”. You want to select top 100 records from it. Which of the below commands can you use?
1. SELECT TOP 100 * FROM Customers;2. SELECT TOP 10 PERCENT * FROM Customers;
B) 2
C) 1 and 2
D) None of them
Solution: (C)
Both query can be used to get the desired output.
32) Which of the following is the outcome of the following query?
Query: SELECT REPLACE( 'Faizan and Ankit are close friends', 'Faizan', 'Ankit' )
B) Ankit and Ankit are close friends
C) Faizan and Faizan are close friends
D) Ankit and Faizan are close friends
Solution: (B)
“Faizan” will be replaced by “Ankit”.
33) Which one of the following queries always gives the same answer as the nested “Query” shown below.
Query: select * from R where a in (select S.a from S)
B) select distinct R.* from R,S where R.a=S.a
C) select R.* from R,(select distinct a from S) as S1 where R.a=S1.a
D) None of above
Solution: (C)
Option C is correct.
Question Context 34-35
Consider the following table “avian” (id, name, sal).
Id | Name | Sal |
1 | Ankit | 20 |
1 | Faizan | 10 |
2 | Faizan | 10 |
3 | Faizan | 20 |
1 | Sunil | 10 |
2 | Sunil | 20 |
1 | Kunal | 10 |
10 | Nikam | 30 |
34) Which of the following options will be required at the end of the following SQL query?
Query: SELECT P1.name FROM avian P1
So that the appended query finds out the name of the employee who has the maximum salary?
B) WHERE P1.sal <= All(select max(P2.sal) from avian P2)
C) WHERE P1.sal > Any (select max(P2.sal) from avian P2)
D) WHERE P1.sal >= Any (select max(P2.sal) from avian P2)
Solution: (D)
B – Returns the addresses of all theaters.
C – Returns null set. max() returns a single value and there won’t be any value > max.
D – Returns null set. Same reason as C. All and ANY works the same here as max returns a single value.
35) Which of the following options can be used to find the name of the person with second highest salary?
B) Both
C) None of these
Solution: (B)
Query in the option B
”(select max(sal) from avian)”
first return the highest salary(say H) then the query
“(select max(sal) from avian where sal < H )”
will search for highest salary which is less then H.
Question Context 36-39
Suppose you are given a database of bike sharing which has three tables: Station, Trip and Weather.
Station Table
station_id | station_name | city | zip_code |
2 | M | S1 | 95113 |
3 | N | S2 | 95112 |
4 | L | S3 | 95114 |
5 | G | S4 | 95115 |
6 | O | San Jose | 95115 |
1 | K | San Jose | 95115 |
Trip Table
Id | Duration | start_time | start_station_name | start_station_id | end_time | end_station_name | end_station_name | bike_id |
5081 | 183 | 2013-08-29 22:08:00 | M | 2 | 2013-08-29 22:12:00 | M | 2 | 309 |
5082 | 100 | 2013-08-01 22:08:00 | N | 3 | 2013-08-01 22:12:00 | L | 4 | 301 |
5083 | 283 | 2013-08-02 22:08:00 | O | 6 | 2013-08-02 22:12:00 | G | 5 | 303 |
5084 | 23 | 2013-08-09 22:08:00 | M | 2 | 2013-08-10 22:12:00 | O | 7 | 305 |
Weather Table
zip_code | max_temp | min_temp |
95113 | 74 | 61 |
95112 | 70 | 21 |
95115 | 91 | 40 |
36) Imagine, you run following query on above schema.
Query: select city , count( station_id ) as cnt from station group by city order by cnt desc , city asc;
Which of the following option is correct for this query?
B) This query will print city name and number of stations sorted by city name in increasing magnitude. For cities with same name, it will print by decreasing order of number of stations.
C) None of these
Solution: (A)
A is correct answer.
37) Which of the following query will find the percentage (round to 5 decimal places) of self-loop trips (which start and end at the same station) among all trips?
select round(self_loop_cnt.cnt * 1.0/trip_cnt.cnt,5) as percentage from (select count(*) as cnt from trip where start_station_id = end_station_id) as self_loop_cnt ,(select count(*) as cnt from trip) as trip_cnt;
B)
select round (self_loop_cnt.cnt ? 1.0 , 5 ) as percentage from ( select count(?) as cnt from trip where start_station_id = end_station_id) as self_loop_cnt;
C)
select round (trip_cnt.cnt , 5 ) as percentage from ( select count(?) as cnt from trip) as trip_cnt;
D) None of these
Solution: (A)
Query in option A will give the desired result
38 Which of the following statements is / are true for the below query?
Query: select station_name from station where zip_code = (select zip_code from weather where max_temp = (select max(max_temp) from weather))
Note: All the zip_code are present in table weather also present in station table
B) 1 and 3
C) 1
D) 1,2 and 3
Solution: (A)
39) What will be the output of the following query?
Query: select end_time , (select sum(duration) from trip as i where i.bike_id = 301 and i.end_time <= o.end_time ) as ac from trip as o where o.bike_id = 301 order by ac asc ;
A)
end_time | ac ---------------------+----- 2013-08-09 22:12:00 | 653
B.
end_time | ac ---------------------+----- 2013-08-01 22:12:00 | 100 2013-08-09 22:12:00 | 653
C) Error
D) None of these
Solution: (B)
This query will find a cumulative traveling durations of bike 301.
Question Context 40-42
Suppose you are given 4 tables: Team, Player, Game and GameStats. Below are the SQL statements which create these tables.
CREATE TABLE Team ( name varchar(50) PRIMARY KEY, city varchar(50)); CREATE TABLE Player ( playerID integer PRIMARY KEY, name varchar(50), position varchar(10), height integer, weight integer, team varchar(50) REFERENCES Team(name), CHECK (position='Guard' OR position='Center' OR position='Forward'));; CREATE TABLE Game ( gameID integer PRIMARY KEY, hometeam varchar(50) REFERENCES Team(name) NOT NULL, awayteam varchar(50) REFERENCES Team(name) NOT NULL, homescore integer, awayscore integer, CHECK (hometeam <> awayteam)); CREATE TABLE GameStats ( playerID integer REFERENCES Player(playerID) NOT NULL, gameID integer REFERENCES Game(gameID) NOT NULL, points integer, assists integer, rebounds integer, PRIMARY KEY (playerID, gameID)
40) Which of the following query will return distinct names of the players who play at “Guard” Position and their name contains “Jo”. (ORDER BY A)
B) SELECT name FROM player WHERE position=’Guard’ AND name LIKE ‘%Jo%’ ORDER BY name
C) Both of them
D) None of them
Solution: (B)
This query Finds any values that have “Jo” in any position using ‘%jo%’ expression in command. Notice that ‘Jo’ is different then ‘jo’ because expression in like operator is case sensitive.
41) What will be the output for the below query?
Query: SELECT COUNT(*) AS num_of_games FROM player p1, player p2, gamestats g1, gamestats g2 WHERE p1.name='Saurabh' AND p2.name='Faizan' AND g1.playerid=p1.playerid AND g2.playerid=p2.playerid AND g1.gameid=g2.gameid AND g1.points > g2.points
B) Return the number of games where ‘Saurabh’ scored less points than ‘Faizan’
C) Error
D) None of these
Solution: (A)
42) What is the expected output of the following query?
Query: SELECT s.playerid, AVG(s.points) AS average_points FROM (SELECT st.playerid, st.points FROM player p, game g, gamestats st WHERE st.gameid=g.gameid AND p.team=g.hometeam AND p.playerid= st.playerid) s GROUP BY s.playerid ORDER BY s.playerid
B) List all players’ playerIDs and their average points in all games that they played in (ORDER BY Players’ playerID)
C) Error
D) None of these
Solution: (A)
I hope you enjoyed the questions and were able to test your knowledge about SQL. Irrespective of what role you are in data science, you need to know SQL. If you haven’t done already, take time out to undergo the test and reflect on where you went wrong.
If you have any questions or doubts, feel free to post them below.
really helpful questions to gauge your basic understanding of sql.Nice work
thanks for your positive feedback
Hi Ankit, I enjoyed your quiz. It made me think and I had a lot of fun unearthing the answers. I have doubts about a couple of questions. 1. For Q.33, isn't option A the more correct option than Option C. 2. For Q.38, aren't both options 1 and 2 correct, given that it has been told all zip codes present in weather are also present in station. Thank you for the quiz.
Hi Nishanthh, I am glad you found them helpful. In question number 33 answer, option A could not be possible because number of rows would be greater than the number of rows asked in the query. IN (20,30,20,30) will be same as IN (20,30) and the output would be (20,30) Thanks for noticing question number 38. Regarding,
Hi Ankit, As mentioned in the solution for Question 1 "SQL is a querying language and it is not case sensitive." the correct option should be "D" not "A". Please let me know if i am missing somethig here.
Hi Sashikant, In option 1, it is written that SQL is case "insensitive" ( not case sensitive ) which is actually true. Best! Ankit Gupta