Learning SQL after Excel couldn’t be simpler!
I’ve spent more than a decade working on Excel. Yet, there is so much to learn. If you dislike coding, excel could be your rescue into data science world (to some extent). Once you understand Excel operations, learning SQL is very easy.
Now at this stage, you might ask, why can’t I use excel for all my work. There are several reasons for it:
Moving to SQL would address point 1 and point 2 to some extent. Moreover, SQL is one of the most sought out skills in a data scientist.
If you don’t know SQL yet and have worked in Excel, you can get started right now. I’ve designed this tutorial with keeping in mind, the most commonly used excel operations. Your previous experience blended with this tutorial can quickly make you a SQL expert. (Note: If you find any trouble, please write to me in comments section below.
Related : Basics of SQL and RDBMS for Beginners
Here is the list of commonly used excel operation. In this tutorial, I’ve performed all these operations in SQL:
To perform operations listed above, I’ll use the data listed below (Employee)\:
In excel, we can view all the records directly. But, SQL requires a command to process this request. This can be done by using SELECT command.
Syntax:
SELECT column_1,column_2,…column_n | * FROM table_name;
Exercise:
A. View all data of Employee table
Select * from Employee;
B. View only ECODE and Gender data of Employee table
Select ECODE, Gender from Employee
Organizing information becomes important when you have more data. It helps to generate quick inferences. You can quickly organize an excel worksheet by sorting your data in ascending or descending order.
Syntax:
SELECT column_1,column_2,…column_n | * FROM table_name order by column_1 [desc], column_2 [desc];
Exercise:
A. Arrange records of Employee table in Descending order of Total_Payout.
Select * from Employee order by Total_Payout desc;
B. Arrange records of Employee table by City (ascending) and Total_Payout(descending).
Select * from Employee order by City, Total_Payout desc;
In addition to sorting, we often apply filter to analyze data in a better way. When data is filtered, only rows that meet the filter criteria is displayed while other rows get hidden. Also, we can apply multiple criterion to filter data.
Syntax:
SELECT column_1,column_2,…column_n | * FROM table_name where column_1 operator value;
Below are the common list of operators, we can use to form a condition.
Operator | Description |
---|---|
= | Equal |
<> | Not equal. Note: In some versions of SQL this operator may be written as != |
> | Greater than |
< | Less than |
>= | Greater than or equal |
<= | Less than or equal |
BETWEEN | Between an inclusive range |
LIKE | Search for a pattern |
IN | To specify multiple possible values for a column |
Exercise:
A. Filter observations associated with city “Delhi”
Select * from Employee where City="Delhi";
B. Filter observations of Department “Admin” and Total_Payout >= 500
Select * from Employee where Department="Admin" and Total_Payout >=500;
Deleting records or columns is a commonly used operation in Excel. In excel, we simply press ‘Delete’ key on keyboard to delete a record. Similarly, SQL has command DELETE to remove records from a table.
Syntax:
DELETE FROM table_name WHERE some_column=some_value;
Exercise:
A. Delete observations which have Total_Payout >=600
Delete * from Employee where Total_Payout >=600;
It removes two record only since these two observations satisfy the condition stated above. But be careful! if we do not provide any condition, it will remove all records from a table.
B. Delete observations which have Total_Payout >=600 and Department =”Admin”
Delete * from Employee where Total_Payout >=600 and Department ="Admin";
Above command will remove only one record which satisfies the condition.
We have seen methods to remove records, we can also add records to SQL table as we do in excel. INSERT command helps to perform this operation.
Syntax:
INSERT INTO table_name VALUES (value1, value2, value3,…); -> Insert values to all columns
OR,
INSERT INTO table_name (column1,column2,column3,…) VALUES (value1,value2,value3,…); -> Insert values to selected columns
Exercise:
A. Add below records to the table ‘Employee’
Insert into employee values('A002','05-Nov-12',0.8,'Female','Admin',12.05,26,313.3,'Mumbai');
Select * from Employee where ECODE='A002';
B. Insert values to ECODE (A016) and Department (HR) only.
Insert into employee (ECODE, Department) values('A016','HR');
Select * from Employee where Department='HR';
Suppose, we want to update the name of “HR” department to “Manpower” for all employees. For such cases, SQL has a command UPDATE which performs this function.
Syntax:
UPDATE table_name SET column1=value1,column2=value2,… WHERE some_column=some_value;
Exercise: Rename Department “HR” to “Manpower”
Update Employee SET Department='Manpower' where Department='HR';
We can show unique values of variable(s) by applying DISTINCT keyword prior to variable name.
Syntax:
SELECT DISTINCT column_name,column_name FROM table_name;
Exercise: Show unique values of City
Select distinct City from Employee;
In excel, we can create a column, based on existing column using functions or operators. This can be done in SQL using the commands below.
Exercise:
A. Create a new column Incentive which is 10% of Total_Payout
Select *, Total_Payout*01 as Incentive from Employee;
B. Create a new column City_Code which has first three characters of City.
Select *, Left(City,3) as City_Code from Employee where Department="Admin";
For more details on SQL functions, I would recommend you to refer this link.
The most used function of excel by any BI professional / data analyst is VLOOKUP(). It helps to map data from other table to parent table. In other words, we can say that it is the ‘excel’ way of joining 2 data sets through a common key.
In SQL, we have similar functionality known as JOIN.
SQL JOIN is used to combine rows from two or more tables, based on a common field between them. It has multiple types:
Syntax:
SELECT table1.column1, table2.column2..... FROM table1 INNER | LEFT| RIGHT| FULL JOIN table2 ON table1.column = table2.column;
Exercise: Below is city category table “City_Cat”, now I want to map city category to Employee table and show all records of Employee table.Here, I want to show all records of table Employee. So, we will use left join.
SELECT Employee.*,City_Cat.City_Category FROM Employee LEFT JOIN City_Cat ON Employee.City = City_Cat.City;
To know more about JOIN operations, I would recommend you to refer this link.
Pivot Table is an advanced way of analyzing data in excel. Not only it is useful, but it allows you to extract the hidden insights from data.
Moreover, it helps us to generate inference by summarizing data and allow us to manipulate it in different ways. This operation can be done in SQL by using aggregate functions and GROUP BY command.
Syntax:
SELECT column, aggregate_function(column) FROM table WHERE column operator value GROUP BY column;
Exercise:
A. Show sum of Total_Payout by Gender
SELECT Gender, Sum(Total_Payout) from Employee Group by Gender;
B. Show sum of Total_Payout and Count of Records by Gender and City
SELECT Gender, City, Count(City), Sum(Total_Payout) from Employee Group by Gender, City;
Did you find the article useful? Do let us know your thoughts about this transition guide in the comments section below
Just to make sure... Does this mean you can use SQL in Excel instead of filters? or is this an article on how to use SQL by exporting it to a RDBMS?
I am completely new for this (SQL). I want to learn SQL because it will help me in learning of Analytics (R). This beginner information is invaluable for me. Thanks for sharing ! Trust you will continue your Zeal in New Year 2016. I wish you a very Happy New Year 2016 in advance ! Regards, A. Kumar
Really useful blog for beginner like I am.. very basic question .. pls dont laugh :) but on which platform should we write these command .. in excel only ? or any SQL has separate application where we connect excel as a database.