This article was published as a part of the Data Science Blogathon.
The SQL (Structured Query Language) programming language is used to store, manipulate, and query information that is stored in relational databases. SQL first appeared in 1974 when a group at IBM created a prototype for a relational database. Relational Software (later known as Oracle) released the first commercial relational database.
There are standards for SQL. However, there are different flavors of SQL that can be used on each major RDBMS today. In part, this is due to two factors:
1) the SQL command standard is fairly complex, and it is not practical to implement the entire standard.
2) each database vendor needs a way to differentiate its product.
With the advent of Big Data, SQL has become even more versatile. At first, Big Data technology was geared toward programmers. Eventually, however, Big Data vendors realized they needed to provide a SQL-like layer to enable analysts to query the data set, and they started developing their own versions of SQL.
One of the biggest differences between these types of SQL and SQL that runs on relational databases is the latter does not have the ability to manage data at the row level. It is not possible, for instance, to update an individual row.
This database language is mainly designed for maintaining the data in the relational database management system.
SQL is the standard language for accessing and manipulating databases.
Without training, people view SQL and other coding languages as too difficult to learn. This is one of the reasons employers value SQL-trained workers.
SQL plays a pivotal role in most database management, thus even if an employer isn’t familiar with it, workers can easily explain why it’s important to know it.
In so doing, workers are often able to increase their desirability to employers by developing SQL skills. Here are some of the most common positions that require SQL expertise:
In Data Science – common tasks that use SQL will be.
There are lot more things you can do with SQL:
An effective way to think about what we can be using SQL is with the acronym C.R.U.D.
While this might seem like a slightly informal acronym it is actually a really effective way to describe the core functions or operations that can be performed on a relational database…
Let us take a Look!
A relational database consists of logical units called tables, where tables are related to one another within a database. With relational databases, data can be divided into logical, smaller, and more manageable units for easier maintenance and better performance.
In a relational database system, tables are linked through common keys or fields, so if the desired data is in more than one table, you can easily join them together to get a combined data set using a single query.
Normalization is the process of minimizing redundancy from a relation or set of relations. Redundancy in relation may cause insertion, deletion, and update anomalies. So, it helps to minimize the redundancy in relations. Normal forms are used to eliminate or reduce redundancy in database tables.
The main reason for normalizing the relations is to remove these anomalies. Failure to eliminate anomalies leads to data redundancy and can cause data integrity and other problems as the database grows. Normalization consists of a series of guidelines that helps to guide you in creating a good database structure.
Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into a relationship due to a lack of data.
Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data results in the unintended loss of some other important data.
Updatation Anomaly: The update anomaly is when an update of a single data value requires multiple rows of data to be updated.
Partial Dependency: If the proper subset of candidate keys determines a non-prime attribute, it is called partial dependency.
X is a super key.
Y is a prime attribute (each element of Y is part of some candidate key).
We can specify our keys after the table has been created if we do not specify them at table creation time. SQL tables support the following key types:
For communicating with the database, SQL commands are used. By using basic SQL commands, you can do everything from creating a table and adding data to modifying a table and setting user permissions.
Database management systems such as Oracle Database, MySQL, Microsoft SQL Server, PostgreSQL, and MariaDB are some of the most popular options.
There are five types of SQL commands: DDL, DML, DCL, TCL, and DQL.
The Data Definition Language is a set of SQL commands that are used to create a database schema. DDL is a set of SQL commands that create, modify, and delete database structures without affecting the data. It deals with descriptions of the database schema and is used to create and modify database objects in the database. In most cases, a general user would not use these commands, rather they would use an application to access the database.
Here are some commands that come under DDL:
CREATE
ALTER
DROP
TRUNCATE
Syntax:
CREATE TABLE TABLE_NAME (COLUMN_NAME DATATYPES[,....]);
Example:
CREATE TABLE STUDENT(Name VARCHAR2(20), Email VARCHAR2(100), DOB DATE);Â Â
Syntax:
To add a new column in the table
ALTER TABLE table_name ADD column_name COLUMN-definition;
To modify the existing column in the table:
ALTER TABLE table_name MODIFY(column_definitions....);
Example :
ALTER TABLE STUDENT_DETAILS ADD(ADDRESS VARCHAR2(20)); ALTER TABLE STUDENT _DETAILS MODIFY (NAME VARCHAR2(20));Â Â
Syntax
DROP TABLE table_name;Â Â
Example
DROP TABLE STUDENT;Â Â
Syntax:
TRUNCATE TABLE table_name;
Example:
TRUNCATE TABLE STUDENT;Â Â
The SQL commands which manipulate the data in the database belong to DML, or Data Manipulation Language, which includes the majority of SQL statements. It is the part of a SQL statement that controls access to data and to the database. They are usually grouped together with DML statements.
Here are some commands that come under DML:
INSERT
DELETE
UPDATE
Syntax:
INSERT INTO TABLE_NAME (col1, col2, col3,.... col N) VALUES (value1, value2, value3, .... valueN);
Or
INSERT INTO TABLE_NAME VALUES (value1, value2, value3, .... valueN);
Example:
INSERT INTO SQLJOINT (Author, Subject) VALUES ("Roger", "DBMS");Â Â
Syntax:
DELETE FROM table_name [WHERE condition];
Example:
DELETE FROM SQLJOINT WHERE Author=" Roger ";Â
Syntax:
UPDATE table_name SET [column_name1= value1,...column_nameN = valueN] [WHERE CONDITION]
Example:
UPDATE employee SET User_Name = ' Roger ' WHERE Emp_Id = '5'Â Â
Data control language, DCL, is a way to manage data that is stored on databases. It includes commands such as GRANT and REVOKE which mainly deal with the rights, permissions, and other controls of the data.
DCL can be used to grant or revoke access to tables in a database from any database user. The grant command gives users or groups access to specific tables while the revoke command takes away their permissions.
Here are some commands that come under DCL:
Grant
Revoke
Example
GRANT SELECT, UPDATE ON MY_TABLE TO SOME_USER, ANOTHER_USER;Â Â
Example
REVOKE SELECT, UPDATE ON MY_TABLE FROM USER1, USER2;Â
TCL commands can only use with DML commands like INSERT, DELETE and UPDATE only.
These operations are automatically committed in the database that’s why they cannot be used while creating tables or dropping them.
Commands that come under TCL:
Syntax:
COMMIT;
Example:
DELETE FROM USERS WHERE AGE = 22; COMMIT;Â Â
Syntax:
ROLLBACK;
Example:
DELETE FROM USERS WHERE AGE = 22; ROLLBACK;Â Â
Syntax:
SAVEPOINT SAVEPOINT_NAME;Â
DQL is used to fetch the data from the database.
It uses only one command:
SELECT
Syntax:
SELECT expressions FROM TABLES WHERE conditions;
Example:
SELECT emp_name FROM employee WHERE age > 20;
Syntax :
SELECT column
FROM table_name
WHERE conditions
GROUP BY column
ORDER BY columnÂ
Syntax:
SELECT column1, column2 FROM table_name WHERE conditions GROUP BY column1, column2 HAVING conditions ORDER BY column1, column2;Â Â
Syntax:
SELECT column1, column2 FROM table_name WHERE condition ORDER BY column1, column2... ASC|DESC;
Where:
ASC: It is used to sort the result set in ascending order by expression.
DESC: It sorts the result set in descending order by expression.
Syntax :
AVG() or AVG( [ALL|DISTINCT] expression )Â Â
Syntax :
MIN() or MIN( [ALL|DISTINCT] expression )Â Â
Syntax :
MAX() or MAX( [ALL|DISTINCT] expression )
Syntax :
SUM() or SUM( [ALL|DISTINCT] expression )Â Â
Syntax :
COUNT(*) or COUNT( [ALL|DISTINCT] expression )Â Â
Window functions apply to aggregate and ranking functions over a particular window (set of rows). OVER clause is used with window functions to define that window. OVER clause does two things :
Note :
If partitions are not done, then ORDER BY orders all rows of the table.
 Syntax :
SELECT coulmn_name1,
window_function(cloumn_name2),
OVER([PARTITION BY column_name1] [ORDER BY column_name3]) AS new_column
FROM table_name;
window_function= any aggregate or ranking function
column_name1= column to be selected
coulmn_name2= column on which window function is to be applied
column_name3= column on whose basis partition of rows is to be done
new_column= Name of new column
table_name= Name of table
Example :
Find the average salary of employees for each department and order employees within a department by age.
SELECT Name, Age, Department, Salary,
AVERAGE(Salary) OVER( PARTITION BY Department ORDER BY Age) AS Avg_Salary
FROM employee
RANK():Â As the name suggests, the rank function assigns a rank to all the rows within every partition. Rank is assigned such that rank 1 is given to the first row and rows having the same value are assigned the same rank. For the next rank after two same rank values, one rank value will be skipped.
DENSE_RANK():Â It assigns a rank to each row within a partition. Just like the rank function first row is assigned rank 1 and rows having the same value have the same rank. The difference between RANK() and DENSE_RANK() is that in DENSE_RANK(), for the next rank after two same ranks, the consecutive integer is used, and no rank is skipped.
ROW_NUMBER():Â It assigns consecutive integers to all the rows within a partition. Within a partition, no two rows can have the same row number.
Note :
ORDER BY() should be specified compulsorily while using rank window functions.
Values Window Functions :Â
A SQL Join is used to fetch or combine data (rows or columns) from two or more tables based on the defined conditions
Syntax :
SELECT table1.column1,table1.column2,table2.column1,.... FROM table1 INNER JOIN table2 ON table1.matching_column = table2.matching_column;
table1: First table.
table2: Second table
matching_column: Column common to both the tables.
Note: We can also write JOIN instead of INNER JOIN. JOIN is the same as INNER JOIN.
Syntax :
SELECT table1.column1,table1.column2,table2.column1,.... FROM table1 LEFT JOIN table2 ON table1.matching_column = table2.matching_column;
table1: First table.
table2: Second table
matching_column: Column common to both the tables.
Note: We can also use LEFT OUTER JOIN instead of LEFT JOIN, both are the same.
Syntax :
SELECT table1.column1,table1.column2,table2.column1,.... FROM table1 RIGHT JOIN table2 ON table1.matching_column = table2.matching_column;
table1: First table.
table2: Second table
matching_column: Column common to both the tables.
Note:
We can also use RIGHT OUTER JOIN instead of RIGHT JOIN, both are the same
Syntax :
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
FULL JOIN table2
ON table1.matching_column = table2.matching_column;
table1: First table.
table2: Second table
matching_column: Column common to both the tables.
CROSS JOIN: It is also known as CARTESIAN JOIN, which returns the Cartesian product of two or more joined tables. The CROSS JOIN produces a table that merges each row from the first table with each second table row. It is not required to include any condition in CROSS JOIN.
Syntax:
Select * from table_1 cross join table_2; Or, Select column1, column2, column3 FROM table_1, table_2;Â Â
Syntax:
Select column1, column2, column(s) FROM table_1 Tbl1, table_2 Tbl2 WHERE condition;
The article discusses the importance of SQL in data analysis. A brief explanation of SQL and how it facilitates the analysis of data in business processes is provided.
SQL is used in every data-driven sector, emphasizing the importance of big data processing.
Data scientists, Data analysts, and developers must access data from databases, so SQL has become a regular part of any data-driven enterprise.
Databases are managed using SQL. Data analysts use SQL to query tables of data and gain insights. Similarly, data scientists use SQL to load data into models. Using SQL, data engineers and database administrators can ensure that everybody in their organization has easy and intuitive access to their data.
Almost every website has a SQL backend, which stores everything from user details to engagement across a particular app.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.