SQL is a database programming language created for managing and retrieving data from Relational databases like MySQL, Oracle, and SQL Server. SQL(Structured Query Language) is the common language for all databases. In other terms, SQL is a language that communicates with databases. It is a query language used to store and retrieve data from databases. The knowledge of SQL is necessary for students and working professionals who want to become software engineers, especially if they work in the software development world.
Learning Objectives
This article was published as a part of the Data Science Blogathon.
A database is a grouping of structured or unstructured data that has been organized. The database’s primary goal is to store a vast amount of data. Data is information that has been converted into a format useful for processing or moving around. Everything we express or communicate is data. Data in the memory can be text, numbers, or bytes. Every form of communication involves the transfer of data, which is the most important part of the computer world. Tables are used to store data in databases, depending on the type of database.
Data Base Management System (DBMS) is software for managing databases. Database management solutions allow users to share data quickly, securely, and efficiently. It provides a better platform for implementing security and privacy laws.
Using a database, it is easy to manage data efficiently. Any number of records can be stored. As the data is well structured, it is easy to locate data quickly. It is simple to import data into other programs. Data in a database is safer than in physical paper files.
1. Relational Database: A Relational database contains data and gives users access, and these points are connected to one another. Each row in a table is a record in a Relational database, and the key identifies each record individually. The attributes of the data are stored in the table’s columns, and since each record has a value for each attribute, it is simple to determine the relations between points.
2. Object-Oriented Database: An object-oriented database is a system used in object-oriented programming to represent information or data as objects. It combines object-oriented principles with ideas from Relational databases. Data encapsulation, inheritance, and polymorphism are the three pillars of OOPs. Maintenance is easy, and there is less code to write.
3. Hierarchical Database: A hierarchical database is a data model in which information is kept in records. These are arranged into a parent-child architecture that resembles a tree, with one parent node having several child nodes connected by links.
4. Network Database: A Network database is another type in which several records or files may be connected to owner files and vice versa. Using a Network database, entities can have flexible relationships. This model is an improvement to the hierarchical database model. Computer network systems are built using these models.
The abbreviation “ACID” stands for “atomicity, consistency, isolation, and durability” in the context of transaction processing. These four properties are required for any SQL transaction. Let’s see them one by one.
Atomicity: Any operation performed on the data should be executed either entirely or not executed at all. Partial execution is not possible in SQL transactions.
Let’s see it in detail with an example.
Here there are two accounts, A and B. Before the transaction, there was a balance of 3000 in A and 1000 in B. The operation here is 1000 has to be transferred from A to B. Read operation happens before the transaction, and the write operation happens after the transaction. In total, there are two operations here. Money has to be debited from A, and money has to be credited to B. If only one operation happens and the transaction has stopped, then money will be debited from A, but it won’t be credited to B, which results in a loss. To avoid these kinds of situations, SQL follows the atomicity property. If the second operation doesn’t place, then the first operation will not be considered, and money won’t be debited. In this way, transactions take place safely.
Consistency: Before and after the transaction, the data must be consistent. The most important thing when it comes to data is to maintain integrity.
In the previous example, Before the transaction, there was a balance of 3000 in A and 1000 in B. After the transaction, there are 2000 in A and 2000 in B.
Total amount before transaction = 3000+1000 = 4000
Total amount after transaction = 2000+2000 = 4000
The total amount is the same before and after the transaction. Data is consistent here.
Isolation: The word isolation tells that transactions are isolated from other transactions. That means if multiple transactions take place at the same time, one must not affect another.
For example, take two transactions. The first transaction includes transferring 500 to B from A. Second transaction includes transferring 1000 to C from A. So two transactions have to take place from A. These two transactions will occur independently. They won’t affect each other at any cost. This property is called Isolation, where transactions are isolated from each other.
Durability: Durability ensures that data stays in the database permanently when a transaction is successfully completed. Data should be so perfectly durable that it can still function even in the case of a system failure or crash. Changes to data are maintained after a transaction is successful and cannot be undone.
In an example of a money transaction, once the money is debited from one account and credited to another, this transaction cannot be undone and maintains durability.
To preserve the consistency and accessibility of the data in the database, the DBMS’s ACID property is vital.
A clustered index is a type of index that arranges the table’s data rows according to their key values, in contrast to a non-clustered index, which keeps the data in one place and the indices in a different place. There is only one clustered index per table in the database. The order in which data is kept in the table, which can only be sorted in one direction, is determined by a clustered index. As a result, each table can only have a single clustered index. A clustered index can be created in an RDBMS using the primary key.
In a non-clustered index, the data is kept in one place and the indexes in another. Pointers connect data and index. The index contains pointers. These pointers store the address of the data where it is stored. Because indices are stored at different locations, a table can have multiple non-clustered indices. The non-clustered index is much less than the clustered index in size.
Extract, Transform, and Load is referred to as ETL. ETL is a process used to extract data from different available sources and transform the data depending on our needs. The data transformation could involve several processes like filtering, sorting, joining, aggregating, cleaning, analyzing data, etc., and finally, loading this data to the final destination location. ETL processes can be used to illustrate any system in which data is extracted from one or more source systems and copied into a destination system where the data is represented differently than at the source.
A stored procedure in SQL is a group of statements that perform some actions. So whenever these actions are to be performed, Instead of writing all the statements repeatedly, we run this SP. Like functions in programming, an SP in SQL. Both of them operate identically. Whenever it is needed, a function is called.
Similarly to that, the SP is executed when needed. By simply calling SP, users and applications can quickly and easily use and reuse them. With the ALTER TABLE command, you can alter the statements in the SP whenever you need to.
Let’s see how to create an SP.
For this, we have to use CREATE PROCEDURE command followed by some SQL statements to implement a particular action.
CREATE PROCEDURE name
AS
variables;
BEGIN;
//statements;
END;
For example, if we have to create an SP that will fetch the student_id and first_name of students from the college table, the code will be,
CREATE PROCEDURE get_names AS
BEGIN
SELECT student_id, first_name
from college
END
To execute it, use the command EXEC procedure_name
EXEC get_names
These are of two types.
1. User-defined procedures: If the user creates it, it is called a User-defined procedure. This can be created in a user-defined or system database but not a resource database.
2. System Procedures: System procedures are included with SQL Server. They are logically present in each system’s sys schema and physically stored in its hidden internal Resource database. The sp_ prefix is used to identify system-stored procedures. So while naming user-defined procedures, we must avoid the prefix sp_.
The most popular language for storing and manipulating data is SQL, which is used in the all the fields like banking, education, security, etc. If you are beginning to learn a new programming language, it is highly recommended to learn SQL because it is quite simple to understand.
The key takeaways of this article are as follows:
I hope you found this article useful. Connect with me on LinkedIn.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.