As an SQL Developer, you regularly work with enormous amounts of data stored in different tables that are present inside databases. It often becomes difficult to extract the information if the data is unorganized. We can solve this problem using Normalization by structuring the database in different forms or stages. This article will help you understand the concept of normalization in DBMS with step-by-step instructions and examples of tables.
We’ll discuss the functional dependencies that may exist in a table and anomalies that occur due to these functional dependencies. We will see the conversion of tables into normal forms to eliminate those anomalies. In this article you will get to understand about the normalization in database with example tables and get proper guide on database normalization with examples
This article was published as a part of the Data Science Blogathon.
Normalization is a technique for organizing the data into multiple related tables to minimize Data Redundancy and Data Inconsistency. It aims to eliminate anomalies in data.
Data inconsistency results from anything that affects data integrity. This can cause the data to be correct in one place and wrong elsewhere it is stored. This can lead to unreliable and meaningless information. It occurs between tables when similar data is stored in different formats in two different tables.
For example, consider the following tables:
LibraryVisitors (StudentID, Student_Name, Student_Address, InTime, OutTime);
Students (StudentID, Student_Name, Student_Address, Department, RollNo, CourseRegistered);
In the above tables, Student_Address is stored in both tables. For each student_id, the address must be the same in those two tables. Both these relations must be considered to retrieve or update the correct address. The issues mentioned arise due to poorly designed/structured databases.
We can eliminate data inconsistency in databases by using constraints on the relations.
Data Redundancy is the condition where the same data is stored at different locations leading to the wastage of storage space.
Examples:
Student Id | Student Name | Course ID | Course Name |
111 | John | C08 | English |
112 | Alice | C08 | English |
111 | John | C02 | French |
In the above table, we have stored student name John twice as he registered for two different courses and course name English twice as two students registered for it. This is called data redundancy. Data redundancy causes many problems in databases.
We can eliminate data redundancy in the databases by the normalization of relations.
Also Read: Different Types of Normalization Techniques
Before diving into normalization, we need to know clearly about functional dependencies.
An attribute is dependent on another attribute if another attribute uniquely identifies it.
It is denoted by A –> B, meaning A determines B, and B depends upon A.
Example: We can find the Student’s name using the Student_ID.
An anomaly is an unexpected side effect of trying to insert, update, or delete a row. Essentially more data must be provided to accomplish an operation than expected.
Consider the following relation:
Retail_Outlet_ID | Outlet_Location | Item_Code | Description | Qty_Available | Retail_Unit_Price |
R1001 | King Street, Hyderabad, 540001 | I1001 | Britannia Marie Gold | 25 | 1600 |
R1002 | Rajaji Nagar, Bangalore, 600341 | I1106 | Cookies | 58 | 1289 |
R1003 | MVP Colony, Visakhapatnam, 500021 | I1200 | Best Rice | 22 | 2000 |
R1001 | King street, Hyderabad | I1309 | Dal | 20 | 1500 |
Here are some of the most common anomalies that happen in database management.
These occur when we cannot insert a new tuple into the table due to a lack of data.
What happens if we try to insert(add) the details of a new retail outlet with no items in its stock?
They happen when the deletion of some data deletes the other required data also (Unintended data loss)
What happens if we try to delete the item of item code I1106?
These happen when an update of a single record requires an update in multiple records.
How many rows will be updated if the retail outlet location of R1002 is changed from King Street to Victoria Street?
This happens when new items are supplied to a retail outlet.
What details do we need to insert?
We have seen insert, delete, update anomalies, and data redundancy in the above-given example. Functional dependencies may lead to anomalies. To minimize anomalies, there is a need to refine functional dependencies using normalization.
Also Read: A beginner’s Guide to Database: Part 1
Database normalization is the process of organizing a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by Edgar F. Codd.
“Normal Forms” (NF) are the different stages of Normalization in DBMS:
4NF to 6NF applies to multivalued dependencies and complex table scenarios. In this article, we discuss up to BCNF.
Also Read: Understanding the need for DBMS
A relation R is said to be in 1 NF (First Normal) if and only if:
In the above-taken example of the Retail_Outlets table, we have stored multiple values in an address field, such as street name, city name, and pin code.
What if we want to know about all retail outlets in a given city? We may need to perform some string operations on the address field, which is not preferable. So we need to store all these atomic values in separate fields.
A multi-valued attribute is an attribute that can have multiple values like Contact numbers. They should also be separated like ContactNo1, ContanctNo2,.. to achieve 1st Normal form.
Advantage: 1 NF allows users to use the database queries effectively as it removes ambiguity by removing the non-atomic and multi-valued attributes, which creates major issues in the future while updating and extracting the data from the database.
Limitation: Data redundancy still exists even after 1st Normal form, so we need further normalization in DBMS.
A relation R is said to be in 2 NF (Second Normal) form if and only if:
Suppose we have a composite primary or candidate key in our table. Partial dependency occurs when a part of the primary key (Key attribute) determines the non-key attribute.
In the Retail Outlets table, the Item_Code and Retail_Outlet_ID are key attributes. The item description is partially dependent on Item_Code only. Outlet_Location depends on Retail_Outlet_ID. These are partial dependencies.
To achieve normalization in DBMS, we need to eliminate these dependencies by decomposing the relations.
From the above decomposition, we eliminated the partial dependency.
Advantage: 2 NF attempts to reduce the amount of redundant data in a table by extracting it, placing it in a new table(s), and creating relationships between those tables.
Limitation: There are still some anomalies, as there might be some indirect dependencies between Non-Key attributes, leading to redundant data.
A relation R is said to be in 3 NF (Third Normal Form) if and only if:
A transitive dependency exists when another non-key attribute determines a non-key attribute. In other words, If A determines B and B determines C, then automatically, A determines C.
Some other examples:
Advantage: 3 NF ensures data integrity. It also reduces the amount of data duplication.
It is an upgraded version of the 3rd Normal form. It is also called as 3.5 Normal Form.
A relation R is said to be in 3 NF (Third Normal Form) if and only if:
In simple words, if A –> B, then A cannot be a non-prime Attribute if B is a prime attribute which means that A non-prime attribute cannot determine a prime attribute.
You must be wondering how’s this possible. but Yes, there can be some cases in which the Non-Prime attribute will determine the prime attributes even if the relationship was in the 3rd Normal form. BCNF does not allow this kind of dependency.
Let us understand this better with an example. Look at the below Relation of Student Enrollments table.
Student_ID | Course_Name | Professor |
101 | JAVA | Prof. Java |
102 | C++ | Prof. CPP |
101 | Python | Prof. Python |
103 | JAVA | Prof. Java_2 |
104 | Python | Prof. Python_2 |
In the above relation:
So the (Student_ID & Course_Name) will form the primary key. These 2 will compositely determine all other attributes in the relation. In our case, it is only the professor.
If we observe here, the “Professor” attribute, a non-prime attribute, can determine the Course_Name as each professor teaches only one course. But Course_Name is a prime attribute, and Professor is not a Super Key. That means a non-prime attribute determines the prime attribute.
This is not allowed in BCNF. So, how do we decompose this relation?
Until here, we have seen normal forms up to BCNF. Here are some guidelines to follow while normalizing the database.
The below picture summarizes how to reach the third normal form from an unnormalized form:
Any relational database without normalization may lead to problems like large tables, difficulty maintaining the database as it involves searching many records, poor disk space utilization, and inconsistencies. If we fail to eliminate this kind of problem, it would lead to data integrity and redundancy problems. Normalization of a relational database helps to solve these problems. Normalization applies to a series of transformations in terms of normal forms. Any relation in a database must be normalized to get efficient access to the database. Each Normal form eliminates each type of dependency and improves the data integrity.
Normalization helps a lot with organizing data. Here are some of its advantages:
This article was aimed at making you understand the normalization process and how to apply it when you design a database system. There is another multi-valued dependency that 4NF and 5NF can eliminate. Try to explore those also.
I hope this article helped you to understand the concept of normalization better and also we have talk about the normalization in database with examples. If you have any questions, please let me know in the comments. I wish you great learning ahead.
A. Database normalization is the process of organizing data in a database efficiently. It involves reducing redundancy and dependency by dividing large tables into smaller tables and defining relationships between them.
A. The four types of database normalization are:
a) First Normal Form (1NF)
b) Second Normal Form (2NF)
c) Third Normal Form (3NF)
d) Boyce-Codd Normal Form (BCNF)
A. The 5 rules of database normalization, often referred to as normal forms, are:
a) First Normal Form (1NF): Each column in a table must contain atomic values, and there should be no repeating groups or arrays.
b) Second Normal Form (2NF): The table must be in 1NF, and all non-key attributes must be fully dependent on the primary key.
c) Third Normal Form (3NF): The table must be in 2NF, and there should be no transitive dependencies, meaning no non-key attribute should depend on another non-key attribute.
d) Boyce-Codd Normal Form (BCNF): A stronger version of 3NF where every determinant is a candidate key.
A. 1NF, 2NF, and 3NF refer to different stages of normalization:
a) 1NF (First Normal Form): Ensures that the data is organized without repeating groups and each attribute contains atomic values.
b) 2NF (Second Normal Form): Builds upon 1NF and ensures that non-key attributes are fully dependent on the entire primary key.
c) 3NF (Third Normal Form): Builds upon 2NF and eliminates transitive dependencies by ensuring that non-key attributes are not dependent on other non-key attributes.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Thanks brother this was helpful