Normalization in DBMS
Normalization in DBMS
Normalization in DBMS is a technique using which you can organize the data in the database
tables so that:
It is a multi-step process that puts data into tabular form, removes duplicate data, and set
up the relationship between tables.
Insertion, Updation, and Deletion Anomalies are very frequent if the database is not
normalized.
In the table above, we have data for four Computer Sci. students.
As we can see, data for the fields branch, hod(Head of Department), and office_tel are
repeated for the students who are in the same branch in the college, this is Data
Redundancy.
Suppose for a new admission, until and unless a student opts for a branch, data of
the student cannot be inserted, or else we will have to set the branch information
as NULL.
Also, if we have to insert data for 100 students of the same branch, then the branch
information will be repeated for all those 100 students.
These scenarios are nothing but Insertion anomalies.
If you have to repeat the same data in every row of data, it's better to keep the
data separately and reference that data in each row.
So in the above table, we can keep the branch information separately, and just use
the branch_id in the student table, where branch_id can be used to get the branch
information.
What if Mr. X leaves the college? or Mr. X is no longer the HOD of the computer
science department? In that case, all the student records will have to be updated,
and if by mistake we miss any record, it will lead to data inconsistency.
This is an Updation anomaly because you need to update all the records in your table
just because one piece of information got changed.
In our Student table, two different pieces of information are kept together,
the Student information and the Branch information.
So if only a single student is enrolled in a branch, and that student leaves the college,
or for some reason, the entry for the student is deleted, we will lose the branch
information too.
So never in DBMS, we should keep two different entities together, which in the above
example is Student and branch,
The solution for all the three anomalies described above is to keep the student information
and the branch information in two different tables. And use the branch_id in the student
table to reference the branch.
As you can see in the table above, the student_id column is a primary key because using
the student_id value we can uniquely identify each row of data, hence the remaining
columns then become the non-key attributes.
Let's cover all the Database Normal forms one by one with some basic examples to help you
understand the DBMS normal forms.
For a table to be in the First Normal Form, it should follow the following 4 rules:
Watch this YouTube video to understand First Normal Form (if you like videos) - DBMS First
Normal Form 1NF with Example
Hence the above table fails to pass the First Normal form.
So how do you fix the above table? There are two ways to do this:
1. Remove the emp_skills column from the Employee table and keep it in some other
table.
2. Or add multiple rows for the employee and each row is linked with one skill.
emp_id emp_skill
1 Python
1 JavaScript
2 HTML
2 CSS
2 JavaScript
emp_id emp_skill
3 Java
3 Linux
3 C++
You can also simply add multiple rows to add multiple skills. This will lead to repetition of the
data, but that can be handled as you further Normalize your data using the Second Normal
form and the Third Normal form.
If you want to learn about the First Normal Form in detail, check out DBMS First Normal
Form tutorial.
Watch this YouTube video to understand Second Normal Form (if you like videos) - DBMS
Second Normal Form 2NF with Example
Let's take an example to understand Partial dependency and the Second Normal Form.
When a table has a primary key that is made up of two or more columns, then all the
columns(not included in the primary key) in that table should depend on the entire primary
key and not on a part of it. If any column(which is not in the primary key) depends on a part
of the primary key then we say we have Partial dependency in the table.
If we have two tables Students and Subjects, to store student information and information
related to subjects.
Student table:
1 Akon CSE
2 Bkon Mechanical
Subject Table:
subject_id subject_name
1 C Language
2 DSA
3 Operating System
And we have another table Score to store the marks scored by students in any subject like
this,
1 1 70 Miss. C
student_id subject_id marks teacher_name
1 2 82 Mr. D
2 1 65 Mr. Op
Now in the above table, the primary key is student_id + subject_id, because both these
information are required to select any row of data.
But in the Score table, we have a column teacher_name, which depends on the subject
information or just the subject_id, so we should not keep that information in
the Score table.
The column teacher_name should be in the Subjects table. And then the entire system will
be Normalized as per the Second Normal Form.
1 C Language Miss. C
2 DSA Mr. D
1 1 70
1 2 82
2 1 65
To understand what is Partial Dependency and how you can normalize a table to 2nd normal
form, jump to the DBMS Second Normal Form tutorial.
1. It satisfies the First Normal Form and the Second Normal form.
2. And, it doesn't have Transitive Dependency.
Watch this YouTube video to understand the Third Normal Form (if you like videos) - DBMS
Third Normal Form 3NF with Example
In a table we have some column that acts as the primary key and other columns depends on
this column. But what if a column that is not the primary key depends on another column
that is also not a primary key or part of it? Then we have Transitive dependency in our table.
Let's take an example. We had the Score table in the Second Normal Form above. If we
have to store some extra information in it, like,
1. exam_type
2. total_marks
To store the type of exam and the total marks in the exam so that we can later calculate the
percentage of marks scored by each student.
1 1 70 Theory 100
1 2 82 Theory 100
2 1 42 Practical 50
You can create a separate table for ExamType and use it in the Score table.
1 Practical 50 45
We have created a new table ExamType and we have added more related information in it
like duration(duration of exam in mins.), and now we can use the exam_type_id in
the Score table.
Here is the DBMS Third Normal Form tutorial. But we suggest you first study the second
normal form and then head over to the third normal form.
Boyce and Codd Normal Form is a higher version of the Third Normal Form.
This form deals with a certain type of anomaly that is not handled by 3NF.
A 3NF table that does not have multiple overlapping candidate keys is said to be
in BCNF.
For a table to be in BCNF, the following conditions must be satisfied:
o R must be in the 3rd Normal Form
o and, for each functional dependency ( X → Y ), X should be a Super Key.
You can also watch our YouTube video to learn about BCNF - DBMS BCNF with Example
To learn about BCNF in detail with a very easy-to-understand example, head to the Boye-
Codd Normal Form tutorial.
You can also watch our YouTube video to learn about Fourth Normal Form - DBMS Fourth
Normal Form 4NF with Example
Here is the Fourth Normal Form tutorial. But we suggest you understand other normal
forms before you head over to the fourth normal form.
The fifth normal form is also called the PJNF - Project-Join Normal Form
It is the most advanced level of Database Normalization.
Using Fifth Normal Form you can fix Join dependency and reduce data redundancy.
It also helps in fixing Update anomalies in DBMS design.
We have an amazing video to showcase the Fifth Normal Form with the help of Examples
and to explain when it occurs and how you can fix it, check out the video on YouTube - Fifth
Normal Form in DBMS
FAQs
Here are some frequently asked questions related to the normalization in DBMS.
Database Normalization helps you design and structure your table properly so that you have
proper relationships between tables. It helps you with the following:
1. Data Integrity
2. Data consistency
3. Better relationship between tables
4. More scalable design for tables.
5. No large tables, small tables with a proper relationship.
6. Removing dependencies, like Partial Dependency, Transitive Dependency, Join
Dependency, etc.
A Primary key is a column that can be used to uniquely identify each row in a table. It can
be a single column, or it can be multiple columns together. Yes, a primary key can have two
columns or even more than two columns in it.
All the columns that are not a primary key or not a part of the primary key are called as non-
Key columns in a Table.
No. BCNF is a higher version of the Third Normal Form. The purpose of the Third Normal
Form or 3NF is to remove Transitive dependency whereas BCNF is more strict than 3NF, and
it focuses on removing all non-trivial functional dependencies.
Q. What is PJNF?
PJNF stands for Project-Join Normal Form. This is a name given to the Fifth Normal Form
because the Fifth Normal Form or 5NF is used to fix Join dependency in tables.
The Fifth Normal Form is also known as PJNF or Project-Join Normal Form. The fifth normal
form fixes the Join dependency in tables hence it is called PJNF. This is an advanced Normal
form that helps in reducing Data redundancy and Updation anomaly.