Unit IV - Database Normalization
Unit IV - Database Normalization
4 HRS
1
Notes By: Raju Poudel (Mechi Multiple Campus)
Introduction
Database normalization, or simply normalization, is the process of restructuring a
relational database in accordance with a series of so-called normal forms in order
to reduce data redundancy and improve data integrity.
Data redundancy is the existence of data that is additional to the actual data.
Data integrity is the maintenance of, and the assurance of the accuracy and
consistency of, data over its entire life-cycle.
It divides larger tables to smaller tables and links them using relationships.
2
Notes By: Raju Poudel (Mechi Multiple Campus)
Advantages of Normalization
• A smaller database can be maintained as normalization eliminates the duplicate
data. Overall size of the database is reduced as a result.
• As databases become lesser in size, the passes through the data becomes faster
and shorter thereby improving response time and speed.
• More flexible data structure i.e. we should be able to ad new rows and data values
easily
• Better understanding of data.
• Easier to maintain data structure i.e. it is easy to perform operations and complex
queries can be easily handled.
3
Notes By: Raju Poudel (Mechi Multiple Campus)
Disadvantages of Normalization
• Database systems are complex, difficult, and time-consuming to design.
• On Normalizing the relations to higher normal forms i.e. 4NF, 5NF the performance
degrades.
6
Notes By: Raju Poudel (Mechi Multiple Campus)
Multivalued Dependency
• Multivalued dependency occurs when two attributes in a table are independent of each
other but, both depend on a third attribute.
• A multivalued dependency consists of at least two attributes that are dependent on a third
attribute that's why it always requires at least three attributes.
• For example, BIKE_MODEL MANUF_YEAR COLOR
AS555 Harry
AS222 George
AS999 Kevin
8
Notes By: Raju Poudel (Mechi Multiple Campus)
Non-Trivial Functional Dependency
• If a functional dependency X->Y holds true where Y is not a subset of X then this
dependency is called non trivial Functional dependency.
• For example: An employee table with three attributes: emp_id, emp_name, emp_address.
9
Notes By: Raju Poudel (Mechi Multiple Campus)
Transitive Dependency
• If non-primary key attributes depends upon other non-primary key attributes than there
occurs transitive dependency.
• A transitive is a type of functional dependency which happens when t is indirectly formed by
two functional dependencies.
• A transitive functional dependency is when changing a non-key column, might cause any of
the other non-key columns to change.
• Consider the table, Changing the non-key column Full Name may change Salutation.
10
Notes By: Raju Poudel (Mechi Multiple Campus)
Transitive Dependency
• For example,
Company CEO Age
Microsoft Satya Nadella 51
Google Sundar Pichai 46
Alibaba Jack Ma 54
• {Company} -> {CEO} (if we know the company, we know its CEO's name)
• {CEO } -> {Age} If we know the CEO, we know the Age
• For example, we can't add a new course unless we have at least one student enrolled on the
course.
• If we want to add a new course then student details will become null. So, course can’t be
inserted without having student details. This scenario forms insertion anomaly.
13
Notes By: Raju Poudel (Mechi Multiple Campus)
Deletion Anomaly
• A Delete Anomaly exists when certain attributes are lost because of the deletion of other
attributes.
sid sname cid cname
S10 Ram C03 Java
S11 Shyam C03 Java
S12 Hari C04 C
S13 Sita C05 C++
• For example, consider what happens if Student S13 is the last student to leave the course - All
information about the course is lost.
14
Notes By: Raju Poudel (Mechi Multiple Campus)
Modification Anomaly
• The modification anomaly occurs when the record is updated in the relation. In this anomaly,
the modification in the value of specific attribute requires modification in all records in which that
value occurs.
• For example, if we update cid of student then we need to update cname of student too.
15
Notes By: Raju Poudel (Mechi Multiple Campus)
First Normal Form (1NF)
• For a table to be in the First Normal Form, it should follow the following rules:
a) It should only have single(atomic) valued attributes/columns.
b) All the columns in a table should have unique names.
• Consider the following table Student,
sid sname salutation address phone sub_id sub_name
• Above table does not satisfy 1NF because column phone contains multiple values. Hence, we
need to create new table contact to store phone numbers.
16
Notes By: Raju Poudel (Mechi Multiple Campus)
First Normal Form (1NF)
• Following are the normalized tables that satisfy 1NF,
Student
Contact
contact_id phone sid
1 9864912123 1
2 9854964126 1
3 9824912345 2
4 9824923456 3
5 9825612723 4
17
Notes By: Raju Poudel (Mechi Multiple Campus)
Second Normal Form (2NF)
• For a table to be in the Second Normal Form, it must satisfy two conditions:
a) The table should be in the First Normal Form.
b) There should be no Partial Dependency
• Partial Functional Dependency occurs only in relation with composite keys.
• Partial functional dependency occurs when one or more non key attribute are depending on a
part of the primary key.
Example:
Table: Stud_id, Course_id, Stud_name, Course_Name
Where: Primary Key = Stud_id + Course_id
Then: To determine name of student we use only Stud_id, which is part of primary key.
{Stud_id} -> {Stud_Name}
19
Notes By: Raju Poudel (Mechi Multiple Campus)
Second Normal Form (2NF)
Student
sid sname salutation address sub_id
1 Ram Mr Btm 1
2 Shyam Mr Ktm 1
3 Gita Mrs Btm 2
4 Sita Ms Ktm 3
2 Shyam Mr Ktm 2
Subject
sub_id sub_name
1 Database
2 Java
3 C
20
Notes By: Raju Poudel (Mechi Multiple Campus)
Third Normal Form (3NF)
• The official qualifications for 3NF are:
a) A table is already in 2NF.
b) Non primary key attributes do not depend on other non primary
key attributes (i.e. no transitive dependencies)
• All transitive dependencies are removed to place in another table.
• Consider the following student table,
sid sname salutation address sub_id
1 Ram Mr Btm 1
2 Shyam Mr Ktm 1
3 Gita Mrs Btm 2
4 Sita Ms Ktm 3
2 Shyam Mr Ktm 2
• In above table, there is transitive dependency on sname and salutation. Both are non-
primary key attributes. Change in sname might cause change in salutiation. For eg, if
we change name Ram to Maya then we need to change salutiation too.
21
Notes By: Raju Poudel (Mechi Multiple Campus)
Third Normal Form (3NF)
Original table
sid sname salutation address sub_id
1 Ram Mr Btm 1
2 Shyam Mr Ktm 1
3 Gita Mrs Btm 2
4 Sita Ms Ktm 3
2 Shyam Mr Ktm 2
Normalized tables.
22
Notes By: Raju Poudel (Mechi Multiple Campus)
Functional Dependencies
23
Notes By: Raju Poudel (Mechi Multiple Campus)
Fourth Normal Form (4NF)
• A table is in the 4NF if it is in 3NF and has no multivalued dependencies.
• A multivalued dependency exists when there are at least 3 attributes (like X,Y and Z) in a
relation and for value of X there is a well defined set of values of Y and a well defined set of
values of Z. However, the set of values of Y is independent of set Z and vice versa.
• Suppose a student can have more than one subject and more than one activity.
25
Notes By: Raju Poudel (Mechi Multiple Campus)
Fourth Normal Form (4NF)
Original Table: Here are the tables Normalized-
26
Notes By: Raju Poudel (Mechi Multiple Campus)
Boyce-Codd Normal Form (BCNF)
• Boyce-Codd Normal Form (BCNF) is one of the forms of database normalization. A
database table is in BCNF if and only if there are no non-trivial functional
dependencies of attributes on anything other than a superset of a candidate key.
• BCNF is also sometimes referred to as 3.5NF, or 3.5 Normal Form.
• For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two
conditions:
a) It should be in the Third Normal Form.
b) And, for any dependency A → B, A should be a super key.
• The second point sounds a bit tricky, right? In simple words, it means, that for a
dependency A → B, A cannot be a non-prime attribute, if B is a prime attribute.
27
Notes By: Raju Poudel (Mechi Multiple Campus)
Boyce-Codd Normal Form (BCNF)
• Below we have a college enrolment table with columns student_id, subject and
professor.
student_id subject professor
101 Java P.Java
101 C++ P.Cpp
102 Java P.Java2
103 C# P.Chash
104 Java P.Java
• In the above table student_id, subject together form the primary key, because using
student_id and subject, we can find all the columns of the table.
• Also, there is a dependency between subject and professor, where subject depends on
the professor name.
28
Notes By: Raju Poudel (Mechi Multiple Campus)
Boyce-Codd Normal Form (BCNF)
• This table satisfies the 1st Normal form because all the values are atomic, column
names are unique and all the values stored in a particular column are of same domain.
• This table also satisfies the 2nd Normal Form as their is no Partial Dependency.
• And, there is no Transitive Dependency, hence the table also satisfies the 3rd
Normal Form.
• But this table is not in Boyce-Codd Normal Form.
29
Notes By: Raju Poudel (Mechi Multiple Campus)
Boyce-Codd Normal Form (BCNF)
• To make this relation(table) satisfy BCNF, we will decompose this table into two tables,
student table and professor table.
• Below we have the structure for both the tables.
31
Notes By: Raju Poudel (Mechi Multiple Campus)