Normalization
Normalization
Unit-IV
Topics to be covered
• Functional Dependency
• Definition and types of FD
• Closure of FD set
• Closure of attribute set
• Irreducible set of FD
• Normalization and normal forms
• 1NF
• 2NF
• 3NF
• BCNF
• 4NF
• 5NF
2
What is Functional Dependency?
Let R be a relation schema having n attributes A1, A2, A3,…, An.
Student
RollNo Name SPI BL
101 Raj 8 0
102 Meet 7 1
3
Diagrammatic representation
X→Y {X1, X2} → Y X → {Y1, Y2}
X Y X1 X2 Y X Y1 Y2
Example
Consider the relation Account(account_no, balance, branch).
account_no can determine balance and branch.
So, there is a functional dependency from account_no to balance and branch.
This can be denoted by account_no → {balance, branch}.
4
FD is a constraint between two sets of attributes in a relation from a database
Rollno Name
FD is the generalization of the concept of key
101 John
XY {X determine Y or X decides Y}
102 Mike
Single value of LHS we will get single value
at RHS. 103 John
RollnoName Yes
NameRollno No
A B C D
AB N
A1 B1 C1 D1 AC N ABC So with the help of this FD can we determine
AD Y ABD every other attribute uniquely ??
A1 B2 C2 D1 BCA
BA N
A2 B2 C1 D2 BC N ……..
A3 B3 C2 D2 BD N
CA N
A4 B4 C4 D4 CB N
A5 B3 C3 D3 CD N
5
Application of FD
We can determine additional FD
We can identify key(pk,sk,ck…)
Equivalence of FD
Minimal FD set(we can represent the same information with less no of FD)
7
Armstrong's axioms (inference rules)
Armstrong's axioms are a set of rules used to infer (derive) all the functional
dependencies on a relational database.
8
Armstrong's axioms (inference rules)
1. Reflexivity 5. Self-determination
– If B is a subset of A – A→A
then A → B always hold
6. Decomposition
2. Augmentation – If A → BC
– If A → B
then A → B and A → C
then AC → BC always hold
3. Transitivity 7. Union
– If A → B and B → C – If A → B and A → C
then A → C always hold then A → BC
4. Pseudo Transitivity 8. Composition
– If A → B and BD → C – If A → B and C → D
then AD → C then AC → BD
9
Rules for Functional Dependency-
Rule-01: A functional dependency X → Y will always hold if all the values of X are unique (different)
irrespective of the values of Y.
10
Rules for Functional Dependency-
Rule-02: A functional dependency X → Y will always hold if all the values of Y are same irrespective of the
values of X.
The following functional dependencies will always hold since all
the values of attribute ‘C’ are same-
• A→C
• AB → C
• ABDE → C
• DE → C
• AE → C
11
What is closure of a set of FDs?
The Closure Of Functional Dependency means the complete set of all possible attributes
that can be functionally derived from given functional dependency using the inference
rules known as Armstrong’s Rules.
If “F” is a functional dependency then closure of functional dependency can be denoted
using {F}+ or F+.
12
Example of closure of a set of FDs
Suppose we are given a relation schema R(A,B,C,G,H,I) and the set
of functional dependencies are:
• F = (A → B, A → C, CG → H, CG → I, B → H)
The functional dependency A → H is logical implied.
We have
A→B
Transitivity rule A→H
B→H
13
Example of closure of a set of FDs
Suppose we are given a relation schema R(A,B,C,G,H,I) and the set
of functional dependencies are:
• F = (A → B, A → C, CG → H, CG → I, B → H)
The functional dependency CG → HI is logical implied.
We have
CG → H
Union rule CG → HI
CG → I
14
Example of closure of a set of FDs
Suppose we are given a relation schema R(A,B,C,G,H,I) and the set
of functional dependencies are:
• F = (A → B, A → C, CG → H, CG → I, B → H)
The functional dependency AG → I is logical implied.
We have
A→C
Pseudo-transitivity rule AG → I
CG → I
15
Example of closure of a set of FDs
Suppose we are given a relation schema R(A,B,C,G,H,I) and the set
of functional dependencies are:
• F = (A → B, A → C, CG → H, CG → I, B → H)
The functional dependency AG → I is logical implied.
We have
A→C Augmentation rule AG → CG
AG → CG
Transitivity rule AG → I
CG → I
16
What is a closure of attribute sets?
Given a set of attributes X, the closure of X under F is the set of attributes that are
functionally determined by α under F.
It is denoted by X+.
17
Algorithm to find closure of attribute sets
Input : Let F be a set of FDs for relation R.
Steps:
1. X+= X //initialize X+ to X
2. For each FD : Y -> Z in F Do
If Y ⊆ X+ Then //If Y is contained in X+
X+ = X+ ∪ Z //add Z to X+
End If
End For
3. Return X+ //Return closure of X
18
Examples
Example-1 : Consider the table student_details having (Roll_No, Name,Marks, Location) as the attributes and
having two functional dependencies.
FD1 : Roll_No Name, Marks
FD2 : Name Marks, Location
Now, We will calculate the closure of all the attributes present in the relation using the three steps mentioned
below.
{Roll_no}+ = {Roll_No}
{Roll_no}+ = {Roll_No, Name, Marks}
Therefore, complete closure of Roll_No will be :
{Roll_no}+ = {Roll_No, Marks, Name, Location}
Similarly, we can calculate closure for other attributes too i.e “Name”.
{Name}+ = {Name}
{Name}+ = {Name, Marks, Location}
{Name}+ = {Name, Marks, Location}
{Marks}+ = {Marks}
and
{Location}+ = { Location}
19
Example-2 : Consider a relation R(A,B,C,D,E) having below
mentioned functional dependencies.
FD1 : A BC
FD2 : C B
FD3 : D E
FD4 : E D
{A}+ = {A, B, C}
{B}+ = {B}
{C}+ = {B, C}
{D}+ = {D, E}
{E}+ = {E}
20
Closure Of Functional Dependency : Calculating Candidate Key
A Candidate Key of a relation is an attribute or set of attributes that can determine the
whole relation or contains all the attributes in its closure."
Example-1 : Consider the relation R(A,B,C) with given functional dependencies :
FD1 : A B
FD2 : B C
{A}+ = {A, B, C}
{B}+ = {B, C}
{C}+ = {C}
Clearly, “A” is the candidate key as, its closure contains all the attributes present in the
relation “R”.
In this case, a single attribute is unable to determine all the attribute on its own like in previous
example. Here, we need to combine two or more attributes to determine the candidate keys.
NOTE : Any relation “R” can have either single or multiple candidate keys.
22
Exercise
1. Consider the relation scheme R = {E, F, G, H, I, J, K, L, M, N} and the set of functional dependencies {{E, F} {G}, {F}
{I, J}, {E, H} {K, L}, K {M}, L {N} on R. What is the key for R?
A. {E, F}
B. {E, F, H}
C. {E, F, H, K, L}
D. {E}
2. In a schema with attributes A, B, C, D and E following set of functional dependencies are given
{A B, A C, CD E, B D, E A}
Which of the following functional dependencies is NOT implied by the above set?
A. CD AC
B. BD CD
C. BC CD
D. AC BC
23
What is extraneous attributes?
Let us consider a relation R with schema R = (A, B, C) and set of functional dependencies F
= { AB → C, A → C }.
In AB → C, B is extraneous attribute. The reason is, there is another FD A → C, which
means when A alone can determine C, the use of B is unnecessary (extra).
An attribute of a functional dependency is said to be extraneous if we can remove it
without changing the closure of the set of functional dependencies.
24
What is canonical cover?
A canonical cover of F is a minimal set of functional dependencies equivalent to F, having
no redundant dependencies or redundant parts of dependencies.
It is denoted by Fc
A canonical cover for F is a set of dependencies Fc such that
1. F logically implies all dependencies in Fc and
2. Fc logically implies all dependencies in F and
3. Fc is free from all the extraneous functional dependencies
4. Fc is not unique and may be more than one for a given set of functional dependencies.
Need-
Working with the set containing extraneous functional dependencies increases the computation time.
Therefore, the given set is reduced by eliminating the useless functional dependencies.
This reduces the computation time and working with the irreducible set becomes easier
25
Steps to Find Canonical cover
Step-01:
Write the given set of functional dependencies in such a way that each functional dependency contains
exactly one attribute on its right side.
Example-
Step-02:
• Consider each functional dependency one by one from the set obtained in Step-01.
• Determine whether it is essential or non-essential.
To determine whether a functional dependency is essential or not, compute the closure of its left side-
• Once by considering that the particular functional dependency is present in the set
• Once by considering that the particular functional dependency is not present in the set
26
Then following two cases are possible-
NOTE-
• Eliminate the non-essential functional dependency from the set as soon as it is discovered.
• Do not consider it while checking the essentiality of other functional dependencies.
27
Step-03:
• Consider the newly obtained set of functional dependencies after performing Step-02.
• Check if there is any functional dependency that contains more than one attribute on its left side.
Case-01: No-
• There exists no functional dependency containing more than one attribute on its left side.
• In this case, the set obtained in Step-02 is the canonical cover.
Case-02: Yes-
• There exists at least one functional dependency containing more than one attribute on its left side.
• In this case, consider all such functional dependencies one by one.
• Check if their left side can be reduced.
Step-01:
Write all the functional dependencies such that each contains exactly one attribute on its right side-
X→W
WZ → X
WZ → Y
Y→W
Y→X
Y→Z
29
Step-02:
Eliminating WZ → X, our set of functional dependencies
Check the essentiality of each functional dependency one by one. reduces to-
X→W
For X → W: WZ → Y
Y→W
• Considering X → W, (X)+ = { X , W } Y→X
• Ignoring X → W, (X)+ = { X } Y→Z
Now, we will consider this reduced set in further checks.
Now,
• Clearly, the two results are different. For WZ → Y:
• Thus, we conclude that X → W is essential and can not be
eliminated. • Considering WZ → Y, (WZ)+ = { W , X , Y , Z }
• Ignoring WZ → Y, (WZ)+ = { W , Z }
For WZ → X:
Now,
• Considering WZ → X, (WZ)+ = { W , X , Y , Z } • Clearly, the two results are different.
• Ignoring WZ → X, (WZ)+ = { W , X , Y , Z } • Thus, we conclude that WZ → Y is essential and can not be
eliminated.
Now,
• Clearly, the two results are same.
• Thus, we conclude that WZ → X is non-essential and can be
eliminated.
30
For Y → W:
Now,
• Clearly, the two results are different.
• Thus, we conclude that Y → X is essential and can not be eliminated
31
Step-03:
• Consider the functional dependencies having more than one attribute on their left side.
• Check if their left side can be reduced.
In our set,
• Only WZ → Y contains more than one attribute on its left side.
• Considering WZ → Y, (WZ)+ = { W , X , Y , Z }
Now,
• Consider all the possible subsets of WZ.
• Check if the closure result of any subset matches to the closure result of WZ.
(W)+ = { W }
(Z)+ = { Z }
Clearly,
• None of the subsets have the same closure result same as that of the entire left side.
• Thus, we conclude that we can not write WZ → Y as W → Y or Z → Y.
• Thus, set of functional dependencies obtained in step-02 is the canonical cover.
Finally, the canonical cover is-
X→W
WZ → Y
Y→X
Y→Z
32
Exercise
1. Consider the following set F of functional dependencies: minimal cover..?
F= {
A BC Canonical Cover = {
B C A B
A B B C
AB C }
}
33
3. Find out canonical cover and minimal cover of following FDs:
AT LAST MINIMAL FDS are :
A->C
{
A->D
A -->C
E->A
AC -->D
E->H
E--> AD
E -->H
hence the CANONICAL FORM
}
IS...(canonical form means LHS should not
be repeated....)
A->CD;
3.5 Find out canonical cover and minimal cover of following FDs: E->AH;
Minimal cover:
R(VWXYZ)
V -> W
V -> W
V -> X
VW -> X
Y -> V
Y -> VXZ
Y -> Z
34
4. Consider a relation scheme R = (A, B, C, D, E, H) on which the following functional dependencies
hold: {A–>B, BC–>D, E–>C, D–>A}. What are the candidate keys of R?
(a) AE, BE
(b) AE, BE, DE
(c) AEH, BEH, BCH
(d) AEH, BEH, DEH
5. In a schema with attributes A, B, C, D and E, following set of functional dependencies are given:
A->B
A->C
CD->E
B->D
E->A
Which of the following functional dependencies is NOT implied by the above set?
(a) CD->AC (b) BD->CD (c) BC->CD (d) AC->BC
35
6. The following functional dependencies are given:
AB->CD, AF->D, DE->F, C->G , F->E, G->A
Which one of the following options is false?
(a)CF+ = {ACDEFG} (b)BG+ = {ABCDG}
(c)AF+ = {ACDEFG} (d)AB+ = {ABCDFG}
7. Relation R has eight attributes ABCDEFGH. Fields of R contain only atomic values.
F={CH->G,
A->BC,
B->CFH,
E->A,
F->EG}
is a set of functional dependencies (FDs) so that F + is exactly the set of FDs that hold for R.
36
What is decomposition?
Decomposition is the process of breaking down given relation into two or more
relations.
Relation R is replaced by two or more relations in such a way that:
1. Each new relation contains a subset of the attributes of R
2. Together, they all include all tuples and attributes of R
Types of decomposition
1. Lossy decomposition
2. Lossless decomposition (non-loss decomposition)
37
What is an anomaly in database design?
Anomalies are problems that can occur in poorly planned, un-normalized database
where all the data are stored in one table.
There are three types of anomalies that can arise in the database because of redundancy
are
1. Insert anomaly
2. Delete anomaly
3. Update / Modification anomaly
38
Insert anomaly
Consider a relation
• emp_dept (E#, Ename, Address, D#, Dname, Dmgr#) E# as a primary key
E# Ename Address D# Dname Dmgr#
Want to insert new 1 Raj Rajkot 1 CE 1
department detail (IT) 2 Meet Surat 1 CE 1
39
What is Insert anomaly?
An insert anomaly occurs when certain attributes cannot be
inserted into the database without the presence of another
attribute.
40
Delete anomaly
Consider a relation
• emp_dept (E#, Ename, Address, D#, Dname, Dmgr#) E# as a primary key
E# Ename Address D# Dname Dmgr#
Want to delete Meet 1 Raj Rajkot 1 CE 1
employee's detail 2 Meet Surat 1 IT 2
Now consider there is only one employee in some department (IT) and
that employee leaves the organization.
So we need to delete tuple of that employee (Meet).
But in addition to that information about the department also deleted.
This kind of problem in the relation where deletion of some tuples can
lead to loss of some other data not intended to be removed is known as
delete anomaly.
41
What is Delete anomaly?
A delete anomaly exists when certain attributes are lost because
of the deletion of another attribute.
42
Update anomaly
Consider a relation
• emp_dept (E#, Ename, Address, D#, Dname, Dmgr#) E# as a primary key
E# Ename Address D# Dname Dmgr#
Want to update CE 1 Raj Rajkot 1 CE M1
department’s manager 2 Meet Surat 2 IT M2
3 Jay Rajkot 2 CE M2
43
What is Update anomaly?
An update anomaly exists when one or more records (instance) of
duplicated data is updated, but not all.
44
Anomaly (Summary)
EmpID EmpName Address DeptID DeptName DeptMngr
E1 Raj Rajkot D1 C.E. Patel
E2 Samir Rajkot D2 Civil Shah
E3 Meet Baroda D1 Computer Patel
E4 Deepak Surat D1 C.E Patel
E5 Suresh Surat D3 Electrical Joshi
null null null D4 Chemical null
Update Anomaly
An update anomaly exists when one or more records of duplicated data is updated, but
not all.
45
How to deal with insert anomaly
EmpID EmpName Address DeptID DeptName DeptMngr
E1 Raj Rajkot D1 Computer
C.E. Patel
E2 Samir Rajkot D2 Civil Shah
E3 Meet Baroda D1 Computer Patel
E4 Deepak Surat D1 C.E
Computer Patel
E5 Suresh Surat D3 Electrical Joshi
null null null D4 Chemical null
Do not allow to insert new department “Chemical” until an employee is assign to it.
EmpID EmpName Address DeptID DeptID DeptName DeptMngr
E1 Raj Rajkot D1 D1 Computer Patel
E2 Samir Rajkot D2 D2 Civil Shah
E3 Meet Baroda D1 D3 Electrical Joshi
E4 Deepak Surat D1 D4 Chemical null
E5 Suresh Surat D3
46
How to deal with delete anomaly
EmpID EmpName Address DeptID DeptName DeptMngr
E1 Raj Rajkot D1 Computer
C.E. Patel
E2 Samir Rajkot D2 Civil Shah
E3 Meet Baroda D1 Computer Patel
E4 Deepak Surat D1 C.E
Computer Patel
E5 Suresh Surat D3 Electrical Joshi
If we delete Employee having ID “E2” then Civil department will also delete because
there is only one record of Civil dept.
47
How to deal with update anomaly
EmpID EmpName Address DeptID DeptName DeptMngr
E1 Raj Rajkot D1 Computer
C.E. Patel
E2 Samir Rajkot D2 Civil Shah
E3 Meet Baroda D1 Computer Patel
E4 Deepak Surat D1 C.E
Computer Patel
E5 Suresh Surat D3 Electrical Joshi
Changing the name of department D1 from “Computer” to “IT” may update one or
more records, but not all.
48
Summary S-ID
1
Name
A
Age
18
Br_code
101
Br_name
CSE
HOD
AAA
2 B 19 101 CSE AAA
3 C 18 101 CSE AAA
Idea: In this table we have to stored the entire college data. 4 D 20 102 EC BBB
Result: Entire branch data is repeated for every student of same branch. 5 E 18 102 EC BBB
Redundancy: When same data is stored multiple time. 6 F 19 103 ME CCC
Updation anomalies : when we want to update a single piece of data, it must be updated
at all the places.(HOD changed, missed somewhere)
49
How anomalies in database design can be solved?
50
What is normalization?
Normalization is the process of removing redundant data from
tables to improve data integrity, scalability and storage
efficiency.
1. data integrity (completeness, accuracy and consistency of data)
2. scalability (ability of a system to continue to function well in a
growing amount of work)
3. storage efficiency (ability to store and manage data that consumes
the least amount of space)
51
What we do in normalization?
Normalization generally involves splitting an existing table into
multiple (more than one) tables, which can be re-joined or linked
each time a query is issued (executed).
52
How many normal forms are there?
Normal forms:
1. 1NF (First normal form)
2. 2NF (Second normal form)
3. 3NF (Third normal form)
4. BCNF (Boyce–Codd normal form)
5. 4NF (Forth normal form)
6. 5NF (Fifth normal form)
53
1NF (First Normal Form)
Conditions for 1NF
A relation R is in first normal form (1NF) if and only if each cell of the table contains only an atomic value.
OR
A relation R is in first normal form (1NF) if the attribute of every tuple is either single valued or a null value.
Problem: It is difficult to retrieve the list of customers living in ‘Jamnagar’ from above
table.
The reason is that address attribute is composite attribute which contains road name as
well as city name in single cell.
It is possible that city name word is also there in road name.
In our example, ‘Jamnagar’ word occurs in both records, in first record it is a part of road
name and in second one it is the name of city.
55
1NF (First Normal Form) [Composite attribute]
Customer
CustomerID Name Address
C01 Raj Jamnagar Road, Rajkot
C02 Meet Nehru Road, Jamnagar
Solution: Divide composite attributes into number of sub-attributes and insert value in
proper sub-attribute.
Customer
CustomerID Name Road City
C01 Raj Jamnagar Road Rajkot
C02 Meet Nehru Road Jamnagar
56
1NF (First Normal Form) [Multivalued attribute]
Student
RollNo Name FailedinSubjects
101 Raj DS, DBMS
102 Meet DBMS, DS relation is not in 1NF
Problem: It is difficult to retrieve the list of students failed in ‘DBMS’ as well as ‘DS’ but
not in other subjects from above table.
The reason is that FailedinSubjects attribute is multi-valued attribute so it contains more
than one value.
57
1NF (First Normal Form) [Multivalued attribute]
Student Student Result
RollNo Name FailedinSubjects RollNo Name RID RollNo Subject
101 Raj DS, DBMS 101 Raj 1 101 DS
102 Meet DBMS, DS 102 Meet 2 101 DBMS
103 Jeet DS, DBMS, DE 103 Jeet 3 102 DBMS
104 Harsh DBMS, DE, DS 104 Harsh 4 102 DS
105 Nayan DE, DBMS, DS 105 Nayan 5 103 DS
… … …
Solution: Split the table into two tables in such as way that
• the first table contains all attributes except multi-valued attribute with
same primary key and
• other table contains multi-valued attribute and place a primary key in it.
• insert the primary key of first table in the second table as a foreign key.
58
2NF (Second Normal Form)
Conditions for 2NF
59
Consider a relation- R ( V , W , X , Y , Z ) with functional dependencies-
VW → XY
Y→V
WX → YZ
From here,
• Prime attributes = { V , W , X , Y }
• Non-prime attributes = { Z }
63
Consider a relation- R ( A , B , C , D , E ) with functional dependencies-
A → BC
CD → E
B→D
E→A
The possible candidate keys for this relation are-
A , E , CD , BC
From here,
• Prime attributes = { A , B , C , D , E }
• There are no non-prime attributes
Now,
• It is clear that there are no non-prime attributes in the relation.
• In other words, all the attributes of relation are prime attributes.
• Thus, all the attributes on RHS of each functional dependency are prime
attributes.
64
3NF (Third Normal Form)
AccountNO Balance BranchName BranchAddress
A01 50000 Rajkot Kalawad Road
A02 40000 Rajkot Kalawad Road
A03 35000 Rajkot Kalawad Road
A04 25000 Rajkot Kalawad Road
65
3NF (Third Normal Form)
Solution: Decompose relation in such a
ANO Balance BName BAddress
way that resultant relations do not have
any transitive FD. A01 50000 Rajkot Kalawad Road
A02 40000 Rajkot Kalawad Road
A03 35000 Rajkot Kalawad Road
Remove transitive dependent attributes A04 25000 Rajkot Kalawad Road
from the relation that violets 3NF.
Place them in a new relation along with BName BAddress
the non-prime attributes due to which
Table 1 Rajkot Kalawad Road
transitive dependency occurred.
The primary key of the new relation will
be non-prime attributes due to which ANO Balance BName
transitive dependency occurred. A01 50000 Rajkot
Table 2
Keep other attributes same as in the A02 40000 Rajkot
table with same primary key and add A03 35000 Rajkot
prime attributes of other relation into it
as a foreign key. A04 25000 Rajkot
66
BCNF (Boyce-Codd Normal Form)
Conditions for BCNF
if and only if-
1. Relation already exists in 3NF.
2. For each non-trivial functional dependency A → B, A must be a super key of the relation.
Now, we can observe that LHS of each given functional dependency is a candidate key.
Thus, we conclude that the given relation is in BCNF.
67
BCNF (Boyce-Codd Normal Form)
FD1: Student, Language→ Faculty Student Language Faculty
Mita JAVA Patel
FD2: Faculty → Language
Nita VB Shah
So {Student, Language} → Language Sita JAVA Jadeja
(Using Transitivity rule) Gita VB Dave
Rita VB Shah
Here, one faculty teaches only one subject, but a
subject may be taught by more than one faculty. Nita JAVA Patel
Mita VB Dave
In FD2, determinant is Faculty which is not a Rita JAVA Jadeja
primary key. So table is not in BCNF.
Problem: In this relation one student has more than one project
with different guide (faculty) then records will be stored
repeatedly for each student, language and guide (faculty)
combination which occupies more space.
68
BCNF (Boyce-Codd Normal Form)
Solution: Decompose relation in such a
way that resultant relations do not have Student Language Faculty
any transitive FD.
70
4NF (Fourth Normal Form)
A relation R is in fourth normal form (4NF)
• if and only if it is in BCNF and
• has no multivalued dependencies
• The following table is not in 4NF, stu_id --->> course, stu_id --->> hobby
71
4NF (Fourth Normal Form)
studentID subject activity
101 DS Cricket
101 DBMS Cricket
101 DS Football
101 DBMS Football
Decompose
72
Functional dependency & Multivalued dependency
A table can have both functional dependency as well as multi-valued dependency to
gather.
• studentID → address
• studentID →→ subject
• studentID →→ activity
73
Functional dependency & Multivalued dependency
Decompose
74
5NF (Fifth Normal Form)
A relation R is in fifth normal form (5NF)
• if and only if it is in 4NF and
• If we can decompose table further to eliminate redundancy and anomaly, and when we re-join the
decomposed tables by means of candidate keys, we should not be losing the original data or any new
record set should not arise. In simple words, joining two or more decomposed table should not lose
records nor create new records.
• R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = R
• Or it cannot be decomposed into any number of smaller tables without loss of data.
75
Example
Consider an example of different Subjects taught by different lecturers and the lecturers taking classes for different
semesters.
Note: Please consider that Semester 1 has Mathematics, Physics and Chemistry and Semester 2 has only Mathematics in its
academic year!!
• In above table, Rose takes both Mathematics and Physics class for Semester 1, but she does not take Physics class for
Semester 2.
• In this case, combination of all these 3 fields is required to identify a valid data.
• Imagine we want to add a new class – Semester3 but do not know which Subject and who will be taking that subject.
• We would be simply inserting a new entry with Class as Semester3 and leaving Lecturer and subject as NULL. As we
discussed above, it’s not a good to have such entries. Moreover, all the three columns together act as a primary key,
we cannot leave other two columns blank!
76
• Hence we have to decompose the table in such a way that it satisfies all the rules till 4NF
• Ans when join them by using keys, it should yield correct record. Here, we can represent each lecturer’s Subject
area and their classes in a better way. We can divide above table into three – (SUBJECT, LECTURER), (LECTURER,
CLASS), (SUBJECT, CLASS)
Now, each of combinations is in three different tables. If we need to identify who is teaching which subject to
which semester, we need join the keys of each table and get the result.
77
Points to remember
Point-1
Remember the following diagram which implies-
• A relation in BCNF will surely be in all other normal forms.
• A relation in 3NF will surely be in 2NF and 1NF.
• A relation in 2NF will surely be in 1NF.
Point-2
The above diagram also implies-
• BCNF is stricter than 3NF.
• 3NF is stricter than 2NF.
• 2NF is stricter than 1NF.
78
Point-03:
Point-04:
Point-05:
79
Point-06
• Third Normal Form (3NF) is considered adequate for normal relational database design.
• Every binary relation (a relation with only two attributes) is always in BCNF.
• BCNF is free from redundancies arising out of functional dependencies (zero redundancy).
A relation with only trivial functional dependencies is always in BCNF.
In other words, a relation with no non-trivial functional dependencies is always in BCNF.
80
Summary
Normal Description
Form
1NF A relation is in 1NF if it contains an atomic value.
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the primary
key. {No PA→ NPA}.
3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.
BCNF If it is 3NF and For each non-trivial functional dependency A → B, A must be a super key.
5NF A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.{i.e
relation can not be decomposed further and have losseless join}
81
Step by step: Candidate key
Consider another relation R(A, B, C, D, E) having the Functional dependencies :
FD1 : A BC
FD2 : C B
FD3 : D E
FD4 : E D
R(A B C D E)
• Now we will find the element with no incoming edge, which A here , which conclude that A is not at
the right hand side and it is not determine by any other element so it’s a must attribute of candidate
key, because no combination of element’s closer contains A.
• So we have consider all the cases, so final candidate key for the relation is AD and AE
82
Decomposition of relation
2NF
Step 1: Create a separate relation for each partial dependency
Step 2: Remove the right hand side attribute of the partial dependency from the relation that is
being decomposed. R
Step 3: we always create one table for CK if it is not a A B C
Part of a relation. a 1 X
b 2 Y
Example: R(A,B,C) a 3 Z
BC c 3 Z
Solution: d 3 Z
key: AB R2
A B e 3 Z
step 1: R1(B,C)
step 2: R(A,B) or R2(A,B) a 1
R1
b 2 B C
Why ? a 3
1 X
AB is the key c 3
B may be null 2 Y
d 3
So BC ? 3 Z
e 3
nullC 83
Examples
Assume a relation R (A, B, C, D, E) with the following set of functional dependencies;
F = {AB → C, B → D, E → D}
The key for this relation is ABE. Then, all three given FDs are partial dependencies, viz., AB → C, B
→ D, and E → D.
Step 1: separate tables for partial dependencies; hence, R1 (ABC), R2 (BD) and R3 (ED).
Step 2: remove RHS of these three partial FDs from R; hence, R4(A, B, E).
Step 3: is candidate key is a part of any relation?
Step 2: Remove the right hand side attribute of the partial dependency from the relation that is
being decomposed.
Step 3: we always create one table for CK if it is not a A B C
Part of a relation. a 1 X
b 1 X
c 1 X
Example: R(A,B,C)
d 2 Y
AB
e 2 Y
BC
Solution: A B f 3 Z
key: A a 1 g 3 Z
step 1: R1(B,C) b 1
step 2: R(A,B) or R2(A,B) c 1
B C
d 2
1 X
e 2
2 Y
f 3
3 Z
g 3
85
Example 1: R(A,B,C,D,E) R(A B C D E F G H I J)
AB ABC
BE A DE
CD B F
Solution: F GH
key: AC D IJ
step 1: R1(A,B,E)(still not in 3 NF)
--- R11(A,B) Solution:
--- R12(B,E) key: AB
step 2: R2(C,D) step 1: R1(A,D,E,I,J)(still not in 3 NF)
step 3: R3(A,C) --- R11(A,D,E)
--- R12(D,I,J)
step 2: R2(B,F,G,H)
R(A,B,C,D,E) ---R21(B,F)
ABC R(ABCDEFGHIJ) ---R22(G,H)
BD ABC step 3: R3(A,B,C) AB keeps c with itself
DE AD GH Key: ABD
Solution: Key AB BD EF R1(ABCI)= R11(ABC) R12(AI)
Step 1:R1(BDE) AI R2(ADGHJ)=R21(ADGH) R22(HJ)
---R11(BD) H J R3(BDEF)
---R12(DE) R4(ABD)
Step 2:R2(ABC) 86
BCNF
Step 1: Create a separate relation for each violated dependency
Step 2: Remove the right hand side attribute of the dependency from the relation that is being
decomposed.
Step 3: we always create one table for CK if it is not a A B C
Part of a relation. a 1 X
b 2 Y
Example: R(A,B,C) c 2 Z
ABC c 2 Z
CB {violation}
A C d 3 W
Solution:
a X e 3 W
key: AB,AC
g 3 W
step 1: R1(C,B) b Y
step 2: R(A,C) or R2(A,C)
c Z
C B
c Z X 1
d W Y 2
We can decompose the relation in the
reference of only one key, selected: AC e W Z 2
g W W 3
87
Find the highest Normal Form of the relation
R(ABCDEFGH) R(ABCDE) R(ABCDEF)
AB C CE D AB C
A DE DB DC AE
BF
CA EF
F GH
R(ABC)
AB
B C
CA
Key: A,B,C
89
4 NF
Step 1: Create a separate relation for each violated multi valued dependency
Step 2: Remove the right hand side attribute of the dependency from the relation that is
being decomposed.
Step 3: we always create one table for CK if it is not a
Part of a relation.
Solution:
key: Person
step 1: R1(Person, Mobile)
step 2: R(Person, Hobby)
90
Decomposition of a Relation- The process of breaking up or dividing a single relation into two or
more sub relations is called as decomposition of a relation.
The following two properties must be followed when decomposing a given relation-
1. Lossless decomposition-
2. Dependency Preservation-
92
Consider the following relation R( A , B , C )-
Consider this relation is decomposed into two sub relations as R1( A , C ) and R2(
B , C )-
94
Now, let us check whether this decomposition is lossy or not.
For lossy decomposition, we must have-
R1 ⋈ R2 ⊃ R
96
Lossless decomposition
Ano Balance Bname
A01 5000 Rajkot
A02 5000 Surat
97
Dependency Preserving Decomposition
A Decomposition D = { R1, R2, R3….Rn } of R is dependency preserving wrt a set F of Functional
dependency if
In the dependency preservation, at least one decomposed table must satisfy every dependency.
If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a
part of R1 or R2 or must be derivable from the combination of functional dependencies of R1 and
R2.
98
Exercise
Problem: Let a relation R (A, B, C, D ) and F+= {A –> B, B –> C, C –> D,D –>B}. Relation R is
decomposed into R1( A, B), R2(B,C) and R3(B, D). Check whether decomposition is
dependency preserving or not.
Solution:
R1(A,B) R2(B,C) R3(B,D)
F1 ∪ F2 ∪ F3 = F+
99
Problem: Let a relation R (A, B, C, D ) and F+= {AB –> CD, D –> A}. Relation R is
decomposed into R1( A, D) and R2(B,C,D). Check whether decomposition is dependency
preserving or not.
R1( A, D) R2(B,C,D)
A–>D B–>CD
D–>A C–>BD
D–>BC
BC–>D
BD–>C
CD–>B
100
Points to remember:
• BCNF decomposition is always lossless but not always dependency preserving.
• Sometimes, going for BCNF may not preserve functional dependencies.
• So, go for BCNF only if the lost functional dependencies are not required else normalize till 3NF only.
• There exist many more normal forms even after BCNF like 4NF and more.
• But in the real world database systems, it is generally not required to go beyond BCNF.
• Lossy decomposition is not allowed in 2NF, 3NF and BCNF.
• So, if the decomposition of a relation has been done in such a way that it is lossy, then the decomposition
will never be in 2NF, 3NF and BCNF.
• Unlike BCNF, Lossless and dependency preserving decomposition into 3NF and 2NF is always possible.
• A prime attribute can be transitively dependent on a key in a 3NF relation.
• A prime attribute can not be transitively dependent on a key in a BCNF relation.
• If a relation consists of only singleton candidate keys and it is in 3NF, then it must also be in BCNF.
• If a relation consists of only one candidate key and it is in 3NF, then the relation must also be in BCNF.
101