Functional Dependencies &
Normalization for Relational DBs
Chapter 5
Contents
1 Introduction
2 Functional dependencies
3 Normalization
2
Contents
1 Introduction
2 Functional dependencies
3 Normalization
3
Overview of Database Miniworld
Design Process
REQUIREMENTS - COLLECTION & ANALYSIS
Functional requirements Data requirements
DBMS–independent
FUNCTIONAL ANALYSIS
CONCEPTUAL DESIGN
High-level transaction Conceptual schema
specification
LOGICAL DESIGN
(DATA MODEL MAPPING)
DBMS–specific
APPLICATION PROGRAM Database schema
DESIGN
PHYSICAL DESIGN
TRANSACTION
Internal schema
IMPLEMENTATION
Application program
4
Application Design Database Design
Two Approaches to Database Design
Bottom-Up Design (Design by Synthesis):
Starts from individual attributes and their binary
relationships.
Builds relation schemas from the ground up.
Limitations:
Difficult to gather all necessary attribute relationships in
practice.
Rarely used in real-world systems due to complexity and
inefficiency.
5
Two Approaches to Database Design
Top-Down Design (Design by Analysis):
Begins with natural groupings of attributes (e.g., invoices,
forms, reports).
These groups are analyzed and decomposed into smaller,
refined relations.
Advantages:
Aligns with real-world scenarios.
More intuitive and easier to implement.
6
Introduction
Each relation schema consists of a number of
attributes and the relational database schema consists
of a number of relation schemas
Attributes are grouped to form a relation schema
Need some formal measure of why one grouping of
attributes into a relation schema may be better than
another
7
Introduction
“Goodness” measures:
Making sure that the semantics of the attributes is clear
in the schema
Redundant information in tuples and Update anomalies(
modification, deletion, insertion)
Reducing the NULL values in tuples
Disallowing the possibility of generating spurious tuples
8
Introduction
Redundant information in tuples: the attribute values
pertaining to a particular department (DNUMBER,
DNAME, DMGRSSN) are repeated for every employee
who works for that department.
9
Introduction
Update anomalies: modification, deletion, insertion
Modification
As the manager of a dept. changes we have to update many
values according to employees working for that dept.
Easy to make the DB inconsistent
10
Introduction
Deletion: if Borg James E. leaves, we delete his tuple
and lose the existing of dept. 1, the name of dept. 1, and
who is the manager of dept. 1
Introduction
Insertion:
How can we create a department before any employees
are assigned to it ??
Introduction
Reducing the NULL values in tuples
Employees not assigned to any dept.: waste the storage
space
Other difficulties: aggregation operations (e.g., COUNT,
SUM) and joins
13
Introduction
Disallowing the possibility of generating spurious tuples
EMP_PROJ (SSN, PNumber, Hours, EName, PName, PLocation)
EMP_LOCS (EName, PLocation)
EMP_PROJ1 (SSN, PNumber, Hours, PName, PLocation)
Generation of invalid and spurious data during JOINS:
PLocation is the attribute that relates EMP_LOCS and
EMP_PROJ1, and PLocation is neither a primary key nor a
foreign key in either EMP_LOCS or EMP_PROJ1
14
Introduction
Disallowing the possibility of generating spurious
tuples
15
Introduction
Disallowing the possibility of generating spurious
tuples
16
Introduction
“Goodness” measures:
Redundant information in tuples
Update anomalies: modification, deletion, insertion
Reducing the NULL values in tuples
Disallowing the possibility of generating spurious tuples
Normalization
17
Introduction
Normalization helps DB designers determine the best
relation schemas
A formal framework for analyzing relation schemas based on
their keys and on the functional dependencies among their
attributes
A series of normal form tests that can be carried out on
individual relation schemas so that the relational database can
be normalized to any desired degree
It is based on the concept of normal form 1NF, 2NF, 3NF,
BCNF, 4NF, 5NF
It is a process which ensures that the data is structured in
such a way that attributes are grouped with the PK.
Attributes that do not directly depend on PK may be
extracted to form a new relation
18
Introduction
There are two important properties of decompositions:
1) non-additive or losslessness of the corresponding join
2) preservation of the functional dependencies
Note that property (1) is extremely important and cannot
be sacrificed. Property (2) is less stringent and may be
sacrificed
19
Contents
1 Introduction
2 Functional dependencies
3 Normalization
20
Functional Dependencies (FDs)
Definition of FDs
Direct, indirect, partial dependencies
Inference Rules for FDs
Equivalence of Sets of FDs
Minimal Sets of FDs
21
Functional Dependencies (FDs)
Functional dependencies (FDs) are used to specify
formal measures of the "goodness" of relational
designs
FDs and keys are used to define normal forms for
relations
FDs are constraints that are derived from the meaning
and interrelationships of the data attributes
A set of attributes X functionally determines a set of
attributes Y if the value of X determines a unique value
for Y
XY
22
Functional Dependencies (FDs)
X Y holds if whenever two tuples have the same value for
X, they must have the same value for Y
For any two tuples t1 and t2 in any relation instance r(R):
If t1[X]=t2[X], then t1[Y]=t2[Y]
X Y in R specifies a constraint on all relation instances r(R)
Examples:
social security number determines employee name:
SSN EName
project number determines project name and location:
PNumber {PName, PLocation}
employee ssn and project number determines the hours per
week that the employee works on the project:
{SSN, PNumber} Hours
23
Functional Dependencies (FDs)
If K is a key of R, then K functionally determines all
attributes in R (since we never have two distinct tuples
with t1[K]=t2[K])
24
Functional Dependencies (FDs)
Definition of FDs
Direct, indirect, partial dependencies
Inference Rules for FDs
Equivalence of Sets of FDs
Minimal Sets of FDs
25
Functional Dependencies (FDs)
Direct and fully dependency : All attributes in a R
must be fully functionally dependent on the primary
key (or the PK is a determinant of all attributes in R)
SSN {Name, BDate, Address, DNO}
EMPLOYEE
SSN Name BDate Address DNO
26
Functional Dependencies (FDs)
Indirect dependency : Value of an attribute is not
determined directly by the primary key
DNO DName
SSN Dname is a transitive dependency through DNO
EMP_DEPT
SSN Name BDate Address DNO DName
27
Functional Dependencies (FDs)
Partial dependency
Composite determinant - more than one value is required
to determine the value of another attribute, the
combination of values is called a composite determinant
{SSN, PNumber} in EMP_PROJ: {SSN, PNumber} Hours
Partial dependency - if the value of an attribute does not
depend on an entire composite determinant, but only
part of it, the relationship is known as the partial
dependency
SSN EName , Pnumber {PName, PLocation}
EMP_PROJ
SSN PNumber Hours EName PName PLocation
28
Functional Dependencies (FDs)
Definition of FD
Direct, indirect, partial dependencies
Inference Rules for FDs
Equivalence of Sets of FDs
Minimal Sets of FDs
29
Functional Dependencies (FDs)
Given a set of FDs F, we can infer additional FDs that
hold whenever the FDs in F hold
Armstrong's inference rules:
IR1. (Reflexive) If Y ⊆ X, then X Y
IR2. (Augmentation) If X Y, then XZ YZ
(Notation: XZ stands for X ⋃ Z)
IR3. (Transitive) If X Y and Y Z, then X Z
Functional Dependencies (FDs)
Some additional inference rules that are useful:
1. Decomposition: If X YZ, then X Y and X Z
2. Union: If X Y and X Z, then X YZ
3. Psuedotransitivity: If X Y and WY Z, then WX Z
The last three inference rules, as well as any other
inference rules, can be deduced from IR1, IR2, and IR3
(completeness property)
Armstrong's inference rules:
IR1. (Reflexive) If Y ⊆ X, then X Y
IR2. (Augmentation) If X Y, then XZ YZ
(Notation: XZ stands for X ⋃ Z)
IR3. (Transitive) If X Y and Y Z, then X Z
Functional Dependencies (FDs)
Closure of a set F of FDs is the set F+ of all FDs that can
be inferred from F
Closure of a set of attributes X with respect to F is the
set X+ of all attributes that are functionally determined
by X
X+ can be calculated by repeatedly applying IR1, IR2,
IR3 using the FDs in F
33
Exercise
Consider a relation R(A, B, C, D, E) with the following
dependencies F:
(1) AB C,
(2) CD E,
(3) DE B
Find {A, B}+ , {A, B, D}+
Find F+
34
Exercise
Closure X+ of a set of attributes X under F where X is a
set of attributes that appears as a left-hand side of a
functional dependency in F
F = { AB → C, A → DE, B → M, M → GH, D → IJ }
1. {A}+ = ???
2. {B}+ = ???
3. {M}+ = ???
4. {D}+ = ???
5. {A, B}+ = ???
35
Functional Dependencies (FDs)
Definition of FD
Direct, indirect, partial dependencies
Inference Rules for FDs
Equivalence of Sets of FDs
Minimal Sets of FDs
36
Functional Dependencies (FDs)
Two sets of FDs F and G are equivalent if F+ = G+
Definition:
F covers G if G+ ⊆ F+
F and G are equivalent if F covers G and G covers F
There is an algorithm for checking equivalence of sets
of FDs
37
Functional Dependencies (FDs)
A set of FDs is minimal if it satisfies the following
conditions:
Every dependency in F has a single attribute for its right
hand side.
We cannot remove any dependency from F and have a set
of dependencies that is equivalent to F.
We cannot replace any dependency X A in F with a
dependency Y A, where Y proper-subset-of X ( Y
subset-of X) and still have a set of dependencies that is
equivalent to F
Functional Dependencies (FDs)
Every set of FDs has an equivalent minimal set
There can be several equivalent minimal sets
There is no simple algorithm for computing a minimal
set of FDs that is equivalent to a set F of FDs
To synthesize a set of relations, we assume that we
start with a set of dependencies that is a minimal set
Contents
1 Introduction
2 Functional dependencies
3 Normalization
40
Normalization
Normalization: The process of decomposing
unsatisfactory "bad" relations by breaking up their
attributes into smaller relations
Normal form: Using keys and FDs of a relation to
certify whether a relation schema is in a particular
normal form
Normalization is carried out in practice so that the
resulting designs are of high quality and meet the
desirable properties
The database designers need not normalize to the
highest possible normal form (3NF, BCNF or 4NF)
41
Normalization
Two new concepts:
A Prime attribute must be a member of some candidate
key
A Nonprime attribute is not a prime attribute: it is not a
member of any candidate key
42
Introduction
There are two important properties of decompositions:
1) non-additive or losslessness of the corresponding join
2) preservation of the functional dependencies
Note that property (1) is extremely important and cannot
be sacrificed. Property (2) is less stringent and may be
sacrificed
43
Normalization
1NF and dependency problems
2NF – solves partial dependency
3NF – solves indirect dependency
BCNF – well-normalized relations
44
Normalization
First normal form (1NF): there is only one value at
the intersection of each row and column of a relation -
no set valued attributes in 1NF
Disallows composite attributes, multivalued
attributes, and nested relations
To be part of the formal definition of a relation in the
basic (flat) relational model
45
1NF
46
1NF
47
1NF
48
1NF
EMP_PROJ (SSN, PNumber, Hours, EName, PName, PLocation)
1. SSN, PNumber Hours
2. SSN EName
3. PNumber PName, PLocation
49
1NF
EMP_PROJ (SSN, PNumber, Hours)
EMP (SSN, EName)
PROJ (PNumber, PName, PLocation)
1. SSN, PNUMBER HOURS
2. SSN ENAME
3. PNUMBER PNAME, PLOCATION
50
Normalization
1NF and dependency problems
2NF – solves partial dependency
3NF – solves indirect dependency
BCNF – well-normalized relations
51
Normalization
Second normal form (2NF) - all attributes must be fully
functionally dependent on the primary key
2NF solves partial dependency problem in 1NF
Method: identify primary keys and group attributes
that relate to the key together to form separate new
relations
52
2NF
53
2NF
Problem with 2NF
54
Normalization
1NF and dependency problems
2NF – solves partial dependency
3NF – solves indirect dependency
BCNF – well-normalized relations
55
Normalization
A relation schema R is in third normal form (3NF) if it
is in 2NF and no non-prime attribute A in R is
transitively dependent on the primary key
NOTE:
In X Y and Y Z, with X as the primary key, we consider
this a problem only if Y is not a candidate key. When Y is a
candidate key, there is no problem with the transitive
dependency .
E.g., Consider EMP (SSN, Emp#, Salary ).
Here, SSN Emp# Salary and Emp# is a candidate key
56
Normalization
3NF solves indirect (transitive) dependencies problem
in 1NF and 2NF
Method: identify all transitive dependencies and each
transitive dependency will form a new relation, with
non-prime attributes participating in the transitive
dependency and the attribute which determines others
as the attributes for the new relation
57
3NF
58
Exercise
Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J}
and the set of functional dependencies:
1. A, B C
2. A D, E
3. B F
4. F G, H
5. D I, J
What is the key for R? Decompose R into 2NF, then 3NF
relations.
59
Normalization
1NF and dependency problems
2NF – solves partial dependency
3NF – solves indirect dependency
BCNF – well-normalized relations
60
SUMMARY OF NORMAL FORMS based
on Primary Keys
61
General Normal Form Definitions
The above definitions consider the primary key only
The following more general definitions take into
account relations with multiple candidate keys
62
General Normal Form Definitions
A relation schema R is in second normal form (2NF) if
every nonprime attribute A in R is not partially dependent
on any key of R.
A relation schema R is in third normal form (3NF) if
whenever a FD X -> A holds in R, then either:
(a) X is a superkey of R, or
(b) A is a prime attribute of R
63
Normalization
1NF and dependency problems
2NF – solves partial dependency
3NF – solves indirect dependency
BCNF – well-normalized relations
64
Normalization
A relation schema R is in Boyce-Codd
Normal Form (BCNF) if whenever an FD
X -> A holds in R, then X is a superkey of R
65
BCNF
Boyce-Codd normal form. (a) BCNF normalization of LOTS1A with the
functional dependency FD2 being lost in the decomposition. (b) A
schematic relation with FDs; it is in 3NF, but not in BCNF.
66
BCNF
TEACH (Student, Course, Instructor)
67
BCNF
Three possible pairs
All three decompositions lose the functional
dependency FD1. The desirable decomposition of those
just shown is 3 because it will not generate spurious
tuples after a join
68
Nonadditive Join Decomposition into BCNF
Schemas
Algorithm 16.5. Relational Decomposition into BCNF with
Nonadditive Join Property
Input: A universal relation R and a set of functional
dependencies F on the attributes of R.
1. Set D := {R} ;
2. While there is a relation schema Q in D that is not in BCNF do
{
choose a relation schema Q in D that is not in BCNF;
find a functional dependency X→Y in Q that violates BCNF;
replace Q in D by two relation schemas (Q – Y) and (X ∪ Y);
};
69
Contents
1 Introduction
2 Functional dependencies
3 Normalization
70
Overview of Database Miniworld
Design Process
REQUIREMENTS - COLLECTION & ANALYSIS
Functional requirements Data requirements
DBMS–independent
FUNCTIONAL ANALYSIS
CONCEPTUAL DESIGN
High-level transaction Conceptual schema
specification
LOGICAL DESIGN
(DATA MODEL MAPPING)
DBMS–specific
APPLICATION PROGRAM Database schema
DESIGN
PHYSICAL DESIGN
TRANSACTION
Internal schema
IMPLEMENTATION
Application program
71
Application Design Database Design
Two Approaches to Database Design
Bottom-Up Design (Design by Synthesis):
Starts from individual attributes and their binary
relationships.
Builds relation schemas from the ground up.
Limitations:
Difficult to gather all necessary attribute relationships in
practice.
Rarely used in real-world systems due to complexity and
inefficiency.
Based on a model called the binary relational model
(NIAM methodology, Verheijen and VanBekkum, 1982).
72
Two Approaches to Database Design
Top-Down Design (Design by Analysis):
Begins with natural groupings of attributes (e.g., invoices,
forms, reports).
These groups are analyzed and decomposed into smaller,
refined relations.
Advantages:
Aligns with real-world scenarios.
More intuitive and easier to implement.
73
Xác định các thuộc tính cần lưu. Các phụ thuộc hàm. Chuẩn hóa.
Exercise
Trường ĐH Bách Khoa TPHCM
PHIẾU ĐIỂM - Học kỳ 1 năm 2024
MSSV: 2134567
Tên sinh viên: Nguyễn Văn A
Địa chỉ: KTX Bách Khoa
Khoa: Khoa học & kỹ thuật máy tính
Mã môn Tên môn Số Điểm Điểm Điểm tổng
Nhóm
học học TC KT thi kết
CO1040 CTDL> L01 3 7 8 7.8
CO2013 Hệ CSDL L05 3 6.5 8.5 8.1
Ghi chú: Tùy từng môn học sẽ có hệ số của điểm kiểm tra và
điểm thi khác nhau. Điểm tổng kết được tính dựa trên điểm
KT, điểm thi và các hệ số này.
74
75
Exercise
Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J}
and the set of functional dependencies:
1) A, B C
2) B, D E, F
3) A, D G, H
4) A I
5) H J Decompose R into 2NF, then 3NF relations.
76
Exercise
Consider the relation:
BOOK (Book_Name, Author, Edition, Year)
Based on a common-sense understanding of the data,
what are the possible candidate keys of this relation?
77
Xác định các thuộc tính cần lưu. Các phụ thuộc hàm. Chuẩn hóa.
Exercise
Trường ĐH Bách Khoa TPHCM
PHIẾU ĐIỂM - Học kỳ 1 năm 2024
MSSV: 2134567
Tên sinh viên: Nguyễn Văn A
Địa chỉ: KTX Bách Khoa
Khoa: Khoa học & kỹ thuật máy tính
Mã môn Tên môn Số Điểm Điểm Điểm tổng
Nhóm
học học TC KT thi kết
CO1040 CTDL> L01 3 7 8 7.8
CO2013 Hệ CSDL L05 3 6.5 8.5 8.1
Ghi chú: Tùy từng môn học sẽ có hệ số của điểm kiểm tra và
điểm thi khác nhau. Điểm tổng kết được tính dựa trên điểm
KT, điểm thi và các hệ số này.
78
Key finding algorithms
Extended part
Key and super key
Superkey of R: A set of attributes SK of R such that no
two tuples in any valid relation instance r(R) will have the
same value for SK. That is, for any distinct tuples t1 and
t2 in r(R), t1[SK] ≠ t2[SK]
Key of R: A "minimal" superkey; that is, a superkey K
such that removal of any attribute from K results in a set
of attributes that is not a superkey
If K is a key of R, then K functionally determines all
attributes in R
Key-finding algorithm (1) By Elmasri and Navathe
Input: A relation R and a set of functional dependencies F
on the attributes of R.
Output: a key K of R
1. Set K to contain all attributes in R
2. For each attribute A in K {
compute (K – A)+ with respect to F;
if (K – A)+ contains all attributes in R, then
set K := K – {A}
};
Key-finding algorithm (1) By Elmasri and Navathe
In algorithm (1), we start by setting K to all the attributes
of R; we then remove one attribute at a time and check
whether the remaining attributes still form a superkey.
The algorithm (1) determines only one key out of the
possible candidate keys for R; the key returned depends
on the order in which attributes are removed from R in
step 2.
Key-finding algorithm (2)By Hossein Saiedian & Thomas Spencer
Input: A relation R and a set of functional dependencies F on the attributes of R.
Output: all candidate keys of R
Let:
U contain all attributes of R
Ul contain attributes of R that occur only on the left-hand side of FDs in F
Ur contain attributes of R that occur only on the right-hand side of FDs in F
Ub contain attributes of R that occur on both sides of FDs in F
Note:
Ul ∩ Ur = ф, Ul ∩ Ub = ф and Ur ∩ Ub = ф
Ul ∪ Ur ∪ Ub = U
For every attribute A ∈ U, if A ∈ Ul, then A must be part of every candidate
key of R.
For every attribute A ∈ U, if A ∈ Ur, then A will not be part of any candidate
key of R.
Key-finding algorithm (2)By Hossein Saiedian & Thomas Spencer
Input: A relation R and a set of functional dependencies F on the
attributes of R.
Output: all candidate keys of R
1. Determine Ul, Ur and Ub
2. If Ul+ = U under F, then Ul forms the only key of R and the
algorithm stops here.
Else: move to step 3 // Ul+ ≠ U under F
3. Consider every subsets Ubi of Ub: Ubi ⊂ Ub
For each Ubi, if (Ul ∪ Ubi)+ = U under F, then Ki = (Ul ∪ Ubi) is a
candidate key of R (*)
(*) If Ki = (Ul ∪ Ubi) is a candidate key of R, then we need not to check
Ubj ⊂ Ub where Ubi ⊂ Ubj
Key-finding algorithm (2)By Hossein Saiedian & Thomas Spencer
A simple categorization of attributes into the sets Ul, Ur
and Ub allows to distinguish between those attributes that
will participate in the candidate keys of a relational
database schema and those that do not.
The algorithm (2) finds all candidate keys.
Exercise 1
Consider the universal relation R = {A, B, C, D, E, F} and
the set of functional dependencies:
1) A B
2) C, D A
Ul = {C, E}
3) B, C D Ur -= {F}
4) A, E F Ub = {A, B}
5) C, E D Ul+ = {C, E, D, A, B, F} =
U
What is the key for R?
Exercise 2
Consider the universal relation R = {A, B, C, D, E, F} and
the set of functional dependencies:
1) A, D B
2) A, B E
3) C D
4) B C
5) A, C F
What is the key for R? Decompose R into 2NF, then 3NF
relations. (General Normal Form Definitions)
Exercise 3
Consider the universal relation R = {A, B, C, D, E, F} and the set
of functional dependencies:
1) AB
2) C A, D
3) A, F C, E
What is the key for R? Decompose R into 2NF, 3NF, and BCNF
relations.
1 AB
2.1 CA
2.2 CD
3.1 A, F C
3.2 A, F E
Which of the following dependencies may hold
in the above relation
89