The Relational Model and
Normalization
The Relational Model
Page 113
Broad, flexible model
Basis for almost all DBMS products
E.F. Codd defined well-structured
normal forms of relations,
normalization
Relational Data Model
A relational data model organizes data as a
set of relations, or two-dimensional tables.
A relation is viewed as a two-dimensional
table, with following properties:
Each column contains values about the same
attribute, and each table cell must be simple
Each column has a distinct name (attribute
name), and the order of columns is immaterial
Each row is distinct, duplicate rows are not
allowed
The sequence of the rows is immaterial
An Example Relation
Key
Candidate
Key
Foreign
Key
Non-key
Attribute
Non- key
Attribute
Employee Employee
Number
Name
28719
Smith Tom
Department
Number
172
Salary
18,000
Date
Started
12/03/84
53730
Jones Bill
044
20,000
01/05/83
79313
Ropley Ed
044
11,000
18/09/81
51616
Fair Carolyn
090
50,000
05/12/79
61930
Hall Albert
090
25,000
21/06/82
Terminology in a
Relation
Tuple - a row or record
Column - values of an attribute
Domain - a set of possible values for
an attribute
Terminology in a Relation
Key
primary key (unique ID)
Concatenated key - use two or more
attributes to identify a record (e.g.. Student
ID & Course ID to identify a Grade record)
Foreign key (cross reference key)
a foreign key is a non-key attribute in one
relation that also appears as a primary key
in another relation
An E-R Model for
Student Registration
System
Attributes
Course
Number
Instructor ID
Description
Name
Room
Course
1
Teaches
Rank
Instructor
1
Advises
M
Course Enrollment
Course
Number
Grade
M
M
Student
Number
Student
Student
Number
Major
Student
Name
Covert E-R Model to
Relational Tables
Create one table for each entity
with key and attributes
Introduce foreign key into the
many side to represent 1:m
relation
A Relational Model For
Student Registration
System
Course Table
Course ID
Description
Credit
Instructor ID
Instructor Table
Instructor ID
Instructor Name
Rank
Student Table
Student ID
Student Name
Major
Enrollment Table
Course ID
Student ID
Grade
Advisor ID
Relational Database
Advantages
Easy to understand and use
Powerful data manipulation capability
Implicit association to meet different needs. Flexible,
best for DSS
Normalization theory for database design
Disadvantages
Redundantly store keys as logical pointers for
implementing relationship
Inefficiency for high-volume transaction processing
Lack of semantic quality control
Equivalent Relational
Terms
Page 114
Figure 5-1
2000 Prentice Hall
Normalization
Reduce complex user views to a
set of small, stable data structures
Eliminate errors and
inconsistencies related to the
adding, deleting or updating of
record occurrences
Modification Anomalies
Insertion anomalies - cannot add a
record because of a missing value for
one or more fields
Deletion anomalies - the deletion of a
record causes an unintended deletion of
information
Update anomalies - updating as made
needlessly complicated due to
redundancy
Functional Dependence
Given a relation R, attribute Y of R
is functionally dependent on
attribute X of R if and only if,
whenever two tuples of R agree on
their X- value, they must
necessarily agree on their Y-value.
We write R.X --> R.Y
Example:
(Student ID, Student Name, Course ID, Course Title, Grade)
Student ID --> Student Name,
Course ID --> Course Title
Student ID -?-> Course ID
Course Title -?-> Student Name
Student ID -?-> Grade
Course ID -?-> Grade
Normal Forms
A relation is said to be in a
particular normal form if it satisfies
a certain specified set of
constraints
Normal Forms
1 NF (no repeating groups)
2 NF (no partial dependencies)
3 NF (no transitive dependencies)
Boyce-Codd NF
4 NF (no multi-value dependencies)
5 NF
Domain-Key NF
First Normal Form
A relation is in first normal form if it
contains no repeating groups
First Normal Form
An un-normalized relation contains
repeating groups
First Normal Form
Grade Report with repeating group of courses for
each student
(Student ID, Student Name, Campus Address,
Major, Course ID, Course Title, Instructor Name,
Instructor Location, Grade)
Remove repeating group
(Student ID, Student Name, Campus Address,
Major) (3NF)
(Student ID, Course ID, Course Title, Instructor
Name, Instructor Location, Grade) (1NF)
First Normal Form
Second Normal Form
A relation is in second normal
form if it is already in first
normal form and any partial
functional dependencies on the
primary key have been
removed
Second Normal Form
A
partial functional dependencies on the primary key
A
Second Normal Form
(Student ID, Course ID, Course Title, Instructor
Name, Instructor Location, Grade) (1NF)
Primary key is Student ID + Course ID
Student ID + Course ID --> Grade
Course ID --> Course Title (partial dependency)
Removing partial dependencies
(Student ID, Course ID, Grade) (3NF)
(Course ID, Course Title, Instructor Name,
Instructor Location ) (2NF)
Second Normal Form
Third Normal Form
A relation is in third normal form
if it is already in second normal
form and contains no transitive
dependencies
transitive dependency - One
nonkey attribute is dependent on
one or more nonkey attributes
Third Normal Form
A
transitive dependencies
A
Third Normal Form
(Course ID, Course Title, Instructor Name,
Instructor Location ) (2NF)
Course ID --> Instructor Name --> Instructor
Location
Instructor Name is nonkey
Instructor Location is dependent on Instructor
Name
Remove transitive dependency
(Course ID, Course Title, Instructor Name) (3NF)
(Instructor Name, Instructor Location ) (3NF)
Third Normal Form
Third Normal Form
if it is in second normal form and
has no transitive dependencies
Figure 5-7
2000 Prentice Hall
Practice: Mountain View
Community Hospital
Mountain View Community Hospital
Physician Report
Physician: A Campbell
Specialty: Internal Medicine
Date
Patient-Code
Patient-Name
Procedure
Charge
---------------------------------------------------------------------------------------------10/17/96 32968
Baker, Marry S. Examination
35.00
X-ray
75.00
10/17/96 39271
Emery, Nancy
Examination
35.00
Chemotherapy 50.00
10/18/96 32968
Baker, Marry S. Examination
35.00
----------------------------------------------------------------------------------------------
Normalize a table
Report (Doctor Name, Specialty, Date, Patient Code,
Patient Name, Procedure Name, Charge)
Analyzing functional dependency:
Assume no duplicate Doctor Name. Otherwise
introduce a doctor ID
Assume no duplicate Procedure Name. Otherwise
introduce a Procedure code
Assume charge is determined by procedure.
Assume a patient may visit a doctor more than
once during the same day.
Answer
Doctors (Doctor ID, Doctor Name, Specialty)
Patients (Patient Code, Patient Name)
Visit (Visit ID Doctor ID, Patient Code, Date)
Treatment (Visit ID, Procedure ID)
Procedure (Procedure ID, Procedure Name,
Charge)
Here the Visit ID is automatically generated by
the system
A E-R Model for
Hospital Treatment Charge
Procedure
ID
Doctor ID
Description
Name
Rate
Specialty
Procedure
Doctor ID
M
Treatment
Visit ID
Procedure
ID
Visit ID
Date/Time
Doctors
M
Patients
Visit
Patient
Code
1
Patient
Code
Patient
Name
E-R model improvement
criteria vs. Normalization
Each entity must have a key (simple or
Theory
composite) (basic requirement of a
relational table)
Introduce composite entity to convert a m:n
relation into two 1:m relations. Introduce a
composite key (the way of presenting m:n
relationships in relational database)
Convert a multivalued attribute into an
attribute entity or weak entity (1 NF)
E-R model improvement
criteria vs. Normalization
Make
theory
each entity represent a simple
object or concept (2 NF and 3NF)
Divide complex entity into several
related simple entities (2 NF and 3 NF)
Make each attribute associate with only
one entity unless it is a foreign key (3
NF)
A good E-R model usually satisfies 3 NF.
Boyce-Codd Normal
Form
if every determinant is a candidate key
Figure 5-8
2000 Prentice Hall
Boyce-Codd Normal Form
(Student, Major, Advisor) (3NF)
or (Student, Advisor, Major) (1NF)
Student may have more than one major
with one advisor in each major
Student + Major Advisor
Student + Advisor Major
Advisor Major (Advisor determines major but Advisor
is not candidate key)
(Student, Advisor) (BCNF)
(Advisor, Major) (BCNF)
Boyce-Codd Normal Form
A relation is in BCNF if and only if it is in 3NF
and every determinant is a candidate key
A determinant is any attribute (simple or
composite) on which some other attribute is
fully functionally dependent
Situation:
1. Multiple candidate keys
2. Those candidate keys are composite
3. The candidate keys are overlapped
Fourth Normal Form
A relation is in fourth normal form if it is in
BCNF and contains no multivalued
dependencies
Multivalued Dependency
There are three attributes (e.g. A,B,C) in a
relation.
For each value of A there is a well-defined set
of value of B and a well-defined set of value of
C.
The set of value of B is independent of the set
of value of C, and vice versa.
Fourth Normal Form
(Course, Instructor, Textbook)
(BCNF)
One course is taught by several
instructors
One course uses the same set of
textbooks by each instructor
(Course, Textbook) (4NF)
(Course, Instructor) (4NF)
Fourth Normal Form
Course
Instructor
Textbook
1ka3
David
Intro. Web
design
1ka3
Smith
Intro. Web
design
1ka3
David
Intro. Access
1ka3
Smith
Intro. Access
Course
Instructor
Course
Textbook
1ka3
David
1ka3
Intro. Web design
1ka3
Smith
1ka3
Intro. Access
Fifth Normal Form
?
Page 125
Fifth Normal Form
Every join dependency is a
consequence of its relation keys
A non 5NF: Person-using-skills-onjobs (Person, Skill, Job)
5 NF: Has-skill (Person, Skill)
Need-skill (Skill, Job)
Assigned-to-job (Person, Job)
Domain Key Normal
Form
if every constraint on the relation
is a logical consequence of the
definition of keys and domains
Constraint a rule governing
static values of attributes
Key unique identifier of a
tuple
Domain
Page 125
description of an
attributes allowed values
Example of non DK/NF
Enrollment (Student ID, Course ID, Grade)
Key constraint: Student ID + Course ID --> Grade
Domain constraint:
Student ID: 7 digits, Course ID: 3 digits, Grade:
A,B,C,D,F,P
General constraint
If Course ID < 900 then Grade in {A,B,C,D,F}
else Grade in {P,F}
Since the general constraint cannot be inferred from
key constraint or domain constraint, it is not a
DK/NF.
Remarks on Normalization
The notions of dependency and
normalization are semantic in
nature
The normalization guidelines
should be regarded primarily as a
discipline to help the database
design
Limitations of
normalization
may not natural, e.g. zip code, area code for
phone #
May ignore operational considerations: need not
change, may change over time. e.g. (order# ,
prod# ,description, unit-price, quantity)
Difficult to enforce integrity control
(Order#, Prod#, quantity)
(Prod#, Description, Unit-price)
Prod# may not be valid.
Now the integrity control is provided by relational
DBMS
Denormalization
Normalization is only one of many database
design goals.
Normalized (decomposed) tables require
additional processing, reducing system
speed.
Normalization purity is often difficult to
sustain in the modern database
environment. The conflict between design
efficiency, information requirements, and
processing speed are often resolved through
compromises that include denormalization.