IT - 304
Database Systems
for BS (IT)
Lecture 7:
Normalization
Hasan Raza
Lecturer CS & IT
Govt. Postgraduate College, Sheikhupura
Objectives
In this chapter, students will learn:
What normalization is and what role it plays in the database design
process
About the normal forms 1NF, 2NF, 3NF, BCNF,
and 4NF
How normal forms can be transformed from lower normal forms to
higher normal forms
That normalization and ER modeling are used concurrently to
produce a good database design
That some situations require denormalization to generate information
efficiently
Database Systems, 10th Edition 2
Database Tables and Normalization
Table is basic building block in database design
Table’s structure is of great interest
Two cases:
possible poor table structures in good database design
Modify existing database with existing poor table structure
Normalization can help recognize a poor table and
convert to good tables with good structure
Database Tables and Normalization
Normalization
Process for evaluating and correcting table structures to minimize
data redundancies
Reduces data anomalies
Series of stages called normal forms:
First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
Fourth normal form(4NF)
Database Tables and Normalization (cont’d.)
Normalization (continued)
2NF is better than 1NF; 3NF is better than 2NF
For most business database design purposes, 3NF is as high as
needed in normalization
Denormalization produces a lower normal form
Highest level of normalization is not always most desirable
Increased performance but greater data redundancy
The Need for Normalization
Example: Company which manages building projects.
Building projects
Project number
Project name
Employees assigned
…
Employee
Employee number
Employee name
Job classification
The Need for Normalization
The business rules are:
Charges its clients by billing hours spent on each contract
Hourly billing rate is dependent on employee’s position
Periodically, report is generated that contains information
such as displayed in Table 6.1
Classic control-break report. A common type of report from a
database.
DESIRED OUTPUT
The Need for Normalization (cont’d.)
Figure Observation
Structure of data set in Figure 6.1 does not handle data very
well
Primary key – PROJ_NUM contains nulls
Table entries invite data inconsistencies
Table displays data redundancies which yield the following
anomalies
Update
Insertion
Deletion
The Need for Normalization (cont’d.)
Figure Observation
Report may yield different results depending on what data
anomaly has occurred
Relational database environment is suited to help designer
avoid data integrity problems
The Need for Normalization (cont’d.)
Insertion Anomaly
It occurs when extra data beyond the desired data must be added to
the database. (New employee must be assigned project)
Update Anomaly
It occurs when it is necessary to change multiple rows to modify only
a single fact. (Modifying JOB_CLASS)
Deletion Anomaly
It occurs when deleting a row causes some unwanted deletions.(If
employee deleted, other vital data lost)
Data Anomalies: Example
Suppose a manufacturing company stores the employee details in a table
named employee that has four attributes: emp_id, emp_name, emp_address,
emp_dept for storing employee’s details. At some point of time the table looks
like this:
emp_id emp_name emp_address emp_dept
101 Ali Lahore D001
101 Ali Lahore D002
123 Usman Karachi D890
166 Omar Islamabad D900
166 Omar Islamabad D004
Update Anomaly:
we have two rows for employee Ali as he belongs to two departments of the
company. If we want to update the address of Ali then we have to update the
same in two rows otherwise the data will become inconsistent. If somehow,
the correct address gets updated in one department but not in other then as
per the database, Ali would be having two different addresses, which is not
correct and would leads to inconsistent data.
emp_id emp_name emp_address emp_dept
101 Ali Lahore D001
101 Ali Lahore D002
123 Usman Karachi D890
166 Omar Islamabad D900
166 Omar Islamabad D004
Insert Anomaly:
Suppose a new employee joins the company, who is under
training and currently not assigned to any department then we
would not be able to insert the data into the table if emp_dept
field doesn’t allow nulls.
emp_id emp_name emp_address emp_dept
101 Ali Lahore D001
101 Ali Lahore D002
123 Usman Karachi D890
166 Omar Islamabad D900
166 Omar Islamabad D004
Delete Anomaly:
Suppose, if after some time the company closes the department D890 then
deleting the rows that are having emp_dept as D890 would also delete the
information of employee Usman since he is assigned only to this department
emp_id
emp_id emp_name
emp_name emp_address
emp_address emp_dept
emp_dept
101 Ali Lahore D001
101 Ali Lahore D001
101 Ali Lahore D002
101 Ali Lahore D002
123 Usman Karachi D890
166 Omar Islamabad D900
166 Omar Islamabad D900
166
166 Omar
Omar Islamabad
Islamabad D004
D004
The Normalization Process
Each table represents a single subject. For example, a course table will
contain only data that directly pertain to courses. Similarly, a student table
will contain only student data.
No data item will be unnecessarily stored in more than one table (in short,
tables have minimum controlled redundancy). The reason for this
requirement is to ensure that the data are updated in only one place.
All nonprime attributes in a table are dependent on the primary key—the
entire primary key and nothing but the primary key. The reason for this
requirement is to ensure that the data are uniquely identifiable by primary
key value.
Each table is void of insertion, update, or deletion anomalies. This is to
ensure the integrity and consistency of the data.
The Normalization Process (cont’d.)
Objective of normalization is to ensure that all tables are in at
least 3NF
Higher forms are not likely to be encountered in business
environment
Normalization works one relation at a time
Progressively breaks table into new set of relations based on
identified dependencies
The Normalization Process (cont’d.)
Partial dependency
Exists when there is a functional dependence in which the determinant is
only part of the primary key
Transitive dependency
dependency of one nonprime attribute on another nonprime
attribute.
Normalization starts by identifying the dependencies of a given relation and
progressively breaking up the relation (table) into a set of new relations (tables)
based on the identified dependencies.
Conversion to First Normal Form
Repeating group
Group of multiple entries of same type can exist for any single key
attribute occurrence
Relational table must not contain repeating groups
Normalizing table structure will reduce data redundancies
Normalization is three-step procedure
Conversion to First Normal Form (cont’d.)
Step 1: Eliminate the Repeating Groups
Eliminate nulls: each repeating group attribute contains an
appropriate data value
Step 2: Identify the Primary Key
Must uniquely identify attribute value
New key must be composed
Step 3: Identify All Dependencies
Dependencies are depicted with a diagram
Conversion to 1NF
Step 1: Eliminate the Repeating Groups
A Repeating group is group of multiple entries of same type existing
for any single key attribute occurrence
Present data in tabular format, where each cell has single value
and there are no repeating groups
Eliminate repeating groups, eliminate nulls by making sure that each
repeating group attribute contains an appropriate data value
Repeating groups must be eliminated
A repeating group derives its name from the fact that a group of
multiple entries of the same type can exist for any single key
attribute occurrence.
Step 1: Eliminate the Repeating Groups
Conversion to 1NF
Step 2 - Identify the Primary Key
Review Determination and attribute dependence.
All attribute values in the occurrence are ‘determined’ by the Primary
Key. The Primary Key Must uniquely identify the attribute(s)
Resulting Composite Key : PROJ_NUM and EMP_NUM
For example, using the data shown in Table if you know that PROJ_NUM = 15 and
EMP_NUM = 103, the entries for the attributes PROJ_NAME, EMP_NAME, JOB_CLASS,
CHG_HOUR, and HOURS must be Evergreen, June E. Arbough, Elect. Engineer, $84.50,
and 23.8, respectively
Step 2 – Identify Primary key
PK PK
Conversion to 1NF
Step 3 - Identify All Dependencies
Depicts all dependencies found within given table structure
Helpful in getting bird’s-eye view of all relationships among table’s
attributes
PROJ_NUM, EMP_NUM PROJ_NAME, EMP_NAME, JOB_CLASS,
CHG_HOUR, HOURS
PROJ_NUM PROJ_NAME
EMP_NUM EMP_NAME, JOB_CLASS, CHG_HOUR
JOB_CLASS CHG_HOUR
Conversion to First Normal Form (cont’d.)
Dependency diagram:
Depicts all dependencies found within given table structure
Helpful in getting bird’s-eye view of all relationships among table’s
attributes
Makes it less likely that you will overlook an important dependency
1NF Summarized
All key attributes defined
No repeating groups in table
All attributes dependent on primary key
every column of your table should only contain
single values
33
(Example- Multiple value)For an airline
Flight Weekday
UA59 Mo
Flight Weekdays UA59 We
UA59 Mo We Fr UA59 Fr
UA73 Mo Tu We Th Fr UA73 Mo
UA73 We
… …
First Normal Form (1NF)
A relation is said to be in 1NF, if it contains no Repeating Group (RG).
A RG is a collection of multi-valued attributes OR when there is more than
one field storing the same kind of information in a single table, there is a RG.
To eliminate a RG, the value at the intersection of a row and column must be
atomic(having one value).
If you developed a logical design by transforming ER diagram into relations,
there should not be any multivalued attributes remaining
Consider the following relation:
Student (RegNo, Name, Program, C-Code, C-Title, C-Grade)
This relation has a repeating group consisting of C-Code, C-Title, C-Grade and
therefore it has the insert, delete and update anomalies.
Multiple values create problems in performing operations like select or join.
First Normal Form (1NF)
The relation Student can be converted into 1NF using either of the following
methods:
a) Change the PK of the relation and define a composite key RegNo & C_Code.
We fill the blanks by duplicating the non-repeating data. This approach is
commonly referred to as Flattening the table.
b) Split the relation into 2 relations by placing the repeating data along with a
copy of the original key attribute(s) in a separate relation. The new relation
will always have concatenated key.
Student (RegNo, Name, Program)
Course (RegNo, C-Code, C-Title, C_Grade)
Example 2: STD(stId, stName, stAdr, prName, bkId,bkTitle, i-date)
SECOND NORMALIZATION FORM
Conversion to Second Normal Form
Step 1: Make New Tables to Eliminate Partial Dependencies
Write each key component on separate line, then write original
(composite) key on last line
Each component will become key in new table
Step 2: Reassign Corresponding Dependent Attributes
Determine attributes that are dependent on other attributes
At this point, most anomalies have been eliminated
Second Normal Form (2NF)
A relation is in 2NF if:
It is in 1NF
Every non-key attribute is fully functionally dependent on the primary key
A situation of Partial Functional Dependency arises when PK of a relation is
composite, and a non key attribute is functionally dependent on part (but not
all) of the PK.
Referring to the Course relation:
Course (RegNo, C-Code, C-Title, C_Grade)
The functional dependencies are:
C-Code -> C_Title (Partial FD)
RegNo, C_Code -> C_Grade (Full FD)
Second Normal Form (2NF)
Since all the non key attributes are not fully functionally dependent on the
PK or there is partial functional dependency in the relation, therefore it is
not in 2NF.
The Anomalies associated with the course relation are:
a) Insert Anomaly:
A course instance can't be inserted without a student (RegNo)
b) Delete Anomaly.
Deleting a student will unnecessarily delete course data.
c) Update Anomaly.
A course cant be updated independently.
Second Normal Form (2NF)
The relation Course
Course (RegNo, C-Code, C-Title, C_Grade)
can be converted into 2NF by decomposing it into the following
relations:
Course (C-Code,C-Title)
Result (RegNo, C-Code, C_Grade)
2NF Summarized
In 1NF
Includes no partial dependencies
No attribute dependent on a portion of primary key
Still possible to exhibit transitive dependency
Attributes may be functionally dependent on nonkey
attributes
THIRD NORMALIZATION FORM
Conversion to Third Normal Form
Step 1: Make New Tables to Eliminate Transitive
Dependencies
For every transitive dependency, write its determinant as PK for new
table
Determinant: any attribute whose value determines other values
within a row
Conversion to Third Normal Form (cont’d.)
Step 2: Reassign Corresponding Dependent Attributes
Identify attributes dependent on each determinant identified in Step 1
Identify dependency
Name table to reflect its contents and function
3NF Summarized
In 2NF
Contains no transitive dependencies
Improving the Design
Table structures should be cleaned up to eliminate initial partial and
transitive dependencies
Normalization cannot, by itself, be relied on to make good designs
it reduces data redundancy and builds controlled redundancy.
The higher the NF,
– the more entities one has,
– the more flexible the database will be,
– the more joins (and less efficiency) you have.
Improving the Design (cont’d.)
Issues to address, in order, to produce a good normalized set of
tables:
Evaluate PK Assignments
Evaluate Naming Conventions
Refine Attribute Atomicity
Identify New Attributes
Identify New Relationships
Refine Primary Keys as Required for Data Granularity
Maintain Historical Accuracy
Evaluate Using Derived Attributes
THE BOYCE-CODD NORMAL FORM
The Boyce-Codd Normal Form
Every determinant in table is a candidate key
Has same characteristics as primary key, but for some reason, not
chosen to be primary key
When table contains only one candidate key, the 3NF and the
BCNF are equivalent
BCNF can be violated only when table contains more than
one candidate key
The Boyce-Codd Normal Form (cont’d.)
Most designers consider the BCNF as a special case of 3NF
Table is in 3NF when it is in 2NF and there are no transitive
dependencies
Table can be in 3NF and fail to meet BCNF
No partial dependencies, nor does it contain transitive dependencies
A nonkey attribute is the determinant of a key attribute
C(nonprime attribute) is
determinant. C is
Candidate key.
59
60
Example: BCNF conversion
Decomposition into BCNF
Normalization and Database Design
Normalization should be part of the design process
Make sure that proposed entities meet required normal form
before table structures are created
Many real-world databases have been improperly designed or
burdened with anomalies
You may be asked to redesign and modify existing databases
Normalization and Database Design (cont’d.)
ER diagram
Identify relevant entities, their attributes, and their relationships
Identify additional entities and attributes
Normalization procedures
Focus on characteristics of specific entities
Micro view of entities within ER diagram
69
Denormalization
Creation of normalized relations is important database design
goal
Processing requirements should also be a goal
If tables are decomposed to conform to normalization
requirements:
Number of database tables expands
Denormalization (cont’d.)
Joining the larger number of tables reduces system speed
Conflicts are often resolved through compromises that may
include denormalization
Defects of unnormalized tables:
Data updates are less efficient because tables are larger
Indexing is more cumbersome
No simple strategies for creating virtual tables known as views
Summary
Normalization minimizes data redundancies
First three normal forms (1NF, 2NF, and 3NF) are most
encountered
Table is in 1NF when:
Atomicity of attributes is ensured
A Primary key is defined
Summary (cont’d.)
Table is in 2NF when it is in 1NF and contains no partial
dependencies
Table is in 3NF when it is in 2NF and contains no transitive
dependencies
Table that is not in 3NF may be split into new tables until all of
the tables meet 3NF requirements
Normalization is important part—but only part—of the design
process
Summary (cont’d.)
Tables are sometimes denormalized to yield less I/O, which
increases processing speed
Textbook
Carlos Coronel, Steve Morris, “Database Systems” Design, Implementation, Management, 12th Ed.
Course Technology, 2016”.(chapter 6)
Reference book
Jeffrey Hoffer, “Modern Database Management ” Design, Implementation,
Management, 10th Edition”
Thomas Connolly, “Database Systems: A Practical Approach to Design,
Implementation and Management (6th Ed.)” (chapter 13)
Elmasri, “Fundamentals of Database Systems: (7th Ed.)”