Database Design and
Normalization -
continued
CHAPTER 5
Informal Design Guidelines for Relation
Schemas
Functional Dependencies
Normal Forms Based on Primary Keys
General Definitions of Second and Third
Topics Normal Forms
Boyce-Codd Normal Form
Multivalued Dependency and Fourth
Normal Form
Join Dependencies and Fifth Normal
Form
Refers to the relationship that exists between
attributes or relations in a database. These
dependencies help to define the rules that govern
how data is stored and accessed within the database
The three main types of Dependencies:
Dependency
1. Functional Dependency
2. Multivalued Dependency
3. Join Dependency
1. Functional Dependency: This type of dependency
occurs when one attribute in a relation uniquely
determines another attribute in the same relation.
Type of 2. Multivalued Dependency: This type of dependency
occurs when the presence of certain values in one or
Dependency more attributes of a relation implies the presence of
certain other values in the same relation.
3. Join Dependency: This type of dependency occurs
when a relation can be expressed as a join of two or more
other relations.
1. Functional Dependency: This type of dependency
occurs when one attribute in a relation uniquely
determines another attribute in the same relation.
Employee ID First Name Last Name Department
001 John Smith Marketing
002 Jane Doe HR
Type of 003 Alex Kim IT
Dependency 004 Sarah Lee Marketing
In the table above, we can say that there is a functional
dependency between the Employee ID and the other attributes.
The Employee ID uniquely determines the first name, last name,
and department of each employee. For example, if we know that
the Employee ID is 001, we can determine that the first name is
John, the last name is Smith, and the department is Marketing.
However, if we only know the first name or the department, we
cannot determine anything about the other attributes. This
illustrates the concept of functional dependency in a real-life
scenario.
2. Multivalued Dependency: This type of dependency
occurs when the presence of certain values in one or
more attributes of a relation implies the presence of
certain other values in the same relation.
CourseNam
StudentID CourseID Professor
e
001 101 Math Dr. Smith
001 102 Science Dr. Jones
Type of 002 101 Math Dr. Smith
Dependency 002 102 Science Dr. Jones
In this table, there is a multivalued dependency between the "StudentID" and
"CourseID" attributes. This means that the presence of a certain "StudentID"
value implies the presence of certain "CourseID" values, and vice versa. For
example, if StudentID 001 is in the table, then CourseID 101 and CourseID 102
must also be in the table for that student. Similarly, if CourseID 101 is in the
table, then StudentID 001 and StudentID 002 must also be in the table for that
course.
This multivalued dependency can be represented using the notation:
StudentID →→ CourseID
which means "the value(s) of StudentID imply the value(s) of CourseID."
3. Join Dependency: This type of dependency occurs
when a relation can be expressed as a join of two or more
other relations.
Order_ID Customer_ID Order_Date
1 1 2022-03-10
2 2 2022-03-11
3 1 2022-03-11
Type of 4 3 2022-03-12
Dependency
Customer_Na
Customer_ID Email
me
1 John john@[Link]
2 Jane jane@[Link]
3 Bob bob@[Link]
Keys
1. Primary key: A primary key is a
column or set of columns that
uniquely identifies each row in a
table. It is used to enforce data
integrity and ensure that each row
can be uniquely identified.
2. Foreign key: A foreign key is a
column or set of columns in one
table that refers to the primary key
of another table. It is used to create
relationships between tables.
3. Relationship: A relationship is a
connection between two or more
tables, typically created using
foreign keys. There are different
types of relationships, including
one-to-one, one-to-many, and
many-to-many.
■ Perform a conceptual schema
Approaches design using a conceptual model
to practical such as ER or EER and map the
relational conceptual design into a set of
design relations.
projects ■ Design the relations based on
external knowledge derived from
an existing implementation of
files or forms or reports.
• first proposed by Codd (1972)
• takes a relation schema through a
series of tests to certify whether it
satisfies a certain normal form.
Normalizatio • a top-down process which
n process evaluates each relation against the
criteria for normal forms and
decomposes relations as necessary
(relational design by analysis)
involves splitting large tables into smaller ones and
defining relationships between them
Normalizatio based on a set of rules called normal forms, which
Normalization n are a series of guidelines that help to eliminate data
redundancy and ensure that the data is stored in the
most efficient and effective way possible.
The goal of normalization is to minimize data
redundancy and dependency by organizing data into
smaller, more manageable tables. This reduces the
risk of data inconsistencies and errors, and makes it
easier to update and modify data without affecting
other parts of the database.
Redundancy
Student Table
As you can see, there is redundancy in the table. For example, the information about
the Program Adviser and their office is repeated for each student in the program.
This duplication of data can lead to problems such as data inconsistency and update
anomalies.
To remove this redundancy, we can break the table into two separate
tables: one for student information and another for program
information. We can then link the two tables using a foreign key. Here's
an example of the normalized tables:
Redundancy
Redundancy
Student Table
Now, the program information is stored in a separate table and is linked to the student
table using the Program ID. This eliminates the redundancy and ensures data
consistency.
Normal Form. Refers to a set of
rules that a database table must
follow in order to eliminate
redundancy and maintain data
integrity. The main goal of
Normal normalization is to minimize data
Forms redundancy and avoid data
anomalies (inconsistencies) that can
occur when data is updated or
deleted.
1NF (First Normal Form): A table is in first normal form if
it has no repeating groups or arrays, and all data is atomic
(indivisible).
2NF (Second Normal Form): A table is in second normal
The most form if it is in 1NF and all non-key attributes are dependent
on the whole primary key, not just part of it.
commonly 3NF (Third Normal Form): A table is in third normal form
used step if it is in 2NF and all non-key attributes are dependent only
on the primary key and not on any other non-key attributes.
normal Boyce-Codd Normal Form (BCNF): A stricter version of
forms are: 3NF, BCNF ensures that every non-trivial functional
dependency is a superkey. This means that no partial
dependencies or transitive dependencies are allowed.
4NF (Fourth Normal Form): A table is in fourth normal
form if it is in BCNF and has no multi-valued dependencies.
5NF (Fifth Normal Form): A table is in fifth normal form if
it is in 4NF and has no join dependencies.
1NF (First Normal Form)
The first step in the process of organizing the data in a relational database.
1NF requires that the table meet the following requirements:
Each table cell should contain a single value, meaning that it
should be atomic. In other words, the table should not
contain any repeating groups or arrays.
Each row in the table should be unique, meaning that it
should have a primary key that uniquely identifies each
record.
Each column in the table should have a unique name.
Each table cell should contain a single value, meaning
that it should be atomic. In other words, the table
should not contain any repeating groups or arrays.
In this example, the
"Subjects" column
contains multiple
subjects for each
student, violating the
atomicity requirement.
This can make it difficult
to query and update the
data, and can lead to
To bring this table into data inconsistency.
1NF, we need to split
the "Subjects" column
into separate rows,
with each row Now each table cell
representing a single contains a single
subject for a single value, meeting the
student: atomicity requirement
of 1NF.
Each row in the table should be unique, meaning that
it should have a primary key that uniquely identifies
each record.
In this example, we have duplicate records for John
Doe with the same Student ID and Course. This can
make it difficult to uniquely identify a particular
record and can lead to data inconsistency.
To bring this table into 1NF, we need to
add a primary key column that
uniquely identifies each record:
Now each row in the table is unique,
meeting the uniqueness requirement
of 1NF.
Each column in the table should have a unique name.
In this example, we have two columns
named "Course", which violates the
unique column name requirement of 1NF.
This can make it difficult to refer to a
specific column and can lead to data
inconsistency.
To bring this table into 1NF, we need to
give each column a unique name:
Now each column in the table has a
unique name, meeting the unique
column name requirement of 1NF.
Second Normal Form (2NF)
In this example, the primary key is the
combination of Order ID and Product
Name. However, the Supplier Name and
Supplier Address are dependent only on
the Order ID, creating a partial
dependency on the primary key.
To bring this table into 2NF, we
need to split it into two tables:
Now the Supplier Name and Supplier
Address are moved to a separate table
that is related to the original table by the
Order ID, ensuring that every non-key
attribute is functionally dependent on the
primary key. This meets the requirements
of 2NF.
3NF (Third Normal Form)
In this table, we can see that there is a
transitive dependency between the
Customer Name and Product Name,
which violates 3NF. To bring this table
into 3NF, we need to remove this
transitive dependency.
In this example, we can identify two
functional dependencies:
Customer ID → Customer Name
Product Name → Product Category
Next, we can create the following
tables:
Now, the table is in 3NF, with no
repeating groups and each table having
a single primary key. The Order table
serves as a junction table that links the
Customer and Product tables together.
All non-key attributes depend on the
respective primary keys and not on any
other non-key attributes, satisfying the
rules of 3NF.
Boyce-Codd Normal Form
Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it was
found to be stricter than 3NF. That is, every relation in BCNF is also in 3NF; however,
a relation in 3NF is not necessarily in BCNF.
4NF (Fourth Normal Form)
In this table, there is a multi-valued dependency between the course ID and the
instructor ID and name, since one course can have multiple instructors and one
instructor can teach multiple courses.
To remove this multi-valued
dependency and normalize the
table up to 4NF, we can split the
table into four separate tables:
By doing so, we eliminate the
multi-valued dependency
between the course ID and the
instructor ID and name,
resulting in a more efficient and
normalized database design.
5NF (Fifth Normal Form)
In this table, there is a join dependency between the employee ID, project ID, and department ID.
This means that if we know the values of any two of these attributes, we can determine the value of
the third attribute.
To remove this join dependency and
normalize the table up to 5NF, we
can split the table into five separate
tables:
By doing so, we eliminate the join
dependency and have a more
efficient and normalized database
design that allows for more flexibility
in querying and manipulating the
data.
Summary
• pitfalls in relational database design using intuitive arguments
• measures for indicating whether a relation schema is good or bad
• informal guidelines for a good design
• limiting NULL values
• formal concepts that allow us to do relational design in a top-down fashion by analyzing
relations individually
• process of design by analysis and decomposition by introducing the process of
normalization
• functional dependency properties
• Functional dependencies
• normalization process
• treatment of successive normalization based on a predefined primary key in each relation
• provided more general definitions of second normal form (2NF)
• third normal form (3NF) that take all candidate keys of a relation into account.
• Boyce-Codd normal form (BCNF)
• decomposition of a non-BCNF relation
• test for the nonadditive join property of binary decompositions
• general algorithm to convert any relation not in BCNF into a set of BCNF relations
• multivalued dependency (MVD)
• fifth normal form