Unit 2: Relational Database Design
1. Introduction to Relational Database Design
The primary goal of relational database design is to create a set of relation schemas that allow
for efficient information storage without unnecessary redundancy, while also enabling easy
data retrieval. A key approach to achieving this is by designing schemas that adhere to
appropriate normal forms through a process known as normalization. This process helps in
creating a database that is robust, flexible, and free from data anomalies.
2. The Purpose of Normalization
Normalization is a formal, step-by-step process for analyzing and organizing data in a
database to minimize data redundancy and improve data integrity. The main purposes of
normalization are:
● Identify Suitable Relations: To create a set of relations that effectively and accurately
represent the data requirements of an enterprise.
● Minimize Attributes: To ensure relations contain the minimum number of attributes
necessary to support the data requirements.
● Minimize Redundancy: To reduce the amount of redundant data by ensuring that each
piece of information is stored in only one place. This saves storage space and improves
consistency.
● Group Logically Related Data: To group attributes that have a close logical relationship
(as defined by functional dependencies) within the same relation.
3. Data Redundancy and Update Anomalies
A major aim of relational database design is to group attributes into relations in a way that
minimizes data redundancy. Relations with redundant data can suffer from update
anomalies, which compromise data integrity. These are categorized as follows:
● Insertion Anomalies: These occur when it is not possible to insert a new piece of
information into the database without also having data for another, unrelated attribute.
For example, if you cannot add a new branch to a database until a staff member is
assigned to it because the branch and staff information are in the same table.
● Deletion Anomalies: This happens when the deletion of a piece of information
unintentionally leads to the loss of other, unrelated information. For instance, if a table
stores both staff and branch details, deleting the record of the last staff member at a
particular branch might also delete all information about the branch itself.
● Modification Anomalies: This occurs when a change to a single piece of information
requires updates in multiple places within the database. If all instances are not updated
correctly, the database becomes inconsistent. For example, changing a branch address
would require updating the records of all staff members working at that branch.
4. Functional Dependencies
A functional dependency is a constraint that describes the relationship between attributes
in a relation. It states that the value of one set of attributes determines the value of another
set of attributes.
● Notation: X → Y, which is read as "X functionally determines Y".
● Determinant: The attribute or set of attributes on the left-hand side of the arrow (X) is
called the determinant.
Types of Functional Dependencies
● Full Functional Dependency: A dependency X → Y is full if the removal of any attribute
from the determinant X will cause the dependency to no longer hold.
● Partial Dependency: A dependency X → Y is partial if an attribute in the determinant X
can be removed, and the dependency still holds. This occurs in relations with composite
primary keys.
● Transitive Functional Dependency: A condition where A → B and B → C, which implies
that A → C. Here, C is transitively dependent on A through B.
● Trivial Functional Dependency: A dependency A → B where B is a subset of A. (e.g.,
{StudentID, StudentName} → StudentName).
● Nontrivial Functional Dependency: A dependency A → B where B is not a subset of A.
Inference Rules for Functional Dependencies (Armstrong's Axioms)
Given a set of functional dependencies (F), other dependencies can be inferred. The set of all
functional dependencies implied by F is called the closure (F+). Armstrong's axioms are a
set of sound and complete rules used to find the closure:
● Reflexivity: If B is a subset of A, then A → B.
● Augmentation: If A → B, then AC → BC.
● Transitivity: If A → B and B → C, then A → C.
Closure of an Attribute Set
The closure of an attribute set A, denoted as A+, is the set of all attributes that can be
functionally determined from A using the given functional dependencies.
Identifying Keys Using Functional Dependencies
● Superkey: A set of attributes whose closure contains all attributes of the relation.
● Candidate Key: A minimal superkey, meaning no proper subset of the key is also a
superkey.
● Primary Key: A candidate key chosen by the database designer to uniquely identify
tuples.
5. The Process of Normalization
Normalization is a step-by-step process of decomposing relations with anomalies to produce
smaller, well-structured relations.
Normal Forms
● Unnormalized Form (UNF): A table that contains one or more repeating groups
(multi-valued attributes).
● First Normal Form (1NF): A relation is in 1NF if all its attributes are atomic (i.e., the
intersection of each row and column contains one and only one value). To achieve 1NF,
repeating groups must be removed.
● Second Normal Form (2NF): A relation is in 2NF if it is in 1NF and every non-primary-key
attribute is fully functionally dependent on the primary key. To achieve 2NF, all partial
dependencies must be removed by placing the partially dependent attributes in a
separate table along with a copy of their determinant.
● Third Normal Form (3NF): A relation is in 3NF if it is in 2NF and no non-primary-key
attribute is transitively dependent on the primary key. To achieve 3NF, all transitive
dependencies must be removed by placing the transitively dependent attributes in a new
table.
● Boyce-Codd Normal Form (BCNF): A stricter version of 3NF. A relation is in BCNF if and
only if for every non-trivial functional dependency X → Y, X is a superkey. In simpler
terms, every determinant must be a candidate key.
● Fourth Normal Form (4NF): A relation is in 4NF if it is in BCNF and has no multivalued
dependencies. A multivalued dependency exists when there are at least three attributes
(e.g., A, B, and C) in a relation, and for each value of A, there is a well-defined set of
values for B and a well-defined set of values for C. However, the set of values for B is
independent of the set of C. This is denoted as A ↠ B and A ↠ C.
● Fifth Normal Form (5NF): Also known as Project-Join Normal Form (PJNF), 5NF is
designed to reduce redundancy in relational databases that record multi-valued facts by
isolating semantically related multiple relationships. A relation is in 5NF if and only if every
join dependency in it is implied by the candidate keys.
6. Decomposition
Decomposition is the process of breaking down a single relation into two or more smaller
relations to eliminate redundancy and anomalies. There are two important properties of
decomposition:
● Lossless-Join Property: This property ensures that the original relation can be perfectly
reconstructed by performing a natural join on the decomposed relations. No spurious
tuples should be generated, and no original tuples should be lost. A decomposition of R
into R1 and R2 is lossless if R1 ∩ R2 → R1 or R1 ∩ R2 → R2.
● Dependency Preservation Property: This property ensures that all the functional
dependencies of the original relation are preserved across the decomposed relations.
This means that each original functional dependency can be checked by examining a
single one of the decomposed relations.