0% found this document useful (0 votes)
44 views44 pages

Database Normalization Techniques Explained

part of ktu notes cgip

Uploaded by

jzf802606
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views44 pages

Database Normalization Techniques Explained

part of ktu notes cgip

Uploaded by

jzf802606
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Normalization

MODULE IV
● Different anomalies in designing a database, The idea of
normalization, Functional dependency, Armstrong’s Axioms (proofs
not required), Closures and their computation, Equivalence of
Functional Dependencies (FD), Minimal Cover (proofs not required).
● First Normal Form (1NF), Second Normal Form (2NF), Third Normal
Form (3NF), Boyce Codd Normal Form (BCNF), Lossless join and
dependency preserving decomposition, Algorithms for checking
Lossless Join (LJ) and Dependency Preserving (DP) properties
Normalization

● Normalization in databases is a process of organizing the attributes


and tables of a relational database to minimize redundancy and
dependency.
● The main objective of normalization is to eliminate data anomalies
like insertion, update, and deletion anomalies, which can occur
when a database is not properly structured.

● Normalization typically involves breaking down a large table into


smaller tables and defining relationships between them.
Normalization of Relations
● Normalization is usually achieved through a series of normal forms
● Normal forms are a series of guidelines used to structure relational
databases effectively, ensuring data integrity and reducing redundancy.
● Types of Normal forms:
○ First Normal Form
○ Second Normal Form
○ Third Normal Form
○ Boyce Codd Normal Form
○ Fourth Normal Form
○ Fifth Normal Form
Super Key

● A superkey of a relation schema R = {A1, A2, ...., An} is a set of


attributes S subset-of R with the property that no two tuples t1 and
t2 in any legal relation state r of R will have t1[S] = t2[S] .
Definitions of Keys and Attributes
● If a relation schema has more than one key, each is called a
candidate key.
○ One of the candidate keys is arbitrarily designated to be
the primary key, and the others are called secondary
keys.

● A prime attribute is an attribute that is part of any


candidate key. A prime attribute is also known as a key
attribute.
● A non-prime attribute is one that is not part of one of the
candidate keys.
First Normal Form (1NF)

● For a table to be in the First Normal Form, it should follow the


following rules:

 It should only have single(atomic) valued attributes/columns.


 Values stored in a column should be of the same domain
 All the columns in a table should have unique names.
 And the order in which data is stored, does not matter.
Example
● Table does not satisfy 1NF ROLL_NO NAME SUBJECT
● Why?
1 Danish OS, DBMS

● How to solve?
3 Denik Java

2 Daryl C, C++
ROLL_NO NAME SUBJECT

Example 1 Danish OS

1 Danish DBMS
● 1NF
● Atomic Values 3 Denik Java
● Unique Column Names
● Order of Instances 2 Daryl C
● Same domain
2 Daryl C++

● By doing so, although a few values are getting repeated but values for
the subject column are now atomic for each record/row.
● Using the First Normal Form, data redundancy increases, as there will be many
columns with same data in multiple rows but each row as a whole will be unique.
Convert the table into 1NF
Second Normal Form (2NF)

● For a table to be in the Second Normal Form(2NF), it must


satisfy two conditions:
○ The table should be in the First Normal Form.
○ There should be no Partial Dependency.
What is Dependency?

 Let's take an example of a Student table with


columns student_id, name, reg_no branch and address .
 In this table, student_id is the primary key and will be unique for every
row, hence we can use student_id to fetch any row of data from this
table
 Even for a case, where student names are same, if we know
the student_id we can easily fetch the correct record. (primary key)
 This is called dependency. (functional dependency)
Partial Dependency

SUBJECT SCORE

subject_id subject_name score_id student_id subject_id marks teacher


(Primary
Key) 1 10 1 70 X
1 Java
2 10 2 75 Y
2 C++
3 11 1 80 X
3 OS
Partial Dependency
● Together, student_id + subject_id forms a Candidate Key
for Score table, which can be the Primary key
● ie, Primary key for this table is a composition of two columns
which is student_id & subject_id but the teacher's name only
depends on subject_id , not the entire primary key
● This is Partial Dependency, where an attribute in a table
depends on only a part of the primary key and not on the
whole key.
● A functional dependency X->Y is a partial dependency if Y is
functionally dependent on X and Y can be determined by
any proper subset of X.
How to remove Partial Dependency

● The simplest solution is to remove columns teacher from Score


table and add it to the Subject table.
● No partial dependency in score table.
Third Normal Form (3NF)

● A relation will be in 3NF


○ if it is in 2NF and does not contain any transitive partial
dependency.
● 3NF is used to reduce the data duplication and to
achieve the data integrity.
● If you have a table where attribute C depends on
attribute B, which in turn depends on attribute A (A -> B
-> C), then you would move attribute C to a separate
table along with B as the primary key.
Employee Table

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago


● Super key in the table above:
○ {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_ZIP}
● Non-prime attributes: In the given table, all attributes
except EMP_ID are non-prime.
● Here
○ (EMP_IDEMP_ZIP)
○ (EMP_ZIPEMP _STATE, EMP_ZIPEMP _CITY)
○ The non-prime attributes (EMP_STATE, EMP_CITY) transitively
dependent on super key(EMP_ID). It violates the rule of third
normal form.(ab,bc ac)
○ So need to move the EMP_CITY and EMP_STATE to the new
EmployeeZip table, with EMP_ZIP as a Primary key.
3NF forms

Employee Table EmployeeZip


Table
EMP_ID EMP_NAME EMP_ZIP EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 201010 UP Noida

333 Stephan 02228 02228 US Boston

444 Lan 60007 60007 US Chicago


Boyce Codd Normal Form (BCNF)

● BCNF is the advance version of 3NF. It is stricter than 3NF.


○ A table is in BCNF if every functional dependency X → Y, X
is the super key of the table.
○ For BCNF, the table should be in 3NF, and for every FD,
LHS is super key.
EMPLOYEE table
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232


● In the above table Functional dependencies are as follows:
○ EMP_ID → EMP_COUNTRY
○ EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
● Candidate key: {EMP-ID, EMP-DEPT}
● The table is not in BCNF because neither EMP_DEPT nor EMP_ID
alone are keys.
● To convert the given table into BCNF, we decompose it into three
tables:
EMP_DEPT DEPT_TYPE EMP_DEPT_NO

BCNF Designing D394 283

EMP_ID EMP_COUNTRY Testing D394 300

264 India
Stores D283 232

364 UK

EMP_ID EMP_DEPT_NO

264 283

264 300

364 232
● Candidate keys:
○ For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT_NO}
● Functional dependencies:
○ EMP_ID → EMP_COUNTRY
○ EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
● Now, this is in BCNF because left side part of both the functional
dependencies is a key.
Examples
Steps to decompose a non-2NF relation to a 2NF
relation
Step 1: Create a separate relation for each partial dependency
Step 2: Remove the right hand side attribute of the partial dependency
from the relation that is being decomposed.
Example 1

Consider the Flight_Schedule table with attributes (Flight_ID,Flight_Day, Pilot, Boarding_Gate), the following is the
set of functional dependencies;
F = { Flight_ID Flight_Day → Pilot Boarding_Gate, Flight_ID → Boarding_Gate}

The key is (Flight_ID, Flight_Day)


These two attributes together can identify the Pilot value uniquely. But for identifying the other attribute
Boarding_Gate, the attribute Flight_ID is enough. SO there is a partial dependency

Step 1: Create a separate relation for each partial dependency.


Flight_ID → Boarding_Gate is the partial dependency.
Hence we need to create a separate relation for this FD. Boarding ( Flight_ID, Boarding_Gate)

Step 2: Remove the right hand side attribute of the partial dependency from the relation that is being
decomposed.
The attribute Boarding_Gate should be removed as per this condition.
Hence, Flight_Schedule (Flight_ID, Flight_Day, Pilot).

Thus, Flight_Schedule (Flight_ID, Flight_Day, Pilot, Boarding_Gate) is decomposed into

Flight_Schedule (Flight_ID, Flight_Day, Pilot)


Boarding ( Flight_ID, Boarding_Gate).
Example 2
Assume a relation R (A, B, C, D, E) with the following set of functional
dependencies; F = {AB → C, B → D, E → D}Find the key and decompose
to 2NF

The key for this relation is ABE. Then, all three given FDs are partial
dependencies, viz., AB → C, B → D, and E → D.
Step 1: separate tables for partial dependencies; hence, R1 (ABC), R2
(BD) and R3 (ED).
Step 2: remove RHS of these partial FDs from R; hence, R4(A, B, E).
Thus, we have four tables R1 (ABC), R2 (BD), R3 (ED) and R4 (ABE).
Example 3
Example 4
For the given relation R(ABCDE) and F : {A->C, B->DE, D->C}, check
which functional dependency (FD) violates the 2NF and decompose R
into 2NF.

Decomposition of relation R is R1(AC), R2(BDE), R3(AB).


Relational Decomposition

● When a relation in the relational model is not in


appropriate normal form then the decomposition of a
relation is required.
● In a database, it breaks the table into multiple tables.
● If the relation has no proper decomposition, then it may
lead to problems like loss of information.
● Decomposition is used to eliminate some of the
problems of bad design like anomalies, inconsistencies,
and redundancy.
Types of Decomposition
Lossless Decomposition

● If the information is not lost from the relation that is decomposed,


then the decomposition will be lossless.
● The lossless decomposition guarantees that the join of relations will
result in the same relation as it was decomposed.
● The relation is said to be lossless decomposition if natural joins of
all the decomposition give the original relation.
EMPLOYEE_DEPARTMENT table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing


The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT

EMP_ID EMP_NAME EMP_AGE EMP_CITY


22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
DEPT_ID EMP_ID DEPT_NAME

827 22 Sales

438 33 Marketing

869 46 Finance

575 52 Production

678 60 Testing
Employee ⋈ Department
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing


To check for lossless join decomposition using FD set, following conditions
must hold:

● Union of Attributes of R1 and R2 must be equal to attribute


of R. Each attribute of R must be either in R1 or in R2.
○ Att(R1) U Att(R2) = Att(R)
● Intersection of Attributes of R1 and R2 must not be NULL.
○ Att(R1) ∩ Att(R2) ≠ Φ
● Common attribute must be a key for at least one relation
(R1 or R2)
○ Att(R1) ∩ Att(R2) -> Key (R1) or Att(R1) ∩ Att(R2) -> Key (R2)
A relation R (A, B, C, D) with FD set{A-
>BC}.Perform decomposition and check
whether it is lossy or lossless ?
● R is decomposed into R1(ABC) and R2(AD)
● First condition holds true as Att(R1) U Att(R2) = (ABC) U (AD) =
(ABCD) = Att(R).
● Second condition holds true as Att(R1) ∩ Att(R2) = (ABC) ∩ (AD) ≠ Φ
● Third condition holds true as Att(R1) ∩ Att(R2) = A is a key of
R1(ABC) because A->BC is given. (Common attribute must be a key
to atleast one relation
Algorithm -Testing for Lossless decomposition
Algorithm to check if decomposition is lossy or lossless
Step 1 − Create a table with M rows and N columns
M= number of decomposed relations.
N= number of attributes of original relation.
Step 2 − If a decomposed relation Ri has attribute A then
Insert a symbol (say ‘a’) at position (Ri,A)
Step 3 − Consider each FD X->Y
If column X has two or more symbols then
Insert symbols in the same place (rows) of column Y.
Step 4 − If any row is completely filled with symbols then
Decomposition is lossless.
Else
Decomposition is lossy.
Problem

Consider R(A,B,C,D,E), F:{A->B, BC->E, ED->A}


R is decomposed into R1(AB) and R2(ACDE). Check the decomposition
is lossy or lossless
Step 1

Step 2
Step 3
Now let us insert symbol ‘a’ for A->B in second column, second row

R2 is completely filled => decomposition is lossless.


Dependency Preserving

● It is an important constraint of the database.


● In the dependency preservation, at least one decomposed table must
satisfy every dependency.
● If a relation R is decomposed into relation R1 and R2, then the
dependencies of R either must be a part of R1 or R2 or must be derivable
from the combination of functional dependencies of R1 and R2.
● For example, suppose there is a relation R (A, B, C, D) with functional
dependency set (A->BC).
● The relational R is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of relation R1(ABC).
End of Module-IV

You might also like