0% found this document useful (0 votes)
203 views

Database Normalization

The document discusses database normalization. It begins by defining normalization and its goals of reducing data duplication and related problems. It then discusses the reasons for normalization, including examples of data duplication problems. The document outlines the various normal forms from 1NF to BCNF and how to achieve each form through decomposition. It defines key concepts like functional dependencies and explains how different normal forms address partial and transitive functional dependencies that can cause issues. Finally, it summarizes the goals and techniques of normalization for the first three normal forms and BCNF in removing unnecessary data duplication.

Uploaded by

Ram Bler
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
203 views

Database Normalization

The document discusses database normalization. It begins by defining normalization and its goals of reducing data duplication and related problems. It then discusses the reasons for normalization, including examples of data duplication problems. The document outlines the various normal forms from 1NF to BCNF and how to achieve each form through decomposition. It defines key concepts like functional dependencies and explains how different normal forms address partial and transitive functional dependencies that can cause issues. Finally, it summarizes the goals and techniques of normalization for the first three normal forms and BCNF in removing unnecessary data duplication.

Uploaded by

Ram Bler
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

Database Normalisation

What is Database Normalisation? Why need Database Normalisation? How to perform Database Normalisation?

Summary of Database Normalisation.

What is Normalisation

Normalisation uses a set of restrictions to exclude the undesirable properties from database design.

The restrictions are called normal forms;


Normalisation achieves a good database design by reducing the amount of data duplication.

Why need Normalisation


Review on Logical Database design:

Data analysis

To identify the natural structure called conceptual data model.

Data analysis methods:

Top-down and bottom-up;

Data design

To determine the logical database structure, represented by a set of relations/tables, their attributes and their keys.

Why need Normalisation


Example:
Order_No 01 01 02 02 03 04 Area W York W York Middl Middl Middl Middl S_Off Leeds Leeds Oxford Oxford Oxford Oxford C_No C1 C1 C2 C2 C2 C9 C_Name Nippers Nippers Tots-Gear Tots-Gear Tots-Gear Kid-Naps P_NO P1 P2 P1 P5 P3 P3 P_Name Pantaloons Pantaloons Pantaloons Pinafore Socks Socks QTY 100 50 100 200 50 50

What problems are there in this table?

Why need Normalisation


Three undesirability caused by Data Duplication:

Unnatural representation of information.


Difficult to update the database. More storage space is required.

How to do Normalisation
Normalisation is to make a more natural representation by:

Removing database update problems;

Making the database smaller.

First Normal Form (1NF) Second Normal Form (2NF) and Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF);
Fourth and Fifth Normal Form (4NF and 5NF)

First Normal Form (1NF)


1NF: tuples only contain atomic attribute that can not be spit into any smaller components.

1NF: ensures that each attribute represents a single fact.


Attributes which contain sub-attributes and repeating groups of attributes are not permitted.

First Normal Form (1NF)


Example: 0NF_CUSTOMER_ORDER
CUSTOMER Order_No Area S_Off PRODUCT QTY C_No 01 W York Leeds C1 C_Name Nippers P_NO P1 P2 02 Middl Oxford C2 Tots-Gear P1 P5 03 04 Middl Middl Oxford Oxford C2 C9 Tots-Gear Kid-Naps P3 P3 P_Name Pantaloons Pantaloons Pantaloons Pinafore Socks Socks 100 50 100 200 50 50 ORDER_LINES

First Normal Form (1NF)


1NF:
Order_No

Remove the attributes that are combinations of other attributes 1NF_CUSTOMER_ORDER


Area S_Off C_No C_Name P_NO P_Name QTY

01
01 02 02 03 04

W York
W York Middl Middl Middl Middl

Leeds
Leeds Oxford Oxford Oxford Oxford

C1
C1 C2 C2 C2 C9

Nippers
Nippers Tots-Gear Tots-Gear Tots-Gear Kid-Naps

P1
P2 P1 P5 P3 P3

Pantaloons
Pantaloons Pantaloons Pinafore Socks Socks

100
50 100 200 50 50

Again, What is the problem in this table?

2NF and 3NF (Key)


Review on the concept of Key of relation:
Key is one or a group of attributes, which uniquely identify tuples of a relation. A relation may have many keys, called candidate keys. One of candidate keys, which is used for the tuple identification purpose, is called the primary key. A primary key must never have a null value, and may be used as a foreign key to represent relationships between objects.

2NF and 3NF (FD)


Functional Dependency (FD)
FD defines the relationship between key (identifying attributes) and other attributes (representing single-values facts).
Employee_No 123 314 127 Name J Bloggs D Brown Home_Address 25 High Street, Leeds 5 Low Street, Leeds 25 High Street, Leeds

An employee is identified by the Employee_No Name and Home Address represent single-valued fact about employee.

R Bloggs
J Smith J Smith

500 512

19 Grand Way, Leeds 20 Parsons View, Leeds

2NF and 3NF (FD)


Relationships between Employee_No, Name and Home_Address:
Employee_No

Name J Bloggs D Brown

Home_Address 25 High Street, Leeds 5 Low Street, Leeds 25 High Street, Leeds

Employee_No is the key


Employee_No Employee_No Employee_No

123 314 127

R Bloggs
J Smith J Smith

Name

500 512

19 Grand Way, Leeds 20 Parsons View, Leeds

Home_Add
Name, Home_Add

X Y: X determines Y.
??? Name Employee_No, NameHome_Address

They are not correct!

2NF and 3NF (FD)


Functional Dependency (FD)
X Y means: The value of Y represents a single-valued fact, and this fact is identified by the value of X. X is called the determinant Y is called the determined attributes.

2NF and 3NF (Partial FD)


Partial FD on a key: the determinant is part of a key.
Order_No 01 01 Area W York W York S_Off Leeds Leeds C_No C1 C1 C_Name Nippers Nippers P_NO P1 P2 P_Name Pantaloons Pantaloons QTY 100 50

02
02 03 04

Middl
Middl Middl Middl

Oxford
Oxford Oxford Oxford

C2
C2 C2 C9

Tots-Gear
Tots-Gear Tots-Gear Kid-Naps

P1
P5 P3 P3

Pantaloons
Pinafore Socks Socks

100
200 50 50

(Order_NO, P_No) QTY; P_N0 P_Name

What are the FDs??


Area S_Off.

Order_No C_No, C_Name, Area, S_Off; C_No C_Name, Area, S_Off;

2NF and 3NF (Partial FD)


Partial FD on a key: the determinant is part of a key.
(Order_NO, P_No) QTY; Order_No C_No, C_Name, Area, S_Off; P_N0 P_Name C_No C_Name, Area, S_Off; Area S_Off. There are two Partial FD on the key: Order_No C_No, C_Name, Area, S_Off; P_N0 P_Name

2NF and 3NF (Partial FD)


A Partial FD on the key occurs when the key includes the more than one objects and their identifiers.
The existence of Partial FD on the key result in the data duplication, and inconvenience in updating database. The 2NF and 3NF prohibit this situation where each tuple

include the information about more than one objects.


To design a database in 2NF, it is necessary to split the relation using project operation of relational algebra.

2NF (Method)
The Second Normalisation of relation (2NF):
Step 1: using the partial FD (XY) on a key forms a new
relation (X,Y), where X is a key.

Step 2: using the key of original relation, and the attributes


that are not functionally determined by X.

2NF (Example)
Example: attributes (Order_No, Area, S_Off, C_No, C_Name, P_No, P_Name, QTY)
Partial FD on a key: Order_No C_No, C_Name, Area, S_Off; P_N0 P_Name

Step 1: (X,Y) 2NF_Product (P_N0, P_Name); 2NF_CUSTOMER_ORDER(Order_No, C_No, C_Name, Area, S_Off); Step 2: (original key, others)
2NF_ORDER_LINE(Order_No, P_No, QTY)

2NF (Example)
Results:
1NF_CUSTOMER_ORDER
Order_No Area S_Off C_No C_Name P_NO P_Name QTY

2NF_Product
P_NO P_Name

2NF_CUSTOMER_ORDER
Order_No C_No Name Area S_Off

2NF_ORDER_LINE
Order_No P_No QTY

2NF and 3NF


The Third Normalisation of relation (3NF):
2NF prohibits that a relation contains more than one objects
using key as identifiers of objects.

3NF, based on 2NF, prohibits that a relation contain more


than one objects using non-key attributes as identifier.

3NF (Transitive FD)


The Third Normalisation of relation (3NF):
Transitive FDs of non-key attribute on keys: a non-key

attributes is determined by another non-key attribute, and


the latter is determined by a key.
If attribute X is a key, and attributes Y and Z are not included in
keys, then there is a transitive FD of Z on x if: XY and YZ.

3NF (Example)
Example:
2NF_CUSTOMER_ORDER(Order_No, C_No, C_Name, Area, S_Off) ORDER_NO C_No, C_Name, Area, S_Off

C_No C_Name, Area, S_Off


ORDER_NO C_No C_Name, Area, S_Off.

This relation can be split into two relations. One is identified by the key (ORDER_No) and another is identified by the non-key (C_No).

3NF (Method)
The third Normalisation Method:
Step 1: For X Y Z, a new relation is formed by (Y,Z)

Step 2: Another new relation is formed by (X, Y ,others)


2NF_CUSTOMER_ORDER(Order_No, C_No, C_Name, Area, S_Off) ORDER_NO C_No C_Name, Area, S_Off.
3NF_ORDER
Order_No C_No

2NF_CUSTOMER
C_No C_Name Area S_Off

Strong 3NF: BCNF


BCNF (Boyce-Codd Normal Form) require that:

1) Each tuple represent only one object;

2) Every determinant of a FD (X in XY) is a candidate key.

BCNF is stronger than 3NF as 3NF does no require X is a candidate key.

Strong 3NF: BCNF


3NF_STOCK Key:
(Ware_house, Bin_No);
P_No P1 P1 Ware_house WH1 WH1 Bin_No B1 B3 QTY 100 200

P2
P5 P5 P5

WH3
WH4 WH4 WH4

B2
B9 B10 B11

3000
50 50 50

(P_No, Bin_No)

If each type of product is required to store only at one site, i.e. P_NO Ware_house, then we have: P_No Ware_house; (P_No, Bin_No) QTY

(Ware_house, Bin_No) P_No, QTY.

Strong 3NF: BCNF


P_No Ware_house; (P_No, Bin_No) QTY (Ware_house, Bin_No) P_No, QTY. Keys:
(Ware_house, Bin_No); (P_No, Bin_No)

P_No P1 P1 P2 P5

Ware_house WH1 WH1 WH3 WH4 WH4 WH4

Bin_No B1 B3 B2 B9 B10 B11

QTY 100 200 3000 50 50 50

Is P_No a key???

P5 P5

Strong 3NF: BCNF


As P_NO is not a key, this relation is not in BCNF.
A BCNF is generated by repeatedly applying the following process: For a relation (X,Y,Z), in which XY, but X is not a key, then a new relation (X,Y) is formed, and another new relation is formed by (X, Z)
P_No Ware_house Bin_No Quantity

P_No Ware_house, but P_No is not a key.


P_No Ware_house P_No Bin_No Quantity

4NF and 5NF


1-3NFs and BCNF concern with single-valued objects. 4NF and 5Nf concern with multi-valued objects.
4NF: Removes repeating groups of INDEPENDENT multi-valued facts.

5NF: Removes redundancy when there are INTERDEPENDENT multivalued facts

Summary: Normalisation
Unnecessary repetition of data cause the inability to represent information, update anomalies, and excessive database size. Normal forms are retractions to exclude these unnecessary data repetition. Normal forms (NFs) are defined in terms of functional dependence:

XY models the relationship between the attributes X and the


attributes Y. Normalisation is the process of applying the constraints of NFs.

Summary:1-3NFs and BCNF


1NF, 2NF, 3NF and BCNF are defined to remove unnecessary duplication data that represents single-valued facts.

1NF: the use of atomic attributes.


2NF: non-key attribute dependent on the key. 3NF: non-key attribute dependent on WHOLE key.

BCNF:3NF plus all determinants in the relation are candidate keys.

Reading: 4.10-4.14

Exercises: 4.8-4.17

You might also like