Database Design-Functional Dependencies and Normalization
Database Design-Functional Dependencies and Normalization
UNIT-
Chapter--10
Chapter
Database Design:Functional
Dependencies and Normalization
for Relational Database
c
10.1 Informal Design Guidelines for
Relational Schemas
Informal measures of quality of relational
schemas:
¦ Semantics of the attributes
Department
è
Guideline1
¦ Design a relational schema so that it is easy to
explain its meaning
¦ Do not combine attributes from multiple entities
and relationship types into single relation
¦ In Emp_dept mixes attributes of employee and
departments
¦ Emp_proj mixes attributes of employees and
projects
¦ Although there is nothing wrong but these are
considered poor designs
Î
Emp_dept
Emp_proj
å
. Redundant Information in tuples
and update anomalies
¦ Minimize the redundancy so that storage space
is not wasted
¦ In emp_dept, the attribute values pertaining to a
particular department are repeated for every
employee who works for that department
¦ In contrast each department info appears only
once in department relation
¦ It may lead to insert, delete and update
anomalies
ü
Insert anomaly:
¦ Suppose we want to enter a new tuple for
employee who works in department 5 so it may
lead to consistency problem
¦ If we want to enter a new department that has
no employees as yet, we have to place NULL for
attributes of employees where primary key
cannot be NULL
Delete anomaly:
If we delete employee tuple that represents last
employee of that department , then info of that
department is lost from the database
Modification anomaly:
ƥ If we change the attribute of a particular
department then we have to make
changes of all employees who work in that
department
ƥ If we fail to update then it will cause
inconsistency
Guideline
¦ Design the base relation schemas so that
no insertion, deletion and modification
anomalies are present
*
3.. NULL values in tuple
¦ Waste of storage space
¦ How to account them for aggregate
functions
¦ Means unpredictable, unknown, absent
c
Guideline 3
¦ Avoid placing NULL values in a base
relation
¦ If NULLƞs are unavoidable then make sure
that they are applied in exceptional cases
only and donot apply it on majority of
tuples
cc
è. Generation of spurious tuples
¦ In relation emp_proj1 and emp_locs are
the base relations instead of emp_proj
¦ We cannot recover info that was originally
in emp_proj from emp_proj1 and
emp_locs
¦ Because in this case Ploc is an attribute
that is neither a primary key nor a foreign
key
c
Emp_pro
Emp_locs
Ename Ploc
Emp_pro1
c
Guideline è
¦ Design relations so that they can be joined
on primary keys and foreign keys in a way
that guarantees for no spurious tuples
¦ Avoid relations that contain matching
attributes that are not primary keys and
foreign keys because joining of these lead
to spurious attributes
cè
Functional Dependencies
¦ Functional Dependency denoted by X
between two sets of attributes X and that are
subsets of R specifies a constraint on the
possible tuples that can form a relation r from R
¦ The constraint is that for any two tuples t1 and
t in r that have t1[X]=t[X] then t1[y]=t[y].
¦ Values of the X component determine the values
of component or is functionally dependent
on X
cÎ
¦ Consider relation
emp_proj1(ssn,ename,pnum, pname,
ploc, hours)
¦ From the semantics of attributes
¦ SSN Ename
¦ Pnum
Pnum 'Pname,Ploc}
¦ 'SSN,Pnum}
'SSN,Pnum} Hours
cå
¦ In some cases FD cannot be inferred from
a given relation
¦ FD must be defined explicitly by someone
who knows the sementics of the attributes
of relation
¦ Eg: Course teacher
cü
Inference rules for FD
¦ The set of all the dependencies that include F as
well as all the dependencies that can be inferred
from f is called i of F denoted by F+
¦ Eg Dept no mgrssn
¦ And mgrssn mgrphone then
¦ deptno mgrphone
¦ To determine a systematic way to infer
dependencies from a given set of dependencies
there are O i
¦ IR1
IR 1 (Reflexive rule): If X then X
¦ IR
IR (Augmentation rule): X thenXZ Z
c
¦ IR holds only if t1(X)=t (X), t1()=t(),
t1(XZ)=t(XZ), t1(Z)=t(Z)
¦ IR3 (transition rule): X , Z then X Z
c*
¦
¦ IRè
IRè: X Z then Z (IR1
(IR1, Z )
¦ IR5
IR5: X and X Z then X X(IR
X(IR)
X Z(IR
Z(IR) and X Z(IR 3)
Z(IR3
¦ IR6
IR6: X , W Z then WX W (IR (IR) and
WX Z(IR
Z(IR3
3)
¦ A set of functional dependencies F is said to
i functional dependencies E if every
dependency in E can be inferred from F
¦ Two sets E and F are O if E+=F+
Normal forms based on primary
keys
¦ Ê OO process first proposed by
Codd (197) takes a relation through a
series of tests to certify that whether it
satisfies a certain normal form or not
¦ This process starts from top down fashion
¦ Codd proposed 3 normal forms, all these
normal forms are based on single
analytical tool :functional dependencies
c
¦ Later, è normal form and 5 normal form was
proposed based on multivalued and join
dependencies
¦ This approach is called relational design by
synthesis
¦ Ê OO is a process of analyzing the
given relation schemas based on FD and primary
keys to achieve these:
1) minimizing redundancy
) minimizing insertion, deletion and update
anomalies
The process of storing to join of higher normal
form relations as a base relation which is of
lower form is known as denormalization
¦ Process of normalization through decomposition
should also confer two additional properties:
1) lossless join or nonadditive join property:
guarantees that there are no spurious tuples
) dependency preservation property: ensures
that each FD is represented in some individual
relation after decomposition
Super Key: A superkey in a given relation
R='A1,A,ƦAn} is a set of attributes S is
superset of R with the property that no two
tupes t1 and t in any relation will have
t1[S]=t[S]
¦ Key is a superkey wih additional property that
removal of any attributes from K will cause K not
to be a super key
¦ If a relation has more than one keys than each
is called a candidate key
¦ One of the candidate key will be a primary key
others will be secondary keys
¦ An attribute of a relation is called prime if it is a
member of some candidate keys and others are
called non prime attributes
¦ Unique key is attribute that uniquely identifies
each row in a table and allow Null values in it.
è
¦ Primary key is attribute or combination of
attributes that uniquely identifies a row in a
table and should be not Null.
¦ Foreign key is attribute or combination of
attributes whose value match a primary key in
another table
¦ Composite or compound key consist of or
more attributes
¦ Candidate key is a collection of keys in a table
which has ability to become a primary key
¦ Alternate or secondary key is a candidate key
that is not a part of primary key
Î
First Normal Form1NF
¦ It states that domain of an attribute must
include only atomic values and that the value of
any attribute in an tuple must be a single value
from the domain of that attribute
¦ 1NF disallows having set of values, a tuple of
values or the combination of values
¦ Consider department relation , its not in 1NF
bcause Dloc is not atomic
¦ Dnum Dloc because Dnum is primary key so
there are three main techniques to make it in
1NF:
å
1) Remove Dloc and make it in a separate relation
Dept_locations which will have primary key Dnum
and Dloc, this will form two 1NF relations
) Expand the key so that there is a separate tuple in
dept where primary key is combination of
'Dnum,Dloc}
3) Max number of values in Dloc as three and place Null
for other tuples but it will generate spurious tuples
ü
Dname Dnum Dmgrssn
Dnum Dloc
Research 5 333èèè5 Bellaire
5
Adminis è 98765è3 Sugarlan
5
Headquater 1 8886677 d
5 Houstan
è Stafford
1 Houston
Dname Dnum Dmgrssn Dloc
Research 5 333èèè5 Bellaire
Research 5 333èèè5 Sugarland
Research 5 333èèè5 Houstan
Adminis è 98765è3 Stafford
Headquater 1 8886677 Houston
SSN Ename Pnumber Hours
13è Smith,John 1, 3,è3
5678 Narayan, 3 è0
Joyce
913 Ramseh, è,5 0,10
Rakesh
è567 Wong, 6,7 35,10
Franklin
*
Second Normal Form (NF)
¦ It is based on Full functional dependency X is in NF
if its in 1NF and every nonprime attribute A in R is fully
functional dependent on the primary key of R
¦ X is Fully Functionally dependent if removal of any
attribute A from X means that the dependency does not
hold any more
¦ X is Partially Dependent if some attributes A belongs
to X can be removed from X and the dependency still
holds
¦ SSN Ename is exmple of FFD in emp_dept relation
¦ Whereas 'SSN,Pnum} Ename is Partial Dependency in
emp_proj relation
¦ In emp_proj with two primary keys SSN and Pnumber,
relation is in 1NF but not in NF
¦ These are the given FDƞs:
¦ 'SSN,Pnumber} Hours
¦ 'SSN,Pnumber} Ename
¦ 'SSN,Pnumber} 'Pname,Plocations}
¦ This should hold but Ename,
Pname,Plocations are partially dependent
because SSN Ename
and Pnumber 'Pname,Plocations}
c
SSN Pnumber Hours Ename Pname Plocations
Emp_Proj
SSN Pnumber Hours
EP1
EP 1
SSN Ename
EP
Pnumber Pname Plocations
EP3
Third Normal Form (3
(3NF)
¦ 3NF is based on the concept of transitive
dependency
¦ A relation R is in 3NF if it satisfies NF and no
nonprime attribute of R is transitive dependent
on the primary key
¦ FD X in relation R is transitive dependent if
set of attributes that is neither candidate key nor
subset of any key holds both X Z and Z
¦ In relation emp_dept, SSN Dmgrssn is
transitive dependent because SSN Dnum and
Dnum Dmgrssn and Dnum is not a primary
key
¦ So because of this transitive dependency
between SSN , Dnumber and Dmgrssn this
relation is not in 3NF
¦ So we have to decompose relation into two
tables independent of transitive dependency
such that natural join operation will result in
original base table emp_dept relation
¦ It is not important to remove partial
dependencies before transitive dependency but
normal forms are made such that in NF partial
dependencies are removed and in 3NF transitive
dependencies are removed
è
Emp_Dept
ED1
Ename SSN Bdate Address Dnum
ED
Î
¦ Def of NF: A relation schema R is in NF if its
in 1NF and every non prime attribute A in R is
not partially dependent on any key in R
¦ Def of 3NF:A relation R is in 3NF if its in NF
and whenever nontrivial functional dependency
X A holds in R either X is a superkey or A is a
prime attribute of R
¦ Def 3 of 3NF: A relation R is in 3NF if every
nonprime attribute meets both conditions:
It is fully functionally dependent on every key of
R
It is non transitively dependent on every key of
R
å
Example:1 NF
lots
lots1
Country_name taxrate
lots
¦ Property_id is primary key and
country_name and lotno is candidate key
¦ FD1:Prop_id
country_name,lotno,area,price,taxrate
¦ FD:Countryname,lotno
prop_id,area,price,taxrate
¦ FD3:Countryname taxrate, taxrate is partially
dependent on countryname and lotno
¦ FDè:Area price
¦ (FD5:Area Countryname)
¦ (Consider FD5 only in BCNF)
¦ Because of FD3 lots is not in NF so we
decompose lots into lots1 and lots
*
lots1
Example:3 NF
Prop_id Country_ Lotno Area Price
name
lots1A
lots1B
Area Price
è
¦ FDè violates 3NF becaues Area is not a key and
price is not a prime attribute
¦ To make it 3NF we decompose reation lots1 into
lots1A and lots1B by removing transitive
dependency
¦ Price is transitively dependent on each of
candidate keys via Area
¦ This lots1A is in 3NF but not BCNF because of
FD5 because Area is not a superkey and
Countryname is a prime attribute so we
decompose relation lots1A into lots1Ax and
lots1Ay
èc
Boyce--Codd Normal Form
Boyce
¦ BCNF was proposed as a simpler form of 3NF, it seems
to be strict and stronger than NF
¦ A relation R is in BCNF if whenever a nontrivial functional
dependency X A holds in R, then X is a superkey
¦ In relation lotsA, Area Countrycode so its in 3NF but
not in BCNF form because countrycode is a key
(candidate key) but Area is not a superkey
¦ For eg in relation Teach 'student,coures,instructor}
where student &course is primary key
¦ 'student,course} instructor
¦ instructor course
¦ This is in 3NF but not in BCNF so we decompose it into
'instructor,student} and 'instructor,course}
è
Teach
Teach1
Teach 1
Student Instructor
Teach
Instructor Course
è
lots1A Example:BCNF
Prop_id Country_na Lotno Area
me
lots1Ax
Prop_i Area lotno
d
lots1Ay
Area Countryna
me
èè
Benefits or Advantages of normalization:
¦ Greater overall database organization
èÎ