0% found this document useful (0 votes)
325 views45 pages

Database Design-Functional Dependencies and Normalization

The document discusses database design and normalization. It begins by outlining informal guidelines for relational schemas, including imparting clear semantics to attributes, reducing redundant information and null values, and avoiding spurious tuples. It then defines functional dependencies and describes inference rules for determining dependencies. Finally, it introduces the concept of normal forms based on primary keys and the normalization process of decomposing relations to eliminate anomalies and achieve higher normal forms.

Uploaded by

Neha Jagadish
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
Download as ppt, pdf, or txt
0% found this document useful (0 votes)
325 views45 pages

Database Design-Functional Dependencies and Normalization

The document discusses database design and normalization. It begins by outlining informal guidelines for relational schemas, including imparting clear semantics to attributes, reducing redundant information and null values, and avoiding spurious tuples. It then defines functional dependencies and describes inference rules for determining dependencies. Finally, it introduces the concept of normal forms based on primary keys and the normalization process of decomposing relations to eliminate anomalies and achieve higher normal forms.

Uploaded by

Neha Jagadish
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1/ 45

UNIT-VI

UNIT-
Chapter--10
Chapter

Database Design:Functional
Dependencies and Normalization
for Relational Database

c
10.1 Informal Design Guidelines for
Relational Schemas
Informal measures of quality of relational
schemas:
¦ Semantics of the attributes

¦ Reducing the redundant information in


tuples
¦ Reducing the NULL values in tuples

¦ Disallowing the possibility of generation


spurious tuples

1.Imparting clear semantics to
Attributes in Relations
¦ Semantics refer to the interpretation of attribute
value in a tuple
¦ Meaning of Employee table : each tuple
represents ename, SSN, birthdate, address, Dno.
¦ Department table and project are also
straightforward
¦ Semantic of dept_locations and works_on are
complex
¦ Dept_locations has multivalued attribute and
works_on has N:M ralationship b/w emp and
project
Employee

Ename SSN Bdate Address Dno

Department

Dname Dnumber Dmgr_SSN


Dept_Loc
Dnumber Dlocation
Project
Pname Pnumber Plocation Dnum
Works_on

SSN Pnumber Hours

è
Guideline1
¦ Design a relational schema so that it is easy to
explain its meaning
¦ Do not combine attributes from multiple entities
and relationship types into single relation
¦ In Emp_dept mixes attributes of employee and
departments
¦ Emp_proj mixes attributes of employees and
projects
¦ Although there is nothing wrong but these are
considered poor designs

Î
Emp_dept

Ename SSN Bdate Add Dnum Dname Dmgr


_SSN

Emp_proj

SSN Pnum Hours Ename Pname Plocation

å
. Redundant Information in tuples
and update anomalies
¦ Minimize the redundancy so that storage space
is not wasted
¦ In emp_dept, the attribute values pertaining to a
particular department are repeated for every
employee who works for that department
¦ In contrast each department info appears only
once in department relation
¦ It may lead to insert, delete and update
anomalies

ü
Insert anomaly:
¦ Suppose we want to enter a new tuple for
employee who works in department 5 so it may
lead to consistency problem
¦ If we want to enter a new department that has
no employees as yet, we have to place NULL for
attributes of employees where primary key
cannot be NULL
Delete anomaly:
If we delete employee tuple that represents last
employee of that department , then info of that
department is lost from the database


Modification anomaly:
ƥ If we change the attribute of a particular
department then we have to make
changes of all employees who work in that
department
ƥ If we fail to update then it will cause
inconsistency
Guideline 
¦ Design the base relation schemas so that
no insertion, deletion and modification
anomalies are present

*
3.. NULL values in tuple
¦ Waste of storage space
¦ How to account them for aggregate
functions
¦ Means unpredictable, unknown, absent

c
Guideline 3
¦ Avoid placing NULL values in a base
relation
¦ If NULLƞs are unavoidable then make sure
that they are applied in exceptional cases
only and donot apply it on majority of
tuples

cc
è. Generation of spurious tuples
¦ In relation emp_proj1 and emp_locs are
the base relations instead of emp_proj
¦ We cannot recover info that was originally
in emp_proj from emp_proj1 and
emp_locs
¦ Because in this case Ploc is an attribute
that is neither a primary key nor a foreign
key

c
Emp_pro

SSN Pnum Hours Ename Pname Ploc

Emp_locs

Ename Ploc

Emp_pro1

SSN Pnum Hours Pname Ploc

c
Guideline è
¦ Design relations so that they can be joined
on primary keys and foreign keys in a way
that guarantees for no spurious tuples
¦ Avoid relations that contain matching
attributes that are not primary keys and
foreign keys because joining of these lead
to spurious attributes


Functional Dependencies
¦ Functional Dependency denoted by X†
between two sets of attributes X and  that are
subsets of R specifies a constraint on the
possible tuples that can form a relation r from R
¦ The constraint is that for any two tuples t1 and
t in r that have t1[X]=t[X] then t1[y]=t[y].
¦ Values of the X component determine the values
of  component or  is functionally dependent
on X


¦ Consider relation
emp_proj1(ssn,ename,pnum, pname,
ploc, hours)
¦ From the semantics of attributes
¦ SSN †Ename
¦ Pnum†
Pnum †'Pname,Ploc}
¦ 'SSN,Pnum}†
'SSN,Pnum} † Hours


¦ In some cases FD cannot be inferred from
a given relation
¦ FD must be defined explicitly by someone
who knows the sementics of the attributes
of relation
¦ Eg: Course †teacher

This cannot be true for all the legal states


If teacher teaches two subjects then we
cannot conclude that teacher is FD on
course


Inference rules for FD
¦ The set of all the dependencies that include F as
well as all the dependencies that can be inferred
from f is called i  of F denoted by F+
¦ Eg Dept no †mgrssn
¦ And mgrssn †mgrphone then
¦ deptno †mgrphone
¦ To determine a systematic way to infer
dependencies from a given set of dependencies
there are O  i 
¦ IR1
IR 1 (Reflexive rule): If X ƒ  then X †
¦ IR
IR  (Augmentation rule): X † thenXZ †Z
c
¦ IR holds only if t1(X)=t (X), t1()=t(),
t1(XZ)=t(XZ), t1(Z)=t(Z)
¦ IR3 (transition rule): X †, †Z then X †Z

¦ IRè (decomposition or projection rule): X †Z


not equal to X †Z
¦ IR5 (union or additive rule): X †,X †Z then
this is not equal to X †Z
¦ IR6 (pseudotransitive rule):X †,W †Z the
WX †Z
IR1 generates dependencies that are always true,
such dependencies are known as ›OO others
as  ›OO

c*
¦ ‰ 
¦ IRè
IRè: X †Z then Z † (IR1
(IR1, Z ƒ)
¦ IR5
IR5: X † and X †Z then X †X(IR
X(IR)
X †Z(IR
Z(IR) and X †Z(IR 3)
Z(IR3
¦ IR6
IR6: X †, W †Z then WX †W (IR (IR) and
WX †Z(IR
Z(IR3
3)
¦ A set of functional dependencies F is said to
i functional dependencies E if every
dependency in E can be inferred from F
¦ Two sets E and F are O  › if E+=F+


Normal forms based on primary
keys
¦ Ê  O›O process first proposed by
Codd (197) takes a relation through a
series of tests to certify that whether it
satisfies a certain normal form or not
¦ This process starts from top down fashion
¦ Codd proposed 3 normal forms, all these
normal forms are based on single
analytical tool :functional dependencies

c
¦ Later, è normal form and 5 normal form was
proposed based on multivalued and join
dependencies
¦ This approach is called relational design by
synthesis
¦ Ê  O›O is a process of analyzing the
given relation schemas based on FD and primary
keys to achieve these:
1) minimizing redundancy
) minimizing insertion, deletion and update
anomalies
The process of storing to join of higher normal
form relations as a base relation which is of
lower form is known as denormalization

¦ Process of normalization through decomposition
should also confer two additional properties:
1) lossless join or nonadditive join property:
guarantees that there are no spurious tuples
) dependency preservation property: ensures
that each FD is represented in some individual
relation after decomposition
Super Key: A superkey in a given relation
R='A1,A,ƦAn} is a set of attributes S is
superset of R with the property that no two
tupes t1 and t in any relation will have
t1[S]=t[S]


¦ Key is a superkey wih additional property that
removal of any attributes from K will cause K not
to be a super key
¦ If a relation has more than one keys than each
is called a candidate key
¦ One of the candidate key will be a primary key
others will be secondary keys
¦ An attribute of a relation is called prime if it is a
member of some candidate keys and others are
called non prime attributes
¦ Unique key is attribute that uniquely identifies
each row in a table and allow Null values in it.


¦ Primary key is attribute or combination of
attributes that uniquely identifies a row in a
table and should be not Null.
¦ Foreign key is attribute or combination of
attributes whose value match a primary key in
another table
¦ Composite or compound key consist of  or
more attributes
¦ Candidate key is a collection of keys in a table
which has ability to become a primary key
¦ Alternate or secondary key is a candidate key
that is not a part of primary key


First Normal Form1NF
¦ It states that domain of an attribute must
include only atomic values and that the value of
any attribute in an tuple must be a single value
from the domain of that attribute
¦ 1NF disallows having set of values, a tuple of
values or the combination of values
¦ Consider department relation , its not in 1NF
bcause Dloc is not atomic
¦ Dnum †Dloc because Dnum is primary key so
there are three main techniques to make it in
1NF:


1) Remove Dloc and make it in a separate relation
Dept_locations which will have primary key Dnum
and Dloc, this will form two 1NF relations
) Expand the key so that there is a separate tuple in
dept where primary key is combination of
'Dnum,Dloc}
3) Max number of values in Dloc as three and place Null
for other tuples but it will generate spurious tuples

Dname Dnum Dmgrssn Dloc


Research 5 333èèè5 'Bellaire, Sugarland,Houstan}
Adminis è 98765è3 'Stafford}
HeadQuater 1 8886677 'Houston}


Dname Dnum Dmgrssn
Dnum Dloc
Research 5 333èèè5 Bellaire
5
Adminis è 98765è3 Sugarlan
5
Headquater 1 8886677 d
5 Houstan

è Stafford

1 Houston
Dname Dnum Dmgrssn Dloc
Research 5 333èèè5 Bellaire
Research 5 333èèè5 Sugarland
Research 5 333èèè5 Houstan
Adminis è 98765è3 Stafford
Headquater 1 8886677 Houston

SSN Ename Pnumber Hours
13è Smith,John 1, 3,è3

5678 Narayan, 3 è0
Joyce
913 Ramseh, è,5 0,10
Rakesh
è567 Wong, 6,7 35,10
Franklin

*
Second Normal Form (NF)
¦ It is based on Full functional dependency X † is in NF
if its in 1NF and every nonprime attribute A in R is fully
functional dependent on the primary key of R
¦ X † is Fully Functionally dependent if removal of any
attribute A from X means that the dependency does not
hold any more
¦ X † is Partially Dependent if some attributes A belongs
to X can be removed from X and the dependency still
holds
¦ SSN †Ename is exmple of FFD in emp_dept relation
¦ Whereas 'SSN,Pnum} †Ename is Partial Dependency in
emp_proj relation
¦ In emp_proj with two primary keys SSN and Pnumber,
relation is in 1NF but not in NF


¦ These are the given FDƞs:
¦ 'SSN,Pnumber} †Hours
¦ 'SSN,Pnumber} †Ename
¦ 'SSN,Pnumber} †'Pname,Plocations}
¦ This should hold but Ename,
Pname,Plocations are partially dependent
because SSN †Ename
and Pnumber †'Pname,Plocations}

c
SSN Pnumber Hours Ename Pname Plocations
Emp_Proj
SSN Pnumber Hours

EP1
EP 1

SSN Ename
EP
Pnumber Pname Plocations

EP3


Third Normal Form (3
(3NF)
¦ 3NF is based on the concept of transitive
dependency
¦ A relation R is in 3NF if it satisfies NF and no
nonprime attribute of R is transitive dependent
on the primary key
¦ FD X † in relation R is transitive dependent if
set of attributes that is neither candidate key nor
subset of any key holds both X †Z and Z †
¦ In relation emp_dept, SSN †Dmgrssn is
transitive dependent because SSN †Dnum and
Dnum †Dmgrssn and Dnum is not a primary
key
¦ So because of this transitive dependency
between SSN , Dnumber and Dmgrssn this
relation is not in 3NF
¦ So we have to decompose relation into two
tables independent of transitive dependency
such that natural join operation will result in
original base table emp_dept relation
¦ It is not important to remove partial
dependencies before transitive dependency but
normal forms are made such that in NF partial
dependencies are removed and in 3NF transitive
dependencies are removed

è
Emp_Dept

Ename SSN Bdate Address Dnum Dname Dmgrssn

ED1
Ename SSN Bdate Address Dnum

ED

Dnum Dname Dmgrssn

Î
¦ Def  of NF: A relation schema R is in NF if its
in 1NF and every non prime attribute A in R is
not partially dependent on any key in R
¦ Def  of 3NF:A relation R is in 3NF if its in NF
and whenever nontrivial functional dependency
X †A holds in R either X is a superkey or A is a
prime attribute of R
¦ Def 3 of 3NF: A relation R is in 3NF if every
nonprime attribute meets both conditions:
It is fully functionally dependent on every key of
R
It is non transitively dependent on every key of
R
å
Example:1 NF

lots

Prop_id Country Lotno Area Price Taxrate


_name
FD1
FD
FD3
FDè ü
Example: NF
Prop_id Country_name Lotno Area Price

lots1

Country_name taxrate

lots


¦ Property_id is primary key and
country_name and lotno is candidate key
¦ FD1:Prop_id
†country_name,lotno,area,price,taxrate
¦ FD:Countryname,lotno
†prop_id,area,price,taxrate
¦ FD3:Countryname †taxrate, taxrate is partially
dependent on countryname and lotno
¦ FDè:Area †price
¦ (FD5:Area †Countryname)
¦ (Consider FD5 only in BCNF)
¦ Because of FD3 lots is not in NF so we
decompose lots into lots1 and lots

*
lots1
Example:3 NF
Prop_id Country_ Lotno Area Price
name

lots1A

Prop_id Country_ Lotno Area


name

lots1B
Area Price

è
¦ FDè violates 3NF becaues Area is not a key and
price is not a prime attribute
¦ To make it 3NF we decompose reation lots1 into
lots1A and lots1B by removing transitive
dependency
¦ Price is transitively dependent on each of
candidate keys via Area
¦ This lots1A is in 3NF but not BCNF because of
FD5 because Area is not a superkey and
Countryname is a prime attribute so we
decompose relation lots1A into lots1Ax and
lots1Ay

èc
Boyce--Codd Normal Form
Boyce
¦ BCNF was proposed as a simpler form of 3NF, it seems
to be strict and stronger than NF
¦ A relation R is in BCNF if whenever a nontrivial functional
dependency X †A holds in R, then X is a superkey
¦ In relation lotsA, Area †Countrycode so its in 3NF but
not in BCNF form because countrycode is a key
(candidate key) but Area is not a superkey
¦ For eg in relation Teach 'student,coures,instructor}
where student &course is primary key
¦ 'student,course} †instructor
¦ instructor †course
¦ This is in 3NF but not in BCNF so we decompose it into
'instructor,student} and 'instructor,course}
è
Teach

Student Course Instructor

Teach1
Teach 1

Student Instructor

Teach

Instructor Course

è
lots1A Example:BCNF
Prop_id Country_na Lotno Area
me

lots1Ax
Prop_i Area lotno
d

lots1Ay

Area Countryna
me
èè
Benefits or Advantages of normalization:
¦ Greater overall database organization

¦ Reduction of redundant values

¦ Data consistency within database

¦ A much more flexible design

¦ A better handle of database security

èÎ

You might also like