CENG 3005
Database Management Systems
Week 8
• Functional Dependencies
• Normal Forms
Database Management Systems
Week 8
OUTLINE
• SQL
• Ordering
• Set Operations
• Inner Join/Left Outer Join/Right Outer Join/Full Join
• Aggregate Functions
• Modifying the database (Delete/Insert/Update)
• Views
• Stored procedures
• Functional Dependencies
• Normal Forms
Today and next week we will learn:
What is normalization ? Why do we need it?
What does functional dependency mean? Why do we
need it?
What role does it play in the database design process?
What do the normal forms 1NF, 2NF, 3NF, BCNF mean?
How normal forms can be transformed from lower
normal forms to higher normal forms?
How some situations require denormalization to
generate information efficiently?
Watch [Link]
M
1NF
Watch [Link]
2NF
3NF
Functional Dependencies (FDs), and
how to find Keys using them
[Link]
i4sk4h2lhtU
Functional Dependencies
(formal definition)
Definition: A functional dependency (FD) on a relation
schema R is a constraint X -> Y, where X and Y are
subsets of attributes/columns of R.
An FD X -> Y is satisfied in a row r of R if for every
pair of tuples, t and s: if t and s agree on all attributes in
X then they must agree on all attributes in Y
In other words, for a dependency to hold at a table R, if all the values on the
left side attribute set (X) are same, they should be the same on the right side
attribute set (Y)
If you know left side, you can assume you know right side!
• SSN -> SSN, Name, Address
Functional Dependencies
(Examples)
• Address -> ZipCode
– Stony Brook’s ZIP is 11733
• ArtistName -> BirthYear
– Picasso was born in 1881
• Autobrand -> Manufacturer, Engine type
– Pontiac is built by General
Motors(Manufacturer) with gasoline engine
(Engine Type)
• Author, Title -> PublDate
– Shakespeare’s Hamlet published in 1600
Entailment, Closure, Equivalence
Definition: If F is a set of FDs on schema R and f is
another FD on R, then F entails f if every instance(row) r
of R that satisfies every FD in F also satisfies f
Ex: F = {A -> B, B-> C} and f is A -> C
• If Streetaddr -> Town and Town -> Zip then Streetaddr ->
Zip
Definition: The closure of F, denoted F+, is the set of all
FDs entailed by F
Definition: F and G are equivalent if F-> G and G-> F
Properties
of
Functional Dependencies
• Reflexivity: If Y -> X then X -> Y (trivial FD)
– Name, Address -> Name
• Augmentation: If X -> Y then X Z-> YZ
– If Town -> Zip then Town, Name -> Zip, Name
• Transitivity: If X -> Y and Y -> Z then X -> Z
Derived inference rules
Union: if X-> Y and X-> Z, then X-> YZ.
Decomposition: if X-> YZ, then X-> Y and X-> Z.
Pseudotransitivity: if X-> Y and WY-> Z, then WX-> Z.
Generating F+
F (functional dependency list for your table)
AB-> C
union AB-> BCD decomp
aug
A-> D AB-> BD trans AB-> BCDE AB-> CDE
aug
D-> E BCD -> BCDE
Thus, F+
consists of AB-> BD, AB -> BCD, AB -> BCDE,
and AB -> CDE
If you understood Functional
Dependency so far,
Normalization is so easy!
Why do we need to normalize our
databases?
We need a criterion for determining a
table's degree of vulnerability to
1) logical inconsistencies and
2) anomalies.
1
Normalization of DB Tables
Normalization
– Process for evaluating and correcting table structures
• determines the optimal assignments of attributes to entities
– Normalization provides micro view of entities
• focuses on characteristics of specific entities
• may yield additional entities
– Works through a series of stages called normal forms
• 1NF -> 2NF -> 3NF -> 4NF (optional)
– Higher the normal form, slower the database response
• more joins are required to answer end-user queries
So if the response is slow, why do the database people
normalize?
1. Reduce uncontrolled data redundancies
• Help eliminate data anomalies
2. Produce controlled redundancies to link tables
1
Normalization Forms
(simple simple simple)
“Data depends on the key
[1NF]
the whole key
[2NF]
and nothing but the key
[3NF]”
“If all the arrows in FDs are out of a candidate
key” [BCNF]
1
Normalization
is the process for evaluating and correcting
table structures to minimize data redundancies
– reduces data anomalies
works through a series of stages called normal
forms
– First normal form (1NF)
– Second normal form (2NF)
– Third normal form (3NF)
Edgar Codd, inventor of the Relational Model,
introduced the normal forms in 1970s
2
Normalization Example
We have a table with Columns
name R1; containing – Order
orders in an online – Product
store – Customer
Each entry in the table – Address
represents an item on – Quantity
a particular order – UnitPrice
Primary key is {Order,
Product}
2
2
2
Functional Dependencies
All the Functional Dependencies for this table R1
{Order, Product, Customer, Address, Quantity, UnitPrice}
1. Each order is for a single {Order}-> {Customer}
customer
2. Each customer has a single {Customer}-> ->
address {Address}
3. Each product has a single {Product} -> ->
price: {UnitPrice}
4. From FDs 1 and 2 and
transitivity:Contains partial dependency
{Order} -> {Address}
Address depends only part of the primary 2
Normalization to 3NF
R{Order, Product, Customer, Address, Quantity,
UnitPrice} has now been split into 3 relations
–
R1={Order, Customer, Address}
R3={Product, UnitPrice}
R4={Order, Product, Quantity}
R3 and R4 are in 3NF
BUT!!! R1 has a transitive FD on its key!!
To remove this transitive FD from R1
{Order} -> -> {Customer}-> -> {Address}
We decompose R1 over
– {Order, Customer}
– {Customer, Address}
2
Normalization
1NF:
– {Order, Product, Customer, Address, Quantity,
2NFUnitPrice}
(no partial dependency on key)
{Product, UnitPrice}
{Order, Product, Quantity}
{Order, Customer, Address}
3NF (no transitive dependence on a key)
{Product, UnitPrice}
{Order, Product, Quantity}
{Order, Customer}
{Customer, Address}
2
Another example: Course
Schedules
Consider a relation, Schedule, which stores
information about times for various schedules of
courses
For example: labs for first years
Each course has several schedules
Only one schedule (of any course at
all) takes place at any given time
Each student taking a course is
assigned to a single schedule for it
2
FDs in the Course Schedule
Relation
Candidate keys: {Student, Course} and {Student, Time}
Schedule has the following non-trivial FDs
– {Student, Course} -> {Time}
– {Time} -> {Course}
Since there is no partial dependency(2NF) and
no transitive dependency(3NF), Schedule table
is in 3NF
2
The Schedule Relation
Student Course Time
John Databases
12:00
Mary Databases
12:00
Richard Databases
15:00
Richard Programming
Can you find the candidate keys?
10:00
Candidate keys: {Student, Course} and {Student, Time}
2
Anomalies in Schedule
INSERT anomalies
– What if there is a new student with no class? You can’t add an
empty schedule item (what if there is a new student with no
class)
UPDATE anomalies
– Moving the 12:00 class to 9:00 means changing two rows
DELETE anomalies
– Deleting Rebecca removes one class(time/date) from schedule
Student Course Time
John Databases 12:00
Mary Databases 12:00
Richard Databases 15:00
Richard Programming 10:00
Mary Programming 10:00
Rebecca Programming 13:00 3