Unit-4
Schema refinement, Functional
dependencies and Normal Form
Schema refinement
Database Design
• coming up with a ‘good’ schema is very important
How do we characterize the “goodness” of a schema ?
If two or more alternative schemas are available
how do we compare them ?
What are the problems with “bad” schema designs ?
Normal Forms:
• Each normal form specifies certain conditions If the
conditions are satisfied by the schema
• certain kind of problems are avoided
An Example
•student relation with attributes: studName, rollNo, gen, studDept
•department relation with attributes: deptName, officePhone, hod
• Several students belong to a department.
• studDept gives the name of the student’s department.
•Correct schema:
• Student Department
studName rollNo gen studDept deptName officePhone HOD
•Incorrect schema:
• Student Dept
•What are the problems that arise ?
Problems of Redundancy
Redundancy is at the root of several problems associated with relational
schemas:
redundant storage, insert/delete/update anomalies
Integrity constraints, in particular functional dependencies, can be used
to identify schemas with such problems and to suggest refinements.
Role of FDs in detecting redundancy:
Consider a relation R with 3 attributes, ABC.
No FDs hold: There is no redundancy.
Given A B: Several tuples can have the same A value,
• and if so, they’ll all have the same B value (Redundancy)
Schema refinement technique: decomposition (replacing ABCD with,
say, AB and BCD, or ACD and ABD).
Update Anomalies
• If a database design is not perfect, it may contain anomalies, which
are like a bad dream for any database administrator. Managing a database
with anomalies is next to impossible.
Update anomalies − If data items are scattered and are not linked to each
other properly, then it could lead to strange situations. For example, when
we try to update one data item having its copies scattered over several
places, a few instances get updated properly while a few others are left
with old values. Such instances leave the database in an inconsistent state.
Deletion anomalies − We tried to delete a record, but parts of it was left
undeleted because of unawareness, the data is also saved somewhere else.
Insert anomalies − We tried to insert data in a record that does not exist at
all.
• Normalization is a method to remove all these anomalies and bring the
database to a consistent state.
EXAMPLE
• Social Security , Name, Lot, Rating, Wage, Hours per week
S N L R W H
123-22-3666 Attishoo 48 8 10 40
231-31-5368 Smiley 22 8 10 30
131-24-3650 Smethurst 35 5 7 30
434-26-3751 Guldu 35 5 7 32
612-67-4134 Madayan 35 8 10 40
• FDS S {S,N,L,R,W,H} AND R W
dependency
• Problems due to R
W• :Update anomaly: Can we change W in just the 1st tuple of SNLRWH?
• Insertion anomaly: What if we want to insert an employee and don’t
know the hourly wage for his rating?
• Deletion anomaly: If we delete all employees with rating 5, we lose the
information about the wage for rating 5
• Null Values: null values do not provide a general solution to the problems of redundancy, even though they
can help in some cases
Example Solution
Will 2 smaller tables be S N L R W H
better? 123-22-3666 Attishoo 48 8 10 40
231-31-5368 Smiley 22 8 10 30
131-24-3650 Smethurst 35 5 7 30
Wage 434-26-3751 Guldu 35 5 7 32
s 612-67-4134 Madayan 35 8 10 40
R W
8 10 Hourly_Emps
5 7 2N
S L R H
123-22-3666 Attishoo 48 8 40
231-31-5368 Smiley 22 8 30
131-24-3650 Smethurst 35 5 30
434-26-3751 Guldu 35 5 32
612-67-4134 Madayan 35 8 40
Relational Decomposition
Decomposition addresses redundancy of data
Use FDs to identify “good” ways to split relations
Split R into 2+ smaller relations having less redundancy
Split up F into subsets which apply to the new relations
Decomposition should be used judiciously:
Is there a reason to decompose a relation?
What problems (if any) does the decomposition cause?
A good decomposition does not :
lose information
complicate checking of constraints
contain anomalies (or at least contains fewer anomalies)
Types of Relational Decomposition
Decomposition
Lossless Dependency
decomposition preserving
Lossless Decomposition Dependency Preserving
If the information is not lost from the It is an important constraint of the
relation that is decomposed, then the database.
decomposition will be lossless. In the dependency preservation, at least
The lossless decomposition guarantees one decomposed table must satisfy
that the join of relations will result in the every dependency.
same relation as it was decomposed. If a relation R is decomposed into
The relation is said to be lossless relation R1 and R2, then the
decomposition if natural joins of all the dependencies of R either must be a part
decomposition give the original relation. of R1 or R2 or must be derivable from
the combination of functional
dependencies of R1 and R2.
Functional Dependency
Functional dependency (FD) is a set of constraints between two attributes in a
relation.
Functional dependency is represented by an arrow sign (→) that is, X→Y,
where X functionally determines Y. The left-hand side attributes determine the
values of attributes on the right-hand side.
• If t1.X=t2.X , then t1.Y=t2.Y
• Example:
• The FD AB->C is satisfied by the following instances.
A B C D
a1 b1 c1 D1
a1 b1 c1 d2
a1 b2 c2 d1
a2 b1 c3 d1
• To add a tuple <a1 ,b1 , c2 ,d1> to the instances is not possible because it
violates FD.
Reasoning About FDs
Given some FDs, we can usually infer additional FDs:
ssn did, didlot implies ssn lot
An FD f is implied by a set of FDs F if f holds whenever all
FDs in F hold.
F = closure of F is the set of all FDs that are implied by F.
Armstrong’s Axioms (X, Y, Z are sets of attributes):
Reflexivity: If X Y, then Y X
Augmentation: If XY, then XZ YZ for any Z
Transitivity: If X Y and Y Z, then X Z
These are sound and complete inference rules for FDs!
Reasoning About FDs(Contd.)
Couple of additional rules (that follow from Armstrong Axiom):
Union: If X Y andX Z, then X YZ
Decomposition: If X YZ, then X Y and X Z
Example: Contracts(cid,sid,jid,did,pid,qty,value), and:
C is the key: C CSJDPQV
Project purchases each part using single contract: JP C
Dept purchases at most one part from a supplier: SD P
JP C,C CSJDPQV implies JP CSJDPQV
SD P implies SDJ JP
SDJ JP, JP CSJDPQV imply SDJ CSJDPQV
Set of Functional Dependencies F+
• Informal Definition
• F+ is the set of all FDs logically implied by F
• Usually F+ is too large to enumerate
• Some FDs are trivial (EXAMPLE: A A)
• Formal Definition
• If F is a set of FDs, then F+ = { X Y| F |= X Y }
Functional Dependencies – Examples
•Consider the schema:
• Student ( studName, rollNo, gen, dept, hostelName, roomNo)
•Since rollNo is a key, rollNo → {studName, gen, dept,
• hostelName, roomNo}
• Suppose that each student is given a hostel room
exclusively, then hostelName, roomNo → rollNo
•Suppose boys and girls are accommodated in separate hostels, then
• hostelName → gen
•FDs are additional constraints that can be specified by designers
Trivial / Non-Trivial FDs
•An FD X → Y where Y ⊆ X
- called a trivial FD, it always holds good
•An FD X → Y where Y ⊈ X
- non-trivial FD
•An FD X → Y where X ∩ Y =
•completely non-trivial FD
Deriving new FDs
•Given that a set of FDs F holds on R
• we can infer that a certain new FD must also hold on R
•For instance,
• given that X → Y, Y → Z hold on R
we can infer that X → Z must also
hold
•How to systematically obtain all such
new FDs ?
•Unless all FDs are known, a relation
schema is not fully specified
Entailment relation
• We say that a set of FDs F ⊨{ X → Y}
(read as F entails X → Y or
• F logically implies X → Y)
• if in every instance r of R on which
FDs F hold,
• FD X → Y also
holds.
• Armstrong came up with several inference rules
for deriving new FDs from a given set of FDs
• We define F+ = {X → Y | F ⊨X → Y} F+:
Closure of F
Armstrong’s Inference Rules
1. Reflexive rule
• F ⊨ {X → Y | Y ⊆ X} for any X. Trivial FDs
2. Augmentation rule
– {X → Y} ⊨ {XZ → YZ}, Z ⊆ R.
– Here XZ denotes X ⋃ Z
3. Transitive rule
• {X → Y, Y → Z} ⊨ {X → Z}
4. Decomposition or Projective rule
• {X → YZ} ⊨ {X → Y,X->Z}
5. Union or Additive rule
• {X → Y, X → Z} ⊨ {X → YZ}
6. Pseudo transitive rule
Attribute closure
To check whether a given dependency like, X->Y is in the closure
F+, we can do with out computing F+.
First compute the attribute closure X+ with respect to F, Which is
set of attributes A, such that X->A can be inferred using Armstrong
Axioms.
The algorithm for computing the attribute closure of a set X of
attributes is:
– Closure = X;
– Repeat until there is no change:
–{
• If there is an FD U->V in F such that U Closure,
• Then set Closure = Closure U V
–}
Attribute Closure Example
Relation R(A, B, C, D)
FDs:
A->B
B->D
C->B
Attribute closure for A:
A+ -> ABD
Closure(A)= ABD
NORMALIZATION & TYPES OF NORMALIZATION
NORMALIZATION
Normalization can be defined as :-
Normalization is the process of organizing the data in the
database.
Normalization is used to minimize the redundancy from a
relation or set of relations. It is also used to eliminate the
undesirable characteristics like Insertion, Update and Deletion
Anomalies.
Normalization divides the larger table into the smaller table and
links them using relationship.
Types of
normalization
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form
(BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
First Normal Form (1NF)
First normal form enforces these criteria:
A relation will be 1NF if it contains an atomic value.
It states that an attribute of a table cannot hold multiple values.
It must hold only single-valued attribute.
First normal form disallows the multi-valued attribute,
composite attribute, and their combinations.
First Normal Form
Table_Product This table is not in first
normal form because the
Product Id Colour Price “Colour” column contains
multiple Values.
1 Black, red Rs.210
2 Green Rs.150
3 Red Rs. 110
4 Green, blue Rs.260
5 Black Rs.100
After decomposing it into first
normal form it looks like:
Product_id Price Product_id Colour
1 Rs.210 1 Black
1 Red
2 Rs.150
2 Green
3 Rs. 110 3 Red
4 Rs.260 4 Green
4 Blue
5 Rs.100
5 Black
EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302
The decomposition of the EMPLOYEE table into 1NF
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab
Second Normal Form (2NF)
A table is said to be in 2NF if both the following conditions hold:
Table is in 1NF (First normal form)
No non-prime attribute is dependent on the proper subset of
any candidate key of table. Which is called as Partial
dependency.
2NF should not have partial dependencies.
An attribute that is not part of any candidate key is known
as non-prime attribute.
SECOND NORMAL
FORM
Table purchase detail
Customer_id Store_id Location This table has a composite
primary key i.e. customer id,
1 1 Patna store id. The non key attribute
is location. In this case
1 3 Noida location depends on store id,
which is part of the primary
key.
2 1 Patna
3 2 Delhi
4 3 Noida
After decomposing it into second
normal form it looks like:
Table Purchase Table Store
Customer_id Store_id Store_id Location
1 1 1 Patna
1 3 2 Delhi
2 1 3 Noida
3 2
4 3
Third Normal Form (3NF)
A table design is said to be in 3NF if both the following conditions hold:
Table must be in 2NF
Transitive functional dependency of non-prime attribute on any super
key should be removed.
An attribute that is not part of any candidate key is known as non-prime
attribute. In other words 3NF can be explained like this: A table is in
3NF if it is in 2NF and for each functional dependency X-> Y at least one
of the following conditions hold:
X is a super key of table
Y is a prime attribute of table
An attribute that is a part of one of the candidate keys is known as prime
attribute.
THIRD NORMAL
FORM
Table Book Details
Bood_id Genre_id Genre Price
type In the table, book_id
determines genre_id and
1 1 Fiction 100 genre_id determines genre
type. Therefore book_idd
2 2 Sports 110 determines genre type via
genre_id and we have
transitive functional
3 1 Fiction 120 dependency.
4 3 Travel 130
5 2 sports 140
After decomposing it into third
normal form it looks like:
TABLE BOOK TABLE GENRE
Book_id Genre_id Price Genre_id Genre type
1 1 100 1 Fiction
2 2 110
2 Sports
3 1 120
3 Travel
4 3 130
5 2 140
Boyce-Codd Normal Form (BCNF)
It is an advance version of 3NF that’s why it is also referred as
3.5NF.
BCNF is stricter than 3NF.
A table complies with BCNF if it is in 3NF and for every
functional dependency X->Y, X should be the super key of the
table.
Boyce-Codd Normal
Form
Student Course Teacher
Aman DBMS AYUSH
KEY: {Student, Course}
Aditya DBMS RAJ Functional dependency
{student, course} ->
Abhinav E-COMM RAHUL
Teacher Teacher->
Aman E-COMM RAHUL Course
Problem: teacher is not
superkey but determines
abhinav DBMS RAJ course.
After decomposing it into Boyce-
Codd normal form it looks like:
Student Course Course Teacher
Aman DBMS DBMS AYUSH
Aditya DBMS DBMS RAJ
Abhinav E-COMM E-COMM RAHUL
Aman E-COMM
Abhinav DBMS
Loss less Join Decomposition
• Lossless decomposition ensures-
No information is lost from the original relation during decomposition.
When the sub relations are joined back, the same relation is obtained
that was decomposed.
Every decomposition must always be lossless.
Consider there is a relation R which is decomposed into sub
relations R1 , R2 , …. , Rn.
This decomposition is called lossless join decomposition when the join
of the sub relations results in the same relation R that was
decomposed.
For lossless join decomposition, we always have-
• R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = R
• where ⋈ is a natural join operator
Example 1/2
• Consider the following relation R( A , B , C )
• R( A , B , C )
A B C
1 2 1
2 5 3
3 3 3
• Consider this relation is decomposed into two sub relations
R1( A , B ) and R2( B , C )
Example 2/2
• The two sub relations are-
• R1( A , B ) R2( B , C )
A B B C
1 2 2 1
2 5 5 3
3 3 3 3
• For lossless decomposition, we must have-
• R1 ⋈ R2 = R
• Now, if we perform the natural join ( ⋈ ) of the sub relations R 1 and R2 , we get-
This relation is same as the
A B C
original relation R.
1 2 1 Thus, we conclude that the
2 5 3 above decomposition is
3 3 3 lossless join decomposition.
NOTE
Lossless join decomposition is also known as non-
additive join decomposition.
This is because the resultant relation after joining the
sub relations is same as the decomposed relation.
No extraneous tuples appear after joining of the sub-
relations.
Lossy Join Decomposition
Consider there is a relation R which is decomposed into sub relations R 1 ,
R2 , …. , Rn.
This decomposition is called lossy join decomposition when the join of the
sub relations does not result in the same relation R that was decomposed.
The natural join of the sub relations is always found to have some
extraneous tuples.
For lossy join decomposition, we always have
R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn ⊃ R
where ⋈ is a natural join operator
Example 1/2
• Consider the following relation R( A , B , C )
• R( A , B , C )
•
A B C
1 2 1
2 5 3
3 3 3
• Consider this relation is decomposed into two sub relations as R 1( A , C ) and
R2( B , C )
Example 2/2
• The two sub relations are
• R1( A , C ) R2( B , C )
A C B C
1 1 2 1
2 3
5 3
3 3
3 3
• For lossy decomposition, we must have
• R1 ⋈ R2 ⊃ R
• Now, if we perform the
A naturalB join ( ⋈ C) of the sub relations R1 and R2 we get
This relation is not same as
1 2 1
the original relation R and
2 5 3
contains some extraneous
2 3 3 tuples.
3 5 3 Clearly, R1 ⋈ R2 ⊃ R.
3 3 3 Thus, we conclude that the
above decomposition is lossy
NOTE
Lossy join decomposition is also known as careless
decomposition.
This is because extraneous tuples get introduced in
the natural join of the sub-relations.
Extraneous tuples make the identification of the
original tuples difficult.
Types of Functional Dependencies
• Multivalued dependency
• Trivial functional dependency
• Non-trivial functional dependency
• Transitive dependency
• Inclusion dependency
Multivalued dependency in DBMS
Multivalued dependency occurs in the situation where there are
multiple independent multivalued attributes in a single table.
A multivalued dependency is a complete constraint between
two sets of attributes in a relation.
It requires that certain tuples be present in a relation.
Example
Car_model Maf_year Color
H001 2017 Metallic
H001 2017 Green
H005 2018 Metallic
H005 2018 Blue
H010 2015 Metallic
H033 2012 Gray
• In this example, maf_year and color are independent of each other
but dependent on car_model. In this example, these two columns are said to
be multivalue dependent on car_model.
• This dependence can be represented like this:
• car_model -> maf_year
• car_model-> colour
Inclusion Dependency
Multivalued dependency and join dependency can be used to guide database
design although they both are less common than functional dependencies.
Inclusion dependencies are quite common. They typically show little influence on
designing of the database.
The inclusion dependency is a statement in which some columns of a relation are
contained in other columns.
The example of inclusion dependency is a foreign key. In one relation, the
referring relation is contained in the primary key column(s) of the referenced
relation.
Suppose we have two relations R and S which was obtained by translating two
entity sets such that every R entity is also an S entity.
Inclusion dependency would be happen if projecting R on its key attributes yields
a relation that is contained in the relation obtained by projecting S on its key
attributes.
In inclusion dependency, we should not split groups of attributes that participate
in an inclusion dependency.
In practice, most inclusion dependencies are key-based that is involved only keys.
Fourth Normal Form
(4NF)
A relation will be in 4NF if it is in Boyce Codd normal form and has
no multi-valued dependency.
For a dependency A → B, if for a single value of A, multiple values of
B exists, then the relation will be a multi-valued dependency.
FOURTH NORMAL
FORM
Student Major Hobby Key: {students, major,
hobby}
Aman Managemen Football
MVD: ->-> Major, hobby
t
Aman Managemen Cricket
t
Raj Managemen Football
t
Raj Medical Football
Ram Managemen Cricket
t
Aditya Btech Football
Abhinav Btech Cricket
After decomposing it into fourth
normal form it looks like:
Student Major Student Hobby
Aman Management Aman Football
Raj Management Aman Cricket
Raj Medical Raj Football
Ram Management Ram Cricket
Aditya Btech Aditya Football
Abhinav Btech Abhinav Cricket
Fifth Normal Form
(5NF)
A database is said to be in 5NF, if and only if,
It's in 4NF.
If we can decompose table further to eliminate redundancy and
anomaly, and when we re-join the decomposed tables by means of
candidate keys, we should not be losing the original data or any new
record set should not arise. In simple words, joining two or more
decomposed table should not lose records nor create new records.
FIFTH NORMAL
FORM
Seller Company Product Key: {seller, company,
product}
Aman Coca Thumps Up
cola MVD: Seller ->-> Company,
compan
product Product is related to
y
Aditya Unilever Ponds company.
Aditya Unilever Axe
Aditya Uniliver Lakme
Abhinav P&G Vicks
Abhinav Pepsico Pepsi
After decomposing it into fifth
normal form it looks like:
Seller Product Seller Company
Aman Thumps Up Aman Coca
Aditya Ponds cola
compan
Aditya Axe y
Aditya Lakme Aditya Unilever
Abhinav Vicks Abhinav P&G
Abhinav Pepsi Abhinav Pepsico
Continued in next
Company Product
Coca cola company Thumps Up
Unilever Ponds
Unilever Axe
Unilever Lakme
Pepsico Pepsi
P&G Vicks
Introduction to Database Keys
What are DBMS Keys
Keys are very important part of Relational database model. They
are used to establish and identify relationships between tables and
also to uniquely identify any record or row of data in a
relation(table).
A Key can be a single attribute or a group of attributes, where the
combination may act as a key.
The video below covers all about the different keys in an RDBMS.
Example
• Student
SID NAME BRANCH EMAIL
3 Adam IT [email protected]
SID and EMAIL are keys for Student table
Why we need DBMS Keys
For identifying any row of data in a table
uniquely.
We can force identity of data and ensure integrity
of data is maintained.
To establish relationship between tables and
identify relationship between tables.
Types of DBMS Keys
Surrogate key
Candidate key Alternate key
Super
Composite &
Primary key
Compound key
Foreign key
Super Key
Super Key is defined as a set of attributes within a table that can
uniquely identify each record within a table. Super Key is a
SID Reg_ID
superset NAME
of Candidate key. BRANCH EMAIL
1 CS-37 John CSE [email protected]
2 CS-02 Adam CSE [email protected]
3 IT-01 Adam IT [email protected]
4 ECE-07 Elly ECE [email protected]
Keys:
SID SID + Reg_ID SID + Reg_ID + EMAIL
Reg_ID Reg_ID + EMAIL
EMAIL EMAIL + SID
Candidate Key
The candidate keys in a table are defined as the set of keys that is
minimal and can uniquely identify any data row in the table.
It is an attribute or a set of attributes that can act as a Primary
Key for a table to uniquely identify each record in that table.
There can be more than one candidate key.
A candiate key can never be NULL or empty. And its value
should be unique.
There can be more than one candidate keys for a table.
A candidate key can be a combination of more than one
columns(attributes).
If any proper subset of a super key is a super key then that key
cannot be a candidate key.
Primary Key
The primary key is selected from one of the candidate
keys and becomes the identifying key of a table.
It can uniquely identify any data row of the table.
It can identified by DBA.
Foreign Key
A foreign key is an attribute value in a table that acts as the
primary key in another another.
Hence, the foreign key is useful in linking together two tables.
Data should be entered in the foreign key column with great
care, as wrongly entered data can invalidate the relationship
between the two tables.
Composite Key
If any single attribute of a table is not capable of being the key i.e
it cannot identify a row uniquely, then we combine two or more
attributes to form a key. This is known as a composite key.
Any key with more than one attribute is called Composite key.
In the above example (SID,Reg_ID), (Reg_ID, EMAIL),
(EMAIL, SID), (SID, Reg_ID, EMAIL) all are composite keys
Compound Key
If a Composite key has atleast one attribute which is a foreign
key then it is called as Compound key.
Surrogate Key
If a relation has no attribute which means can be used to identify
the data stored in it, then we create an attribute for this purpose.
It adds no meaning to the data but serves the sole purpose of
identifying rows uniquely in a table.
Closure of FD (Example)
GIVEN:
Step 1. A B
Step 2. B C
Step 3. AB D
Step 4 if A B then A AB ( Reflexive & 1 )
Step 5 if A B and B C then A C (transitive & 1 )
Step 5 if A AB and AB D then A ABD (transitive, 3, 4)
Step 6 if A B and BC then A C (1,2,transitivity_
Step 7 if A ABD and A C then A ABCD (2, 5, Union)