PROBLEM ON NORMALIZATION
NOTE:
Functional Dependency
Try to find an attribute or set of attributes which can derive all the
attributes.
(KEY)
PROBLEM ON NORMALIZATION
Example #1
Take a look at these functional dependencies in the relation A (P, Q, R, S, T)
Here,
P -> QR,
RS -> T,
Q -> S,
T -> P
In the relation given above, all the possible candidate keys would be {P, T, RS,
QR}. In this case, the attributes that exist on the right sides of all the functional
dependencies are prime.
Example #2 Identify and discuss each of the indicated dependencies in the
dependency diagram shown in below Figure.
Example #3 Identify and discuss each of the indicated dependencies in the
dependency diagram shown in below Figure.
Example 4: Click the link : solution for below problem
[Link]
1. To keep track of students and courses, a new college uses the table
structure in Figure a.
Draw the dependency diagram for this table.
Figure a For question 1, by A. Watt.
2. Using the dependency diagram you just drew, show the tables (in their
third normal form) you would create to fix the problems you encountered.
Draw the dependency diagram for the fixed table.
PROBLEM ON NORMALIZATION
3. An agency called Instant Cover supplies part-time/temporary staff to hotels
in Scotland. Figure b lists the time spent by agency staff working at various
hotels. The national insurance number (NIN) is unique for every member
of staff. Use Figure 12.4 to answer questions (a) and (b).
Figure b: For question 8, by A. Watt.
1. This table is susceptible to update anomalies. Provide examples of
insertion, deletion and update anomalies.
2. Normalize this table to third normal form. State any assumptions.
1. Boyce-Codd Normal Form (BCNF)
When a table has more than one candidate key, anomalies may result
even though the relation is in 3NF. Boyce-Codd normal form is a
special case of 3NF. A relation is in BCNF if, and only if, every
determinant is a candidate key.
2. BCNF Example 1
Consider the following table (St_Maj_Adv).
Student_id Major Advisor
111 Physics Smith
111 Music Chan
320 Math Dobbs
671 Physics White
803 Physics Smith
The semantic rules (business rules applied to the database) for this table are:
1. Each Student may major in several subjects.
2. For each Major, a given Student has only one Advisor.
3. Each Major has several Advisors.
4. Each Advisor advises only one Major.
5. Each Advisor advises several Students in one Major.
The functional dependencies for this table are listed below. The first one is a
candidate key; the second is not.
1. Student_id, Major ——> Advisor
2. Advisor ——> Major
Anomalies for this table include:
1. Delete – student deletes advisor info
PROBLEM ON NORMALIZATION
2. Insert – a new advisor needs a student
3. Update – inconsistencies
Note: No single attribute is a candidate key.
PK can be Student_id, Major or Student_id, Advisor.
To reduce the St_Maj_Adv relation to BCNF, you create two new tables:
1. St_Adv (Student_id, Advisor)
2. Adv_Maj (Advisor, Major)
St_Adv table
Student_id Advisor
111 Smith
111 Chan
320 Dobbs
671 White
803 Smith
Adv_Maj table
Advisor Major
Smith Physics
Chan Music
Dobbs Math
White Physics
3. BCNF Example 2
Consider the following table (Client_Interview).
ClientNo InterviewDate InterviewTime StaffNo RoomNo
CR76 13-May-02 10.30 SG5 G101
CR56 13-May-02 12.00 SG5 G101
CR74 13-May-02 12.00 SG37 G102
CR56 1-July-02 10.30 SG5 G102
FD1 – ClientNo, InterviewDate –> InterviewTime, StaffNo, RoomNo (PK)
FD2 – staffNo, interviewDate, interviewTime –> clientNO (candidate key: CK)
FD3 – roomNo, interviewDate, interviewTime –> staffNo, clientNo (CK)
PROBLEM ON NORMALIZATION
FD4 – staffNo, interviewDate –> roomNo
A relation is in BCNF if, and only if, every determinant is a candidate key. We
need to create a table that incorporates the first three FDs
(Client_Interview2 table) and another table (StaffRoom table) for the fourth FD.
Client_Interview2 table
InterviewDa InterViewTi StaffN
ClientNo
te me o
CR76 13-May-02 10.30 SG5
CR56 13-May-02 12.00 SG5
CR74 13-May-02 12.00 SG37
CR56 1-July-02 10.30 SG5
StaffRoom table
StaffNo InterviewDate RoomNo
SG5 13-May-02 G101
SG37 13-May-02 G102
SG5 1-July-02 G102
PROBLEM ON NORMALIZATION
Solution:
PROBLEM ON NORMALIZATION
PROBLEM ON NORMALIZATION
PROBLEM ON NORMALIZATION
Can we factor this out?
The LOTS1 relation above (EN fig above) is not 3NF, because of Area ⟶ Price.
So we factor on Area ⟶ Price, dividing into LOTS1A(property_ID,
county,lot_num,area) and LOTS1B(area,price). Another approach would be to
drop price entirely, if it is in fact proportional to area, and simply treat it as a
computed attribute.
PROBLEM ON NORMALIZATION
example: Suppose there is a company wherein employees work in more than
one department. They store the data like this:
emp_id emp_nationality emp_dept dept_type dept_no_of_emp
1001 Austrian Production and D001 200
planning
1001 Austrian stores D001 250
1002 American design and D134 100
technical support
1002 American Purchasing D134 600
department
Functional dependencies in the table above:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
PROBLEM ON NORMALIZATION
To make the table comply with BCNF we can break the table in three tables like
this:
emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:
emp_dept dept_type dept_no_of_emp
Production and D001 200
planning
stores D001 250
design and technical D134 100
support
Purchasing department D134 600
emp_dept_mapping table:
emp_id emp_dept
1001 Production and planning
1001 stores
1002 design and technical
support
1002 Purchasing department
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a
key.
PROBLEM ON NORMALIZATION
Q.1: Consider the relation scheme R = {E, F, G, H, I, J, K, L, M, N} and the
set of functional dependencies {{E, F} -> {G}, {F} -> {I, J}, {E, H} -> {K, L},
K -> {M}, L -> {N} on R. What is the key for R? (GATE-CS-2014)
A. {E, F}
B. {E, F, H}
C. {E, F, H, K, L}
D. {E}
PROBLEM ON NORMALIZATION
Solution:
Finding attribute closure of all given options, we get:
{E,F}+ = {EFGIJ}
{E,F,H}+ = {EFHGIJKLMN}
{E,F,H,K,L}+ = {{EFHGIJKLMN}
{E}+ = {E}
{EFH}+ and {EFHKL}+ results in set of all attributes, but EFH is minimal. So
it will be candidate key. So correct option is (B).
Q.2: How to check whether an FD can be derived from a given FD set?
Solution:
To check whether an FD A->B can be derived from an FD set F,
1. Find (A)+ using FD set F.
2. If B is subset of (A)+, then A->B is true else not true.
Q.3: In a schema with attributes A, B, C, D and E following set of
functional dependencies are given
{A -> B, A -> C, CD -> E, B -> D, E -> A}
Which of the following functional dependencies is NOT implied by the
above set? (GATE IT 2005)
A. CD -> AC
B. BD -> CD
C. BC -> CD
D. AC -> BC
Solution:
Using FD set given in question,
(CD)+ = {CDEAB} which means CD -> AC also holds true.
(BD)+ = {BD} which means BD -> CD can’t hold true. So this FD is no
implied in FD set. So (B) is the required option.
Others can be checked in the same way.
Q.4: Consider a relation scheme R = (A, B, C, D, E, H) on which the
following functional dependencies hold: {A–>B, BC–> D, E–>C, D–>A}.
What are the candidate keys of R? [GATE 2005]
(a) AE, BE
(b) AE, BE, DE
(c) AEH, BEH, BCH
(d) AEH, BEH, DEH
Solution:
(AE)+ = {ABECD} which is not set of all attributes. So AE is not a candidate
key. Hence option A and B are wrong.
PROBLEM ON NORMALIZATION
(AEH)+ = {ABCDEH}
(BEH)+ = {BEHCDA}
(BCH)+ = {BCHDA} which is not set of all attributes. So BCH is not a
candidate key. Hence option C is wrong.
So correct answer is D.
Question on Second Normal Form (2NF):
1. Given a relation R( A, B, C, D) and Functional Dependency set FD = { AB
→ CD, B → C }, determine whether the given R is in 2NF? If not convert it
into 2 NF.
Solution: Let us construct an arrow diagram on R using FD to calculate the
candidate key.
From above arrow diagram on R, we can see that an attributes AB is not
determined by any of the given FD, hence AB will be the integral part of the
Candidate key, i.e. no matter what will be the candidate key, and how many will
be the candidate key, but all will have W compulsory attribute.
Let us calculate the closure of AB
AB + = ABCD (from the method we studied earlier)
Since the closure of AB contains all the attributes of R, hence AB is Candidate
Key
From the definition of Candidate Key(Candidate Key is a Super Key whose no
proper subset is a Super key)
Since all key will have AB as an integral part, and we have proved that AB is
Candidate Key, Therefore, any superset of AB will be Super Key but not
Candidate key.
Hence there will be only one candidate key AB
PROBLEM ON NORMALIZATION
Definition of 2NF: No non-prime attribute should be partially dependent on
Candidate Key
Since R has 4 attributes: - A, B, C, D, and Candidate Key is AB, Therefore, prime
attributes (part of candidate key) are A and B while a non-prime attribute are C
and D
a) FD: AB → CD satisfies the definition of 2NF, that non-prime attribute(C and
D) are fully dependent on candidate key AB
b) FD: B → C does not satisfy the definition of 2NF, as a non-prime attribute(C)
is partially dependent on candidate key AB( i.e. key should not be broken at any
cost)
As FD B → C, the above table R( A, B, C, D) is not in 2NF
Convert the table R(A, B, C, D) in 2NF:
Since FD: B → C, our table was not in 2NF, let's decompose the table
R1(B, C)
Since the key is AB, and from FD AB → CD, we can create R2(A, B, C, D) but
this will again have a problem of partial dependency B → C, hence R2(A, B, D).
Finally, the decomposed table which is in 2NF
a) R1( B, C)
b) R2(A, B, D)
2. Given a relation R( P, Q, R, S, T) and Functional Dependency set FD = {
PQ → R, S → T }, determine whether the given R is in 2NF? If not convert it
into 2 NF.
Solution: Let us construct an arrow diagram on R using FD to calculate the
candidate key.
PROBLEM ON NORMALIZATION
From above arrow diagram on R, we can see that an attributes PQS is not
determined by any of the given FD, hence PQS will be the integral part of the
Candidate key, i.e., no matter what will be the candidate key, and how many will
be the candidate key, but all will have PQS compulsory attribute.
Let us calculate the closure of PQS
PQS + = PQSRT (from the method we studied earlier)
Since the closure of PQS contains all the attributes of R, hence PQS is
Candidate Key
From the definition of Candidate Key (Candidate Key is a Super Key whose no
proper subset is a Super key)
Since all key will have PQS as an integral part, and we have proved that PQS is
Candidate Key. Therefore, any superset of PQS will be Super Key but not
Candidate key.
Hence there will be only one candidate key PQS
Definition of 2NF: No non-prime attribute should be partially dependent on
Candidate Key.
Since R has 5 attributes: - P, Q, R, S, T and Candidate Key is PQS, Therefore,
prime attributes (part of candidate key) are P, Q, and S while a non-prime
attribute is R and T
a) FD: PQ → R does not satisfy the definition of 2NF, that non-prime attribute(
R) is partially dependent on part of candidate key PQS.
b) FD: S → T does not satisfy the definition of 2NF, as a non-prime attribute(T)
is partially dependent on candidate key PQS (i.e., key should not be broken at any
cost).
Hence, FD PQ → R and S → T, the above table R( P, Q, R, S, T) is not in
2NF
Convert the table R( P, Q, R, S, T) in 2NF:
Since due to FD: PQ → R and S → T, our table was not in 2NF, let's decompose
the table
R1(P, Q, R) (Now in table R1 FD: PQ → R is Full F D, hence R1 is in 2NF)
PROBLEM ON NORMALIZATION
R2( S, T) (Now in table R2 FD: S → T is Full F D, hence R2 is in 2NF)
And create one table for the key, since the key is PQS.
R3(P, Q, S)
Finally, the decomposed tables which is in 2NF are:
a) R1( P, Q, R)
b) R2(S, T)
c) R3(P, Q, S)
3. Given a relation R( P, Q, R, S, T, U, V, W, X, Y) and Functional
Dependency set FD = { PQ → R, PS → VW, QS → TU, P → X, W → Y },
determine whether the given R is in 2NF? If not convert it into 2 NF.
Solution: Let us construct an arrow diagram on R using FD to calculate the
candidate key.
From above arrow diagram on R, we can see that an attributes PQS is not
determined by any of the given FD, hence PQS will be the integral part of the
Candidate key, i.e. no matter what will be the candidate key, and how many will
be the candidate key, but all will have PQS compulsory attribute.
Let us calculate the closure of PQS
PQS + = P Q S R T U V W X Y (from the closure method we studied earlier)
Since the closure of PQS contains all the attributes of R, hence PQS is
Candidate Key
PROBLEM ON NORMALIZATION
From the definition of Candidate Key(Candidate Key is a Super Key whose no
proper subset is a Super key)
Since all key will have PQS as an integral part, and we have proved that PQS is
Candidate Key, Therefore, any superset of PQS will be Super Key but not a
Candidate key.
Hence there will be only one candidate key PQS
Definition of 2NF: No non-prime attribute should be partially dependent on
Candidate Key
Since R has 10 attributes: - P, Q, R, S, T, U, V, W, X, Y, and Candidate Key is
PQS calculated using FD = { PQ → R, PS → VW, QS → TU, P → X, W → Y }.
Therefore, prime attribute(part of candidate key) are P, Q, and S while non-prime
attribute are R, T, U, V, W, X and Y
a. FD: PQ → R does not satisfy the definition of 2NF, that non-prime
attribute( R) is partially dependent on part of candidate key PQS
b. FD: PS → VW does not satisfy the definition of 2NF, that non-prime
attribute( VW) is partially dependent on part of candidate key PQS
c. FD: QS → TU does not satisfy the definition of 2NF, that non-prime
attribute( TU) is partially dependent on part of candidate key PQS
d. FD: P → X does not satisfy the definition of 2NF, that non-prime attribute(
X) are partially dependent on part of candidate key PQS
e. FD: W → Y does not violate the definition of 2NF, as the non-prime
attribute(Y) is dependent on the non-prime attribute(W), which is not
related to the definition of 2NF.
Hence because of FD: PQ → R, PS → VW, QS → TU, P → X the above table
R( P, Q, R, S, T, U, V, W, X, Y) is not in 2NF
Convert the table R( P, Q, R, S, T, U, V, W, X, Y) in 2NF:
Since due to FD: PQ → R, PS → VW, QS → TU, P → X our table was not in
2NF, let's decompose the table
R1(P, Q, R) (Now in table R1 FD: PQ → R is Full F D, hence R1 is in 2NF)
R2( P, S, V, W) (Now in table R2 FD: PS → VW is Full F D, hence R2 is in
2NF)
PROBLEM ON NORMALIZATION
R3( Q, S, T, U) (Now in table R3 FD: QS → TU is Full F D, hence R3 is in 2NF)
R4( P, X) (Now in table R4 FD : P → X is Full F D, hence R4 is in 2NF)
R5( W, Y) (Now in table R5 FD: W → Y is Full F D, hence R2 is in 2NF)
And create one table for the key, since the key is PQS.
R6(P, Q, S)
Finally, the decomposed tables which is in 2NF are:
R1(P, Q, R)
R2( P, S, V, W)
R3( Q, S, T, U)
R4( P, X)
R5( W, Y)
R6(P, Q, S)
4. Given a relation R( A, B, C, D, E) and Functional Dependency set FD = {
A → B, B → E, C → D}, determine whether the given R is in 2NF? If not
convert it into 2 NF.
Solution: Let us construct an arrow diagram on R using FD to calculate the
candidate key.
From above arrow diagram on R, we can see that an attributes AC is not
determined by any of the given FD, hence AC will be the integral part of the
Candidate key, i.e. no matter what will be the candidate key, and how many will
be the candidate key, but all will have W compulsory attribute.
PROBLEM ON NORMALIZATION
Let us calculate the closure of AC
AC + = ACBED( from the closure method we studied earlier)
Since the closure of AC contains all the attributes of R, hence AC is Candidate
Key
From the definition of Candidate Key(Candidate Key is a Super Key whose no
proper subset is a Super key)
Since all key will have AC as an integral part, and we have proved that AC is
Candidate Key, Therefore, any superset of AC will be Super Key but not
Candidate key.
Hence there will be only one candidate key AC
Definition of 2NF: No non-prime attribute should be partially dependent on
Candidate Key
Since R has 5 attributes: - A, B, C, D, E and Candidate Key is AC, Therefore,
prime attribute (part of candidate key) are A and C while the non-prime attribute
are B D and E
a. FD: A → B does not satisfy the definition of 2NF, as a non-prime
attribute(B) is partially dependent on candidate key AC (i.e., key should not be
broken at any cost).
b. FD: B → E does not violate the definition of 2NF, as a non-prime
attribute(E) is dependent on the non-prime attribute(B), which is not
related to the definition of 2NF.
c. FD: C → D does not satisfy the definition of 2NF, as a non-prime
attribute(D) is partially dependent on candidate key AC (i.e., key should
not be broken at any cost)
Hence because of FD A → B and C → D, the above table R( A, B, C, D, E) is
not in 2NF
Convert the table R(A, B, C, D, E) in 2NF:
Since due to FD: A →B and C → D our table was not in 2NF, let's decompose
the table
PROBLEM ON NORMALIZATION
R1(A, B, E) ( from FD: A → B and B → E and both are violating 2 NF
definition)
R2( C, D) (Now in table R2 FD: C → D is Full F D, hence R2 is in 2NF)
And create one table for candidate key AC
R3 ( A, C)
Finally, the decomposed tables which are in 2NF:
a. R1( A, B, E)
b. R2( C, D)
c. R3( A, C)
Procedure: To verify that given relational schema R is in 2NF or NOT, If NOT
then Convert it to 2NF:
STEP 1: Calculate the Candidate Key of given R by using an arrow diagram on
R.
STEP 2: Verify each FD with Definition of 2NF (No non-prime attribute should
be partially dependent on Candidate Key)
STEP 3: Make a set of FD which do not satisfy 2NF, i.e. all those FD which are
partial.
STEP 4: Convert the table R in 2NF by decomposing R such that each
decomposition based on FD should satisfy the definition of 2NF:
STEP 5: Once the decomposition based on FD is completed, create a separate
table of attributes in the Candidate key.
STEP 6: All the decomposed R obtained from STEP 4 and STEP 5 forms the
required decomposition where each decomposition is in 2NF.
To solve the question on 3 NF, we must understand it's both definitions:
Definition 1: A relational schema R is said to be in 3NF, First, it should be in
2NF and, no non-prime attribute should be transitively dependent on the Key of
the table.
PROBLEM ON NORMALIZATION
If X → Y and Y → Z exist then X → Z also exists which is a transitive
dependency, and it should not hold.
Definition 2: First it should be in 2NF and if there exists a non-trivial
dependency between two sets of attributes X and Y such that X → Y (i.e., Y is
not a subset of X) then
a. Either X is Super Key
b. Or Y is a prime attribute.
Question 1: Given a relation R( X, Y, Z) and Functional Dependency set FD = {
X → Y and Y → Z }, determine whether the given R is in 3NF? If not convert it
into 3 NF.
Solution: Let us construct an arrow diagram on R using FD to calculate the
candidate key.
From above arrow diagram on R, we can see that an attribute X is not determined
by any of the given FD, hence X will be the integral part of the Candidate key,
i.e. no matter what will be the candidate key, and how many will be the candidate
key, but all will have X compulsory attribute.
Let us calculate the closure of X
X + = XYZ (from the closure method we studied earlier)
Since the closure of X contains all the attributes of R, hence X is Candidate Key
From the definition of Candidate Key (Candidate Key is a Super Key whose no
proper subset is a Super key)
Since all key will have X as an integral part, and we have proved that X is
Candidate Key, Therefore, any superset of X will be Super Key but not the
Candidate key.
Hence there will be only one candidate key X
PROBLEM ON NORMALIZATION
Definition of 3NF: A relational schema R is said to be in 3NF, First, it should be
in 2NF and, no non-prime attribute should be transitively dependent on the Key
of the table.
If X → Y and Y → Z exist then X → Z also exists which is a transitive
dependency, and it should not hold.
Since R has 3 attributes: - X, Y, Z, and Candidate Key is X, Therefore, prime
attribute (part of candidate key) is X while a non-prime attribute are Y and Z
Given FD are X → Y and Y → Z
So, we can write X → Z (which is a transitive dependency)
In above FD X → Z, a non-prime attribute( Z) is transitively depending on the
key of the table( X ) hence as per the definition of 3NF it is not in 3 NF, because
no non-prime attribute should be transitively dependent on the key of the
table.
Now check the above table is in 2 NF.
a. FD: X → Y is in 2NF ( as Key is not breaking and its Fully functional
dependent )
b. FD: Y → Z is also in 2NF( as it does not violate the definition of 2NF)
Hence above table R( X, Y, Z ) is in 2NF but not in 3NF.
We can also prove the same from Definition 2: First, it should be in 2NF and if
there exists a non-trivial dependency between two sets of attributes X and Y such
that X → Y (i.e., Y is not a subset of X) then
a. Either X is Super Key
b. Or Y is a prime attribute.
Since we have just proved that above table R is in 2 NF. Let's check it for
3NF using definition 2.
a. FD: X → Y is in 3NF (as X is a super Key)
b. FD: Y → Z is not in 3NF (as neither Y is Key nor Z is a prime attribute)
Hence because of Y → Z using definition 2 of 3NF, we can say that above table
R is not in 3NF.
PROBLEM ON NORMALIZATION
Convert the table R( X, Y, Z) into 3NF:
Since due to FD: Y → Z, our table was not in 3NF, let's decompose the table
FD: Y → Z was creating issue, hence one table R1(Y, Z)
Create one Table for key X, R2(X, Y), since X → Y
Hence decomposed tables which are in 3NF are:
R1(X, Y)
R2(Y, Z)
Question 2: Given a relation R( X, Y, Z, W, P) and Functional Dependency set
FD = { X → Y, Y → P, and Z → W}, determine whether the given R is in 3NF?
If not convert it into 3 NF.
Solution: Let us construct an arrow diagram on R using FD to calculate the
candidate key.
From above arrow diagram on R, we can see that an attributes XZ is not
determined by any of the given FD, hence XZ will be the integral part of the
Candidate key, i.e. no matter what will be the candidate key, and how many will
be the candidate key, but all will have XZ compulsory attribute.
Let us calculate the closure of XZ
XZ + = XZYPW (from the closure method that we studied earlier)
Since the closure of XZ contains all the attributes of R, hence XZ is Candidate
Key
From the definition of Candidate Key (Candidate Key is a Super Key whose no
proper subset is a Super key).
PROBLEM ON NORMALIZATION
Since all key will have XZ as an integral part, and we have proved that XZ is
Candidate Key, Therefore, any superset of XZ will be Super Key but not the
Candidate key.
Hence there will be only one candidate key XZ
Definition of 3NF: First it should be in 2NF and if there exists a non-trivial
dependency between two sets of attributes X and Y such that X → Y ( i.e., Y is
not a subset of X) then
a. Either X is Super Key
b. Or Y is a prime attribute.
Since R has 5 attributes: - X, Y, Z, W, P and Candidate Key is XZ, Therefore,
prime attribute (part of candidate key) are X and Z while a non-prime attribute
are Y, W, and P
Given FD are X → Y, Y → P, and Z → W and Super Key / Candidate Key is XZ
a. FD: X → Y does not satisfy the definition of 3NF, that neither X is Super
Key nor Y is a prime attribute.
b. FD: Y → P does not satisfy the definition of 3NF, that neither Y is Super
Key nor P is a prime attribute.
c. FD: Z → W satisfies the definition of 3NF, that neither Z is Super Key nor
W is a prime attribute.
Convert the table R( X, Y, Z, W, P) into 3NF:
Since all the FD = { X → Y, Y → P, and Z → W} were not in 3NF, let us
convert R in 3NF
R1(X, Y) {Using FD X → Y}
R2(Y, P) {Using FD Y → P}
R3(Z, W) {Using FD Z → W}
And create one table for Candidate Key XZ
R4( X, Z) { Using Candidate Key XZ }
PROBLEM ON NORMALIZATION
All the decomposed tables R1, R2, R3, and R4 are in 2NF( as there is no partial
dependency) as well as in 3NF.
Hence decomposed tables are:
R1(X, Y), R2(Y, P), R3( Z, W), and R4( X, Z)
Question 3: Given a relation R( P, Q, R, S, T, U, V, W, X, Y) and Functional
Dependency set FD = { PQ → R, P → ST, Q → U, U → VW, and S → XY},
determine whether the given R is in 3NF? If not convert it into 3 NF.
Solution: Let us construct an arrow diagram on R using FD to calculate the
candidate key.
From above arrow diagram on R, we can see that an attribute PQ is not
determined by any of the given FD, hence PQ will be the integral part of the
Candidate key, i.e. no matter what will be the candidate key, and how many will
be the candidate key, but all will have PQ compulsory attribute.
Let us calculate the closure of PQ
PQ + = P Q R S T U X Y V W (from the closure method we studied earlier)
Since the closure of XZ contains all the attributes of R, hence PQ is Candidate
Key
From the definition of Candidate Key (Candidate Key is a Super Key whose no
proper subset is a Super key)
Since all key will have PQ as an integral part, and we have proved that XZ is
Candidate Key, Therefore, any superset of PQ will be Super Key but not
Candidate key.
Hence there will be only one candidate key PQ
PROBLEM ON NORMALIZATION
Definition of 3NF: First it should be in 2NF and if there exists a non-trivial
dependency between two sets of attributes X and Y such that X → Y (i.e., Y is
not a subset of X) then
c) Either X is Super Key
d) Or Y is a prime attribute.
Since R has 10 attributes: - P, Q, R, S, T, U, V, W, X, Y, V, W and Candidate
Key is PQ, Therefore, prime attribute (part of candidate key) are P and Q while a
non-prime attribute are R S T U V W X Y V W
Given FD are {PQ → R, P → ST, Q → U, U → VW and S → XY} and Super
Key / Candidate Key is PQ
a. FD: PQ → R satisfy the definition of 3NF, as PQ Super Key
b. FD: P → ST does not satisfy the definition of 3NF, that neither P is Super
Key nor ST is the prime attribute
c. FD: Q → U does not satisfy the definition of 3NF, that neither Q is Super
Key nor U is a prime attribute
d. FD: U → VW does not satisfy the definition of 3NF, that neither U is
Super Key nor VW is a prime attribute
e. FD: S → XY does not satisfy the definition of 3NF, that neither S is Super
Key nor XY is a prime attribute
Convert the table R( X, Y, Z, W, P) into 3NF:
Since all the FD = { P → ST, Q → U, U → VW, and S → XY } were not in
3NF, let us convert R in 3NF
R1(P, S, T) {Using FD P → ST }
R2(Q, U) {Using FD Q → U }
R3( U, V, W) { Using FD U → VW }
R4( S, X, Y) { Using FD S → XY }
R5( P, Q, R) { Using FD PQ → R, and candidate key PQ }
All the decomposed tables R1, R2, R3, R4, and R5 are in 2NF( as there is no
partial dependency) as well as in 3NF.
PROBLEM ON NORMALIZATION
Hence decomposed tables are:
R1(P, S, T), R2(Q, U), R3(U, V, W), R4( S, X, Y), and R5( P, Q, R)
Conclusion: From the above three examples, we can conclude that the following
steps are followed to check whether the given relational schema R is in 3 NF or
not? If not, how to decompose it into 3 NF.
STEP 1: Calculate the Candidate Key of given R by using an arrow diagram and
then using the closure of an attribute on R, such that from the calculated
candidate key, we can separate the prime attributes and non-prime attributes.
STEP 2: Verify each FD with Definition of 3NF (First it should be in 2NF and if
there exist a non-trivial dependency between two sets of attributes X and Y such
that X → Y (i.e., Y is not a subset of X) then Either X is Super Key or Y is a
prime attribute).
STEP 3: Make a set of FD which does not satisfy 3NF, i.e. all those FD which do
not have an attribute on the left side of FD as a super key or attribute on the right
side of FD as a prime attribute.
STEP 4: Convert the table R in 3NF by decomposing R such that each
decomposition based on FD should satisfy the definition of 3NF.
STEP 5: Once the decomposition based on FD is completed, create a separate
table of attributes in the Candidate key.
STEP 6: All the decomposed R obtained from STEP 4 and STEP 5 forms the
required decomposition where each decomposition is in 3NF.
To solve the question on BCNF, we must understand its definitions of BCNF:
Definition: First it should be in 3NF and if there exists a non-trivial dependency
between two sets of attributes X and Y such that X → Y (i.e., Y is not a subset of
X) then
a) X is Super Key
The relation between 3NF and BCNF is:
All BCNF is 3NF but vice versa may or may not be true.
PROBLEM ON NORMALIZATION
Question: Given a relation R( X, Y, Z) and Functional Dependency set FD = {
XY → Z and Z → Y }, determine whether the given R is in BCNF? If not
convert it into BCNF.
Solution: Let us construct an arrow diagram on R using FD to calculate the
candidate key.
From the above arrow diagram on R, we can see that an attribute X is not
determined by any of the given FD, hence X will be the integral part of the
Candidate key, i.e. no matter what will be the candidate key, and how many will
be the candidate key, but all will have X compulsory attribute.
Let us calculate the closure of X
X + = X(from the closure method we studied earlier)
Since the closure of X contains only X, hence it is not a candidate key.
Let us check the combination of Y, i.e. XY, XZ.
a) XY + = XYZ ( from the closure method we studied earlier)
Since the closure of XY contains all the attributes of R, hence XY is Candidate
Key
b) XZ + = XZY (from the closure method we studied earlier)
Since the closure of XZ contains all the attributes of R, hence XZ is Candidate
Key
Hence there are two candidate key XY and XZ
Since R has 3 attributes: - X, Y, Z, and Candidate Key is XY and XZ, Therefore,
prime attribute(part of candidate key) are X, Y, and Z while a non-prime attribute
is none.
Using the Definition of 3NF to check whether R is in 3NF?: First, it should be
in 2NF and if there exists a non-trivial dependency between two sets of attributes
X and Y such that X → Y ( i.e. Y is not a subset of X) then
PROBLEM ON NORMALIZATION
a) Either X is Super Key
b) Or Y is a prime attribute.
Given FD are XY → Z, and Z → Y and Super Key / Candidate Key are XZ and
XY
a) FD: X Y → Z satisfies the definition of 3NF, as XY is Super Key also Z is a
prime attribute.
b) FD: Z → Y satisfies the definition of 3NF, even though Z is not Super Key but
Y is a prime attribute.
Since both FD of R, XY → Z and Z → Y satisfy the definition of 3NF hence R is
in 3 NF
Using the Definition of BCNF to check whether R is in BCNF?: First, it
should be in 3NF and if there exists a non-trivial dependency between two sets of
attributes X and Y such that X → Y ( i.e. Y is not a subset of X) then
a) X is Super Key
Given FD are XY → Z, and Z → Y and Super Key / Candidate Key is XZ and
XY
b) FD: X Y → Z satisfies the definition of BCNF, as XY is Super Key.
c) FD: Z → Y does not satisfy the definition of BCNF, as Z is not Super Key
Since both FD of R, XY → Z and Z → Y satisfy the definition of 3NF hence R is
in 3 NF
Convert the table R( X, Y, Z) into BCNF:
Since due to FD: Z → Y, our table was not in BCNF, let's decompose the table
FD: Z→ Y was creating an issue, hence one table R1( Z, Y )
Create Table for key XY R2(X, Y) as XY was candidate key
Create Table for key XZ R2(X, Z) as XZ was candidate key
Note: When we have more than one key( eg: XY and XY) then while
decomposing keep in mind that you compare both R2 and R3 with R1 such that
PROBLEM ON NORMALIZATION
among R1 and R2 or R1 and R3 there should be at least one common attribute
and, that common attribute must be key in any of the table.
Considering R1( Z, Y) and R2(X, Y) both tables have one common attribute Y,
but Y is not key in any of the table R1 and R2, hence we discard R2(X, Y) i.e.
discarding candidate key XY.
Considering R1( Z, Y) and R3(X, Z) both tables have one common attribute Z,
and Z is key of the table R1, hence we include R3(X, Z) i.e. including candidate
key XZ.
Hence decomposed tables which are in BCNF:
R1(Z, Y)
R2(X, Z)
Question 2: Given a relation R( X, Y, Z) and Functional Dependency set FD = {
X → Y and Y → Z }, determine whether the given R is in BCNF? If not convert
it into BCNF.
Solution: Let us construct an arrow diagram on R using FD to calculate the
candidate key.
From the above arrow diagram on R, we can see that an attribute X is not
determined by any of the given FD, hence X will be the integral part of the
Candidate key, i.e. no matter what will be the candidate key, and how many will
be the candidate key, but all will have X compulsory attribute.
Let us calculate the closure of X
X + = XYZ (from the closure method we studied earlier)
Since the closure of X contains all the attributes of R, hence X is Candidate Key
From the definition of Candidate Key (Candidate Key is a Super Key whose no
proper subset is a Super key)
PROBLEM ON NORMALIZATION
Using the Definition of BCNF to check whether R is in BCNF?: First, it
should be in 3NF and if there exists a non-trivial dependency between two sets of
attributes X and Y such that X → Y ( i.e. Y is not a subset of X) then
a) X is Super Key
First, we check that table is in 3NF?
Using the Definition of 3NF to check whether R is in 3NF?: If there exists a
non-trivial dependency between two sets of attributes X and Y such that X → Y (
i.e. Y is not a subset of X) then
a) Either X is Super Key
b) Or Y is a prime attribute.
a) FD: X → Y is in 3NF (as X is a super Key)
b) FD: Y → Z is not in 3NF (as neither Y is Key nor Z is a prime attribute)
Hence because of Y → Z using definition 2 of 3NF, we can say that above table
R is not in 3NF.
Convert the table R( X, Y, Z) into 3NF:
Since due to FD: Y → Z our table was not in 3NF, let's decompose the table
FD: Y → Z was creating issue, hence one table R1(Y, Z)
Create one Table for key X, R2(X, Y), since X → Y
Hence decomposed tables which are in 3NF:
R1(X, Y)
R2(Y, Z)
Both R1(X, Y) and R2(Y, Z) are in BCNF
Conclusion: From the above three examples we can conclude that the following
steps are followed to check whether the given relational schema R is in 3 NF or
not? If not, how to decompose it into 3 NF.
PROBLEM ON NORMALIZATION
STEP 1: Calculate the Candidate Key of given R by using an arrow diagram and
then using the closure of an attribute on R, such that from the calculated
candidate key, we can separate the prime attributes and non-prime attributes.
STEP 2: Verify each FD with Definition of BCNF (First it should be in 3NF and
if there exist a non-trivial dependency between two sets of attributes X and Y
such that X → Y (i.e., Y is not a subset of X) then X is Super Key
STEP 3: Make a set of FD which does not satisfy BCNF, i.e. all those FD which
do not have an attribute on the left side of FD as a super key
STEP 4: Convert the table R in BCNF by decomposing R such that each
decomposition based on FD should satisfy the definition of BCNF.
STEP 5: Once the decomposition based on FD is completed, create a separate
table of attributes in the Candidate key.
STEP 6: All the decomposed R obtained from STEP 4 and STEP 5 forms the
required decomposition where each decomposition is in BCNF.
Example 1
Let us consider the student database, in which data of the student are mentioned.
Stu_ID Stu_Branch Stu_Course Branch_Number Stu_Course_No
Computer Science &
101 DBMS B_001 201
Engineering
Computer Science & Computer
101 B_001 202
Engineering Networks
Electronics &
102 Communication VLSI Technology B_003 401
Engineering
Electronics &
Mobile
102 Communication B_003 402
Communication
Engineering
Functional Dependency of the above is as mentioned:
PROBLEM ON NORMALIZATION
Stu_ID −> Stu_Branch
Stu_Course −> {Branch_Number, Stu_Course_No}
Candidate Keys of the above table are: {Stu_ID, Stu_Course}
Why this Table is Not in BCNF?
The table present above is not in BCNF, because as we can see that neither
Stu_ID nor Stu_Course is a Super Key. As the rules mentioned above clearly
tell that for a table to be in BCNF, it must follow the property that for functional
dependency X−>Y, X must be in Super Key and here this property fails, that’s
why this table is not in BCNF.
How to Satisfy BCNF?
For satisfying this table in BCNF, we have to decompose it into further tables.
Here is the full procedure through which we transform this table into BCNF. Let
us first divide this main table into two
tables Stu_Branch and Stu_Course Table.
Stu_Branch Table
Stu_ID Stu_Branch
101 Computer Science & Engineering
102 Electronics & Communication Engineering
Candidate Key for this table: Stu_ID.
Stu_Course Table
Stu_Course Branch_Number Stu_Course_No
DBMS B_001 201
Computer Networks B_001 202
VLSI Technology B_003 401
Mobile Communication B_003 402
Candidate Key for this table: Stu_Course.
Stu_ID to Stu_Course_No Table
Stu_ID Stu_Course_No
PROBLEM ON NORMALIZATION
Stu_ID Stu_Course_No
101 201
101 202
102 401
102 402
Candidate Key for this table: {Stu_ID, Stu_Course_No}.
After decomposing into further tables, now it is in BCNF, as it is passing the
condition of Super Key, that in functional dependency X−>Y, X is a Super
Key.
Example 2
Find the highest normal form of a relation R(A, B, C, D, E) with FD set as:
{ BC->D, AC->BE, B->E }
Explanation:
Step-1: As we can see, (AC)+ ={A, C, B, E, D} but none of its subsets can
determine all attributes of the relation, So AC will be the candidate key. A or
C can’t be derived from any other attribute of the relation, so there will be
only 1 candidate key {AC}.
Step-2: Prime attributes are those attributes that are part of candidate key
{A, C} in this example and others will be non-prime {B, D, E} in this
example.
Step-3: The relation R is in 1st normal form as a relational DBMS does not
allow multi-valued or composite attributes.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is
not a proper subset of candidate key AC) and AC->BE is in 2nd normal form
(AC is candidate key) and B->E is in 2nd normal form (B is not a proper subset
of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super
key nor D is a prime attribute) and in B->E (neither B is a super key nor E is a
prime attribute) but to satisfy 3rd normal for, either LHS of an FD should be
super key or RHS should be a prime attribute. So the highest normal form of
relation will be the 2nd Normal form.
PROBLEM ON NORMALIZATION
Note: A prime attribute cannot be transitively dependent on a key in BCNF
relation.
Consider these functional dependencies of some relation R
AB ->C
C ->B
AB ->B
Suppose, it is known that the only candidate key of R is AB. A careful
observation is required to conclude that the above dependency is a Transitive
Dependency as the prime attribute B transitively depends on the key AB
through C. Now, the first and the third FD are in BCNF as they both contain the
candidate key (or simply KEY) on their left sides. The second dependency,
however, is not in BCNF but is definitely in 3NF due to the presence of the
prime attribute on the right side. So, the highest normal form of R is 3NF as all
three FDs satisfy the necessary conditions to be in 3NF.
Example 3
For example consider relation R(A, B, C)
A -> BC,
B -> A
A and B both are super keys so the above relation is in BCNF.
Note: BCNF decomposition may always not be possible with dependency
preserving, however, it always satisfies the lossless join condition. For example,
relation R (V, W, X, Y, Z), with functional dependencies:
V, W -> X
Y, Z -> X
W -> Y
It would not satisfy dependency preserving BCNF decomposition.
Note: Redundancies are sometimes still present in a BCNF relation as it is not
always possible to eliminate them completely.
There are also some higher-order normal forms, like the 4th Normal Form and
the 5th Normal Form.
For more, refer to the 4th and 5th Normal Forms.
4. [20 pts] R(A, B, C, D, E) (All attributes contain only atomic values.)
FD1: B -> D
PROBLEM ON NORMALIZATION
FD2: A -> B, C
FD3: A -> E
FD4: C -> D, E
(a) [5 pts] Compute A+, the attribute closure of attribute A. Show your work as
well as the final result.
(b) [5 pts] List the candidate keys of R. Is this related to your answer to (a)?
Why?
(c) [5 pts] What's the highest normal form that R satisfies and why?
(d) [5 pts] If R is not already at least in 3NF, then normalize R into 3NF and
show the resulting relation(s) and specify their candidate keys. Make sure that
your 3NF decomposition is both lossless-join and dependency-preserving. Note:
If R was already in 3NF, then just list the candidate keys of R. What is the
highest normal form that your answer now satisfies?
[Link]
system/functional-dependencies-and-normalization
[Link]
questions/content/exam/gate/computer-science
[Link]
[Link]
[Link]
Functional-Dependencyas-1