0% found this document useful (0 votes)
5 views17 pages

Functional Dependency and Normalization

Uploaded by

aditidagar00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views17 pages

Functional Dependency and Normalization

Uploaded by

aditidagar00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Functional Dependency

The functional dependency is a relationship that exists between two attributes. It typically
exists between the primary key and non-key attribute within a table.

1. X → Y

The left side of FD is known as a determinant, the right side of the production is known
as a dependent.

For example:

Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table
because if we know the Emp_Id, we can tell that employee name associated with it.

Functional dependency can be written as:

Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.

Types of Functional dependency


1. Trivial functional dependency

o A → B has trivial functional dependency if B is a subset of A.


o The following dependencies are also trivial like: A → A, B → B

Example:

1. Consider a table with two columns Employee_Id and Employee_Name.


2. {Employee_id, Employee_Name} → Employee_Id is a trivial functional depende
ncy as
3. Employee_Id is a subset of {Employee_Id, Employee_Name}.
4. Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name ar
e trivial dependencies too.

2. Non-trivial functional dependency

o A → B has a non-trivial functional dependency if B is not a subset of A.


o When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:

1. ID → Name,
2. Name → DOB
Inference Rule (IR):
o The Armstrong's axioms are the basic inference rule.
o Armstrong's axioms are used to conclude functional dependencies on a relational
database.
o The inference rule is a type of assertion. It can apply to a set of FD(functional
dependency) to derive other FD.
o Using the inference rule, we can derive additional functional dependency from the
initial set.

The Functional dependency has 6 types of inference rule:

1. Reflexive Rule (IR1)


In the reflexive rule, if Y is a subset of X, then X determines Y.

If X ⊇ Y then X → Y

Example:

X = {a, b, c, d, e}
Y = {a, b, c}

2. Augmentation Rule (IR2)


The augmentation is also called as a partial dependency. In augmentation, if X determines
Y, then XZ determines YZ for any Z.

If X → Y then XZ → YZ

Example:

For R(ABCD), if A → B then AC → BC

3. Transitive Rule (IR3)


In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.
If X → Y and Y → Z then X → Z

4. Union Rule (IR4)


Union rule says, if X determines Y and X determines Z, then X must also determine Y and
Z.

If X → Y and X → Z then X → YZ

5. Decomposition Rule (IR5)


Decomposition rule is also known as project rule. It is the reverse of union rule.

This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.

If X → YZ then X → Y and X → Z

6. Pseudo transitive Rule (IR6)


In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ determines W.

If X → Y and YZ → W then XZ → W
Normalization
A large database defined as a single relation may result in data duplication. This repetition
of data may result in:

o Making relations very large.


o It isn't easy to maintain and update data as it would involve searching many records
in relation.
o Wastage and poor utilization of disk space and resources.
o The likelihood of errors and inconsistencies increases.

So to handle these problems, we should analyze and decompose the relations with
redundant data into smaller, simpler, and well-structured relations that are satisfy
desirable properties. Normalization is a process of decomposing the relations into
relations with fewer attributes.

What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate undesirable characteristics like Insertion,
Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using
relationships.
o The normal form is used to reduce redundancy from the database table.

Why do we need Normalization?

The main reason for normalizing the relations is removing these anomalies. Failure to
eliminate anomalies leads to data redundancy and can cause data integrity and other
problems as the database grows. Normalization consists of a series of guidelines that
helps to guide you in creating a good database structure.

• Update anomalies − If data items are scattered and are not linked to each
other properly, then it could lead to strange situations. For example, when
we try to update one data item having its copies scattered over several
places, a few instances get updated properly while a few others are left
with old values. Such instances leave the database in an inconsistent state.
• Deletion anomalies − We tried to delete a record, but parts of it was left
undeleted because of unawareness, the data is also saved somewhere
else.
• Insert anomalies − We tried to insert data in a record that does not exist
at all.

First Normal Form


First Normal Form is defined in the definition of relations (tables) itself. This rule defines
that all the attributes in a relation must have atomic domains. The values in an atomic
domain are indivisible units.

We re-arrange the relation (table) as below, to convert it to First Normal Form.

Each attribute must contain only a single value from its pre-defined domain.

Second Normal Form


Before we learn about the second normal form, we need to understand the following −
• Prime attribute − An attribute, which is a part of the candidate-key, is
known as a prime attribute.
• Non-prime attribute − An attribute, which is not a part of the prime-key, is
said to be a non-prime attribute.
If we follow second normal form, then every non-prime attribute should be fully
functionally dependent on prime key attribute. That is, if X → A holds, then there should
not be any proper subset Y of X, for which Y → A also holds true.

We see here in Student_Project relation that the prime key attributes are Stu_ID and
Proj_ID. According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must
be dependent upon both and not on any of the prime key attribute individually. But we
find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by
Proj_ID independently. This is called partial dependency, which is not allowed in
Second Normal Form.

We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.

Third Normal Form


For a relation to be in Third Normal Form, it must be in Second Normal form and the
following must satisfy −

• No non-prime attribute is transitively dependent on prime key attribute.


• For any non-trivial functional dependency, X → A, then either −
o X is a superkey or,

o A is prime attribute.

We find that in the above Student_detail relation, Stu_ID is the key and only prime key
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip
is a superkey nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there
exists transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as
follows −

Boyce-Codd Normal Form


Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms.
BCNF states that −

• For any non-trivial functional dependency, X → A, X must be a super-key.


In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip is the
super-key in the relation ZipCodes. So,
Stu_ID → Stu_Name, Zip
and
Zip → City
Which confirms that both the relations are in BCNF.
Joins
Join is a combination of a Cartesian product followed by a selection process. A Join
operation pairs two tuples from different relations, if and only if a given join condition is
satisfied.
We will briefly describe various join types in the following sections.

Theta (θ) Join


Theta join combines tuples from different relations provided they satisfy the theta
condition. The join condition is denoted by the symbol θ.

Notation
R1 ⋈θ R2
R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,.. ,Bn) such that the
attributes don’t have anything in common, that is R1 ∩ R2 = Φ.
Theta join can use all kinds of comparison operators.

Student

SID Name Std

101 Alex 10

102 Maria 11

Subjects

Class Subject

10 Math

10 English

11 Music

11 Sports

Student_Detail −

STUDENT ⋈Student.Std = Subject.Class SUBJECT


Student_detail

SID Name Std Class Subject

101 Alex 10 10 Math

101 Alex 10 10 English

102 Maria 11 11 Music

102 Maria 11 11 Sports

Equijoin
When Theta join uses only equality comparison operator, it is said to be equijoin. The
above example corresponds to equijoin.

Natural Join (⋈)


Natural join does not use any comparison operator. It does not concatenate the way a
Cartesian product does. We can perform a Natural Join only if there is at least one
common attribute that exists between two relations. In addition, the attributes must
have the same name and domain.
Natural join acts on those matching attributes where the values of attributes in both the
relations are same.

Courses

CID Course Dept

CS01 Database CS

ME01 Mechanics ME

EE01 Electronics EE

HoD

Dept Head
CS Alex

ME Maya

EE Mira

Courses ⋈ HoD

Dept CID Course Head

CS CS01 Database Alex

ME ME01 Mechanics Maya

EE EE01 Electronics Mira

Outer Joins
Theta Join, Equijoin, and Natural Join are called inner joins. An inner join includes only
those tuples with matching attributes and the rest are discarded in the resulting relation.
Therefore, we need to use outer joins to include all the tuples from the participating
relations in the resulting relation. There are three kinds of outer joins − left outer join,
right outer join, and full outer join.

Left Outer Join(R S)


All the tuples from the Left relation, R, are included in the resulting relation. If there are
tuples in R without any matching tuple in the Right relation S, then the S-attributes of
the resulting relation are made NULL.

Left

A B

100 Database

101 Mechanics

102 Electronics

Right
A B

100 Alex

102 Maya

104 Mira

Courses HoD

A B C D

100 Database 100 Alex

101 Mechanics --- ---

102 Electronics 102 Maya

Right Outer Join: ( R S)


All the tuples from the Right relation, S, are included in the resulting relation. If there
are tuples in S without any matching tuple in R, then the R-attributes of resulting relation
are made NULL.

Courses HoD

A B C D

100 Database 100 Alex

102 Electronics 102 Maya

--- --- 104 Mira

Full Outer Join: ( R S)


All the tuples from both participating relations are included in the resulting relation. If
there are no matching tuples for both relations, their respective unmatched attributes
are made NULL.
Courses HoD

A B C D

100 Database 100 Alex

101 Mechanics --- ---

102 Electronics 102 Maya

--- --- 104 Mira


Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss
of information.
o Decomposition is used to eliminate some of the problems of bad design like
anomalies, inconsistencies, and redundancy.

Types of Decomposition

Lossless Decomposition

o If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the
same relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.

Example:
EMPLOYEE_DEPARTMENT table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT

EMPLOYEE table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai

33 Alina 25 Delhi

46 Stephan 30 Bangalore

52 Katherine 36 Mumbai

60 Jack 40 Noida

DEPARTMENT table

DEPT_ID EMP_ID DEPT_NAME

827 22 Sales

438 33 Marketing

869 46 Finance
575 52 Production

678 60 Testing

Now, when these two relations are joined on the common column "EMP_ID", then the
resultant relation will look like:

Employee ⋈ Department

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.

Dependency Preserving

o It is an important constraint of the database.


o In the dependency preservation, at least one decomposed table must satisfy every
dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R
either must be a part of R1 or R2 or must be derivable from the combination of
functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency
set (A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of relation R1(ABC).

You might also like