Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically
exists between the primary key and non-key attribute within a table.
1. X → Y
The left side of FD is known as a determinant, the right side of the production is known
as a dependent.
For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table
because if we know the Emp_Id, we can tell that employee name associated with it.
Functional dependency can be written as:
Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Types of Functional dependency
1. Trivial functional dependency
o A → B has trivial functional dependency if B is a subset of A.
o The following dependencies are also trivial like: A → A, B → B
Example:
1. Consider a table with two columns Employee_Id and Employee_Name.
2. {Employee_id, Employee_Name} → Employee_Id is a trivial functional depende
ncy as
3. Employee_Id is a subset of {Employee_Id, Employee_Name}.
4. Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name ar
e trivial dependencies too.
2. Non-trivial functional dependency
o A → B has a non-trivial functional dependency if B is not a subset of A.
o When A intersection B is NULL, then A → B is called as complete non-trivial.
Example:
1. ID → Name,
2. Name → DOB
Inference Rule (IR):
o The Armstrong's axioms are the basic inference rule.
o Armstrong's axioms are used to conclude functional dependencies on a relational
database.
o The inference rule is a type of assertion. It can apply to a set of FD(functional
dependency) to derive other FD.
o Using the inference rule, we can derive additional functional dependency from the
initial set.
The Functional dependency has 6 types of inference rule:
1. Reflexive Rule (IR1)
In the reflexive rule, if Y is a subset of X, then X determines Y.
If X ⊇ Y then X → Y
Example:
X = {a, b, c, d, e}
Y = {a, b, c}
2. Augmentation Rule (IR2)
The augmentation is also called as a partial dependency. In augmentation, if X determines
Y, then XZ determines YZ for any Z.
If X → Y then XZ → YZ
Example:
For R(ABCD), if A → B then AC → BC
3. Transitive Rule (IR3)
In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.
If X → Y and Y → Z then X → Z
4. Union Rule (IR4)
Union rule says, if X determines Y and X determines Z, then X must also determine Y and
Z.
If X → Y and X → Z then X → YZ
5. Decomposition Rule (IR5)
Decomposition rule is also known as project rule. It is the reverse of union rule.
This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.
If X → YZ then X → Y and X → Z
6. Pseudo transitive Rule (IR6)
In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ determines W.
If X → Y and YZ → W then XZ → W
Normalization
A large database defined as a single relation may result in data duplication. This repetition
of data may result in:
o Making relations very large.
o It isn't easy to maintain and update data as it would involve searching many records
in relation.
o Wastage and poor utilization of disk space and resources.
o The likelihood of errors and inconsistencies increases.
So to handle these problems, we should analyze and decompose the relations with
redundant data into smaller, simpler, and well-structured relations that are satisfy
desirable properties. Normalization is a process of decomposing the relations into
relations with fewer attributes.
What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate undesirable characteristics like Insertion,
Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using
relationships.
o The normal form is used to reduce redundancy from the database table.
Why do we need Normalization?
The main reason for normalizing the relations is removing these anomalies. Failure to
eliminate anomalies leads to data redundancy and can cause data integrity and other
problems as the database grows. Normalization consists of a series of guidelines that
helps to guide you in creating a good database structure.
• Update anomalies − If data items are scattered and are not linked to each
other properly, then it could lead to strange situations. For example, when
we try to update one data item having its copies scattered over several
places, a few instances get updated properly while a few others are left
with old values. Such instances leave the database in an inconsistent state.
• Deletion anomalies − We tried to delete a record, but parts of it was left
undeleted because of unawareness, the data is also saved somewhere
else.
• Insert anomalies − We tried to insert data in a record that does not exist
at all.
First Normal Form
First Normal Form is defined in the definition of relations (tables) itself. This rule defines
that all the attributes in a relation must have atomic domains. The values in an atomic
domain are indivisible units.
We re-arrange the relation (table) as below, to convert it to First Normal Form.
Each attribute must contain only a single value from its pre-defined domain.
Second Normal Form
Before we learn about the second normal form, we need to understand the following −
• Prime attribute − An attribute, which is a part of the candidate-key, is
known as a prime attribute.
• Non-prime attribute − An attribute, which is not a part of the prime-key, is
said to be a non-prime attribute.
If we follow second normal form, then every non-prime attribute should be fully
functionally dependent on prime key attribute. That is, if X → A holds, then there should
not be any proper subset Y of X, for which Y → A also holds true.
We see here in Student_Project relation that the prime key attributes are Stu_ID and
Proj_ID. According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must
be dependent upon both and not on any of the prime key attribute individually. But we
find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by
Proj_ID independently. This is called partial dependency, which is not allowed in
Second Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.
Third Normal Form
For a relation to be in Third Normal Form, it must be in Second Normal form and the
following must satisfy −
• No non-prime attribute is transitively dependent on prime key attribute.
• For any non-trivial functional dependency, X → A, then either −
o X is a superkey or,
o A is prime attribute.
We find that in the above Student_detail relation, Stu_ID is the key and only prime key
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip
is a superkey nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there
exists transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as
follows −
Boyce-Codd Normal Form
Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms.
BCNF states that −
• For any non-trivial functional dependency, X → A, X must be a super-key.
In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip is the
super-key in the relation ZipCodes. So,
Stu_ID → Stu_Name, Zip
and
Zip → City
Which confirms that both the relations are in BCNF.
Joins
Join is a combination of a Cartesian product followed by a selection process. A Join
operation pairs two tuples from different relations, if and only if a given join condition is
satisfied.
We will briefly describe various join types in the following sections.
Theta (θ) Join
Theta join combines tuples from different relations provided they satisfy the theta
condition. The join condition is denoted by the symbol θ.
Notation
R1 ⋈θ R2
R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,.. ,Bn) such that the
attributes don’t have anything in common, that is R1 ∩ R2 = Φ.
Theta join can use all kinds of comparison operators.
Student
SID Name Std
101 Alex 10
102 Maria 11
Subjects
Class Subject
10 Math
10 English
11 Music
11 Sports
Student_Detail −
STUDENT ⋈Student.Std = Subject.Class SUBJECT
Student_detail
SID Name Std Class Subject
101 Alex 10 10 Math
101 Alex 10 10 English
102 Maria 11 11 Music
102 Maria 11 11 Sports
Equijoin
When Theta join uses only equality comparison operator, it is said to be equijoin. The
above example corresponds to equijoin.
Natural Join (⋈)
Natural join does not use any comparison operator. It does not concatenate the way a
Cartesian product does. We can perform a Natural Join only if there is at least one
common attribute that exists between two relations. In addition, the attributes must
have the same name and domain.
Natural join acts on those matching attributes where the values of attributes in both the
relations are same.
Courses
CID Course Dept
CS01 Database CS
ME01 Mechanics ME
EE01 Electronics EE
HoD
Dept Head
CS Alex
ME Maya
EE Mira
Courses ⋈ HoD
Dept CID Course Head
CS CS01 Database Alex
ME ME01 Mechanics Maya
EE EE01 Electronics Mira
Outer Joins
Theta Join, Equijoin, and Natural Join are called inner joins. An inner join includes only
those tuples with matching attributes and the rest are discarded in the resulting relation.
Therefore, we need to use outer joins to include all the tuples from the participating
relations in the resulting relation. There are three kinds of outer joins − left outer join,
right outer join, and full outer join.
Left Outer Join(R S)
All the tuples from the Left relation, R, are included in the resulting relation. If there are
tuples in R without any matching tuple in the Right relation S, then the S-attributes of
the resulting relation are made NULL.
Left
A B
100 Database
101 Mechanics
102 Electronics
Right
A B
100 Alex
102 Maya
104 Mira
Courses HoD
A B C D
100 Database 100 Alex
101 Mechanics --- ---
102 Electronics 102 Maya
Right Outer Join: ( R S)
All the tuples from the Right relation, S, are included in the resulting relation. If there
are tuples in S without any matching tuple in R, then the R-attributes of resulting relation
are made NULL.
Courses HoD
A B C D
100 Database 100 Alex
102 Electronics 102 Maya
--- --- 104 Mira
Full Outer Join: ( R S)
All the tuples from both participating relations are included in the resulting relation. If
there are no matching tuples for both relations, their respective unmatched attributes
are made NULL.
Courses HoD
A B C D
100 Database 100 Alex
101 Mechanics --- ---
102 Electronics 102 Maya
--- --- 104 Mira
Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss
of information.
o Decomposition is used to eliminate some of the problems of bad design like
anomalies, inconsistencies, and redundancy.
Types of Decomposition
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the
same relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME
22 Denim 28 Mumbai 827 Sales
33 Alina 25 Delhi 438 Marketing
46 Stephan 30 Bangalore 869 Finance
52 Katherine 36 Mumbai 575 Production
60 Jack 40 Noida 678 Testing
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
DEPT_ID EMP_ID DEPT_NAME
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID", then the
resultant relation will look like:
Employee ⋈ Department
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME
22 Denim 28 Mumbai 827 Sales
33 Alina 25 Delhi 438 Marketing
46 Stephan 30 Bangalore 869 Finance
52 Katherine 36 Mumbai 575 Production
60 Jack 40 Noida 678 Testing
Hence, the decomposition is Lossless join decomposition.
Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every
dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R
either must be a part of R1 or R2 or must be derivable from the combination of
functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional dependency
set (A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of relation R1(ABC).