0% found this document useful (0 votes)
14 views18 pages

Understanding Functional Dependencies in Databases

The document states that the training data is current only up to October 2023. It implies that any developments or information beyond that date are not included. This limitation is important for understanding the context of the provided information.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views18 pages

Understanding Functional Dependencies in Databases

The document states that the training data is current only up to October 2023. It implies that any developments or information beyond that date are not included. This limitation is important for understanding the context of the provided information.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Functional Dependency

 A functional dependency is a constraint between two sets of attributes


from the database
 Suppose that our relational database schema has n attributes A1,
A2, ..., An; let us think of the whole database as being described by a
single universal relation schema R = {A1, A2, ... , An}.
Definition.
A functional dependency, denoted by X → Y, between two sets of
attributes X and Y that are subsets of R specifies a constraint on the possible
tuples that can form a relation state r of R. The constraint is that, for any two
tuples t1 and t2 in r that have t1[X] = t2[X], they must also have t1[Y] =
t2[Y].

 This means that the values of the Y component of a tuple in r depend


on, or are determined by, the values of the X component.
 We also say that there is a functional dependency from X to Y, or that Y
is functionally dependent on X.
 The abbreviation for functional dependency is FD or f.d. The set of
attributes X is called the left-hand side of the FD, and Y is called the
right-hand side.
 Consider the relation schema EMP_PROJ in Figure from the semantics of
the attributes and the relation, we know that the following functional
dependencies should hold
a. Ssn → Ename
b. Pnumber →{Pname, Plocation}
c. {Ssn, Pnumber} → Hours

These functional dependencies specify that (a) the value of an


employee’s Social Security number (Ssn) uniquely determines the
employee name (Ename), (b) the value of a project’s number
(Pnumber) uniquely determines the project name (Pname) and location
(Plocation), and (c) a combination of Ssn and Pnumber values uniquely
determines the number of hours the employee currently works on the
project per week (Hours).

Normalization of Relations
 The normalization process, as first proposed by Codd (1972a), takes a
relation schema through a series of tests to certify whether it satisfies
a certain normal form
 The process, which proceeds in a top-down fashion by evaluating each
relation against the criteria for normal forms and decomposing
relations as necessary, can thus be considered as relational design by
analysis
 Initially, Codd proposed three normal forms, which he called first,
second, and third normal form.
 A stronger definition of 3NF—called Boyce-Codd normal form (BCNF)—
was proposed later by Boyce and Codd.
 All these normal forms are based on a single analytical tool: the
functional dependencies among the attributes of a relation. Later, a
fourth normal form (4NF) and a fifth normal form (5NF) were proposed,
based on the concepts of multivalued dependencies and join
dependencies, respectively
Normalization of data can be considered a process of analyzing the given
relation schemas based on their FDs and primary keys to achieve the
desirable properties of (1) minimizing redundancy and (2) minimizing the
insertion, deletion, and update anomalies.
Definition: The normal form of a relation refers to the highest normal form
condition that it meets, and hence indicates the degree to which it has
been normalized.

Definition of Superkey

attributes S ⊆ R with the property that no two tuples t 1 and t 2 in any


 A superkey of a relation schema R = {A1, A2…, An} is a set of

legal relation state r of R will have t1[S] = t2[S].


 A key K is a superkey with the additional property that removal of any
attribute from K will cause K not to be a superkey any more.
 The difference between a key and a superkey is that a key has to be
minimal; that is, if we have a key K = {A1, A2, ..., Ak} of R, then K –
{Ai } is not a key of R for any Ai , 1≤ i ≤ k.

 {Ssn} is a key for EMPLOYEE, whereas {Ssn}, {Ssn, Ename}, {Ssn,


Ename, Bdate}, and any set of attributes that includes Ssn are all
superkeys.
 If a relation schema has more than one key, each is called a candidate
key.
 One of the candidate keys is arbitrarily designated to be the primary
key, and the others are called secondary keys.
Definition. An attribute of relation schema R is called a prime attribute of
R if it is a member of some candidate key of R. An attribute is called
nonprime if it is not a prime attribute—that is, if it is not a member of any
candidate key.
(Note: Prime attribute means key attribute, nonprime means non key
attribute)

Boyce-Codd Normal Form (BCNF):


 Boyce-Codd normal form (BCNF) was proposed as a simpler form of
3NF, but it was found to be stricter than 3NF. That is, every relation in
BCNF is also in 3NF; however, a relation in 3NF is not necessarily in
BCNF.
 Definition. A relation schema R is in BCNF if whenever a nontrivial
functional dependency X → A holds in R, then X is a superkey of R.
 Consider an example relation TEACH with the following dependencies:
FD1: {Student, Course} → Instructor
FD2: 12 Instructor → Course

Note that {Student, Course} is a candidate key for this relation and
that the dependencies shown follow the pattern in figure with Student
as A, Course as B, and Instructor as C. Hence this relation is in 3NF but
not BCNF. Decomposition of this relation schema into two schemas is
not straightforward because it may be decomposed into one of the
three following possible pairs:
1. {Student, Instructor} and {Student, Course}.
2. {Course, Instructor} and {Course, Student}.
3. {Instructor, Course} and {Instructor, Student}

Algorithm for Minimal Cover set


Step 1: Split the FDs such that RHS contains single attribute.
Step 2: Find the redundant FDs and delete them from the set
Step 3: Find the redundant attributes on LHS and delete them
Example:
Find the Minimal Cover set for a set of functional dependencies.
{A->C, AC->D, E->AD, E->H}
Sol:
1. As per the step one RHS should contains single attribute.
{ A->C , AC->D , E->A, E->D, E->H }
2. Now we find redundant and delete them from the set. To delete find
closure for each and check.
A+ = AC, [AC]+ = ACD , E+ = EDH , E+ = EACD, E+=EH
After removing redundant from set is {A->C, AC->D, E->A, E->H}
3. Now find redundant attributes on LHS from the set {A->C, AC->D, E-
>A, E->H}
AC->D
A+=AC
C+=C
Here in A+ we can determine A and C. So C+ is a redundant.

Algorithms for rRelational database schema design


 Normal forms are insufficient on their own for good relational
database schema design
 The relation in a database must satisfy two other properties
1. Dependency Preserving Property of a decomposition
2. Lossless (or Non additive) join property

Dependency Preserving Property

If we decompose a relation R into relations R1 and R2, All dependencies


of R either must be a part of R1 or R2 or must be derivable from a
combination of functional dependency of R1 and R2.

Example:
Let a relation R (A, B, C, D ) and functional dependency {AB –> C, C –> D, D
–> A}. Relation R is decomposed into R1(A, B, C) and R2(C, D). Check
whether decomposition is dependency preserving or not.
Sol:
Given Relation R is decomposed into R1 and R2.
FD: {AB->C, C->D, D->A}
R1(A, B, C) and R2(C, D)
Let us find closure of F1 and F2
To find closure of F1, consider all combination of ABC. i.e., find closure of A,
B, C, AB, BC and AC

closure(A) = { A } // Trivial
closure(B) = { B } // Trivial
closure(C) = {C, A, D} but D can't be in closure as D is not present R1.
= {C, A}
C--> A // Removing C from right side as it is trivial attribute

closure(AB) = {A, B, C, D}
= {A, B, C}
AB --> C // Removing AB from right side as these are trivial attributes

closure(BC) = {B, C, D, A}
= {A, B, C}
BC --> A // Removing BC from right side as these are trivial attributes

closure(AC) = {A, C, D}
NULL SET

F1 {C--> A, AB --> C, BC --> A} .


Similarly F2 { C--> D }

In the original Relation Dependency { AB --> C , C --> D , D --> A}.


AB --> C is present in F1.
C --> D is present in F2.
D --> A is not preserved.

F1 U F2 is a subset of F. So given decomposition is not dependency


preserving.

Lossless (or Non additive) join property

Lossless join decomposition is a decomposition of a relation R into relations


R1, and R2 such that if we perform a natural join of relation R1 and R2, it
will return the original relation R.
In other words by lossless decomposition, it becomes feasible to reconstruct
the relation R from decomposed tables R1 and R2 by using Joins.
Introduction to relation Algebra
 Historically, the relational algebra and calculus were developed before the SQL
language. In fact, in some ways, SQL is based on concepts from both the algebra and the
calculus
 The basic set of operations for the relational model is the relational algebra. These
operations enable a user to specify basic retrieval requests as relational algebra
expressions. The result of a retrieval is a new relation, which may have been formed
from one or more relations.
Unary Relation Operations
SELECT Operation
 The SELECT operation is used to choose a subset of the tuples from a relation that
satisfies a selection condition.
 For example, to select the EMPLOYEE tuples whose department is 4, or those whose
salary is greater than $30,000.

 For example, to select the tuples for all employees who either work in department 4 and
make over $25,000 per year, or work in department 5 and make over $30,000

PROJECT Operation
 If we think of a relation as a table, the SELECT operation chooses some of the rows from
the table while discarding other rows. The PROJECT operation, on the other hand,
selects certain columns from the table and discards the other columns.
 For example, to list each employee’s first and last name and salary, we can use the
PROJECT operation as follows:

 The general form of the PROJECT operation is

 For example, the following operation:


Sequences of Operations and the RENAME Operation
 For example, to retrieve the first name, last name, and salary of all employees who work
in department number 5, we must apply a SELECT and a PROJECT operation.

 Alternatively, we can explicitly show the sequence of operations, giving a name to each
intermediate relation, as follows:

Relational Algebra Operations from Set Theory


The UNION, INTERSECTION, and MINUS Operations
UNION
For example, to retrieve the Social Security numbers of all employees who either work in
department 5 or directly supervise an employee who works in department 5, we can use the
UNION operation as follows:

The relation RESULT1 has the Ssn of all employees who work in department 5, whereas
RESULT2 has the Ssn of all employees who directly supervise an employee who works in
department 5. The UNION operation produces the tuples that are in either RESULT1 or RESULT2
or both see in the below figure.
Cartesian Product
 CARTESIAN PRODUCT operation—also known as CROSS PRODUCT or CROSS JOIN—
which is denoted by ×.
 In its binary form, this set operation produces a new element by combining every member
(tuple) from one relation (set) with every member (tuple) from the other relation (set).
 In general, the result of R(A1, A2, ..., An) × S(B1, B2, ..., Bm) is a relation Q with degree n + m
attributes Q(A1, A2, ..., An, B1, B2, ..., Bm), in that order.
 The resulting relation Q has one tuple for each combination of tuples—one from R and one from
S.
 Hence, if R has nR tuples (denoted as |R| = nR), and S has nS tuples, then R × S will have nR * nS
tuples.
Tuple Relation Calculus
 Another formal query language for the relational model called relational calculus.
 the language known as tuple relational calculus
 A calculus expression specifies what is to be retrieved rather than how to retrieve it.
 Therefore, the relational calculus is considered to be a nonprocedural language.
 A simple tuple relational calculus query is of the form:
{ t | COND(t) }
where t is a tuple variable and COND(t) is a conditional (Boolean) expression involving t that
evaluates to either TRUE or FALSE for different assignments of tuples to the variable t.
 For example, to find all employees whose salary is above Rs.50,000. we can write the following
tuple calculus expression.
{t | EMPLOYEE(t) AND [Link]>50000}
 The above query retrieves all attribute values for each selected EMPLOYEE tuple t. To retrieve
only some of the attributes—say, the first and last names—we write
{[Link], [Link] | EMPLOYEE(t) AND [Link]>50000}

Expressions and Formulas in Tuple Relational Calculus


 A general expression of the tuple relational calculus is of the form

 Two special symbols called quantifiers can appear in formulas.


 These are universal quantifier (∀) and the existential quantifier (∃)
 If F is a formula,
 The (∃) quantifier is called an existential quantifier because a formula (∃t)(F) is TRUE if there
exists some tuple that makes F TRUE.
 For the universal quantifiers, (∀t)(F) is TRUE if every possible tuple that can be assigned to free
occurrences of t in F is substituted for t.
The Domain Relation Calculus

?
Query Processing and Optimization
1. A query expressed in a high-level query language such as SQL must first be scanned,
parsed, and validated.
2. The scanner identifies the query tokens—such as SQL keywords, attribute names, and
relation names—that appear in the text of the query
3. the parser checks the query syntax to determine whether it is formulated according to
the syntax rules (rules of grammar) of the query language.
4. The query must also be validated by checking that all attribute and relation names are
valid and semantically meaningful names in the schema of the particular database being
queried.
5. An internal representation of the query is then created, usually as a tree data structure
called a query tree. It is also possible to represent the query using a graph data structure
called a query graph.
6. The DBMS must then devise an execution strategy or query plan for retrieving the
results of the query from the database files.
7. A query typically has many possible execution strategies, and the process of choosing a
suitable one for processing a query is known as query optimization.
Translating SQL queries into Relational Algebra

 In practice, SQL query is first translated into an equivalent extended relational algebra
expression—represented as a query tree data structure—that is then optimized.
 Typically, SQL queries are decomposed into query blocks, which form the basic units that can be
translated into the algebraic operators and optimized

You might also like