Relational data model
Relational data model is the primary data model, which is used widely
around the world for data storage and processing. This model is simple and
it has all the properties and capabilities required to process data with
storage efficiency.
Concepts of Relational Data Model
Tables − In relational data model, relations are saved in the format of
Tables. This format stores the relation among entities. A table has rows and
columns, where rows represents records and columns represent the
attributes.
Tuple − A single row of a table, which contains a single record for that
relation is called a tuple.
Relation instance − A finite set of tuples in the relational database system
represents relation instance. Relation instances do not have duplicate
tuples.
Relation schema − A relation schema describes the relation name (table
name), attributes, and their names.
Relation key − Each row has one or more attributes, known as relation
key, which can identify the row in the relation (table) uniquely.
Attribute domain − Every attribute has some pre-defined value scope,
known as attribute domain.
Constraints
Every relation has some conditions that must hold for it to be a valid
relation. These conditions are called Relational Integrity Constraints.
There are three main integrity constraints −
Key constraints
Domain constraints
Referential integrity constraints
Key Constraints
There must be at least one minimal subset of attributes in the relation,
which can identify a tuple uniquely. This minimal subset of attributes is
called key for that relation. If there are more than one such minimal
subset, these are called candidate keys.
Key constraints force that −
In a relation with a key attribute, no two tuples can have identical values for key
attributes.
A key attribute can not have NULL values.
Key constraints are also referred to as Entity Constraints.
Domain Constraints
Attributes have specific values in real-world scenario. For example, age can
only be a positive integer. The same constraints have been tried to employ
on the attributes of a relation. Every attribute is bound to have a specific
range of values. For example, age cannot be less than zero and telephone
numbers cannot contain a digit outside 0-9.
Referential integrity Constraints
Referential integrity constraints work on the concept of Foreign Keys. A
foreign key is a key attribute of a relation that can be referred in other
relation.
Referential integrity constraint states that if a relation refers to a key
attribute of a different or same relation, then that key element must exist.
Mapping ER Model to Relational Model
ER Model, when conceptualized into diagrams, gives a good overview of
entity-relationship, which is easier to understand. ER diagrams can be
mapped to relational schema, that is, it is possible to create relational
schema using ER diagram.
ER diagrams mainly comprise of −
Entity and its attributes
Relationship, which is association among entities.
Mapping Entity
An entity is a real-world object with some attributes.
Mapping Process (Algorithm)
Create table for each entity.
Entity's attributes should become fields of tables with their respective data types.
Declare primary key.
Mapping Relationship
A relationship is an association among entities.
Mapping Process
Create table for a relationship.
Add the primary keys of all participating Entities as fields of table with their
respective data types.
If relationship has any attribute, add each attribute as field of table.
Declare a primary key composing all the primary keys of participating entities.
Declare all foreign key constraints.
Mapping Weak Entity Sets
A weak entity set is one which does not have any primary key associated
with it.
Mapping Process
Create table for weak entity set.
Add all its attributes to table as field.
Add the primary key of identifying entity set.
Declare all foreign key constraints.
Mapping Hierarchical Entities
ER specialization or generalization comes in the form of hierarchical entity
sets.
Mapping Process
Create tables for all higher-level entities.
Create tables for lower-level entities.
Add primary keys of higher-level entities in the table of lower-level entities.
In lower-level tables, add all other attributes of lower-level entities.
Declare primary key of higher-level table and the primary key for lower-level
table.
Declare foreign key constraints.
Functional Dependency
Functional dependency (FD) is a set of constraints between two attributes in
a relation. It is a relationship between two attributes, typically between the
PK and other non-key attributes within a table. For any relation R, attribute
Y is functionally dependent on attribute X (usually the PK), if for every valid
instance of X, that value of X uniquely determines the value of Y.
Functional dependency is represented by an arrow sign (→) that is, X→Y,
where X functionally determines Y. The left-hand side attributes determine
the values of attributes on the right-hand side.
Example
This case represents an example where multiple functional dependencies are
embedded in a single representation of data. Note that because an employee
can only be a member of one department, the unique ID of that employee
determines the department.
Employee ID → Employee Name
Employee ID → Department ID
In addition to this relationship, the table also has a functional dependency
through a non-key attribute
Department ID → Department Name
Types of Functional Dependencies
Trivial functional dependency
The dependency of an attribute on a set of attributes is known as trivial
functional dependency if the set of attributes includes that attribute.
Symbolically: A ->B is trivial functional dependency if B is a subset of A.
The following dependencies are also trivial: A->A & B->B
For example: Consider a table with two columns Student_id and
Student_Name.
{Student_Id, Student_Name} -> Student_Id
is a trivial functional dependency as Student_Id is a subset of {Student_Id,
Student_Name}. That makes sense because if we know the values of
Student_Id and Student_Name then the value of Student_Id can be uniquely
determined. Also, Student_Id -> Student_Id & Student_Name ->
Student_Name are trivial dependencies too.
Non trivial functional dependency
If a functional dependency X->Y holds true where Y is not a subset of X then
this dependency is called non trivial Functional dependency.
For example:
An employee table with three attributes: emp_id, emp_name, emp_address.
The following functional dependencies are non-trivial:
emp_id -> emp_name (emp_name is not a subset of emp_id)
emp_id -> emp_address (emp_address is not a subset of emp_id)
Multivalued dependency
Multivalued dependency occurs when there are more than
one independent multivalued attributes in a [Link] example: Consider
a bike manufacture company, which produces two colors (Black and white)
in each model every year.
Here columns manuf_year and color are independent of each other and
dependent on bike_model. In this case these two columns are said to be
multivalued dependent on bike_model. These dependencies can be
represented like this:
bike_model ->> manuf_year
Transitive dependency
A functional dependency is said to be transitive if it is indirectly formed by
two functional dependencies.
For e.g.
X -> Z is a transitive dependency if the following three functional
dependencies hold true:
X->Y
Y does not ->X
Y->Z
Note: A transitive dependency can only occur in a relation of three of more
attributes.
{Book} ->{Author} (if we know the book, we knows the author name)
{Author} does not ->{Book}
{Author} -> {Author_age}
Normalization
If a database design is not perfect, it may contain anomalies, which are like
a bad dream for any database administrator. Managing a database with
anomalies is next to impossible.
Update anomalies − If data items are scattered and are not linked to each
other properly, then it could lead to strange situations. For example, when we
try to update one data item having its copies scattered over several places, a
few instances get updated properly while a few others are left with old values.
Such instances leave the database in an inconsistent state.
Deletion anomalies − We tried to delete a record, but parts of it was left
undeleted because of unawareness, the data is also saved somewhere else.
Insert anomalies − We tried to insert data in a record that does not exist at
all.
Normalization is a method to remove all these anomalies and bring the
database to a consistent state.
First Normal Form
First Normal Form is defined in the definition of relations (tables) itself. This
rule defines that all the attributes in a relation must have atomic domains.
The values in an atomic domain are indivisible units.
We re-arrange the relation (table) as below, to convert it to First Normal
Form.
Each attribute must contain only a single value from its pre-defined domain.
Second Normal Form
Before we learn about the second normal form, we need to understand the
following −
Prime (key) attribute − An attribute, which is a part of the candidate-key, is
known as a prime attribute.
Non-prime (non-key) attribute − An attribute, which is not a part of the
prime-key, is said to be a non-prime attribute.
If we follow second normal form, then every non-prime attribute should be
fully functionally dependent on prime key attribute
We see here in Student_Project relation that the prime key attributes are
Stu_ID and Proj_ID. According to the rule, non-key attributes, i.e.
Stu_Name and Proj_Name must be dependent upon both and not on any of
the prime key attribute individually. But we find that Stu_Name can be
identified by Stu_ID and Proj_Name can be identified by Proj_ID
independently. This is called partial dependency, which is not allowed in
Second Normal Form.
We broke the relation in two as depicted in the above picture. So there
exists no partial dependency.
Third Normal Form
For a relation to be in Third Normal Form, it must be in Second Normal form
and the following must satisfy −
No non-prime attribute is transitively dependent on prime key attribute.
For any non-trivial functional dependency, X → A, then either −
o X is a super-key or,
o A is prime attribute.
We find that in the above Student_detail relation, Stu_ID is the key and
only prime key attribute. We find that City can be identified by Stu_ID as
well as Zip itself. Neither Zip is a superkey nor is City a prime attribute.
Additionally, Stu_ID → Zip → City, so there exists transitive dependency.
If we know the zip code 20001, we can determine the city is Washington DC.
To bring this relation into third normal form, we break the relation into two
relations as follows −
Relational Algebra
Relational database systems are expected to be equipped with a query
language that can assist its users to query the database instances. There
are two kinds of query languages − relational algebra and relational
calculus.
Relational Algebra
Relational algebra is a procedural query language, which takes instances of
relations as input and yields instances of relations as output. It uses
operators to perform queries. An operator can be either unary or binary.
They accept relations as their input and yield relations as their output.
Relational algebra is performed recursively on a relation and intermediate
results are also considered relations.
The fundamental operations of relational algebra are as follows −
Select
Project
Union
Set different
Cartesian product
Rename
We will discuss all these operations in the following sections.
Select Operation (σ)
It selects tuples that satisfy the given predicate from a relation.
Notation − σp(r)
Where σ stands for selection predicate and r stands for relation. p is
prepositional logic formula which may use connectors like and, or, and not.
These terms may use relational operators like − =, ≠, ≥, < , >, ≤.
For example −
σsubject = "database"(Books)
Output − Selects tuples from books where subject is 'database'.
σsubject = "database" and price = "450"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is
450.
σsubject = "database" and price = "450" or year > "2010"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is
450 or those books published after 2010.
Project Operation (∏)
It projects column(s) that satisfy a given predicate.
Notation − ∏A1, A2, An (r)
Where A1, A2 , An are attribute names of relation r.
Duplicate rows are automatically eliminated, as relation is a set.
For example −
∏subject, author (Books)
Selects and projects columns named as subject and author from the
relation Books.
Union Operation (∪)
It performs binary union between two given relations and is defined as −
r ∪ s = { t | t ∈ r or t ∈ s}
Notation − r U s
Where r and s are either database relations or relation result set
(temporary relation).
For a union operation to be valid, the following conditions must hold −
r, and s must have the same number of attributes.
Attribute domains must be compatible.
Duplicate tuples are automatically eliminated.
∏ author (Books) ∪ ∏ author (Articles)
Output − Projects the names of the authors who have either written a book
or an article or both.
Set Difference (−)
The result of set difference operation is tuples, which are present in one
relation but are not in the second relation.
Notation − r − s
Finds all the tuples that are present in r but not in s.
∏ author (Books) − ∏ author (Articles)
Output − Provides the name of authors who have wri en books but not ar cles.
Cartesian Product (Χ)
Combines information of two different relations into one.
Notation − r Χ s
Where r and s are relations and their output will be defined as −
r Χ s = { q t | q ∈ r and t ∈ s}
σauthor = 'Elmasri'(Books Χ Articles)
Output − Yields a relation, which shows all the books and articles written
by Elmasri.
Rename Operation (ρ)
The results of relational algebra are also relations but without any name.
The rename operation allows us to rename the output relation. 'rename'
operation is denoted with small Greek letter rho ρ.
Notation − ρ x (E)
Where the result of expression E is saved with name of x.
We understand the benefits of taking a Cartesian product of two relations,
which gives us all the possible tuples that are paired together. But it might
not be feasible for us in certain cases to take a Cartesian product where we
encounter huge relations with thousands of tuples having a considerable
large number of attributes.
DBMS - Joins
Join is a combination of a Cartesian product followed by a selection
process. A Join operation pairs two tuples from different relations, if and
only if a given join condition is satisfied.
We will briefly describe various join types in the following sections.
Theta (θ) Join
Theta join combines tuples from different relations provided they satisfy the
theta condition. The join condition is denoted by the symbol θ.
Notation
R1 ⋈θ R2
R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,..
,Bn) such that the attributes don’t have anything in common, that is R1 ∩
R2 = Φ.
Theta join can use all kinds of comparison operators.
Student
SID Name Std
101 Alex 10
102 Maria 11
Subjects
Class Subject
10 Math
10 English
11 Music
11 Sports
Student_Detail −
STUDENT ⋈[Link] = [Link] SUBJECT
Student_detail
SID Name Std Class Subject
101 Alex 10 10 Math
101 Alex 10 10 English
102 Maria 11 11 Music
102 Maria 11 11 Sports
Equijoin
When Theta join uses only equality comparison operator, it is said to be
equijoin. The above example corresponds to equijoin.
Natural Join ( ⋈ )
Natural join does not use any comparison operator. It does not concatenate
the way a Cartesian product does. We can perform a Natural Join only if
there is at least one common attribute that exists between two relations. In
addition, the attributes must have the same name and domain.
Natural join acts on those matching attributes where the values of
attributes in both the relations are same.
Courses
CID Course Dept
CS01 Database CS
ME01 Mechanics ME
EE01 Electronics EE
HoD
Dept Head
CS Alex
ME Maya
EE Mira
Courses ⋈ HoD
Dept CID Course Head
CS CS01 Database Alex
ME ME01 Mechanics Maya
EE EE01 Electronics Mira
Outer Joins
Theta Join, Equijoin, and Natural Join are called inner joins. An inner join
includes only those tuples with matching attributes and the rest are
discarded in the resulting relation. Therefore, we need to use outer joins to
include all the tuples from the participating relations in the resulting
relation. There are three kinds of outer joins − left outer join, right outer
join, and full outer join.
Left Outer Join(R S)
All the tuples from the Left relation, R, are included in the resulting relation.
If there are tuples in R without any matching tuple in the Right relation S,
then the S-attributes of the resulting relation are made NULL.
Left
A B
100 Database
101 Mechanics
102 Electronics
Right
A B
100 Alex
102 Maya
104 Mira
Courses HoD
A B C D
100 Database 100 Alex
101 Mechanics --- ---
102 Electronics 102 Maya
Right Outer Join: ( R S)
All the tuples from the Right relation, S, are included in the resulting
relation. If there are tuples in S without any matching tuple in R, then the
R-attributes of resulting relation are made NULL.
Courses HoD
A B C D
100 Database 100 Alex
102 Electronics 102 Maya
--- --- 104 Mira
Full Outer Join: ( R S)
All the tuples from both participating relations are included in the resulting
relation. If there are no matching tuples for both relations, their respective
unmatched attributes are made NULL.
Courses HoD
A B C D
100 Database 100 Alex
101 Mechanics --- ---
102 Electronics 102 Maya
--- --- 104 Mira