0% found this document useful (0 votes)
256 views98 pages

ICT 252: Theory of Databases Lecture Notes

This document provides an overview of database concepts and the relational model. It discusses key topics such as the components of a database application including data independence, database users, and application architecture. Database models like hierarchical, network, and relational models are covered. The relational model section explains concepts like tables, relations, domains, attributes, schemas, relationships, keys, constraints, and relational algebra operations. The entity-relationship modeling approach is also summarized which involves identifying entity types and attributes, relationships between entities, and converting the model into a relational database design.

Uploaded by

yararayan
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
Download as doc, pdf, or txt
0% found this document useful (0 votes)
256 views98 pages

ICT 252: Theory of Databases Lecture Notes

This document provides an overview of database concepts and the relational model. It discusses key topics such as the components of a database application including data independence, database users, and application architecture. Database models like hierarchical, network, and relational models are covered. The relational model section explains concepts like tables, relations, domains, attributes, schemas, relationships, keys, constraints, and relational algebra operations. The entity-relationship modeling approach is also summarized which involves identifying entity types and attributes, relationships between entities, and converting the model into a relational database design.

Uploaded by

yararayan
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1/ 98

FBE Computer Science Department Lecture Notes Theory of Databases

ICT 252: Theory of Databases


Lecture Notes
Contents
1 Introduction to Databases.............................................................................................................4
1.1 Course Overvie....................................................................................................................!
1." #ntro$uction............................................................................................................................!
1.".1 %hat is a $atabase&............................................................................................................!
1."." %hat is a DB'S&..............................................................................................................(
1.".) %hy use $atabases.............................................................................................................(
2 Database Models.............................................................................................................................6
".1 Fi*e+,rocessin- Systems..........................................................................................................
"." Netor/ Navi-ation Systems.................................................................................................0
".) 1e*ationa* 'o$e*..................................................................................................................12
".! Ob3ect+oriente$ 'o$e*..........................................................................................................1"
".( De$uctive 'o$e*..................................................................................................................1"
".. Summary...............................................................................................................................1)
Co!"onents of a Database #""lication.....................................................................................1
).1 Data #n$epen$ence...............................................................................................................1)
)." Different Types of 4ser........................................................................................................1!
).) Three+Tier 5pp*ication 5rchitecture.....................................................................................1(
).! Features of a DB'S.............................................................................................................10
).( Types of DB'S....................................................................................................................10
4 The $elational Model...................................................................................................................1%
!.1 Tab*es 6 1e*ations................................................................................................................17
!." Domains 6 5ttributes...........................................................................................................18
!.)........................................................................................................................................................18
!.).1 'ore about attributes.......................................................................................................18
!.! Schemas................................................................................................................................"2
!.( 1e*ationships........................................................................................................................""
!.. 9eys......................................................................................................................................"!
!...1 Super/eys........................................................................................................................."!
!..." Can$i$ate 9eys................................................................................................................"!
!...) ,rimary /ey......................................................................................................................"(
!.0 Constraints............................................................................................................................"(
!.0.1 Functiona* Depen$encies.................................................................................................".
!.0." Entity #nte-rity.................................................................................................................".
!.0.) 1eferentia* #nte-rity.........................................................................................................".
!.0.! Tri--ers............................................................................................................................"0
5 $elational #l&ebra........................................................................................................................2%
(.1 Basic Operations..................................................................................................................."8
(.1.1 Se*ect................................................................................................................................"8
(.1." ,ro3ect.............................................................................................................................."8
(.1.) 4nion...............................................................................................................................)1
(.1.! Set Difference..................................................................................................................)"
(.1.( Cartesian ,ro$uct.............................................................................................................)"
(.1.. 1ename............................................................................................................................)(
(." 5$$itiona* Operations...........................................................................................................).
(.".1 Set #ntersection................................................................................................................).
(."." Natura* :oin......................................................................................................................)0
(.".) Division............................................................................................................................)8
(.".! 5ssi-nment......................................................................................................................!"
(.) E;ten$e$ 1e*ationa*+5*-ebra Operations.............................................................................!"
(.).1 <enera*ise$ ,ro3ection.....................................................................................................!)
(.)." 5--re-ate Functions........................................................................................................!)
(.).) Outer :oin.........................................................................................................................!(
(.! Database 'o$ifications........................................................................................................!0
(.!.1 De*etion............................................................................................................................!0
,a-e 1 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
(.!." #nsertion...........................................................................................................................!7
(.!.) 4p$ate..............................................................................................................................!8
6 'ntity($elationshi" Modellin&....................................................................................................5)
..1 #ntro$uction..........................................................................................................................(2
.." Overvie Entities= 5ttributes= 1e*ationships.....................................................................(1
..) #$entify Entity Types............................................................................................................("
..! #$entify 5ttributes of the Entities.........................................................................................("
..( Se*ect #$entifiers for Entity Types........................................................................................()
... #$entify 1e*ationships Beteen Entities...............................................................................(!
....1 1e*ationship De-ree.........................................................................................................(!
...." 1e*ationship Car$ina*ity..................................................................................................((
..0 %ea/ Entities........................................................................................................................(7
..7 5ssociative Entities..............................................................................................................(8
..8 <enera*isation >ierarchies....................................................................................................2
..12 5pproach to E+1 Dia-rams...................................................................................................1
..11 Convert an E+1 Dia-ram to a 1e*ationa* Database Desi-n..................................................."
..1" Norma*isation of the Data 'o$e*..........................................................................................!
..1) ,hysica* Database Desi-n......................................................................................................!
..1! 4'L......................................................................................................................................!
* +hysical Database Desi&n............................................................................................................64
0.1 Overvie ,hysica* DB Desi-n...........................................................................................(
0." Choosin- Data Types..............................................................................................................
0.".1 Ca*cu*ate$ Fie*$s................................................................................................................
0."." Co$in-?Compression.........................................................................................................
0.) Contro**in- Data #nte-rity......................................................................................................
0.).1 Defau*t va*ues....................................................................................................................
0.)." 1an-e Contro*...................................................................................................................0
0.).) 1eferentia* #nte-rity..........................................................................................................0
0.).! Nu** @a*ue Contro*............................................................................................................0
% Inde,in&.........................................................................................................................................6*
7.1 ,hysica* vs Lo-ica* vies of $ata..........................................................................................0
7." Fi*e Structures........................................................................................................................8
7.".1 SeAuentia* fi*es.................................................................................................................02
7."." >ash fi*es.........................................................................................................................02
7.) #n$e;+SeAuentia* Fi*e Or-aniBation......................................................................................01
7.).1 %hat is an in$e;&.............................................................................................................01
7.)." ,rimary?C*usterin- #n$e;.................................................................................................01
7.).) Secon$ary?Non+c*usterin- #n$e;.....................................................................................0)
7.).! Dense 6 Sparse #n$ices...................................................................................................0)
7.! 'o$ifyin- #n$e;+SeAuentia* fi*es Cinsert= up$ate= $e*eteD....................................................0!
7.!.1 De*ete from the $ata fi*e..................................................................................................0!
7.!." #nsert to the $ata fi*e........................................................................................................0(
7.!.) 4p$ate the $ata fi*e..........................................................................................................0(
7.!.! 5*-orithm for insertin- to the in$e;................................................................................0(
7.!.( 5*-orithm for $e*etin- from the in$e;.............................................................................0(
7.( 'u*ti*eve* #n$ices.................................................................................................................0.
7.. Summary #n$e;+SeAuentia* Fi*e Or-aniBation..................................................................00
7.0 Binary Search Trees + 1ecap................................................................................................00
7.0.1 Binary Search Tree as an #n$e;.......................................................................................07
7.7 '+ay search trees...............................................................................................................08
7.8 B Trees..................................................................................................................................72
7.12 #n$e;e$ SeAuentia* Fi*es + BE Tree......................................................................................71
7.11 #nsertin- 6 De*etin- to?from a BE Tree...............................................................................7"
7.1" Summary of in$e; fi*e structures..........................................................................................7"
7.1) #n$ices on 'u*tip*e 9eys......................................................................................................7!
7.1! Enforcin- 4niAueness ith an #n$e;....................................................................................7!
7.1( %hy use in$e;es 6 choosin- fie*$s to in$e;.......................................................................7!
7.1(.1 Choosin- in$ices.........................................................................................................7!
7.1(." Fuery OptimiBation ho in$e;es are use$..............................................................7(
7.1(.) 4se of Statistics...........................................................................................................70
,a-e " of 87
FBE Computer Science Department Lecture Notes Theory of Databases
7.1. Creatin- #n$e;es ith SFL..................................................................................................70
7.10 FuiB......................................................................................................................................77
- ./L................................................................................................................................................%%
1) Introduction to Transactions 0 Concurrency...........................................................................%%
12.1 #ntro$uction..........................................................................................................................77
12." 5C#D ,roperties...................................................................................................................78
12.) DB'S Services for Transactions.........................................................................................81
12.).1 5tomicity....................................................................................................................81
12.)." Consistency.................................................................................................................8"
12.).) #so*ation.......................................................................................................................8"
12.).! Durabi*ity....................................................................................................................8"
12.! Concurrency Contro*............................................................................................................8)
12.( Loc/s....................................................................................................................................8!
12.. #mp*icit Transactions............................................................................................................8!
12.0 Demo in Lab.........................................................................................................................8(
11 Introduction to .ecurity...............................................................................................................-5
,a-e ) of 87
FBE Computer Science Department Lecture Notes Theory of Databases
1 Introduction to Databases
1b2ecti3eG To intro$uce stu$ents to the course= to *earn hat is a $atabase an$ hat is
a DB'S.
$eadin& MaterialG
+re"arationG have copies of the course out*ine rea$y to -ive to stu$ents Cor have
a*rea$y -iven to themD.
1.1 Course 13er3ie4
1evie the course out*ine.
Discuss course te;ts emphasise hat boo/s are avai*ab*e in the *ibrary an$ hat
ones they can -et from the boo/store.
Emphasise importance of rea$in- boo/s i** not be provi$in- $etai*e$ han$outs as
the te;ts are avai*ab*e.
Hou shou*$ a*so ma/e sure to atten$ *ectures an$ ta/e notes as # i** e;p*ain
concepts to you an$ then you can rea$ more about them in the te;tboo/s.
5s you $o your rea$in-= you shou*$ try to !a5e notes too bein- a university stu$ent
is not 3ust about copyin- thin-s from the boar$ you must thin/ an$ ana*yse yourse*fI
1.2 Introduction
1.2.1 6hat is a database7
5s/ c*ass e*icit i$eas rite up on boar$.
Can they -ive e;amp*es of a $atabase&
Thin/ about a *ist of names 6 phone numbers for e;amp*e= the phone boo/= your
mobi*e phone= even a *ist ritten on a piece of paper.
Thin/ about the *ibrary Cant them to reco-niBe that the car$ cata*o-ue is a $atabase
arran-e$ by author an$ by tit*eD.
Data in a $atabase is re*ate$ in some ay a co**ection of ran$om $ata is not rea**y a
$atabase in the true sense of the or$.
%hen finishe$= summarise an$ intro$uce this $efinition of a $atabaseG
5 $atabase is a co**ection of related data.
This is Auite a *oose $efinition of $atabase. For e;amp*e a** the or$s on a printe$
pa-e of te;t cou*$ be seen as a $atabase. For our purposes= the $efinition of a $atabase
is more restricte$.
For this course= e are *oo/in- at $atabases that are use$ most*y in or-aniBations. #n
this sense= a $atabase has the fo**oin- propertiesG
1. 5 $atabase represents some aspect of the rea* or*$ this is sometimes ca**e$
the miniworld.
,a-e ! of 87
FBE Computer Science Department Lecture Notes Theory of Databases
". 5 $atabase is a co**ection of $ata that is *o-ica**y coherent in other or$s= it
is not 3ust ran$om $ata put to-ether there is some connection beteen the
$ifferent $ata in the $atabase.
). 5 $atabase is $esi-ne$= bui*t an$ popu*ate$ ith $ata for a specific purpose. #t
has an intended group of users an$ some applications CusesD in hich those
users are intereste$.
To summariseG a $atabase has some source that provi$es its $ata= it interacts ith
events in the rea* or*$ an$ it has an intereste$ au$ience of users.
The siBe an$ comp*e;ity of a $atabase varies a *ot. Hour *ist of names an$ phone
numbers may be *ess than 122 recor$s of $ata= each recor$ havin- a simp*e structure.
The re-istrarJs office of the university ou*$ have a more comp*e; $atabase that
ref*ects the *in/s beteen stu$ents= courses= teachers an$ so on ith 1222s of
recor$s in it.
1.2.2 6hat is a D8M.7
DB'S Database 'ana-ement System.
5s/ the c*ass hat $o you thin/ a DB'S is&
5 DB'S is a set of pro-rams that enab*es users to create a $atabase an$ access the
$ata in the $atabase.
5 DB'S a**os the user to $efine the structure of a $atabase= put the $ata into the
$atabase an$ to manipu*ate the $ata in the $atabase.
There are various DB'S pac/a-es avai*ab*e to buy or as freeare. They a*so provi$e
other functions= such as ays to contro* ho can access the $ata.
Hou cou*$ a*so rite your on DB'S app*ication Ce.-. usin- CEE= @B or :avaD
because $ata is store$ in structures that you can manipu*ate pro-rammatica**y.
But hy reinvent the hee* every time&&
So= most $atabase app*ications are create$ usin- an e;istin- DB'S so you $onJt
have to orry too much about un$er*yin- $ata structures an$ ho to or/ ith them.
%e can ca** the $atabase an$ the DB'S to-ether a database syste!.
%e i** start usin- the 'icrosoft 5ccess $atabase in the *abs an$ e i** a*so *oo/ at
'icrosoft SFL Server *ater.
'ost DB'Ss no have a c*ient+server architecture so $ata can easi*y be share$ on a
netor/. The $atabase resi$es on the server an$ the c*ients access the $b usin- specia*
softare app*ications.
%e i** *oo/ at the functions of a DB'S an$ some types of $atabase system in more
$etai* a *itt*e *ater in the course.
1.2. 6hy use databases
Hou te** me no that you /no hat a $atabase is= hy $o you thin/ e shou*$ use
$atabases&
,a-e ( of 87
FBE Computer Science Department Lecture Notes Theory of Databases
Some reasonsG
To or-aniBe information
To be ab*e to -et reports from $ata
To protect $ata the security features of a $atabase a**o you to specify ho
has access= hat $ata they can see an$ hat they can $o ith the $ata
To be ab*e to share $ata
2 Database Models
1b2ecti3eG to brief*y $escribe the evo*ution of $atabase systems an$ to un$erstan$
some of the main $ata mo$e*s use$ an$ no in use.
$eadin&G ,ost Chapter 1K 'annino Chapter 1.
+re"arationG maybe photocopy Chapter ! of Everest CLo-ica* Data StructuresD
for e;tra rea$in-= may ai$ stu$entsJ un$erstan$in- of the $ifferent mo$e*s.
There are $ifferent ays of storin- the $ata in a $atabase. The ay you are probab*y
most fami*iar ith is storin- $ata as ros in a tab*e e.-. a *ist of names= a$$resses an$
phone numbers in E;ce*.
Lto i**ustrate $ra a simp*e tab*e on the boar$ ith co*umn tit*es an$ a coup*e of
ros of $ataM
This is the basis of one $atabase mo$e* the re*ationa* mo$e*.
There are other mo$e*s for $atabases. The $ata mo$e* is a ay of $escribin- the $ata
structures use$ in the $atabase.
Over time= $atabase techno*o-y has evo*ve$ Csho on boar$ *i/e this ith a simp*e
time*ineDG
18.2s Fi*e+base$ mo$e*s
1802s Netor/ navi-ation mo$e*s Cnetor/ 6 hierarchica* mo$e*sD
1872s 1e*ationa* mo$e*
1882s Ob3ect mo$e*s
No= many $atabase systems use the re*ationa* mo$e* or ob3ect+oriente$ mo$e*s.
To -ive you an i$ea of ho $atabase systems have evo*ve$= e i** *oo/ at some of
the ear*ier mo$e*s in brief.
5fter this= the course i** focus on the re*ationa* mo$e*. 'ost $atabase systems no
use the re*ationa* mo$e* a*thou-h OO $atabases are becomin- more i$e*y use$.
2.1 9ile(+rocessin& .yste!s
The ear*iest $atabase systems $urin- the 18.2s ere rea**y 3ust fi*e+processin-
systems. Data as store$ in flat text files in the operatin- system.
Data as a*so -roupe$ into records e.-. a recor$ of Customer $ata in a ban/ mi-ht
*oo/ *i/e thisG
Number= Name= FathersName= ,honeNumber= 5$$ress= Savin-s5ccountNumber=
Savin-s5ccountBa*ance= Current5ccountNumber= Current5ccountBa*ance
,a-e . of 87
FBE Computer Science Department Lecture Notes Theory of Databases
1")=Tesfay= 9infe= 2! !20.22= 9ebe*e 0 'e/e**e= 1")!= 1222= 870.= (!..(2
This $ata ou*$ be store$ in a f*at te;t fi*e on the computer. 5 fi*e contains $ata about
one entity e.-. Customer is an entity. Each recor$ in the fi*e represents an instance of
that entity.
Each ro is a recor$K the fi*e represents a recor$ type.
5 recor$ is a fun$amenta* unit in any $atabase system in fact= e i** see *ater that
the $ata mo$e*s use$ no sti** store $ata in recor$s= but in $ifferent $ata structures.
,ro-rammers ha$ to rite specific pro-rams to carry out tas/s e.-. to retrieve a**
customer recor$s or to fin$ the customer recor$ for a -iven customer number.
Different pro-rams ha$ to be ritten for each tas/ so a *ot of or/ as invo*ve$.
#t cou*$ a*so *ea$ to $ata bein- $up*icate$ in $ifferent fi*es e.-. thin/ of a ban/Js
$atabase. 5 customerJs name an$ phone number cou*$ appear in a fi*e containin- a**
savin-s accounts. #f the same customer a*so has a current account= their name an$
phone number ou*$ a*so appear in the fi*e containin- a** current accounts.
#f the customer chan-es their phone number it has to be chan-e$ in both fi*es.
O1G if a customer has " savin-s accounts nee$ " ros in the fi*e name an$ phone
number are repeate$.
This system or/e$= but $i$nJt a**o $ata to be re*ate$ e.-. to have the customer
name an$ phone number in one p*ace= an$ re*ate a savin-s account to it an$ a current
account to it.
2.2 Net4or5 Na3i&ation .yste!s
The ne;t -eneration of systems $urin- the 1802s came about as $eve*opers
reco-niBe$ that $ata in a system as re*ate$ to other $ata in the same system. This
cou*$ be mo$e**e$ in hierarchies or netor/s.
For e;amp*e= in our ban/ system if e use$ a hierarchical !odel= the customer
recor$ type ou*$ be at the top of the hierarchy. Li/e thisG
The *e-s on the *in/s beteen the recor$ types in$icate a 1+to+many re*ationship Ctry
to e*icit this from stu$ents they may a*rea$y have seen re*ationships in the *abD. This
is the hierarchica* structure. 5 Customer has 5ccounts. 5n 5ccount contains
transactions.
,a-e 0 of 87
Transaction
5ccount
Customer
has
contains
FBE Computer Science Department Lecture Notes Theory of Databases
LTo e;p*ain further compare to a fi*in- cabinet a section for each customerK insi$e
the section for a Customer= there is a fo*$er for each 5ccount they have. #nsi$e the
5ccount fo*$er= $etai*s of each transaction ma$e in the account are store$.M
The $ata is or-aniBe$ into a tree structure. The tree consists of $ata records Cas beforeD
an$ links beteen them.
Customer is a recor$ type each of the ban/Js customers has a recor$ of type
Customer.
5 recor$ is a co**ection of attributes or fie*$s e.-. for Customer= the fie*$s mi-ht be
Name= ,hone= 5$$ress. For 5ccount= the fie*$s mi-ht be type Csavin-s or currentD=
account number an$ ba*ance.
The tree represents parent-child relationships one Customer can have many
5ccount recor$s associate$ ith it e.-. a savin-s 5ccount an$ a current 5ccount. 5n
5ccount has Transactions associate$ ith it Ce.-. $eposit money= ith$ra moneyD.
To he*p un$erstan$in- $ra a tree shoin- a particu*ar customer an$ his?her
accounts *i/e thisG
Transaction 1 Transaction " Transaction ) Transaction ! Transaction (
The $ata is sti** store$ in a f*at fi*eN.*i/e previous*yNbut no the f*at fi*e has )
$ifferent recor$ types in it an$ the recor$ types are re*ate$G
CustomerG 1")= So*omon= Tesfay= 2! !20.22= 9ebe*e 0=
5ccountG Savin-s= 1")!(.= (!22
TransactionGN..
TransactionGN.
5ccountG Current="07.(!= "222
CustomerG NNNNN
There are sti** prob*ems ith this mo$e* a chi*$ no$e in the tree cannot have more
than one parent.
So if the same account is associate$ ith " customers Ce.-. a husban$ an$ ife have a
3oint account an$ a*so separate accountsD= e cannot *in/ the one account recor$ to "
$ifferent customer recor$s.
But e can have the account recor$ appearin- in to tree branches. This can *ea$ to
$up*icate$ $ata an$ inconsistent $ata= if the account is not up$ate$ in a** the
branches.
,a-e 7 of 87
So*omon Tesfay 2! 1")!(. 9ebe*e 0
Savin-s 1")!(. (!22 Current "07.(! "222
FBE Computer Science Department Lecture Notes Theory of Databases
5 pro-rammer sti** has to rite the pro-ram to access the account information= but
no the pro-ram navi-ates throu-h the hierarchy e.-. to fin$ the ba*ance of the
savin-s account for a -iven customer number.
5ccessin- $ata in this type of structure is fast but on*y if you are accessin- $ata
from the top e.-. by customer name or number. #f you ant to fin$ a** savin-s
accounts ith a ba*ance -reater than 1222= you have to rite a pro-ram that accesses
each customer an$ then the savin-s account $ata for each customer.
The net4or5 !odel Cnothin- to $o ith L5ND a*so uses recor$s an$ *in/s but in a
$ifferent ay.
#nstea$ of puttin- $ata in a hierarchy= it structures $ata in a netor/ *i/e thisG
Entry point Entry point
5** the points in the netor/ are *in/e$ to-ether= in a chain.
So the netor/ for the same customer a-ain ou*$ be *i/e thisG
#t is a netor/ of *in/s beteen the $ifferent recor$s. 5-ain= Customer= 5ccount an$
Transaction are record types.
1e*ationships are mo$e**e$ usin- sets. #n a set= there is one oner recor$ type
CCustomerD an$ 1 or more member recor$ types C5ccountD.
This is a set type= *et us name it Customer5ccount. #t mo$e*s the re*ationship beteen
Customer 6 5ccount that a Customer can have 1 or more accounts.
There i** be many occurrences of the Customer5ccount set in the $atabase one for
each Customer. 5 set occurrence re*ates one recor$ from the oner recor$ type
CCustomerD to the set of recor$s from the member recor$ type re*ate$ to it C5ccountD.
For each Customer5ccount set= there is a netor/ of *in/s from the Customer recor$
to the 5ccount recor$s.
The $ata is sti** store$ in fi*es but no e a*so have to $efine the set types.
/uestion to c*assG Can you i$entify another set type in this mo$e*& 5G 5ccount+
Transaction an account has 2 or more transactions.
The entry points are recor$s that can be searche$. For e;amp*e= to fin$ a** savin-s
accounts ith a ba*ance over 1222= a pro-ram can enter the netor/ at the 5ccount
entry point an$ then search throu-h a** the 5ccount recor$s. 5n entry point is
imp*emente$ as an in$e; on a *ist of recor$s so the in$e; must be create$ before
such Aueries can be run a-ainst the $atabase.
,a-e 8 of 87
Customer 5ccount Transaction
Entry point
So*omon Tesfay 2! 1")!(. 9ebe*e 0
Savin-s 1")!(. (!22
Current "07.(! "222
Trans1 Trans"
FBE Computer Science Department Lecture Notes Theory of Databases
L%hat is an in$e;& #t is a *ist of a** possib*e va*ues in a particu*ar fie*$ e.-. a** the
ba*ance va*ues. Each va*ue in the *ist has pointers to a** the recor$s that have that
va*ue in the fie*$. %e i** *earn more about in$e;in- *ater in the course.M
The arros can be fo**oe$ by the DB'S pro-rams to fin$ matchin- $ata e.-. to fin$
the savin-s account that a transaction be*on-s toK to fin$ the customer ho is the
oner of the savin-s account.
The arros represent efficient connections beteen $ata e*ements.
/uestionG in terms of $ata structures can you thin/ of ho this cou*$ be
imp*emente$ Ca ay of *in/in- recor$s in a *ist or setD& 5G pointers Cif they have $one
the Data Structures course= they shou*$ be ab*e to fi-ure this outD.
The arros are actua**y imp*emente$ as pointers embe$$e$ in the fi*es. Each recor$
has a pointer to the ne;t an$ previous recor$ in the netor/.
1emember a pointer points to a stora-e *ocation.
Both fi*e an$ netor/ type systems use$ proce$ura* pro-rammin- to access $ata
*an-ua-es here the co$e ha$ step by step instructions of hat to $o ith the $ata.
They store $ata in fi*es= consistin- of recor$s of $ata thin/ of a te;t fi*e that you
type characters into.
The $eve*opment of the netor/ mo$e* as he*pe$ by a committee of e;perts ho
$efine$ a DDL CData Definition Lan-ua-eD an$ a D'L CData 'anipu*ation
Lan-ua-eD for the mo$e*. The DDL as $esi-ne$ to be in$epen$ent of the *an-ua-e
bein- use$ to manipu*ate the $ata. The D'L as inc*u$e$ in $ifferent *an-ua-es e.-.
ith FO1T15N.
The notion of a DDL an$ a D'L are sti** in use no= in SFL= as e i** see *ater.
Durin- the 1872s= another type of mo$e* evo*ve$ that use$ nonproce$ura* *an-ua-es
to access $ata.
2. $elational Model
This type of mo$e* is the re*ationa* mo$e*= hich e are -oin- to focus on in this
course.
1e*ationa* approach ori-inate$ by E.F. Co$$ in the 1802s= became i$e*y use$ in
the 1872s an$ 1882s.
Base$ on mathematics re*ationa* a*-ebra.
Softare pro-rammers $eve*ope$ co$e that as efficient at $oin- itN.har$are
improve$ in a ay that ma$e it a** possib*e= so the re*ationa* mo$e* became the most
i$e*y use$.
#n a re*ationa* $atabase= $ata is store$ in tab*es Ca*so ca**e$ re*ationsD. Each tab*e is
physica**y separate to other tab*es in the system un*i/e netor/ or hierarchy mo$e*s=
here there must be physica* fi*e *in/s beteen $ata sets.
5 tab*e stores $ata about one specific entity in the mini+or*$ represente$ by the
$atabase e.-. about Customers. 5 ro in a tab*e represents an instance of the entity i.e.
a ro in the Customers tab*e represents one Customer recor$. 5 ro in the 5ccounts
tab*e represents one 5ccount recor$.
,a-e 12 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
The co*umns in a tab*e represent attributes of the entity. Therefore the Customers
tab*e ou*$ have co*umns for name= phone number= a$$ress.
There are no physica* *in/s beteen tab*es instea$= the *in/s beteen the $ata are
mo$e**e$ by storin- matchin- $ata in each tab*e. For e;amp*e= e can store the
uniAue Customer #D number ith each 5ccount recor$ so Customer Name becomes
an attribute or co*umn in the 5ccounts tab*e.
Customers
Custo!erID Na!e +honeNu!ber #ddress
1 So*omon Tesfay 2! 1")!(. 9ebe*e 0
" N.
5ccounts
Ty"e Nu!ber 8alance Custo!erID
Savin-s 1")!(. (!22 1
Current "07.(! "222 1
/uestionG hy $o you thin/ e ou*$ use a Customer#D instea$ of Customer Name
in the 5ccounts tab*e&
#G try to -et stu$ents to thin/ about this hat if to $ifferent customers have the
same name& The anser is that e can ensure that Customer#D is $ifferent or uniAue
for each customer e cannot ma/e them a** have $ifferent names. 'ention that this
is the i$ea of keys in re*ationa* $atabase tab*es e i** *earn more about this soon.
5s/ them hat $o they thin/ is the uniAue va*ue for 5ccounts. 5G Number.
The bi- a$vanta-e of the re*ationa* mo$e* is that if the $ata is e**+$esi-ne$= any
Aueries can be ansere$ ithout ritin- specific pro-rams this is $ifferent to the
netor/ an$ hierarchica* mo$e*s= here the $ata $esi-n ou*$ have to ta/e into
account hat Aueries i** be ma$e on the $ata.
The SFL CStructure$ Fuery Lan-ua-eD as $eve*ope$ to $efine an$ manipu*ate $ata
in re*ationa* $atabase tab*es if the $ata is e**+$esi-ne$= a*most any possib*e Auery
on the $ata can be ansere$ usin- SFL.
SFL has DDL 6 D'L e*ements.
For e;amp*e to fin$ a** accounts ith a ba*ance of more than 1222 or to fin$ the
customer phone number for the account number 1")!(..
SFL is not a proce$ura* *an-ua-e it is declarative. That means you $o not have to
rite co$e that specifies how to $o the or/. Hou simp*y state what you ant from
the $atabase an$ the DB'S en-ine $oes the rest of the or/. This ma/es it a very
poerfu* *an-ua-e *ess effort is reAuire$ to -et more resu*ts= compare$ to o*$er
$atabase mo$e*s. #t is a*so easier for pro-rammers to *earn.
2.4 1b2ect(oriented Model
OO COb3ect+oriente$D concepts starte$ out bein- use$ for pro-rammin- :ava is an
OO *an-ua-e= as is CEE.
,a-e 11 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
The OO approach of $efinin- ob3ects that can be use$ in many pro-rams is no a*so
bein- app*ie$ to $atabase systems.
5n ob3ect can have properties Cor attributesD but a*so behaviour= hich is mo$e**e$ in
metho$s CfunctionsD in the ob3ect.
#n an OO $b= each type of ob3ect in the $atabaseJs mini+or*$ is mo$e**e$ by a c*ass
Customer c*ass= 5ccount c*ass *i/e tab*es in the re*ationa* mo$e*. 5 c*ass has
properties CattributesD.
5 c*ass a*so has metho$s that are store$ ith the c*ass $efinition e.-. the co$e to
create a ne Customer ob3ect be*on-s in the Customer c*ass. %hen an app*ication is
or/in- ith $ata in the $atabase= the app creates CinstantiatesD ob3ects from the c*ass
$efinitions.
One a$vanta-e of the OO mo$e* is sub+c*asses. 5s there are $ifferent types of
account= they can be mo$e**e$ as sub+c*asses of the 5ccount c*ass Savin-s5ccount
an$ Current5ccount. This ma/es sense because the $ifferent account types have some
$ifferent behaviour e.-. -ainin- interest in a savin-s account but some behaviour the
same e.-. *o$-in- or ith$rain- cash. This is the inheritance concept of OO
pro-rammin-.
Dia-ram c*ass name at the top= properties in the mi$$*e= metho$s at the bottom.
Hou shou*$ be *earnin- the i$eas of OO in your :ava course #nternet ,ro-rammin-.
%e i** not cover the OO $ata mo$e* in this course= but you shou*$ be aare that it
e;ists. 'any $eve*opers are usin- OO $atabases no= but the re*ationa* mo$e* is sti**
very i$e*y use$ an$ probab*y the most i$e*y use$.
2.5 Deducti3e Model
5nother mo$e* for $atabases is the $e$uctive mo$e*. #n a $e$uctive $b system= ru*es
can be $efine$ the ru*es deduce or infer a$$itiona* information from the facts store$
in the $atabase.
De$uctive $atabases are rea**y a type of /no*e$-e base= use$ in the area of 5#
C5rtificia* #nte**i-enceD.
5 $e$uctive $b has facts an$ rules in it. Facts are store$ simi*ar to re*ations in a
re*ationa* $atabase= but attribute names are not necessary.
.
1u*es are specifications that can be app*ie$ to the facts to pro$uce ne information.
The ru*es are $efine$ usin- a $ec*arative *an-ua-e Chat= rather than hoND.
The system has an inference engine that $e$uces ne facts from the $b by interpretin-
the ru*es.
,a-e 1" of 87
Customer
Customer#D
Name
,honeNumber
5$$ress
neCustomer
removeCustomer
N
5ccount
5ccountNumber
Ba*ance
Customer#D
*o$-e'oney
ith$ra'oney
Savin-s5ccount
#nterest1ate
Current5ccount
Over$raftLimit
cashCheAue
FBE Computer Science Department Lecture Notes Theory of Databases
The $e$uctive mo$e* is c*ose*y re*ate$ to the re*ationa* mo$e*K it a*so has its basis in a
branch of mathematics C$omain re*ationa* ca*cu*usD.
#n a $e$uctive $atabase system= the emphasis is on $erivin- ne /no*e$-e from
e;istin- $ata by supp*yin- ru*es base$ on /no*e$-e of the rea* or*$.
2.6 .u!!ary
The $ata mo$e*s e have $iscusse$ Cname themND are mo$e*s for imp*ementation of
$atabases. These mo$e*s are not very usefu* for mo$e**in- $ata in a ay that en$+
users of a system un$erstan$ they are more about ho $ata is store$ on the
computer.
But e a*so have conceptua* $ata mo$e*s these provi$e ays of mo$e**in- $ata that
are c*ose to the ay en$+users perceive the $ata in their system. One of these is
Entity+1e*ationship mo$e**in-= hich e i** cover *ater in this course.
E1 mo$e**in- is often use$ as a step in $esi-nin- a re*ationa* $atabase.
The *an-ua-es use$ to access $ata in $atabases have a*so evo*ve$. For the ear*ier
mo$e*s= proce$ura* *an-ua-es ere necessary co$e that ha$ a** the steps nee$e$ to
access or process $ata. The co$e ha$ to inc*u$e *oops to step throu-h a** the recor$s in
a set= for e;amp*e.
The re*ationa* an$ OO mo$e*s have $ec*arative Cnon+proce$ura*D *an-ua-es to or/
ith $ata. This type of *an-ua-e is easier for pro-rammers to *earn you on*y have to
rite statements that say hat you ant to $o= not ho to $o it. For e;amp*e= in SFL=
you can as/ to -et a *ist of a** the recor$s in a tab*e. Hou $o not have to rite the *oop
that -oes throu-h the tab*e to rea$ each of the co*umn va*ues for each ro.
Co!"onents of a Database #""lication
1b2ecti3eG to /no hat are the ma3or components of a $atabase app*ication an$
hat are the main features of a DB'S.
$eadin& !aterialG Si*berschatB et a* Chapter 1= sections 1.7 6 1.8
+re"arationG photocopy?print $ia-ram of )+tier architecture C>an$out 1DK maybe a*so
$ia-ram on ,- 18 of 'annino Cuser typesD stu$ents can rite on notes on these.
Before e move on to *oo/ at re*ationa* $atabases= e i** first *oo/ at the ma3or
components of a $atabase app*ication.
.1 Data Inde"endence
Ear*y systems Cfi*e processin-= hierarchica*= netor/D c*ose *in/ beteen $atabase
an$ pro-rams to access it $efinition of $atabase as a part of the pro-rams
accessin- it.
Conceptua* C$ata $efinitionsD not separate from physica* stora-e on $is/ $ata store$
in recor$s insi$e fi*es.
,a-e 1) of 87
FBE Computer Science Department Lecture Notes Theory of Databases
,rob*em ith thisG
Chan-es to $b $efinitions OP chan-in- a** co$e that accesses the $ata a *ot of
inspection of co$e to ma/e a** the necessary chan-es.
E;pensive manua* or/.
,erformance tunin- ma/in- chan-es to a $atabase to ma/e it run faster
ou*$ have to recompi*e *ots of pro-rams for one chan-e.
This *e$ to the concept of data inde"endence in $atabase systems $ata $efinitions
shou*$ be separate from app*ications?pro-rams that use the $ata. Overa**= this ma/es
$atabases easier to maintain an$ to optimise. Lasi$e stu$ents may have ta*/e$ about
$ata+oriente$ vs. process+oriente$ approach in S5D $ata in$epen$ence is somethin-
that came about as the approach became more $ata+oriente$M.
Database $efinitions are part of the $atabase sche!a. The schema is a $escription of
the $atabase inc*u$in- hat tab*es are in the $atabase= the co*umns in each tab*e= the
$ata types for the co*umns an$ other information.
The schema is specifie$ hen $esi-nin- the $atabase an$ can usua**y be shon in
$ia-ram form by the DB'S Cin SFL Server *oo/ in Dia-rams in a $atabaseK in
5ccess *oo/ in 1e*ationshipsD.
The schema can be chan-e$ ithout affectin- e;istin- $ata or pro-rams that access
the $ata.
For e;amp*e to a$$ a ne co*umn to a tab*e StartDate to Emp*oyees can chan-e
the schema an$ e;istin- pro-rams sti** or/. On*y nee$ to chan-e pro-rams that nee$
to access the ne co*umn.
.2 Different Ty"es of :ser
5nother factor that inf*uences the architecture of $atabase app*ications $ifferent
types of user.
Ban/ e;amp*e can you thin/ of $ifferent -roups of peop*e ho ou*$ use the
system thin/ about hat they nee$ to be ab*e to $o Las/ stu$ents to thin/ about this
in pairs for ( minutesM.
'ana-ers vie reports
Te**ers?cashiers carry out transactions= open ne accounts= c*ose accounts
Customers vie their on accounts
,ro-rammers rite ne pro-rams= chan-e e;istin- pro-rams
Database a$ministrator chan-e the schema= ma/e the $atabase perform better
So e have $ifferent types of user in a system= a** ith $ifferent nee$s they nee$ to
see $ifferent vies of the $ata. Some nee$ to chan-e $ata= others on*y to rea$ $ata=
some to see the schema.
%e can cate-orise users of the system by the ro*es they haveG
9unctional users
#n$irect user receive reports of $ata from the $b= usua**y from someone ho
is a more $irect user of the system Cne;t user typeD e.-. in a ban/= a te**er ho
*oo/s up a printe$ *ist of customer names an$ account numbers.
,arametric user uses pre$efine$ forms an$ reports= here he?she simp*y has
to c*ic/ a button to run it. 'ay enter input va*ues CparametersD e.-. a** accounts
ith a ba*ance -reater than 1222K a** the *o$-ements in a -iven $ate ran-e
,a-e 1! of 87
FBE Computer Science Department Lecture Notes Theory of Databases
,oer user can rite his?her on reports?forms as nee$e$ not usin-
pre$efine$ reports e.-. cou*$ bui*$ their on form or report in an 5ccess
$atabase or even rite a SFL Auery.
IT users
DB5 CDatabase 5$ministratorD or/s ith functiona* an$ #T usersK ma/es
schema chan-es= monitors $atabase performance an$ tunes it to improve
performance.
5na*yst?pro-rammer -ather reAuirements= $esi-n app*ications= imp*ement
app*ications so they nee$ to create pro-rams that access the $ata.
'ana-ement pro3ect mana-ers supervise $eve*opment of app*icationsK $o
not often use the $atabase $irect*y but may ant to see schema $ia-rams or
other $esi-n information.
#f e can /eep the $atabase $esi-n 6 $efinitions Cthe schemaD separate to pro-rams=
then e can rite $ifferent pro-rams or app*ications for the $ifferent users.
For e;amp*eG a netor/e$ @B app*ication for ban/ staff to maintain accounts. This
app*ication cou*$ have a section for mana-ers to vie summary reports e.-. at the en$
of each $ay. Customers cou*$ access their accounts on a eb+base$ app*ication to
vie their accounts an$ maybe a*so to transfer money beteen $ifferent accounts= to
pay bi**s on*ine etc.
Data in$epen$ence ma/es this easier. LetJs have a *oo/ at a -enera* architecture for
$atabase systems= here $ata is in$epen$ent of the schema.
. Three(Tier #""lication #rchitecture
'ost $atabase systems are no run on a c*ient+server netor/.
+ a**os sharin- of $ata $atabase on server= c*ients access it
+ internet is an e;tension of this eb pa-es that access a centra* $atabase
#n practice= an app*ication that ma/es a *ot of use of a $atabase i** have components
to present?$isp*ay $ata to en$+users an$ to process $ata as e** as the DB'S itse*f.
This fits e** ith the concept of $ata in$epen$ence.
The approach to systems $esi-n no tries to separate presentation of $ata from
business rules as much as possib*e.
LDra this $ia-ram then $iscuss.M

Three(tier architecture for a database a""lication
,a-e 1( of 87
,resentation "N.
5pp*ication server
Cbusiness ru*es in pro-rams?co$e
e.-. :ava= C<#= 5S,= @BD
,resentation Cuser vieD
C$isp*ay $ata= forms to
chan-e?enter $ataD
$ata
On c*ient ,Cs
On server
On server
%i** a*so have an app*ication component
to $ea* ith ma/in- connections to the
app*ication server 6 ca**in- the ri-ht
functions on the app*ication server e-. ca**
openNe5ccountCD function.
5ccepts ca**s to functions the functions
bui*$ Aueries for the $atabase in the
appropriate *an-ua-e CSFL for a re*ationa*
$bD an$ passes them to the $atabase.
4ses an interface stan$ar$ such as ODBC
or :DBC.
FBE Computer Science Department Lecture Notes Theory of Databases
Conceptua**y= this architecture has ) *eve*s= or tiers.
)+D cy*in$er shape use$ to $enote a $atabase in system $ia-rams.
5t the "resentation layer= there can be $ifferent vies of the $ata $ifferent
presentations to $ifferent types of user.
#n a c*ient+server system= this part usua**y communicates ith the app*ication server
usin- the netor/.
5t the a""lication layer there is a *ot of co$e that ref*ects the Qbusiness ru*esJ e.-.
co$e to ca*cu*ate the interest to app*y to an account. #f somethin- chan-es e.-. the
ban/ chan-es the ay it app*ies interest= then on*y nee$ to chan-e it here.
5t the data layer this is here the actua* $atabase is cou*$ be an 5ccess $b or a
SFL Server $b or an Orac*e $b any DB'S. The DB5 may or/ $irect*y ith the
$atabase throu-h the DB'S itse*f= but pro-rammers -enera**y or/ at the app*ication
*ayer.
This architecture is suite$ to internet app*ications e.-. e+commerce= emai* here it
ou*$ be very $ifficu*t to put the business ru*es into the c*ient Cthe eb broserD.
The eb broser can sen$ >TT, reAuests to the eb serverK the eb server can ma/e
ca**s to the app*ication server= or sometimes= $irect to the $atabase server.
The eb server is then the app*ication server in our $ia-ram. %eb scripts ritten in a
*an-ua-e *i/e ,er* or 5S, can use $atabase interfaces 6 $rivers base$ on ODBC to
access the $atabase.
N8G in the 'annino boo/= p- 1.= there is a $ia-ram shoin- the Three Schema
5rchitecture. This is $ifferent to this )+tier architecture for $atabase app*ications. The
Three Schema 5rchitecture refers to a stan$ar$ for DB'Ss. Hou can see this
imp*emente$ in= for e;amp*e= 'S 5ccess the interna* schema Cfi*e stora-e you $o
not nee$ to see itD= a conceptua* schema Cthe tab*e $esi-nD an$ vies CAueries to see
$ata from particu*ar tab*esD.
.4 9eatures of a D8M.
5 $atabase app*ication has at its core a DB'S to mana-e the $atabase itse*f.
5 DB'S provi$es an environment that a**os stora-e an$ retrieva* of $ata in a
$atabase= an$ provi$es ays of carryin- out $atabase a$ministration tas/s.
,a-e 1. of 87
netor/
Netor/ 6 $b interfaces
FBE Computer Science Department Lecture Notes Theory of Databases
Ne;t yearJs course= #CT)("= i** -o more in+$epth into the architecture of a DB'S.
For no= e i** ta*/ about the -enera* functions provi$e$ by a DB'SG
.tora&e 0 retrie3al can be $one in$epen$ent of interna* structures of the
$b. The D'BS i** have its on interna* $ata structures for storin- $ata= but a
user of the DB'S shou*$ not have to /no anythin- about those structures
you can store an$ retrieve $ata in a *o-ica*?conceptua* vie of the $ata. Thin/
of an 5ccess $ata e or/ ith $ata in tab*es. %e $o not nee$ to /no
anythin- about the un$er*yin- $ata structures an 5ccess $b is a** insi$e a
.m$b fi*e= hich e $o not have to *oo/ into.
Catalo& $escribes a** the $ata items store$ in the $b= hich are accessib*e to
users inc*u$es $ata $efinitions e.-. for a co*umn= hat is the $ata+type an$
hat is the siBe. This is the $atabase schema.
.hared u"date to support concurrency i.e. hen more than 1 user are
up$atin- the $atabase at the same time. This is $one ith transactions you
i** -et an intro$uction to transactions toar$s the en$ of this course an$
cover in more $epth in the #CT)(" course ne;t year.
$eco3ery if the $b is $ama-e$= nee$ to be ab*e to restore a or/in- copy. 5
DB'S provi$es bac/up an$ restore functions. #t is usua**y possib*e to
sche$u*e a bac/up to occur on a re-u*ar basis e.-. every ni-ht or every ! hours
Cif there is time, I will show you how to do this with SQL Server, in the labD
.ecurity access restricte$ to authorise$ usersK users assi-ne$ permissions to
carry out certain actions Ce.-. to up$ate or $e*ete $ataDK usua**y passor$+
protecte$ access. Data can a*so be encrypte$ for further protection Cwill have a
brief intro to this in this course, more detail in IC!"#$ will look at how to
create logins % users on SQL ServerD
Inte&rity mechanisms to ensure $ata inte-rity an$ referentia* inte-rity. Data
types= formats= chec/ constraints an$ /ey constraints a** use$ for this Cwill
learn about keys, data types % constraints in the relational modelD
Data inde"endence the manipu*ation of the $ata is in$epen$ent of here
the $ata is physica**y store$ in other or$s= $ata manipu*ation or/s ith
*o-ica* vie of the $ata an$ the process that is manipu*atin- the $ata $oes not
nee$ to /no where or how the $ata is store$ Cwe will use SQL to manipulate
dataD
:tility ser3ices provi$es ays to import 6 e;port $ata= Auery the $ata etc
CSQL Server has the Query &naly'er tool$ &ccess also has toolsD
.5 Ty"es of D8M.
There are various DB'S pac/a-es avai*ab*e on the mar/et.
'any of them are base$ on the re*ationa* mo$e* 1DB'S pac/a-es.
'icrosoft has " 5ccess an$ SFL Server.
5ccess is -oo$ for sma** sca*e app*ications that $o not have *ar-e numbers of users.
SFL Server is better for *ar-e app*ications here the number of connections an$
$atabase transactions is bi-.
SFL Server provi$es better performance= security an$ $ata protection than 5ccess.
,a-e 10 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
The current versions C as of 'arch "22(D are 5ccess "222 Cmaybe "22)= # am not
sureD an$ SFL Server "222. Hou may sti** see 5ccess 80 an$ SFL Server 0.2 in use.
The Orac*e Corporation has the Orac*e DB'S= hich is a competitor to SFL Server
these to beteen them have most of the mar/et share. The current version is 8i.
#B' a*so has an 1DB'S DB".
'icrsoft= Orac*e an$ #B' have most of the mar/et share beteen them.
#n the open+source or*$= 'ySFL is a DB'S app*ication. #t can be $on*oa$e$ from
the internetK # a*so have a copy if anyone ou*$ *i/e to borro it to insta** it.
5nother open+source DB'S is ,ost-reSFL= for re*ationa* $atabases= $eve*ope$ by
the 4niversity of Ca*ifornia in the 4S.
Some other commercia* pac/a-es are #n-res an$ #nformi;.
4 The $elational Model
1b2ecti3eG in this unit= you i** *earn about the theory behin$ the re*ationa* mo$e*.
First e i** cover some basics an$ termino*o-y associate$ ith the mo$e*= then e
i** *oo/ in more $etai* at the concepts.
5s most $atabases no are base$ on the re*ationa* mo$e*= if you -et a -oo$
un$erstan$in- of the theoretica* concepts behin$ the mo$e*= you are more *i/e*y to
become a proficient $atabase $esi-ner an$ $eve*oper.
$eadin& MaterialG 'annino= Chapter "K Si*berschatB et a*= Chapter ).
The 'annino boo/ $oes not rea**y cover this usin- re*ationa* a*-ebra termino*o-y
but Si*berschatB et a* $oes recommen$ that you rea$ both.
+re"arationG Dup*icate >an$out " a one pa-e han$out shoin- the re*ation
schemas an$ re*ations that are use$ as e;amp*es $urin- this unit stu$ent=
pro-ramme= course= stu$entCourse Cto save time for stu$ents ritin- them outK they
can a*so rite notes on the pa-eD. Shos the re*ations as in section !.(= but ithout
the re*ationship connectors.
4.1 Tables 0 $elations
# relational database consists of a collection of tables.
Table a*so ca**e$ a relation + because a ro in a tab*e represents a set of re*ate$
va*ues Cthe va*ues in the co*umnsD.
1e*ation is a mathematica* term use$ in re*ationa* a*-ebra. 5s e i** see *ater= the
re*ationa* mo$e* is base$ on re*ationa* a*-ebra.
5 tab*e has colu!ns an$ ro4s or a re*ation has attributes an$ tu"les.
1e*ation= tup*e= attribute the more forma* termino*o-y. %e i** use both.
,a-e 17 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
4.2 Do!ains 0 #ttributes
Let us *oo/ at the re*ationa* mo$e* in more $etai*N
5 do!ain D is a set of va*ues.
Each attribute of a re*ation has a set of permitte$ va*ues the $omain for that
attribute.
5n attribute represents a characteristic of the entity that is represente$ by the
re*ation.
%e can combine to $omains= D1 an$ D" to -et the Cartesian "roductG
D
1
; D
"
this is a** the possib*e combinations of the va*ues in D1 an$ D".
E;amp*eG
D
1
G name
D
"
G phone numbers
Thin/ of a re*ation that has n attributes. Each tup*e in the re*ation is some combination
of va*ues from each of the attribute $omains. But there i** not be a tup*e for every
possib*e combination.
So= a relation is a subset of the Cartesian "roduct of a list of do!ainsG
D
1
; D
"
; NND
n+1
; D
n
4.
%hen or/in- ith re*ations= e use mathematica* termino*o-y *i/e thisG
t Lco*RnameM
$enotes the va*ue of the attribute ith the name co*Rname in the tup*e t.
e.-. tLphoneRnumberM O 2! !20.22
t r
+ $enotes that the tup*e t is in the re*ation r.
#n a re*ation= the or$er of the tup*es is not important because= mathematica**y=
e*ements of a set are not or$ere$.
ButN in a computer+base$ fi*e= the recor$s must be physica**y store$ on the $is/
somehere so they are store$ in some or$er. 5n$ hen you vie the ros in a
re*ation Ce.-. in a tab*e in 5ccessD= you are viein- them in some or$er Cit may not be
the same as the physica* or$er on $is/D.
%hen $efinin- a re*ation= e $o not $efine anythin- about the or$er of the tup*es.
4..1 More about attributes
For a** re*ations r= the $omains of a** attributes must be ato!ic the va*ues must be
in$ivisib*e i.e. a simp*e= sin-*e va*ue= that cannot be further $ivi$e$.
E;amp*eG the set of a** inte-ers is atomic.
The set of sets of inte-ers is not atomic= because a set is not a simp*e= sin-*e va*ue= it
is a *ist of inte-er va*ues. So this cou*$ not be the $omain for an attribute in a re*ation.
%hen $eterminin- if a $omain is atomic or not= you nee$ to consi$er the usa-e in the
$atabase. For e;amp*eG
,a-e 18 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
+ The set of a** possib*e names of peop*e is atomic.
+ The set of fu** names Cfirst name an$ fatherJs nameD is not atomic= as you
can sp*it into first name an$ fatherJs name. <enera**y= this is hat is $one in a
re*ationa* $atabase. Hou cou*$= in theory= have an attribute for fu** name if
you ta/e the vie that names cannot be sp*it. But then it ou*$ be $ifficu*t to
search for a person /noin- 3ust their first name or 3ust their fatherJs name.
Multi"le attributes can have the same $omain. For e;amp*e= in a $atabase for the
university Stu$entRname an$ StaffRname have the same $omain hi*e they be*on-
to $ifferent re*ations CStu$ent= StaffD. Li/eise= if you have Stu$entRFirstName an$
Stu$entRFathersName attributes both in the same re*ation= an$ have the same
$omain.
Thin/ about attributes *i/e Stu$entRFirstName an$ DepartmentRName.
#f you ta/e a physica* vie i.e. thin/ about hat is store$ on the $is/. Both are 3ust a
strin- of characters e.-. T= e= s= f= a= y for the name QTesfayJ or QComputer ScienceJ. So
they cou*$ be the same $omain i.e. a** character strin-s. But on*y certain strin-s ma/e
up person names an$ on*y certain strin-s ma/e up $epartment names.
So *o-ica**y= e /no they have $ifferent $omains.
On the other han$G some $omains are obvious*y $ifferent e.-. <,5 is a number an$
Stu$entRFirstName is character $ata.
Null 3alue is a specia* va*ue that can be a member of any possib*e $omain. Nu**
means the actua* va*ue is un/non or $oes not e;ist. For e;amp*e= if the Staff re*ation
has a ,honeRNumber attribute + it shou*$ be nu** an$ not 2 if you $o not /no the
number or if there is no phone number for the person.
ButNnu** va*ues can cause some prob*ems hen or/in- ith $ata. #f possib*e= you
shou*$ try not a**o them in $atabase tab*es e i** see ho to $o this *ater in the
course.
4.4 .che!as
The database sche!a is the *o-ica* $esi-n of the $atabase.
5 database instance is a snap+shot CpictureD of the $ata in the $atabase at any -iven
instant in time.
#n the re*ationa* mo$e*= e a*so ta*/ about a relation sche!a. This refers to the
$efinition of one re*ation. Thin/ of a re*ation as bein- *i/e a variab*e in pro-rammin-=
hi*e a re*ation schema is *i/e the type $efinition for the variab*e.
9or e,a!"leG
%e have a re*ation ca**e$ stu$ent C*i/e a variab*eD.
The re*ation schema for stu$ent isG
Stu$ent+schema O Cstu$entRi$= stu$entRfirstname= stu$entRfathersname= pro-rammeRco$eD
C*i/e the variab*e type $efinitionD
,a-e "2 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
pro-rammeRco$e is an a*pha+numeric va*ue that i$entifies the $e-ree or $ip*oma the
stu$ent is re-istere$ for e.-. Computer Science De-ree CCompSciDe-D= 'ana-ement
Dip*oma C'-tDipD.
%e can sho that stu$ent is a re*ation on Stu$ent+schema Ca variab*e of the typeND
*i/e thisG
stu$ent CStu$ent+schemaD
Let us say e a*so haveG
,ro-ramme+schema O Cpro-rammeRco$e= pro-Rname= pro-R$escriptionD
5n$
pro-ramme C,ro-ramme+SchemaD
%e have the attribute pro-rammeRco$e in both re*ation schemas this is a ay to
re*ate the tup*es in to $ifferent re*ations. The pro-rammeRco$e in the stu$ent
re*ation te**s you hat pro-ramme that stu$ent is re-istere$ for. Hou can -et the
pro-ramme $etai*s Cname= $escriptionD in the pro-ramme re*ation.
%ith this structure= e can anser Auestions *i/e
Q-et a *ist of a** stu$ents re-istere$ for the Computer Science De-reeJ
Le*aborate on this e;p*ain ho you can use the va*ue CompSciDe- to *oo/ up the
stu$ent re*ationM
This or/s e** because e /no that one stu$ent can be re-istere$ in one
pro-ramme on*y.
Consi$er stu$ents ta/in- courses. 5 stu$ent can ta/e many $ifferent courses.
Course+schema O CcourseRco$e= courseRname= courseR$escription= cre$itRhoursD
course CCourse+schemaD
/uestionG ho ou*$ you sho this in these re*ations&
First anser *i/e*y to beG put courseRco$e in the stu$ent schema. #f someone su--ests
a separate re*ation fin$ out hat they thin/ shou*$ be in it= con-ratu*ate them if they
are ri-ht but say that first eJ** *oo/ at hat happens if you put courseRco$e in the
stu$ent re*ation.
Stu$ent+schema O Cstu$entRi$= stu$entRfirstname= stu$entRfathersname= pro-rammeRco$e=
courseRco$eD
Each stu$ents ta/es severa* coursesNso no there ou*$ be severa* tup*es for each
stu$ent in the stu$ent re*ation.
/uestionG Can you see any prob*em ith this&
5G stu$ent name= i$ an$ pro-ramme co$e are $up*icate$ repeate$ in each tup*e.
5n$ hat if a stu$ent is re-istere$ but has no courses se*ecte$ yet& Then the tup*e for
that customer is incomp*ete an$ e have to put a nu** va*ue for the courseRco$e.
,a-e "1 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
/uestionG Can you see ho e can fi; the $up*ication prob*em an$ not have to use
nu** va*ues&
5G ma/e a ne re*ation *i/e this Ce*icit from stu$ents hat attributes they ou*$ put in
the ne re*ationD
Stu$ent+Course+Schema O Cstu$entRi$=courseRco$eD
stu$entCourse CStu$ent+Course+SchemaD
%e create tup*es in the stu$entCourses re*ation on*y hen a stu$ent re-isters for a
particu*ar course. Then e $o not have nu**s.
5n$= as before= e can *oo/ up a -iven stu$entRi$ in stu$entCourse to fin$ hat
courses the stu$ent is ta/in-.
No= e are usin- a re*ation to $escribe an association beteen the stu$ent an$
course entities. So= a relation can describe an entity Cstu$ent or course or pro-rammeD
or it can describe an association between entities.
4.5 $elationshi"s
%e no have a number of re*ations= base$ on these schemasG
Stu$ent+schema O Cstu$entRi$= stu$entRfirstname= stu$entRfathersname= pro-rammeRco$eD
,ro-ramme+schema O Cpro-rammeRco$e= pro-Rname= pro-R$escriptionD
Course+schema O CcourseRco$e= courseRname= courseR$escription= cre$itRhoursD
Stu$ent+Course+Schema O Cstu$entRi$=courseRco$eD
%hen or/in- ith a re*ationa* $atabase= e can represent the re*ations in a schema
$ia-ram *i/e this Lfirst $ra ithout the re*ationships as/ c*ass to put them in
themse*ves possib*y have a han$out ith the tab*es on it a*rea$y. 5*so sho some
samp*e $ata recor$s for each tab*eM.
The re*ation name appears in the top bit of the rectan-*e. The attribute names appear
in the bottom bit. Different authors?te;ts use variations of this ay of $rain- a
schema $ia-ram for e;amp*e= some put the re*ation name outsi$e the bo;.
Samp*e $ataG
stu$ent
student;ID student;firstna!e student;fathersna!e "ro&ra!!e;code
122 Sara Ne-ash CSDE<
121 Te/*e >aimanot CSDE<
12" Terhas <irma CSDE<
,a-e "" of 87

stu$ent
stu$entRi$
stu$entRfirstname
stu$entRfathersname
pro-rammeRco$e
pro-ramme
pro-rammeRco$e
pro-Rname
pro-R$escription
course
courseRco$e
courseRname
courseR$escription
cre$itRhours
stu$entCourse
stu$entRi$
courseRco$e
1

1
1
FBE Computer Science Department Lecture Notes Theory of Databases
12) So*omon 9ebe$e CSD#,
pro-ramme
"ro&ra!!e;code "ro&;na!e "ro&;descri"tion
CSDE< Computer Science De-ree ) Hear De-ree in Computer Science
CSD#, Computer Science Dip*oma Dip*oma in Computer Science
course
course;code course;na!e course;descri"tion credit;hours
#CT"(" Theory of
Databases
#ntro$uction to $atabases= DB'S= $atabase
mo$e*sK focus on re*ationa* mo$e*K usin- E+
1 mo$e**in- to $esi-n $atabases.
!
#CT")1 #nternet 6 %eb
,a-e Deve*opment
Basic s/i**s reAuire$ for eb $eve*opment=
inc*u$in- >T'L an$ scriptin-.
)
stu$entCourse
student;ID course;code
122 #CT"("
121 #CT"("
122 #CT")1
12" #CT")1
12) #CT"("
12) #CT")1
There are connections beteen the $ata in the $ifferent re*ations. %e ca** these
relationshi"s because the ros in a tab*e can be re*ate$ to ros in another tab*e.
They are re*ate$ by va*ues that match in the $ifferent tab*es.
/uestionG Can you i$entify re*ationships beteen tab*es on this $ia-ram&
5 stu$ent is re-istere$ for one pro-ramme Cmust be 1D
5 pro-ramme has many stu$ents re-istere$ for it C2 or moreD
SoN1(!any re*ationship beteen ,ro-ramme an$ Stu$ent re*ations.
5 stu$ent can enro* for many courses C1 or moreD
5 course can have many stu$ents re-istere$ for it C2 or moreD.
SoN.1+many beteen Stu$ent an$ Stu$entCourse= 1+many beteen Course an$
Stu$entCourse. 5n$ !any(!any beteen Stu$ent an$ Course but e cannot sho
that $irect*y beteen the tab*es e use the Stu$entCourse re*ation to mo$e* the
association beteen the " re*ations.
NoNa$$ the re*ationships to your $ia-ram put a *ine beteen the re*ate$
attributes= ith a 1 for the one si$e an$ a CinfinityD symbo* for the many si$e.
Some $esi-ners use a $ifferent notation a *ine ith an arro hea$ pointin- to the
many si$e of the re*ationship.
Can a*so have a 1+1 re*ationship e.-. if e intro$uce a Teacher re*ation an$ assume
that one course is tau-ht by one teacher on*y.
1e*ationships beteen re*ations are important because often e nee$ to e;tract $ata
from " or more tab*es for the $ata to be meanin-fu*. For e;amp*e if the re-istrar
ants a *ist shoin- each stu$ent an$ hat courses he?she has re-istere$ for= the $ata
,a-e ") of 87
FBE Computer Science Department Lecture Notes Theory of Databases
must come from the stu$ent an$ stu$entCourse re*ations. %e can -et this $ata by
matchin- the stu$entRi$ fie*$ in both tab*es.
This ay of combinin- tab*es to -et $ata from them is ca**e$ a 3oin e can use SFL
to 3oin tab*es.
#t is important to un$erstan$ the re*ationships beteen your $b tab*es for e;tractin-
meanin-fu* 6 usefu* $ata from them.
%e i** *earn more about re*ationships hen e *oo/ at E+1 mo$e**in-.
4.6 <eys
#n a re*ation= each tup*e CroD represents an instance of the rea*+or*$ entity that the
re*ation mo$e*s.
%e nee$ some ay to $istin-uish the instances from each other the va*ues of the
attributes of an instance must uniAue*y i$entify that instance.
,ut another ayG no to instances Ctup*es?rosD can have e;act*y the same va*ues for
a** the attributes.
5 /ey is an attribute or set of attributes in a re*ation that uniAue*y i$entifies each tup*e
in the re*ation.
4.6.1 .u"er5eys
Loo/ at the Stu$ent+Schema re*ation schema if e ta/e the combination of a**
attributes= it ou*$ uniAue*y i$entify each ro. Or e cou*$ ta/e the combination of
stu$entRi$= stu$entRfirstname= stu$entRfathersname. But e cou*$ not use
stu$entRfirstname= stu$entRfathersname as $ifferent stu$ents cou*$ have the same
name.
Each of these combinations is ca**e$ a su"er5ey.
# su"er5ey is an attribute or co!bination of attributes containin& uni=ue 3alues
for each tu"le in the relation.
4.6.2 Candidate <eys
#f e ta/e aay stu$entRfathersname from the stu$entRi$= stu$entRfirstname=
stu$entRfathersname combinationNe sti** have a super/ey.
#f e then ta/e aay stu$entRfirstname= e have 3ust stu$entRi$ *eft an$ this sti**
uniAue*y i$entifies each tup*e.
There is no sub+set of this set of attributes that is itse*f a super/ey e have re$uce$
the super/ey as much as e can.
%hat e are *eft ith is ca**e$ a candidate 5ey.
# candidate 5ey is a su"er5ey for 4hich no subset is itself a su"er5ey.
%e can a*so that a can$i$ate /ey is a !ini!al su"er5ey it is minima* if removin-
any attributes ma/es it no *on-er uniAue.
Loo/ at the ,ro-ramme+Schema. Startin- ith a** the attributes as a super/ey= can you
i$entify one or more can$i$ate /eys& 5ssume that no to ,ro-rammes have the same
,ro-ramme name.
5G " can$i$ate /eys ,ro-rammeRCo$e an$ ,ro-RName because each can uniAue*y
i$entify a pro-ramme. The combination of both is a super/ey but not a can$i$ate /ey
because it has sub+sets C,ro-rammeRCo$e 6 ,ro-RnameD that are super/eys.
,a-e "! of 87
FBE Computer Science Department Lecture Notes Theory of Databases
4.6. +ri!ary 5ey
No e have i$entifie$ " can$i$ate /eys for the ,ro-rammeRSchema.
%hen $esi-nin- a re*ationa* $atabase= you Cas the $esi-nerD must choose one of these
to be the ,rimary 9ey for the re*ation.
/uestionG %hich ou*$ you choose for this re*ation&
See hat the stu$ents chooseN.fin$ someone ho chooses ,ro-rammeRCo$e an$ as/
them hy they choose it.
The reason is that the co$e is *ess *i/e*y to chan-e over time hereas the university
mi-ht chan-e course names. 5 -oo$ can$i$ate for primary /ey is an attribute hose
va*ues are *east *i/e*y to have to be chan-e$ over time.
# "ri!ary 5ey is a candidate 5ey chosen to be the !ain 4ay to uni=uely identify
tu"les in the relation. #t represents a constraint in the rea*+or*$ that the $atabase
mo$e*s. For e;amp*e= in choosin- ,ro-rammeRCo$e as the primary /ey= e are
ref*ectin- the fact that the Co$e must be uniAue for every ,ro-ramme. Li/eise for
Stu$entR#D.
The $ecision is up to the $esi-ner= but you shou*$ put some thou-ht into it. Sometimes
it is obvious hat the primary /ey shou*$ be= e.-. in the Stu$ent+Schema. But
sometimes it isnJt e.-. in Stu$ent+Course+Schema.
5fter you have some e;perience of $esi-nin- $atabases= you i** not often thin/
about super/eys an$ can$i$ate /eys as your e;perience i** -ui$e you an$ you i**
be ab*e to te** Auite Auic/*y hat shou*$ be the primary /eyI
5 primary /ey that consists of more than one attribute is a co!"osite "ri!ary 5ey.
NoNon your $ia-ram of the re*ation schema= un$er*ine the primary /eys in each
re*ation.
4.* Constraints
# mentione$ ear*ier that a primary /ey ref*ects a constraint in the rea* or*$ of the
$atabase.
5 constraint is a ru*e that restricts the possib*e va*ues that can -o into a re*ation Ctab*eD
in a re*ationa* $atabase.
Besi$es /eys= there are some other types of constraint in the re*ationa* mo$e*.
Some constraints are va*ue+base$= some are va*ue+neutra*.
@a*ue+base$G comparison of an attribute va*ue to some constant va*ue e.-.
Cre$it>ours PO 2 to ref*ect that a courseJs cre$it hours must be -reater than or eAua*
to 2.
@a*ue+neutra*G comparison of attribute va*ues to other attribute va*ues. For e;amp*e= if
e ha$ start $ate an$ finish $ate attributes for a Stu$ent the finish $ate shou*$ be
*ater than the start $ate.
/G 9eys are a form of constraint. Do you thin/ they are va*ue+base$ or va*ue+neutra*&
,a-e "( of 87
FBE Computer Science Department Lecture Notes Theory of Databases
#G va*ue+neutra* because they compare attribute va*ues in a -iven co*umn to other
va*ues in the same co*umn.
%e are -oin- to *oo/ at these types of constraintG
Functiona* $epen$ency
Entity inte-rity
1eferentia* inte-rity
Tri--ers
4.*.1 9unctional De"endencies
Lcover as part of norma*isationM
4.*.2 'ntity Inte&rity
Entity inte-rity means that each re*ation must have an attribute or combination of
attributes hose va*ues uniAue*y i$entify each tup*e in the re*ation.
#n other or$s= no to tup*es in the re*ation can have the same va*ue for that attribute
or combination of attributes.
This is to ensure that entities from the rea* or*$ are uniAue*y i$entifie$ in the
$atabase e.-. stu$ents= courses= pro-rammes.
Entity inte-rity is enforce$ ith primary /eys no to tup*es in a re*ation can contain
the same va*ues for the primary /ey attributeCsD.
5*so= the primary /ey of a re*ation cannot have a nu** va*ue in any tup*e so primary
/ey attributes must have an a$$itiona* constraint that $oes not a**o nu**s.
#f nu**s ere a**oe$= then to or more ros cou*$ have the nu** va*ue hich
vio*ates the entity inte-rity constraint.
4.*. $eferential Inte&rity
Loo/ bac/ at the re*ate$ tab*es stu$ent an$ stu$entCourse.
Stu$ent#D is the primary /ey in the stu$ent re*ation.
%e /no that if e *oo/ at stu$entCourse= the va*ues for Stu$ent#D match va*ues in
Stu$ent.Stu$ent#D Cpoint out this $ot notation tab*e.co*umn?re*ation.attribute to
the stu$entsD.
#n fact= e reAuire that the va*ues in stu$entCourse.stu$ent#D match va*ues in
stu$ent.Stu$ent#D. This is referential inte&rity here the va*ues in co*umns of one
tab*e must match va*ues in co*umns of other tab*es.
1eferentia* inte-rity is enforce$ usin- another type of /ey a forei-n /ey.
Stu$ent#D is a forei&n 5ey in the stu$entCourse re*ation.
5 forma* $efinition for a forei-n /eyG
# relation r
1
can ha3e an attribute that is the "ri!ary 5ey of another relation> r
2
.
This a forei&n 5ey fro! r
1
> referencin& r
2
.
r
1
is the referencin& relation. r
2
is the referenced relation.
,a-e ". of 87
FBE Computer Science Department Lecture Notes Theory of Databases
#n a $atabase instance= -iven any tup*e= say t
a
= from the r
1
re*ation= there must be some
tup*e= t
b
= in the r
"
re*ation here the va*ue of the forei-n /ey attribute of t
a
is the same
as the va*ue of the primary /ey attribute in r
"
.
The va*ue of the forei-n /ey attribute of t
a
can be nu** a*so.
>oever= the $b $esi-ner can $eci$e hether or not to a**o nu**s in the forei-n /ey
attribute. #t $epen$s on the usa-e in the rea* or*$.
/G For e;amp*e= in the stu$entCourse re*ation $oes it ma/e sense to a**o stu$ent#D
or course#D to be nu**&
#G no because both are part of the primary /ey the primary /ey cannot be nu**.
/G %hat about in the stu$ent re*ation cou*$ the pro-rammeRco$e be nu**&
#G $epen$s on usa-e can a stu$ent be re-istere$ but not have se*ecte$ a pro-ramme&
# thin/ no= as the stu$ent ou*$ have to choose a pro-ramme. They can a*ays chan-e
it *ater. #f you a**o nu**s here= you cou*$ en$ up ith $ata that shos stu$ents that
are not re-istere$ for a pro-ramme but that are re-istere$ for courses.
4.*.4 Tri&&ers
'any DB'Ss inc*u$e a capabi*ity to $efine ru*es that are processe$ Cor tri--ere$D
hen certain events occur.
For e;amp*e= if the ba*ance on a current account becomes ne-ative C*ess than 2D= the
account shou*$ be mar/e$ as bein- over$ran. This cou*$ be mo$e**e$ ith an
attribute name$ Over$ran in the 5ccount re*ation= hich can have the va*ues true or
fa*se Cor yes?noD.
%e ant the $atabase to automatica**y up$ate the Over$ran co*umn hen the
ba*ance chan-es from bein- positive to ne-ative or vice+verse.
This is ca**e$ a tri&&er in the $atabase.
This is not strict*y a constraint in the re*ationa* mo$e*= but it is a feature that has been
imp*emente$ in many 1DB'S pac/a-e.
The tri--er is a ru*e $efine$ in the $atabase itse*f. The ru*e for this e;amp*e ou*$ be
somethin- *i/e thisG
5fter a ro in the 5ccount re*ation has been up$ate$
FO1 E5C> up$ate$ ro
#F neRba*ance S 2 5ND o*$Rba*ance P 2 T>EN
set Over$ran O true
ELSE #F
o*$Rba*ance S 2 5ND neRba*ance P 2 T>EN
set Over$ran O fa*se
END #F
En$ FO1
5 tri--er has ) parts to itG
Event C5ccount is up$ate$ can be any event e.-. $e*ete or insertD
Con$ition Cba*ance chan-es from bein- positive to ne-ative or vice+verse
can *oo/ at the va*ues in the ro before an$ after the event occurre$D
,a-e "0 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
5ctions Cset the Over$ran f*a- can carry out any SFL action e.-. insert
to another tab*eD
5 tri--er can be create$ usin- SFL an$ it is then part of the $atabase schema= *i/e
tab*es= co*umns an$ other ob3ects in the $atabase.
5s a $atabase $esi-ner or pro-rammer= you shou*$ be carefu* in your use of tri--ers=
as they can s*o $on the operation of the $atabase as the tri--er i** run every
time a ro or -roup of ros in the tab*e is up$ate$.
Sometimes= you can fin$ another ay of carryin- out the action so thin/ about it
first an$ use a tri--er on*y if you cannot fin$ another ay of $oin- it.
5 $elational #l&ebra
1b2ecti3eG to *earn the basic an$ a$$itiona* operations of re*ationa* a*-ebra= as these
form the basis for SFL. This is $one by *oo/in- at each operation= $oin- some
e;amp*es an$ -ivin- the c*ass e;ercises to $o the operations themse*ves.
+re"arationG $up*icate >an$out )= hich is a short or/sheet on re*ationa* a*-ebra.
<ive to stu$ents after the basic operations have been covere$. They shou*$ $o the
e;ercises outsi$e of c*ass= an$ instructor can brief*y run throu-h the so*utions at
be-innin- of ne;t c*ass.
%e have *oo/e$ at the re*ationa* mo$e*G
1e*ations?attributes?tup*es Ctab*es?co*umns?recor$sD
Domains for attributes
Database schemas
9eys
Constraints
5s a foun$ation for *earnin- SFL= e i** ta/e some time to *oo/ at some of the
operators of re*ationa* a*-ebra as SFL is base$ on it.
1e*ationa* a*-ebra has operators that operate on re*ations.
#t is proce$ura* in nature the operators -enera**y ta/e 1 or " re*ations as input an$
pro$uce a ne re*ation as the output.
SFL is base$ on re*ationa* a*-ebra= but is itse*f most*y a $ec*arative *an-ua-e.
The basic operations areG
Se*ect
,ro3ect
4nion
Set $ifference
Cartesian pro$uct
1ename
Some others= hich are themse*ves $efine$ in terms of the basic operationsG
Set intersection
Natura* 3oin
Division
,a-e "7 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
assi-nment
5.1 8asic 1"erations
4nary operatorsG operate on one re*ation se*ect= pro3ect= rename
Binary operatorsG operate on pairs of re*ations union= set $ifference= Cartesian
pro$uct
The resu*t of an operation is a ne re*ation.
5.1.1 .elect
The se*ect operator se*ects tup*es from a re*ation= that satisfy a -iven predicate
Ccon$itionD.
%e use the -ree/ *etter si-ma CD for the operator.
LStu$ents shou*$ have to han$ the han$out shoin- the stu$ent= course= pro-ramme
re*ations an$ samp*e $ataM.
To se*ect tup*es from the stu$ent re*ation= here the stu$ents are on the CSDE<
pro-rammeG

pro-rammeRco$eOTCSDE<T
Cstu$entD
stu$ent is the argument relation
pro-rammeRco$eOTCSDE<T is the predicate
The resu*t of this operation is a ne re*ation containin- ) tup*es.
The pre$icate has a comparison operator CP= S= O= Cnot eAua*D= = D. Hou can
compare va*ues in $ifferent attributes or compare an attribute to a constant va*ue.
#t can a*so inc*u$e *o-ica* 5ND= O1 an$ NOT operatorsG = = .
',ercisesG
%rite operations for the fo**oin-G
se*ect a** courses that have more than ) cre$it hours.
#G
cre$itRhoursP)
CcourseD
Se*ect stu$ents that are in the CSDE< pro-ramme an$ hose name is Te/*e.
#G
pro-rammeRco$eOTCSDE<T stu$entRfirstnameOTTe/*eT
Cstu$entD
5.1.2 +ro2ect
The pro3ect operator can be use$ to -et a sub+set of the attributes from a re*ation.
The resu*t is a ne re*ation= containin- a** the tup*es in the operan$ re*ation= an$ the
specifie$ attributes.
%e use the -ree/ *etter ,i for the operan$ CD.
To -et on*y the first name an$ fatherJs name from the stu$ent re*ationG

stu$entRfirstname= stu$entRfathersname
Cstu$entD
Because the resu*t of an operation is a re*ation= e can use the resu*t as a re*ation
input to another operation.
,a-e "8 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
So= if you ant to -et the first name an$ fatherJs name for a** stu$ents on the CSDE<
pro-ramme= you can combine this operation ith the se*ect operationG

stu$entRfirstname= stu$entRfathersname
C
pro-rammeRco$eOTCSDE<T
Cstu$entDD
The resu*t of this is a ne re*ation ith attributes stu$entRfirstname an$
stu$entRfathersname an$ containin- on*y those tup*es that have CSDE< in the
pro-rammeRco$e attribute of the stu$ent re*ation
Cput another ay names of a** stu$ents on the CSDE< pro-rammeD
This is a re*ationa* a*-ebra e;pression as it combines $ifferent operations.
',erciseG
%rite an e;pression to -et the course co$e an$ names of courses that have ! or more
cre$it hours.
5G
courseRco$e= courseRname
C
cre$itRhours !
CcourseDD
#sideG at this point= you shou*$ start to see ho these operations are use$ in SFL.
Thin/ of the pro3ect an$ se*ect operators e have 3ust *oo/e$ at.
No thin/ of $oin- a Auery in a $atabase ith these re*ations in it G
#n 5ccess Fuery Desi-ner you ou*$ a$$ the stu$ent tab*e= then choose the co*umns
stu$Ri$= stu$Rfirstname= stu$Rfathersname from the stu$ent tab*e= then a$$ the criteria
that pro-Rco$e must be eAua* to CSDE<D
The Auery ou*$ *oo/ *i/e this if you rote it in SFL Cin the Fuery Desi-ner= ri-ht+
c*ic/ in the tab*es area= choose SFL @ie an$ you can see the SFL for the Auery.
1i-ht+c*ic/ in the tit*e bar to -o bac/ to Fuery Desi-n.DG
se*ect stu$Ri$= stu$Rfirstname= stu$Rfathersname
from stu$ent
here pro-Rco$e O QCSDE<J
%hat part is eAuiva*ent to a re*ationa* a*-ebra se*ect operation can you see an
ar-ument re*ation an$ a pre$icate&
#G the from c*ause an$ the here c*ause are the ar-ument re*ation an$ the pre$icate.
%hat part is eAuiva*ent to the pro3ect operation&
#G the *ine QSe*ect stu$Ri$= stu$Rfirstname= stu$RfathersnameJ
So= itJs a bit confusin- that SFL uses the or$ Qse*ectJ for hat is actua* the pro3ect
operation. For this reason= some te;tboo/s use a $ifferent name for the se*ect
operation restrict because it is restrictin- the number of tup*es.
Se*ect a subset of tup*es
,ro3ect a subset of attributes
The basic SFL Auery $oes se*ection an$ pro3ection. Hou can picture it *i/e thisG
,a-e )2 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
student;ID student;firstna!e student;fathersna!e "ro&ra!!e;code
122 Sara Ne-ash CSDE<
121 Te/*e >aimanot CSDE<
12" Terhas <irma CSDE<
12) So*omon 9ebe$e CSD#,
1esu*tin- re*ation *oo/s *i/e thisG
student;ID student;firstna!e student;fathersna!e
122 Sara Ne-ash
121 Te/*e >aimanot
12" Terhas <irma
5.1. :nion
%e use the union operator to -et tup*es from " $ifferent re*ations.
Let us a$$ a teacher re*ation base$ on this schemaG
Teacher+Schema Cteacheri$= teacherRfirstname= teacherRfathersname= teacherRemai*D
5n$ *et us a*so a$$ an emai* a$$ress attribute for stu$entsG
Stu$ent+schema O Cstu$entRi$= stu$entRfirstname= stu$entRfathersname=
pro-rammeRco$e= stu$entRemai*D
No *et us say e ant to -et a *ist of a** stu$ent an$ teacher names= a*on- ith their
emai* a$$resses an$ put a** the $ata into a sin-*e re*ation. #f e se*ect from each one=
e have " re*ationsG

stu$entRfirstname= stu$entRfathersname= stu$entRemai*


Cstu$entD

teacherRfirstname= teacherRfathersname= teacherRemai*


CteacherD
But e can use the union to put the " re*ations to-ether= *i/e thisG

stu$entRfirstname= stu$entRfathersname= stu$entRemai*


Cstu$entD

teacherRfirstname= teacherRfathersname= teacherRemai*


CteacherD
The resu*t is one ne re*ation.
5 re*ation behaves *i/e a set $oes a set a**o $up*icate va*ues&
5G no.
So= $up*icate tup*es are e*iminate$ from the resu*tin- re*ation. #f " tup*es have the
same va*ues in a** the attributes= on*y 1 tup*e is inc*u$e$ in the resu*t.
%hen $oin- a union operationG
,a-e )1 of 87
.election ?red@
+ro2ection ?yello4@
FBE Computer Science Department Lecture Notes Theory of Databases
'ust use compatible re*ations it must ma/e sense to union the " re*ations.
For e;amp*e= it $oes not ma/e sense to union the stu$ent an$ course re*ations.
The re*ations bein- union+e$ must have the same number of attributes e
say that the re*ations have the same arity.
The $omains of the correspon$in- attributes must be the same. #n other or$sG
for r s= the $omain of the i
th
attribute of r must be the same $omain as that
for the i
th
attribute of s.
#n this e;amp*e= the teacher an$ stu$ent re*ations $o not have the same number of
attributes so e use the pro3ection operation to -et " re*ations that have -ot the same
number of attributes in them.
5.1.4 .et Difference
LetJs say e ant to fin$ stu$ents ho are ta/in- one particu*ar course but are not
ta/in- another particu*ar course.
%e can use the set $ifferent operator to fin$ tup*es that are in one re*ation but not in
another. To -et tup*es that are in re*ation r but not in re*ation sG
r s
5s for union= the operation shou*$ be on re*ations that are compatib*e.
To fin$ stu$ents ho are ta/in- the #CT"(" course but ho are not ta/in- the #CT")1
courseG
C-et stu$ents to $o this themse*vesD
1. -et a re*ation shoin- stu$ents ta/in- #CT"(" 3ust -et the stu$ent #D

stu$entRi$
C
courseRco$eOT#CT"("T
Cstu$entCourseDD
". -et a re*ation shoin- stu$ents ta/in- #CT")1 shou*$ be compatib*e ith the
first re*ationG

stu$entRi$
C
courseRco$eOT#CT")1T
Cstu$entCourseDD
No e can -et the $ifference beteen these toG

stu$entRi$
C
courseRco$eOT#CT"("T
Cstu$entCourseDD

stu$entRi$
C
courseRco$eOT#CT")1T
Cstu$entCourseDD
The resu*t is= a-ain= a ne re*ation= ith one attribute an$ a number of tup*es.
5.1.5 Cartesian +roduct
%e a*rea$y ta*/e$ about the cartesian pro$uct of $omains remember that a re*ation
is a subset of the cartesian pro$uct of a set of $omains.
#n the same ay= e can combine " re*ations ith a Cartesian pro$uct operator the
resu*t is a re*ation that has a** the attributes from both re*ations an$ hose tup*es are
a** the possib*e combinations of the tup*es from each of the " re*ations.
The operator is ; e.-.
r
1
; r
"

,a-e )" of 87
FBE Computer Science Department Lecture Notes Theory of Databases
%e cou*$ have the same attribute name in the " re*ations so e have to ma/e sure
e can $istin-uish beteen the attributes in the resu*tin- re*ation.
r O stu$ent ; pro-ramme
The attributes in r areG
Cstu$ent.stu$Ri$= stu$ent.stu$Rfirstname= stu$ent.stu$Rfathersname=
stu$ent.pro-Rco$e= pro-ramme.pro-Rco$e= pro-ramme.pro-Rname=
pro-ramme.pro-R$escD
%e use the re*ation name as a prefi; to in$icate hich schema each attribute comes
from. But if the name occurs in one of the re*ations on*y= e can omit the re*ation
name. >ere= on*y pro-Rco$e occurs in both= so e have to put the re*ation names in
front of those " attributes.
This a*so means that the ar-ument re*ations must have $ifferent names.
r contains tup*es for every possib*e pair of stu$ent 6 pro-ramme tup*es e.-. Cbase$ on
$ata in han$out "D so if e have n tup*es in stu$ent an$ m tup*es in course= r
contains Cn ; mD tup*es.
stu$ent ; pro-ramme
stud;id stud;firstn
a!e
stud;fathersn
a!e
"ro&;cod
e
"ro&;cod
e
"ro&;na!e "ro&;desc
122 Sara Ne-ash CSDE< CSDE< Computer Science
De-ree
) Hear De-ree in
Computer Science
122 Sara Ne-ash CSDE< CSD#, Computer Science
Dip*oma
Dip*oma in
Computer Science
121 Te/*e >aimanot CSDE< CSDE< Computer Science
De-ree
) Hear De-ree in
Computer Science
121 Te/*e >aimanot CSDE< CSD#, Computer Science
Dip*oma
Dip*oma in
Computer Science
12" Terhas <irma CSDE< CSDE< Computer Science
De-ree
) Hear De-ree in
Computer Science
12" Terhas <irma CSDE< CSD#, Computer Science
Dip*oma
Dip*oma in
Computer Science
12) So*omon 9ebe$e CSD#, CSDE< Computer Science
De-ree
) Hear De-ree in
Computer Science
12) So*omon 9ebe$e CSD#, CSD#, Computer Science
Dip*oma
Dip*oma in
Computer Science
O1N-ive a simp*e e;amp*eG
r1 C11+SchemaD
11+Schema O C5= B= CD
5 B C
1
"
r" C1"+SchemaD
1"+Schema O CD= ED
%e can en$ up ith a *ot of tup*es in the resu*t $epen$in- on ho many tup*es in the
ar-ument re*ations.
,a-e )) of 87
FBE Computer Science Department Lecture Notes Theory of Databases
#n some tup*es= the stu$ent.pro-Rco$e O pro-ramme.pro-Rco$e= but in others they are
not eAua*G
t Lstu$ent.pro-Rco$eM t Lpro-ramme.pro-Rco$eM
#f e $o r
1
; r
"
= the schema for the resu*tin- re*ation r is a concatenation of the
schema 1
1
an$ 1
"
.
%hat use is this operation&
N.e can use it to anser Auestions *i/e Q-et a *ist of a** stu$ents on the Computer
Science De-ree pro-rammeJ if e $o not /no the pro-Rco$e= e can use the select
operator to -et on*y those tup*es here the pro-Rname is QComputer Science De-reeJ=
from the Cartesian pro$uct re*ation.

pro-RnameOTComputer Science De-reeT


Cstu$ent ; pro-rammeD
>o many tup*es i** this -et&
#G ! because each stu$ent tup*e has a tup*e that has pro-Rname O UComputer
Science De-reeT in the Cartesian pro$uct.
But is this correct& >o many stu$ents are $oin- the $e-ree pro-ramme&
#G ) e can see this in the tab*e but ho can e rite an a*-ebra e;pression to $o
this usin- another operation&
E*icit from stu$ents e nee$ to fin$ on*y those tup*es hereN.hat N. is true&
#G here the pro-Rco$e matches in both re*ations
4se another se*ect to $o thisG

stu$ent.pro-Rco$eOpro-ramme.pro-Rco$e
C

pro-RnameOTComputer Science De-reeT


Cstu$ent ; pro-rammeD
D
Note that e prefi; ith the re*ation name for pro-Rco$e but not for pro-Rname
because pro-Rname occurs on*y in the pro-ramme re*ation.
No ho many tup*es e have ) because on*y ) have matchin- va*ues.
One more stepNthe resu*tin- re*ation has a** the attributes in it e can use a
pro3ection to -et on*y the stu$ent name attributesG
r O
stu$Rfirstname=stu$Rfathersname
C

stu$ent.pro-Rco$eOpro-ramme.pro-Rco$e
C

pro-RnameOTComputer Science De-reeT


Cstu$ent ; pro-rammeD
DD
',erciseG
%rite an e;pression to -et a *ist of a** stu$ents ta/in- the course #CT"(" the *ist
shou*$ sho each stu$entJs fu** name.
#G
r O
stu$Rfirstname=stu$Rfathersname
C

stu$ent.stu$Ri$Ostu$entCourse.stu$Ri$
C

courseRco$eOT#CT"("T
Cstu$ent ; stu$entCourseD
DD
,a-e )! of 87
FBE Computer Science Department Lecture Notes Theory of Databases
5.1.6 $ena!e
5s e rite more comp*e; e;pressions= it ou*$ be nice not to have to rite out *on-
re*ation names *i/e Qstu$entCourseJ.
%e can rename a resu*t to a shorter name= usin- the rename operator .
This is a*so usefu* if you ant to $o a Cartesian pro$uct of a re*ation ith itse*f as
the to operan$s must have $ifferent names.
To rename the re*ation -iven by an e;pression E to the name ;G

;
CED
For e;amp*eG

cs$e-Rstu$ents
C
pro-rammeRco$eOTCSDE<T
Cstu$entDD
The operation returns the re*ation -iven by the e;pression= an$ the re*ation is name$
cs$e-Rstu$ents. Can no use that name in further operations.
To fin$ any stu$ents name$ So*omon ho are on the CSDE< pro-rammeG

stu$RfirstnameOTSo*omonT
C
cs$e-Rstu$ents
C
pro-rammeRco$eOTCSDE<T
Cstu$entDDD
Or= can simp*y rename a re*ation= as a re*ation is itse*f a trivia* re*ationa* a*-ebra
e;pressionG

stu$ent"
Cstu$entD
%e can a*so rename the attributes in the re*ation= usin- synta; *i/e thisG

; C51=5"=N.5nD
CED
#f the e;pression E has arity n Cn attributesDK 51 is a name for the first attribute= 5n is
a name for the n
th
attribute.
',a!"leG fin$ the hi-hest cre$it hours Cthis is a trivia* e;amp*e as e can easi*y see
from the $ata= but an eAuiva*ent Auestion mi-ht be Qfin$ the account ith the hi-hest
ba*anceJ hich ou*$ be more $ifficu*t to seeD.
1. ma/e a re*ation containin- a** courses that do not have the highest credit hours
i.e. here the cre$it hours va*ue is *ess than some the cre$it hours va*ue in
some other tup*e in the re*ation.
". $o a set difference beteen a** courses an$ those that are in the re*ation ma$e
in step 1.
This means e are $oin- an operation that reAuires " operan$s but here the "
operan$s are base$ on the same re*ation.
.te" 1G
Luse $ifferent co*ours to sho each ne bit you a$$ this e;pressionM

course.cre$itRhours
S
,a-e )( of 87
FBE Computer Science Department Lecture Notes Theory of Databases
e have to have a $ifferent re*ation containin- the same attributes so e ma/e a
ne re*ation that is 3ust the Course re*ation rename$G

c
CcourseD
No e can $o a cartesian pro$uct of course ith the rename$ re*ationG

course.cre$itRhours S c.cre$itRhours
Ccourse ;

c
CcourseDD
No *etJs pro3ect to -et 3ust the cre$itRhours attribute from the course re*ation
because e have se*ecte$ those tup*es here the course.cre$itRhours is *ess than
c.cre$itRhoursG

course.cre$itRhours
C
course.cre$itRhours S c.cre$itRhours
Ccourse ;

c
CcourseDDD
.te" 2G
No e have a re*ation that has one attribute cre$itRhours. Each tup*e represents
some course that has cre$itRhours *ess than some other course i.e. a** courses that
do not have the highest credit(hours.
So if e no $o a set $ifference of this re*ation from a** cre$itRhours hatever is
remainin- must be the hi-hest cre$itRhours. 1emember that for set $ifference= the
ar-ument re*ations must be union+compatib*e.

course.cre$itRhours
CcourseD

course.cre$itRhours
C
course.cre$itRhours S c.cre$itRhours
Ccourse ;

c
CcourseDDD
5.2 #dditional 1"erations
The basic re*ationa* a*-ebra operators that e have 3ust *oo/e$ at are sufficient to
e;press any re*ationa* a*-ebra Auery.
But even ith these= some types of common Auery that e ma/e on re*ations become
comp*e; an$ *en-thy to e;press.
To ma/e some of these easier= there some a$$itiona* operations that simp*ify some
common types of Auery.
These areG
Set intersection
Natura* 3oin
Division
5ssi-nment
Each one of these can be e;presse$ in terms of the basic operations a*so.
5.2.1 .et Intersection
Suppose e ant to fin$ out hat stu$ents are ta/in- the course #CT"(" an$ the
course #CT")1.
This can be viee$ as the intersection of " sets or re*ationsG

stu$Ri$
C
courseRco$eOT#CT"("T
Cstu$entCourseDD

stu$Ri$
C
courseRco$eOT#CT")1T
Cstu$entCourseDD
is the set intersection operator.
,a-e ). of 87
FBE Computer Science Department Lecture Notes Theory of Databases
r s can be ritten usin- the basic operations *i/e thisG
r Cr sD
Cr sD -ives tup*es that are in r but not in s.
#f e ta/e those tup*es aay from r= e are *eft ith on*y tup*es that are in r and in s.
#t is easier 3ust to use .
5.2.2 Natural Aoin
Thin/ bac/ to the Cartesian pro$uct operation usua**y= to -et somethin- meanin-fu*
from the resu*ts= e $o a se*ect operation ith some pre$icate on the C, resu*t
because e are *oo/in- for matchin- va*ues in the tup*es.
For e;amp*e= e $i$ this operation to -et a *ist of stu$ents ho are ta/in- the
Computer Science De-ree pro-ramme Cassumin- e $o not /no the pro-Rco$e for
itDG

stu$Rfirstname=stu$Rfathersname
C

stu$ent.pro-Rco$eOpro-ramme.pro-Rco$e
C

pro-RnameOTComputer Science De-reeT


Cstu$ent ; pro-rammeD
DD
%e can $o this in a simp*er e;pression usin- the natura* 3oin operator= *i/e thisG

stu$Rfirstname=stu$Rfathersname
C

pro-RnameOTComputer Science De-reeT


Cstu$ent VV pro-rammeDD
The natura* 3oin $oes a Cartesian product but considers only the pairs of tuples where
the attribute that appears in both schemas has e)ual values* It also removes one
occurrence of the attribute that appears in both*
So= in this case on*y tup*es here stu$ent.pro-Rco$e O pro-ramme.pro-Rco$e are
consi$ere$. The resu*tin- re*ation has the pro-Rco$e once on*y= because one
occurrence is remove$.
#f e ant 3ust a *ist of stu$ents an$ the pro-rammes they are on= e can $o thisG

stu$Rfirstname=stu$Rfathersname= pro-Rco$e= pro-Rname


Cstu$ent VV pro-rammeD
+ here= e $onJt nee$ the se*ect on the pro-Rname O UComputer Science De-reeT
pre$icate. %e a*so $o not have to prefi; pro-Rco$e ith a re*ation name= as the
natura* 3oin operator has a*rea$y remove$ one of the pro-Rco$e co*umns.
This operator $epen$s on the matchin- attribute havin- the same name in both
re*ation schemas.
%e can brea/ the natura* 3oin $on into ) steps these are the ) steps e ha$ to $o
before to -et the same resu*t= usin- the basic operationsG
+ Cartesian pro$uct to combine the tup*es
+ Se*ect to remove tup*es here the common attribute $oes not have matchin-
va*ues
,a-e )0 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
+ ,ro3ect to -et on*y the attribute e are intereste$ in Cusua**y a*so to remove
one of the matchin- attributesD
There is a*so a forma* $efinition of the natura* 3oinG
Consi$er that e operate on to re*ations= r an$ s. r has re*ation schema 1 an$ s has
re*ation schema S.
#f e consi$er the re*ation schemas to be sets of attributes
$ . is the intersection of the sets i.e. the attributes that appear in both schemas.
$ . is the union of the sets i.e. the attributes that appear in 1= in S or in both.
So= e can say that
r VV s O
1 S
C
r. 51 O s. 51 r. 5" O s. 5" NN r. 5n O s. 5n
Cr ; sDD
here 1 S O W5
1
= 5
"
= N.5
n
X.
Note that this shos that e can $o a natura* 3oin on more than one attribute there
cou*$ be " or more attributes in the re*ations that have matchin- va*ues in them.
#sideG in your *abs= you have been $oin- Aueries usin- the 'S 5ccess Fuery
Desi-ner. Hou have $one some Aueries ith " tab*es= here there is a re*ationship
beteen the tab*es. The Auery in this case is a 3oin beteen the " tab*es.
#f the matchin- co*umns have the same name in the " re*ations an$ if the Auery
removes one of the matchin- co*umns from the resu*t= it is a natural +oin operation.
5 more -enera* name for this type of Auery is e)ui-+oin here the 3oin is base$ on
eAua*ity beteen ros in one or more attributes. #n an eAui+3oin= the matchin-
attributes $o not have to have the same name.
',a!"lesG
1. :oinin- more than " re*ations
Suppose e ant to -et a *ist of a** Computer Science De-ree stu$ents= shoin- the
stu$ent name= course co$es an$ course names for the courses they are ta/in-. %e nee$
to 3oin the stu$ent= stu$entCourse an$ course re*ations.
%e can $o a natura* 3oin beteen stu$ent an$ stu$entCourse an$ then natura* 3oin the
resu*t to courseG

stu$Rfirstname=stu$Rfathersname= courseRco$e= courseRname


C

pro-Rco$eOTCSDE<T
Cstu$ent V;V stu$entCourse V;V courseD
D
%e cou*$ rite Cstu$ent V;V stu$entCourseD V;V course or stu$ent V;V Cstu$entCourse V;V
courseD an$ -et the same resu*t because the or$er in hich the operations are
e;ecute$ $oes not matter e say that the natura* 3oin is associative as an operation.
". 4sin- a 3oin instea$ of set intersection
1emember our e;amp*e for set intersection fin$ the #Ds of a** stu$ents ho are
ta/in- both #CT"(" an$ #CT")1 coursesG

stu$Ri$
C
courseRco$eOT#CT"("T
Cstu$entCourseDD

stu$Ri$
C
courseRco$eOT#CT")1T
Cstu$entCourseDD
,a-e )7 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
This can a*so be e;presse$ as a natura* 3oinG

stu$Ri$
C
courseRco$eOT#CT"("T
Cstu$entCourseD V;V C
courseRco$eOT#CT")1T
Cstu$entCourseDD
). :oinin- re*ations that $o not have a common attribute
#f e 3oin " re*ations that $o not have a common attribute= the resu*t of the natura* 3oin
is the same as that of the Cartesian pro$uctG
r V;V s O r ; s
1 S O Cempty setD
!. Combine a se*ection an$ a Cartesian pro$uct into a sin-*e operation. Ear*ier= e
$i$ thisG

stu$Rfirstname=stu$Rfathersname
C

pro-RnameOTComputer Science De-reeT


Cstu$ent VV pro-rammeDD
%e can combine the se*ection 6 pre$icate ith the natura* 3oinG
r V;V

s O

Cr ; sD
here CthetaD is a pre$icate on attributes in the schema 1 S.
So e can $oG

stu$Rfirstname=stu$Rfathersname
Cstu$ent VV
pro-RnameOTComputer Science De-reeT
pro-rammeD
The V;V

operator is ca**e$ a theta +oin.
5.2. Di3ision
The division operation is use$ to anser Aueries *i/e Qfin$ courses that are bein- ta/en
by a** stu$ents on the CSDE< pro-rammeJ or Qfin$ stu$ents ho are ta/in- a**
coursesJ.
#n contrast= a natura* 3oin can fin$ courses that are bein- ta/en by any stu$ent.
',a!"leG to fin$ courses that are bein- ta/en by all stu$ents on the CSDE<
pro-ramme
stu$entCourse Y C
stu$Ri$
C
pro-Rco$eOTCSDE<T
Cstu$entDDD
1efer to your >an$out " the stu$entCourse samp*e $ata. But first= a$$ a ne tup*e to
itG C12"= #CT"("D.
stu$entCourse
stud;id course;code
122 #CT"("
121 #CT"("
122 #CT")1
12" #CT")1
12) #CT"("
12) #CT")1
12" #CT"("
%hat tup*es are in
stu$Ri$
C
pro-Rco$eOTCSDE<T
Cstu$entDD&
,a-e )8 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
5nserG
stud;id
122
121
12"
No if e Q$ivi$eJ stu$entCourse by this= e i** -et a resu*t that has on*y the
attribute courseRco$e. #s there any courseRco$e that has a tup*e for every one of W122=
121= 12"X&
5G #CT"(".
%hen $ivi$in-= e are *oo/in- for va*ues in the stu$entCourse.courseRco$e co*umn
that are associated with every value in the stud(id column of the re*ation resu*tin-
from se*ectin- ith the pre$icate pro-Rco$eOTCSDE<T on stu$ent.
To ma/e it more c*ear= e shou*$ a$$ a pro3ection to the re*ation on the *eft= to sho
hat attributes it hasG

stu$Ri$=courseRco$e
Cstu$entCourseD Y C
stu$Ri$
C
pro-Rco$eOTCSDE<T
Cstu$entDDD
9or!al definitionG
rC1D an$ sCSD are re*ations ith schemas 1 an$ S.
S 1 that is= every attribute in schema S is a*so in schema 1.
The re*ation r Y s is a re*ation on the schema 1 S Ca** attributes in schema 1 that are
not in schema SD.
For this e;amp*eG the resu*t has on*y the attribute courseRco$e .
5 tup*e t is in the resu*tin- re*ation if these " con$itions ho*$G
1. t is in
1+S
CrD
". for every tup*e t
s
in s= there is a tup*e t
r
in r that satisfies both of the fo**oin-G
a. t
r
LSM O t
s
LSM
b. t
r
L1 + SM O t
For this e;amp*eG
Condition 1G
1+S
CrD -ives a** the courseRco$e va*ues in stu$entCourseG W#CT"("=
#CT"("= #CT")1= #CT")1= #CT"("= #CT")1= #CT"("X but $up*icate tup*es are
e*iminate$ by a pro3ection= so e have W#CT"("= #CT")1X.
Condition 2G
%e have W122=121= 12"X in our s re*ation.
+art aG For every tup*e t
s
in s= there is a tup*e t
r
in r that satisfiesG
t
r
LSM O t
s
LSM
#n other or$s= e ant to see is there a matchin- ro for every stu$Ri$ in
stu$entCourse Cfor every courseRco$e va*ue in the set W#CT"("= #CT")1XD.
For every tup*e t
s
in sG this is the set of tup*es W122=121= 12"X.
122G t
r
LSM is WS122= #CT"("P= S122= #CT")1PX
121G t
r
LSM is WS121= #CT"("PX
12"G t
r
LSM is WS12"= #CT")1P=S12"= #CT"("PX
,a-e !2 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
To fulfil "art ?b@= e have to ta/e the above tup*es an$ chec/ that t
r
L1 + SM O t= here
t is in W#CT"("= #CT")1X.
For 122G
S122= #CT"("P S#CT"("P O S#CT"("P
there is a matchin- tup*e t
r
L1 + SM for every t
For 121G
there is NOT a matchin- tup*e t
r
L1 + SM for every t #CT")1 $oes not have a match.
So e must e*iminate #CT")1 from the set of tup*es t in
1+S
CrD.
For 12"G
there is a matchin- tup*e t
r
L1 + SM for every t
',"ress r B s usin& the basic o"erations
r Y s O
1+S
CrD +
1+S
CC
1+S
CrD ; sD +
1+S=S
CrDD
LetJs brea/ this $onG

1+S
CrD +
1+S
C C
1+S
CrD ; sD +
1+S=S
CrD D
The steps areG
1. <et a** the courseRco$e va*ues
". E*iminate courseRco$e va*ues that $o not have a** possib*e stu$ent+course
combinations in stu$entCourse. %e $o it *i/e thisN.

1+S
CrD ; s
-ives us every possib*e pairin- of courseRco$e an$ stu$Ri$
course;code stud;id
#CT"(" 122
#CT"(" 121
#CT"(" 12"
#CT")1 122
#CT")1 121
#CT")1 12"

1+S=S
CrD
-ives us the tup*es in r= but ith the courseRco$e attribute first= then stu$Ri$ Cto $o set
$ifference= must be union compatib*e same $omain for the i
th
attribute in each
re*ationD.
course;code stud;id
#CT"(" 122
#CT"(" 121
#CT")1 122
#CT")1 12"
#CT"(" 12)
#CT")1 12)
#CT"(" 12"
Set $ifference of the " above *eaves the tup*e C#CT")1= 121D the on*y tup*e in the top
re*ation that is not in the bottom re*ation.
,a-e !1 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
This is -ivin- us the possib*e combinations of tup*es Cfrom the C, of courseRco$e an$
stu$Ri$D that $o not actua**y appear in stu$entCourse.
Do pro3ection
1+S
on this *eaves the courseRco$e on*y C#CT")1D.
No $o set $ifference beteen
1+S
CrD an$ C#CT")1D to find the course(code
values that have all possible combinations existing in stu$entCourse.
The tup*es for
1+S
CrD are C#CT"("= #CT")1D.
The set $ifference is #CT"(".
5.2.4 #ssi&n!ent
#t ou*$ be nice if e cou*$ assi-n the resu*t of an operation to a variab*e an$ then
use that variab*e in a subseAuent e;pressions= *i/e in pro-rammin- *an-ua-es.
%e**Ne can assi-n an e;pression to an temporary re*ation variab*e= *i/e thisG
temp1e*1 E
This assi-ns the resu*t of the e;pression E to the re*ation variab*e temp1e*1. The
variab*e can then be use$ in subseAuent e;pressions.
LetJs ta/e one of the *on-er e;pressions e have use$ as an e;amp*eG

stu$Rfirstname=stu$Rfathersname
C

stu$ent.pro-Rco$eOpro-ramme.pro-Rco$e
C

pro-RnameOTComputer Science De-reeT


Cstu$ent ; pro-rammeD
DD
To ma/e it easier to rea$= e can $o an assi-nment or toG
cs$e-Stu$ents
stu$ent.pro-Rco$eOpro-ramme.pro-Rco$e
C
pro-RnameOTComputer Science De-reeT

Cstu$ent ; pro-rammeDD
resu*t
stu$ent.pro-Rco$eOpro-ramme.pro-Rco$e
Ccs$e-Stu$entsD
This means that a Auery can be ritten as a seAuentia* pro-ram consistin- of a series
of assi-nments= fo**oe$ by an e;pression hose va*ue is $isp*aye$ as the resu*t of
the Auery. #t a*so ma/es it easier to e;press comp*e; Aueries.
#n re*ationa* a*-ebra Aueries= assi-nment must a*ays be ma$e to a temporary
variab*e. 5n assi-nment to a permanent= pre+e;istin- re*ation ou*$ be a mo$ification
to the $atabase= that chan-es that re*ation.
5. ',tended $elational(#l&ebra 1"erations
%e have no *oo/e$ at basic operations an$ some a$$itiona* operations= that are
$efine$ in terms of the basic operations.
For or/in- ith re*ationa* $atabases= the re*ationa* a*-ebra has ha$ some e;tensions
a$$e$ to it.
,a-e !" of 87
FBE Computer Science Department Lecture Notes Theory of Databases
5..1 Ceneralised +ro2ection
The pro3ection operation has been e;ten$e$ to a**o arithmetic functions to be use$ in
the pro3ection *ist.
For e;amp*e= suppose e ha$ a $ifferent schema for the course re*ation= *i/e thisG
Course+Schema O CcourseRco$e= courseRname= courseR$esc= theoryRhours= *abRhoursD
The formu*a for ca*cu*atin- cre$it hours is theoryRhours E C*abRhours?"D.
So e cou*$ $o a pro3ection on the course re*ation *i/e thisG

courseRco$e= courseRname= theoryRhours E C*abRhours?"D


CcourseD
this -ives us the cre$it hours for each course. The cre$it hours attribute in the
resu*tin- re*ation $oes not have a name e can -ive it a name usin- QasJ= *i/e thisG

courseRco$e= courseRname= theoryRhours E C*abRhours?"D as cre$itRhours


CcourseD
5..2 #&&re&ate 9unctions
Sometimes it is usefu* to be ab*e to ca*cu*ate an a--re-ate= or summary= va*ue for
somethin-. For e;amp*e= to fin$ out the tota* cre$it hours bein- ta/en by each stu$ent.
For this purpose= there are a number of a--re-ate functions $efine$ *i/e sum= av-=
ma;= min= count.
These can be use$ to -et an a--re-ate va*ue for some attribute.
5n a--re-ate function operates on a set of va*ues. For e;amp*e= e can app*y the sum
function to the set of va*ues
W!="=(=1X
The resu*t is 1".
5pp*yin- the av- function returns a va*ue of ).
The set can have the same va*ue appearin- more than once e.-. the set cou*$ be
W!="=!=(=1X
%ith re*ations= e can app*y a--re-ate functions to a** the va*ues in a particu*ar
attribute.
',a!"le 1G sum a** the cre$it hours Cthis $oes not rea**y ma/e senseNbut an
eAuiva*ent operation ou*$ be to -et a sum of a** emp*oyee sa*ariesD.
%e use a < in ca**i-raphic font for this Cca**i-raphic <DG
G
sumCcre$itRhoursD
CcourseD
This ou*$ -ive a resu*t of 12.
5 more usefu* thin- ou*$ be fin$ the tota* number of cre$itRhours to be ta/en by
each stu$ent. To $o this= e can first -et a re*ation that has attributes stu$Ri$=
courseRco$e= cre$itRhours. Then e can partition the re*ation into -roups base$ on the
stu$Ri$ an$ app*y the sum function to each -roup.
',a!"le 2G fin$ the tota* cre$itRhours to be ta/en by each stu$ent.
,a-e !) of 87
FBE Computer Science Department Lecture Notes Theory of Databases
First= e nee$ to $o a natura* 3oin of course an$ stu$entCourse an$ pro3ect to -et the
attributes e antG

stu$Ri$= courseRco$e= cre$Rhours


Ccourse V;V stu$entCourseD
This is no the ar-ument re*ation for the a--re-ate. %e must a*so specify hat
attribute to -roup on stu$Ri$ as subscript at the *eft of the operatorG
stu$Ri$
G
sumCcre$itRhoursD
C
stu$Ri$= courseRco$e= cre$Rhours
Ccourse V;V stu$entCourseD
D
The -roups *oo/ *i/e thisG
stud;id course;code credit;hours
122 #CT"(" !
122 #CT")1 )
121 #CT"(" !
12" #CT")1 )
12" #CT"(" !
12) #CT"(" !
12) #CT")1 )
The resu*tin- re*ation ou*$ *oo/ *i/e thisG
stud;id .u! of credit;hours
122 0
121 !
12" 0
12) 0
%hen an a--re-ate is app*ie$ over a set of va*ues that inc*u$es $up*icates= there are
cases here e ant to e*iminate the $up*icates. For e;amp*e= the count function
counts the number of tup*es in the re*ationG
G
countCcourseRco$eD
Cstu$entCourseD
This ou*$ return a tup*e ith the va*ue 0 as there are 0 tup*es in the stu$entCoures
re*ation.
But if e ant to count ho many courses are bein- ta/en by stu$ents overa** e
$o not ant to count $up*icate va*ues of courseRco$e more than once. To $o this= e
use a $istinct a--re-ate= *i/e thisG
',a!"le G -et the count of ho many courses are bein- ta/en by stu$ents.
G
count+$istinctCcourseRco$eD
Cstu$entCourseD
The -enera* form of the a--re-ate operation G isG
<1=<"=N.=<n
G
F1C51D= F"C5"D=N..FmC5mD
CED
,a-e !! of 87
FBE Computer Science Department Lecture Notes Theory of Databases
%here E is any re*ationa*+a*-ebra e;pressionK
<
1
= <
"
= N<
n
are a *ist of attributes on hich to -roup e can -roup on more than
one attribute.
Each F
i
is an a--re-ate functionK
5n$ each 5
i
is an attribute name.
The tup*es in E are partitione$ into -roups such that
5** tup*es in a -roup have the same va*ues for <
1
= <
"
= N<
n

5n$
Tup*es in $ifferent -roups have $ifferent va*ues for <
1
= <
"
= N<
n
.
So= the -roups can be i$entifie$ by the va*ues of the attributes <
1
= <
"
= N<
n
.
For each -roup= there is a tup*e in the resu*tG
C-
1
= -
"
= N.. -
n=
a
1
= a
"
= NNa
m
D
for each i= a
i
is the resu*t of app*yin- the a--re-ate function F
i
on the set of va*ues for
the attribute 5
i
in the -roup.
Li**ustrate by shoin- in the previous e;amp*eM
5 specia* case is hen the *ist of attributes <
1
= <
"
= N<
n
is empty then there is on*y
one sin-*e -roup that contains a** the tup*es in E. This is a--re-ation ithout
-roupin-.
E;amp*e !G use to a--re-ate functions fin$ the tota* cre$it hours for each stu$ent
an$ the ma;imum cre$it hours each stu$ent has.
stu$Ri$
G
sumCcre$itRhoursD= ma;Ccre$itRhoursD
C
stu$Ri$= courseRco$e= cre$Rhours
Ccourse V;V stu$entCourseD
D
5s/ stu$ents to rite hat the resu*t ou*$ *oo/ *i/e.
The resu*tin- re*ation ou*$ *oo/ *i/e thisG
stud;id .u! of credit;hours Ma, of credit;hours
122 0 !
121 ! !
12" 0 !
12) 0 !
5s e $i$ in -enera*ise$ pro3ection= e can -ive a name for the ne attributesG
stu$Ri$
G
sumCcre$itRhoursD as sum+cre$itRhours= ma;Ccre$itRhoursD as ma;+cre$itRhours
C
stu$Ri$= courseRco$e= cre$Rhours
Ccourse V;V
stu$entCourseD
D
5.. 1uter Aoin
The outer 3oin operation e;ten$s the natura* 3oin operation to $ea* ith missin-
information. For e;amp*e= if e ant to -et a *ist of a** re-istere$ stu$ents an$ the
courses they are ta/in-= e ou*$ $o a natura* 3oinG
stu$ent V;V stu$entCourse
,a-e !( of 87
FBE Computer Science Department Lecture Notes Theory of Databases
No= *et us a$$ a ne stu$ent tup*eG
student;ID student;firstna!e student;fathersna!e "ro&ra!!e;code
12! Fi/ir Hohannis CSDE<
%i** this tup*e appear in the resu*t re*ation of the natura* 3oin&
5G no because it $oes not have a matchin- tup*e in stu$entCourse.
So= sometimes= e ant to $o a 3oin here e -et not on*y the tup*es that have
matches Cthe natura* 3oinD= but a*so the tup*es that $o not have a match.
There are ) types of outer 3oin *eft outer 3oin= ri-ht outer 3oin an$ fu** outer 3oin.
#n each form= the 3oin computes a natura* 3oin an$ then a$$s some e;tra tup*es to the
resu*t.
Left outer 2oinG a$$ a** tup*es from the left re*ation that $i$ not have a matchin- tup*e
in the ri-ht re*ation. The attributes of the ri-ht re*ation are fi**e$ ith null va*ues for
the non+matche$ tup*es in the resu*t.
$i&ht outer 2oinG is the opposite of the *eft outer 3oin it a$$s a** tup*es from the
right re*ation that $i$ not have a matchin- tup*e in the *eft re*ation. The attributes of
the *eft re*ation are fi**e$ ith null va*ues for the non+matche$ tup*es.
9ull outer 2oinG $oes both the *eft 6 ri-ht outer 3oin operations fi**in- the non+
matche$ tup*es from both si$es ith nu** va*ues.
Left outer 3oinG
stu$ent stu$entCourse
The resu*t has a** the matchin- tup*es an$ a*so a tup*e for the stu$Ri$ 12!= ith a
nu** va*ue for the courseRco$e.
stud;id stud;firstna!e stud;fathersna!e "ro&;code course;code
122 Sara Ne-ash CSDE< #CT"("
122 Sara Ne-ash CSDE< #CT")1
121 Te/*e >aimanot CSDE< #CT"("
12" Terhas <irma CSDE< #CT")1
12" Terhas <irma CSDE< #CT"("
12) So*omon 9ebe$e CSD#, #CT"("
12) So*omon 9ebe$e CSD#, #CT")1
12! Fi/ir Hohannis CSDE< null
To i**ustrate a ri-ht outer 3oin ith the same " re*ations= suppose that our re*ations $i$
not a*ays enforce the referentia* inte-rity for the stu$entRi$ forei-n /ey an$ the
stu$entCourse re*ation has a tup*eG
stud;id course;code
"22 #CT"("
1i-ht outer 3oinG
stu$ent stu$entCourse
The resu*t for this one isG
stud;id stud;firstna!e stud;fathersna!e "ro&;code course;code
,a-e !. of 87
FBE Computer Science Department Lecture Notes Theory of Databases
122 Sara Ne-ash CSDE< #CT"("
122 Sara Ne-ash CSDE< #CT")1
121 Te/*e >aimanot CSDE< #CT"("
12" Terhas <irma CSDE< #CT")1
12" Terhas <irma CSDE< #CT"("
12) So*omon 9ebe$e CSD#, #CT"("
12) So*omon 9ebe$e CSD#, #CT")1
"22 null null null IC#"#
Note that the tup*e for stu$Ri$ 12! is not in this resu*t because it $oes not have a
matchin- tup*e in stu$entCourse an$ it is in the *eft re*ation= not the ri-ht re*ation.
But if e $o a fu** outer 3oin= e i** have it an$ the stu$Ri$ "22 tup*eG
stu$ent stu$entCourse
stud;id stud;firstna!e stud;fathersna!e "ro&;code course;code
122 Sara Ne-ash CSDE< #CT"("
122 Sara Ne-ash CSDE< #CT")1
121 Te/*e >aimanot CSDE< #CT"("
12" Terhas <irma CSDE< #CT")1
12" Terhas <irma CSDE< #CT"("
12) So*omon 9ebe$e CSD#, #CT"("
12) So*omon 9ebe$e CSD#, #CT")1
12! Fi/ir Hohannis CSDE< null
"22 null null null IC#"#
#sideG #n re*ationa* $atabases= outer 3oins are often use$ to chec/ for missin- or ba$
$ata. For e;amp*e= if the referentia* inte-rity on the stu$Ri$ forei-n /ey in
stu$entCourse as missin- for some reason= you cou*$ $o an outer 3oin to chec/ for
ros that ere a$$e$ ith stu$Ri$ va*ues for stu$ents that $o not e;ist in the $atabase.
#n $atabase termino*o-y= such ros are sometimes ca**e$ orphans because they are
chi*$ recor$s of a 1+to+many re*ationship that $o not have a correspon$in- parent
recor$.
5.4 Database Modifications
So far= a** the re*ationa* a*-ebra e have *oo/e$ at is to retrieve $ata from re*ations.
#n terms of SFL= a** of the operations e have *oo/e$ at can be $one usin- the SFL
Se*ect Auery statement it has $ifferent c*auses an$ /eyor$s for se*ection=
pro3ection= a--re-ate functions= 3oins= set $ifference an$ rename.
#n a re*ationa* $atabase= e a*so nee$ to be ab*e to mo$ify the $ata in the re*ations.
There are ) types of mo$ification.
5s/ c*ass hat are they&
5G insert= up$ate= $e*ete.
#n SFL= there are $ifferent Auery statements for these insert= up$ate= $e*ete.
%eJ** *oo/ brief*y at the re*ationa* a*-ebra operations for each of these.
%e can use the assi-nment operator to assi-n the resu*t of an e;pression to an e;istin-
re*ation.
5.4.1 Deletion
To $e*ete se*ecte$ tup*es from the $atabase.
Can on*y $e*ete ho*e tup*es cannot $e*ete va*ues on*y from particu*ar attributes.
,a-e !0 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
r r E
here r is a re*ation= E is a re*ationa*+a*-ebra Auery or e;pression
',a!"leG $e*ete a** the recor$s shoin- stu$ents ta/in- course co$e #CT")1
stu$entCourse stu$entCourse +
courseRco$eOT#CT")1T
Cstu$entCourseD
',erciseG $e*ete a** recor$s for the stu$ent hose #D is 12".
5**o stu$ents to try themse*ves first.
Nee$ to $e*ete from " re*ations stu$ent an$ stu$entCourse.
stu$entCourse stu$entCourse +
stu$Ri$O12"
Cstu$entCourseD
stu$ent stu$ent +
stu$Ri$O12"
Cstu$entD
#f you trie$ to $e*ete from stu$ent first= hat $o you thin/ ou*$ happen&
#G if referentia* inte-rity is enforce$= mi-ht -et an error because the forei-n /ey
va*ues in stu$entCourse ou*$ no not be va*ues e;istin- in stu$ent.
%hen $efinin- a forei-n /ey= you can specify if $e*etes of the primary /ey shou*$
cascade to the forei-n /ey.
#f $e*ete casca$e is enab*e$= hen a tup*e in the primary /ey re*ation is $e*ete$= any
re*ate$ tup*es in the forei-n /ey re*ation are a*so $e*ete$.
#n 'S 5ccess you can see this option in 1e*ationships hen you are $efinin- a
re*ationship or if you $oub*e+c*ic/ on an e;istin- re*ationship.
ButNbeare= often not a -oo$ i$ea to turn on the casca$e $e*ete option= as it means
that you may *ose $ata in the F9 re*ation unintentiona**y. Better to have it off= an$
a**o the user or app*ication to first $e*ete the re*ate$ tu*es.
5.4.2 Insertion
Specify a tup*e to be inserte$ or rite a Auery hose resu*t is a set of tup*es to be
inserte$.
The tup*es to be inserte$ must have the correct arity CZ of attributesD for the re*ation
bein- inserte$ into an$ the va*ues specifie$ must be in the attribute $omains.
r r E
E is a re*ationa* a*-ebra e;pression.
To insert a specifie$ tup*e E is a constant re*ation containin- that tup*e.
',a!"leG insert a ne stu$ent to the stu$ent re*ation= ith #D 12(= name 5baba
<irma= pro-ramme CSDE<.
stu$ent stu$ent W C12(= U5babaT= U<irmaT= UCSDE<TD X
',erciseG insert a recor$ to sho that the ne stu$ent is ta/in- course #CT")1.
stu$entCourse stu$entCourse W C12(= U#CT")1TD X
,a-e !7 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
',a!"leG insert tup*es to stu$entCourse to have a** CSDE< stu$ents ta/in- the
#CT)11 course.
First= e can pro3ect 6 se*ect on stu$ent to -et the stu$ent #Ds e antG
r1
stu$Ri$
C
pro-Rco$eOTCSDE<T
Cstu$entDD
Secon$= combine each stu$ent #D ith the course co$e #CT)11 an$ put those tup*es
into stu$entCourseG
stu$entCourse stu$entCourse Cr1 ; U#CT)11TD
This is ho e specify a set of tup*es to insert into a re*ation.
5.4. :"date
Sometimes= e ant to chan-e the va*ue of one or more attributes in a tup*e= ithout
chan-in- a** the va*ues in the tup*e.
4p$ate the va*ues in a** tup*es in the re*ation or up$ate for on*y some tup*es.
4sin- the emp*oyee re*ation Cyour han$out )D.
',a!"leG increase a** Emp*oyee sa*aries by 12[.
To $o this= e can pro3ect on emp*oyee to -et a** attributes= use an arithmetic function
on the sa*ary attribute an$ assi-n the resu*t bac/ to the emp*oyee re*ation.
emp*oyee
empRi$=empRfirstname=empRfathersname=empRphone=sa*ary E C.1 \ sa*aryD =$eptRi$
Cemp*oyeeD
But if e ante$ to on*y $o the increase for emp*oyees in the $epartment that has
$eptRi$ " e have to use a se*ect to on*y chan-e those tup*es= an$ ma/e sure e
assi-n the chan-e$ tup*es 5ND the unchan-e$ tup*es bac/ to the re*ation.
emp*oyee
empRi$=empRfirstname=empRfathersname=empRphone=sa*ary E C.1 \ sa*aryD =$eptRi$
C
$eptRi$O"
Cemp*oyeeDD
C
$eptRi$"
Cemp*oyeeDD
The union is necessary to ta/e the up$ate$ tup*es an$ the non+up$ate$ tup*es.
The -enera* form isG
r
F1= F"= NFn
CrD
here each F
i
is an attribute of r or an e;pression= invo*vin- on*y constants an$
attributes of r= that -ives a ne va*ue for the attribute.
For up$atin- on*y a sub+set of tup*es= the -enera* form isG
r
F1= F"= NFn
C
,
CrDD C r +
,
CrDD
in other or$s a pro3ection on a se*ection from r= union+e$ ith the tup*es not
inc*u$e$ in the se*ection.
That conc*u$es the unit on re*ationa* a*-ebra.
%e covere$G
,a-e !8 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
Basic operationsG se*ect= pro3ect= union= set $ifference= Cartesian pro$uct=
rename
5$$itiona* OperationsG set intersection= natura* 3oin= $ivision= assi-nment=
E;ten$e$ OperationsG -enera*ise$ pro3ection= a--re-ate functions= outer 3oin
Database 'o$ificationsG $e*ete= insert= up$ate
5s you start to *earn SFL in the *abs= you shou*$ be ab*e to ma/e connections
beteen these operations an$ SFL statements. %eJ** ta*/ about SFL a-ain *ater in the
theory part of the course.
Lcou*$ $o no but $oin- in *ab= an$ you nee$ to *earn E+1 for pro3ects= so eJ**
move on.M
6 'ntity($elationshi" Modellin&
1b2ecti3esG by the en$ of the unit= stu$ents i** /no ho to create E+1 mo$e*s an$
use them as a too* for conceptua* $ata mo$e**in-K a*so ho to convert an E+1 mo$e* to
a re*ationa* $atabase schema.
+re"arationG E1 5ssistant Con CD+1O' ith 'annino Boo/D insta**e$ in Lab 1.
Copy >an$out ! for stu$ents.
#t ou*$ a*so be -oo$ to a$$ some more practica* e;ercises to this section= perhaps in
the form of tutoria*s here stu$ents have to create E+1 mo$e*s base$ on -iven
information. 4se *ast yearJs assi-nment= for e;amp*e.
$eadin& MaterialG 'annino ch. 0= Si*berschatB et a* ch.".
Systems 5na*ysis 6 Desi-n my han$outs >an$outs 8= 12= 11 but note that these
ere ritten for a $ifferent course= so not a** materia* is re*evant.
#ssi&n!ent 1G $escribe a scenario an$ stu$ents have to create an E+1 $ia-ram=
convert it to a re*ationa* $atabase schema an$ rite a report on it= inc*u$in- ritin- "
simp*e SFL Aueries on the $atabase.
6.1 Introduction
E+1 is a conceptua* $ata+mo$e**in- too* that can be use$ in the initia* sta-es of
$esi-nin- a re*ationa* $atabase.
Because it is conceptual= it a**os you to mo$e* entities in the mini+or*$ of the
$atabase system an$ the *in/s beteen them in ays that en$+users of the propose$
system i** un$erstan$.
5n E+1 mo$e* is a $ia-ram that is use$ to sho ho $ata is or-aniBe$ in a system.
4sua**y comes before the re*ation schema for a $atabase after comp*etin- an E+1
mo$e*= you can convert it to a re*ation schema for a re*ationa* $atabase.
No you /no about re*ationa* $atabases= *etJs *earn ho to -o about ma/in- them
from scratch.
,a-e (2 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
NoteG in c*ass= e i** use simp*e e;amp*es. 1ea$ the te;tboo/s for $ifferent= more
comp*e; e;amp*es.
%e i** *oo/ at E+1 mo$e**in- in terms of the steps you cou*$ fo**o to create an E+1
mo$e*G
1. #$entify entity types
". #$entify attributes of the entity types
). Se*ect the i$entifier for each entity type
!. #$entify re*ationships beteen entity types
6.2 13er3ie4 D 'ntities> #ttributes> $elationshi"s
E+1 $ia-ram shos
entities Cob3ectsD in the mini+or*$
re*ationships beteen the entities
attributes of the entities an$ of the re*ationships.
',a!"leG Ta/e for e;amp*e a personne* system in a company= in hich emp*oyees
be*on- to $epartments an$ emp*oyees are assi-ne$ to pro3ect.
FG hat are the entities in this system&
5G Department
Emp*oyee
,ro3ect
FG hat $o you thin/ are re*ationships in this system&
5G 5 Department has Emp*oyees
5n Emp*oyee manages pro3ects
Emp*oyees work on pro3ects
Some of the #ttributes areG
5 Department has a name an$ a *ocation
5n Emp*oyee has a name= a phone number an$ a sa*ary
Entities are $escribe$ by nouns.
1e*ationships are $escribe$ by verbs.
LetJs be-in by $rain- an E+1 $ia-ram to sho the entities= attributes an$
re*ationships in a simp*e system that has emp*oyees an$ $epartments on*y.
Then eJ** *oo/ at the steps to bui*$in- this $ia-ram.
There are symbo*s use$ in E+1 $ia-rams.
Entity type rectan-*e
1e*ationship connectin- *ine beteen entity types= ith a $iamon$ shape
shoin- the name of the re*ationship. Diamon$ sometimes omitte$. The many
si$e of a re*ationship is in$icate$ ith somethin- ca**e$ the CroJs Foot
notation.
5ttributes sometimes shon in an ova* shape connecte$ to the entity typeK
sometimes as a *ist un$er the entity type name. %e i** use the *ist.
,a-e (1 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
L$ra this $ia-ram on the boar$= refer bac/ to it throu-h the c*assK stu$ents can use
this to comp*ete the $ia-ram on their han$out !.M
,a-e (" of 87
FBE Computer Science Department Lecture Notes Theory of Databases
6. Identify 'ntity Ty"es
'ntityG a person, place, ob+ect, event or concept in the system.
+ has its own identity that $istin-uishes it from every other entity.
For e;amp*e= each Emp*oyee has a uniAue #D that $istin-uishes it from every other
Emp*oyee.
'ntity ty"e: a co**ection of entities that share common properties or characteristics.
Each entity type in an E+1 mo$e* is -iven a name. The name is singular an$ is a
simple noun.
So e have an entity type ca**e$ ]Emp*oyee] it is not ca**e$ ]Emp*oyees]. This is
because the name represents a set of entities.
#n an E+1 $ia-ram= an entity type is represente$ by a rectangle= an$ the name is
in$icate$ in capita* *etters.
'ntity instance: a sin-*e occurrence of an entity type e.-. an emp*oyee= a $epartment.
5n entity type is $escribe$ 3ust one time in the $ata mo$e*= but many instances of that
entity type may be represente$ by $ata store$ in the $atabase.
For e;amp*e= there may be hun$re$s or thousan$s of emp*oyees in an or-aniBation
each one is an instance of the Emp*oyee entity type.
#n $ata mo$e**in-= the term Qentity, is often use$ to refer to an entity type. The term
Qentity instanceJ is usua**y use$ to in$icate an instance of the entity.
6.4 Identify #ttributes of the 'ntities
5n entity type has a set of attributes properties or characteristics associate$ ith
it. 5n attribute is a fact about the entity that is of interest to the or-aniBation or
system.
So= for e;amp*e= the E',LOHEE entity has attributes of Emp*oyeeR#D=
Emp*oyeeRName= Sa*ary.
#n E+1 $ia-rams= attributes are name$ ith an initia* capita* *etter fo**oe$ by
*oercase *etters.
5n attribute can be represente$ byG
+ an e**ipse Cova*D shape ith a *ine connectin- it to the associate$ entity.
Or
+ by *istin- them ithin the entity rectan-*e= un$er the entity name.
,a-e () of 87
E',LOHEE
Emp*oyee#D
Emp*oyeeRName
Sa*ary
5$$ress
DE,51T'ENT
Department#D
DepartmentName
'ana-es
>as
FBE Computer Science Department Lecture Notes Theory of Databases
5$$ attributes to $ia-ramG
Note that at this sta-e= e are $oin- a conceptua* mo$e* as en$+users see the system.
En$+users typica**y thin/ about the name of a person= not the first name 6 fatherJs
name. Or they *oo/ at the a$$ress= but $o not thin/ about it in terms of postco$e=
ton= re-ion etc.
6.5 .elect Identifiers for 'ntity Ty"es
No you nee$ to i$entify can$i$ate /eys for each entity type.
1emember from beforeG a can$i$ate /ey is a minima* super/ey.
%hen or/in- ith an E+1 mo$e*= a can$i$ate /ey for an entity is an attribute or
combination of attributes that uni)uely identifies each instance of the entity.
Cbefore= e ta*/e$ about a /ey uniAue*y i$entifyin- each tup*e this is the eAuiva*ent
for entitiesD.
Nee$ to se*ect one can$i$ate /ey as an identifier for the entity.
/G Choosin- an i$entifier is *i/e choosin- a primary /ey if the choices for
Emp*oyee are Emp*oyeeRName 6 5$$ress Cif e assume that every Emp*oyee has a
$ifferent a$$ressD or Emp*oyeeR#D= hich ou*$ you choose an$ hy&
5G Emp*oyeeR#D because it is *ess *i/e*y to chan-e over time.
The i$entifier attributeCsD is CareD un$er*ine$ in the $ia-ram.
#$entifiers are critica* to $ata inte-rity in a $atabase= so hen se*ectin- i$entifiers= you
shou*$ be carefu*. Some more -ui$e*ines areG
Choose a can$i$ate /ey that is -uarantee$ to a*ays have va*i$ va*ues an$ not
be nu** for each instance. #t may be necessary to use va*i$ation contro*s in the
DB'S to e*iminate the possibi*ity of errors Ce.-. nu**s not a**oe$ in a
co*umn= va*i$ation ru*esD. For a can$i$ate /ey that inc*u$es more than one
attribute= a** the parts of the /ey shou*$ be -uarantee$ to have va*i$ va*ues an$
not be nu**.
5voi$ usin- /eys here part of the va*ue in$icates some c*assification of the
entity instance= or some other property of the entity. For e;amp*e= in a
$atabase that trac/s computer maintenance= computers may be name$ ith a
co$e *i/e ]FBE21] here the first three characters in$icate the *ocation of the
computer. #f the *ocation of the computer chan-es= then the co$e nee$s to
chan-e. Therefore the co$e is not a -oo$ i$entifier.
#f a can$i$ate /ey is a composite of to or more attributes= consi$er creatin- a
ne /ey ith a sin-*e va*ue. For e;amp*e= if a footba** *ea-ue is bein- trac/e$=
a can$i$ate /ey for a <5'E entity type mi-ht be the >omeRTeamRName an$
,a-e (! of 87
E',LOHEE
Emp*oyeeR#D
Emp*oyeeRName
Sa*ary
5$$ress
FBE Computer Science Department Lecture Notes Theory of Databases
the 5ayRTeamRName. This cou*$ be substitute$ ith a ne attribute ca**e$
<ameR#D.
6.6 Identify $elationshi"s 8et4een 'ntities
1e*ationships connect the various components of an E+1 mo$e*. 5 relationshi" is an
association beteen the instances of one or more entity types that is of interest to the
or-aniBation or the system.
EitherG a natura* *in/ beteen entities *i/e Qa $epartment has emp*oyeesJ or
Some event that occurs *i/e Qa computer is maintaine$ by a staff memberJ.
Labe* re*ationships ith verb phrases a $epartment has emp*oyeesK an emp*oyee
!ana&es a $epartment.
1e*ationships are shon in one of to aysG
a *ine connectin- the entities= ith a $iamon$ shape containin- the $escription
of the re*ationship. This is from the Chen notation for E1 $ia-rams.
a *ine connectin- the entities= ith the $escription on the *ine as e have
$ran above. This is from the CroJs Foot notation for E1 $ia-rams.
To sho the $iamon$ notationG
6.6.1 $elationshi" De&ree
The above re*ationships each invo*ve " entities so they are binary.
1e*ationships can a*so be unary invo*vin- 1 entity.
For e;amp*e= if e a$$ that an emp*oyee is mana-e$ by another emp*oyeeG
Ternary re*ationships are a*so possib*e beteen ) entities.
,a-e (( of 87
'ana-es
E',LOHEE
Emp*oyee#D
Emp*oyeeRName
Sa*ary
5$$ress
DE,51T'ENT
Department#D
DepartmentName
'ana-es
>as
E',LOHEE
Emp*oyee#D
Emp*oyeeRName
Sa*ary
5$$ress
DE,51T'ENT
Department#D
DepartmentName
>as
FBE Computer Science Department Lecture Notes Theory of Databases
5*so hi-her $e-ree= N+ary re*ationships but these $o not often occur an$ they ma/e
thin-s comp*e;.
',erciseG
1. 5$$ a ,ro3ect entity to your $ia-ram. 5 ,ro3ect has a co$e an$ a start $ate. The
pro3ect co$e i** be the i$entifier.
". Emp*oyees or/ on pro3ects put this re*ationship on your $ia-ram.
6.6.2 $elationshi" Cardinality
Consi$er the re*ationships on our $ia-ram.
Department >as Emp*oyeesG e /no that every emp*oyee must be in a $epartment
an$ one $epartment on*y. %e a*so /no that a $epartment must have 1 or more
emp*oyees.
%e can sho this on the $ia-ram. This is ca**e$ relationshi" cardinality. #t means
the number of re*ationships in hich a -iven entity instance can appear. 5n entity
instance can appear inG
one C1D re*ationship beteen its entity type an$ the other entity type or
any variab*e number CND re*ationships beteen its entity type an$ the other
entity type.
5n Emp*oyee can be in on*y one Department so an instance of Emp*oyee can
appear in on*y one re*ationship ith a Department instance.
5 Department can have many Emp*oyees so an instance of Department can appear
in many CND re*ationships ith Emp*oyee instances.
LDra $ia-ram but $o not a$$ car$ina*ity in$icators 3ust yet a$$ them as e -o
throu-h them= be*o.M
,a-e (. of 87
'ana-es
E',LOHEE
Emp*oyee#D
Emp*oyeeRName
Sa*ary
5$$ress
DE,51T'ENT
Department#D
DepartmentName
'ana-es
>as
,1O:ECT
,ro3ectCo$e
StartDate
%or/s on
FBE Computer Science Department Lecture Notes Theory of Databases
To sho that a Department instance can be re*ate$ to many Emp*oyee instances put
an N in$icators at the Emp*oyee si$e of the connector.
This is ca**e$ the croJs foot symbo*.
To sho that an Emp*oyee instance can be re*ate$ to on*y one Department instance=
put a *ine across the connector at the Department si$e.
5 Department can have at most N Emp*oyees. The Department must a*so have at
least one Emp*oyee. The !ini!u! cardinality for Department in the re*ationship is
therefore 1. The !a,i!u! cardinality is N. for %e sho this ith a horiBonta* bar
across the connector= a-ain at the Emp*oyee si$e.
/G %hat $o you thin/ is the minimum car$ina*ity for an Emp*oyee instance in the
re*ationship& %hat is the *east number of $epartments an emp*oyee can be in&
#G 1 if e assume that every emp*oyee is a*ays assi-ne$ to a $epartment.
So sho this on your $ia-ram a*so by puttin- another horiBonta* bar at the
Department si$e.
Loo/ at your $ia-ram noN.rea$ it by startin- at the Emp*oyee instance.
Hou rea$ the car$ina*ity for Emp*oyee in the re*ationship by *oo/in- at the symbo*s at
the Department si$e of the connector.
The to *ines ne;t to Department sho that an Emp*oyee instance must be in 1 an$
on*y 1 re*ationship ith a Department instance.
The Emp*oyee entity has !andatory "artici"ation in the re*ationship because it
must be in at *east 1
The Department entity a*so has !andatory "artici"ation because it must have a *east
one Emp*oyee i.e. the minimum cardinality is -.
%e a*so say that an entity that has a min car$ina*ity of 1 in a re*ationship is e,istence
de"endent on the re*ationship instances of the entity cannot e;ist ithout a re*ate$
instance of the other entity. 5n Emp*oyee instance cannot e;ist un*ess there is a
Department instance that the Emp*oyee instance is re*ate$ to.
1"tional "artici"ation is a*so possib*e if the minimum cardinality is .. Consi$er
the Emp*oyee+or/s+on+,ro3ect re*ationship. Let us say that an Emp*oyee can or/
on pro3ects but $oes not have toK a*so that an Emp*oyee can or/ on more than 1
pro3ect at a time.
,a-e (0 of 87
E',LOHEE
Emp*oyee#D
Emp*oyeeRName
Sa*ary
5$$ress
DE,51T'ENT
Department#D
DepartmentName
>as
FBE Computer Science Department Lecture Notes Theory of Databases
5 ,ro3ect a*ays has at *east 1 Emp*oyee or/in- on it.
Can you sho the car$ina*ity for this re*ationship on your $ia-ram&
Hou can put a 2 to in$icate a car$ina*ity of Bero.
#f the ma;imum car$ina*ity is 1= the re*ationship is sin&le(3alued or functional.
The >as re*ationship is sin-*e+va*ue$ for Emp*oyee.
/G Can you see the re*ationship types C1+1= 1+to+many etcD on the $ia-ram&
Department+Emp*oyee 1+to+many ma; car$ina*ity is 1 in one $irection= many in
the other $irection.
Emp*oyee+,ro3ect many+to+many ma; car$ina*ity is many in both $irections.
#f the ma; car$ina*ity is 1 in both $irections= it is a 1+1 re*ationship.
To -et the re*ationship type= e *oo/ on*y at the maximum cardinalities.
This shos one $ifference beteen an E+1 $ia-ram an$ a re*ation schema in a
$atabase the $b cannot sho minimum car$ina*ities. They are ru*es that must be
imp*emente$ e*sehere in the system= maybe by app*ications insertin- $ata to the $b
e.-. that a ,ro3ect must have at *east one Emp*oyee or/in- on it.
#f the !a,i!u! cardinality is a s"ecific nu!ber e.-. a Department can have
ma;imum (2 emp*oyees= then e can sho that by puttin- the number besi$e the
croJs foot.
Eo!e4or5G input the Emp*oyee= Department an$ ,ro3ect entities to E+1 5ssistant
an$ pro$uce the $ia-ram. 4se this to fi-ure out ho to use the too* you i**
probab*y -et an assi-nment usin- it *ater on.
',erciseG a$$ the car$ina*ity for the manages re*ationship -iven that an Emp*oyee
can mana-e many other Emp*oyees= but an Emp*oyee has to be mana-e$ by one an$
on*y one person.
,a-e (7 of 87
E',LOHEE
Emp*oyee#D
Emp*oyeeRName
Sa*ary
5$$ress
DE,51T'ENT
Department#D
DepartmentName
>as
,1O:ECT
,ro3ectCo$e
StartDate
%or/s on
'ana-es
FBE Computer Science Department Lecture Notes Theory of Databases
',erciseG $ra an E+1 $ia-ram for the fo**oin- system= inc*u$in- car$ina*ities for
the re*ationships. CThere is some space on the bac/ of your >an$out !.D
Computers have a uniAue co$e= an operatin- system an$ a *ocation.
Staff have an #D= a name an$ a phone number.
Computers are maintaine$ by staff hen they nee$ maintenance.
6.* 6ea5 'ntities
Consi$er the Operatin- System attribute of Computer.
%hat if e say that a computer can have " or more $ifferent operatin- systems
insta**e$ on it&
Cou*$ say that Operatin-System is an attribute that can have " va*ues for one
computer it is mu*ti+va*ue$. %e $onJt ant this situation in a re*ationa* $atabase
Catomic ru*e for attributesD.
So e can ma/e a separate entity type for Operatin- System. ButNin terms of this
system= an operatin- system is an entity that $oes not e;ist un*ess it is insta**e$ on a
computer.
So= Operatin- System is a 4ea5 entity an entity that $oes not e;ist ithout the
e;istence of some other entity. To i$entify instances of the ea/ entity= e nee$ to
associate them ith instances of another Cstron-D entity.
# 4ea5 entity &ets "art or all of its identifier fro! the other entity. #n this case=
Operatin- System -ets part of its i$entifier from the Computer entity.
L$ra ne entity onto $ia-ram= remove Operatin-System attribute from Computer.M
/G %hat is the i$entifier for this ne entity&
#G The Operatin-System name uniAue*y i$entifies each instance for a particu*ar
Computer instance e say that Operatin-System is the discri!inator of the ea/
entity. But to uniAue*y i$entify any instance from a** others= the i$entifier is
Operatin-System 5ND Computer#D to-ether.
Dra the car$ina*ities on your $ia-ram assume a computer must have 1 OS an$ can
have up to ).
5 ea/ entity is in$icate$ by $ia-ona* *ines in the corners of the entity rectan-*e.
Norma* re*ationships shou*$ have a $otte$ *ine hi*e the re*ationship beteen a ea/
entity an$ its onin- entity is comp*ete.
,a-e (8 of 87
CO',4TE1
Computer#D
Name
Operatin-System
ST5FF
Staff#D
Name
FathersName
maintains
CO',4TE1
Computer#D
Name
ST5FF
Staff#D
Name
FathersName
maintains
FBE Computer Science Department Lecture Notes Theory of Databases
The re*ationship beteen the ea/ entity an$ the entity from hom it -ets part of its
i$entifier is ca**e$ an identifyin& relationshi".
This is ca**e$ identification de"endency a specia*ise$ /in$ of e;istence
$epen$ency Chen there is a min car$ina*ity of 1D the ea/ entity is $epen$ent on
the i$entifyin- re*ationship. The ea/ entity a*so borros part of its i$entifier from
the other entity.
5nother type of situation in hich you can have a ea/ entity is hen you have an
entity type that is c*ose*y associate$ ith another entity type an$ in fact $oes not have
a separate i$entity. #f e say that an Emp*oyee has an Office an$ an Office is in a
Bui*$in- Office 6 Bui*$in- are ne entity types. Every Office is in a bui*$in-= so
Office is a ea/ entity ith an i$entifyin- re*ationship beteen it an$ Bui*$in-. See
the Bui*$in-+1oom e;amp*e on p-."17 of 'annino.
6.% #ssociati3e 'ntities
Loo/ a-ain at the computer maintenance $ia-ram.
The maintains re*ationship itse*f has attributes because e nee$ to /no hen the
maintenance happene$= as e** as hich staff member $i$ it.
So the Date is an attribute of the maintains re*ationship. #f the staff member recor$s
some notes about the maintenance= e mi-ht a*so have a Notes attribute.
'any+to+many C'+ND an$ 1+to+many C1+'D re*ationships can have attributes.
5$$ them by puttin- a connectin- *ine from the re*ationship name to each attribute.
1e*ationship attributes are associate$ ith the re*ationship on*y= not ith 3ust one of
the entities.
%e can chan-e the re*ationship into an entity= but ma/e it a 4ea5 entity because the
i$entifier for instances of the ea/ entity consists of i$entifiers from both the other
entities. 5n entity shou*$ have a noun name so eJ** ca** it 'aintenance.
/G Of the other attributes= $o you thin/ either of them i$entifies instances&
,a-e .2 of 87
O,E15T#N<RSHSTE'
Operatin-System
)
has
CO',4TE1
Computer#D
Name
ST5FF
Staff#D
Name
FathersName
maintains
Date Notes
FBE Computer Science Department Lecture Notes Theory of Databases
#f no anser forthcomin- su--est that a computer is maintaine$ on*y once in a
-iven $ay.
#G Date.
%e a*so have to chan-e the car$ina*ities an$ participation for Staff an$ Computer ith
the ne entity.
%hat e have $one no is to rep*ace the many+to+many re*ationship beteen Staff
an$ Computer ith " 1+many identifyin& relationshi"s an$ an associati3e entity.
The 'aintenance entity associates the other " entities an$ a*so i** -et its primary
/ey as a combination of the other primary /eys.
Other notationG some notations $epict an associative entity as a $iamon$ insi$e a
rectan-*e.
',erciseG *et us say that hen an Emp*oyee or/s on a pro3ect= he?she is on the
pro3ect for a fi;e$ *en-th of time so the assi-nment has a start $ate an$ a finish $ate.
5men$ your $ia-ram to sho this you i** use an associative entity for the
re*ationship= ca** it 5ssi-nment.
5ssociative entities can be use$ to associate more than " entities N+ary
re*ationships. These are rare but they $o occur sometimes ternary re*ationships=
beteen ) entities can happen.
9or e,a!"leG ta/e the 5ssi-nment associative entity. #f e a*so say that 1o*e is an
entity in the system e.-. ,ro3ect 'ana-er= 1eAuirements Coor$inator. %hen an
Emp*oyee is assi-ne$ to a ,ro3ect= he?she is assi-ne$ in a particu*ar 1o*e. So
5ssi-nment is no a )+ay re*ationship. #t -ets its primary /ey from a** ) entities.
See a*so 'annino p- """= ,art+Supp*ier+,ro3ect e;amp*e. Note that a ternary
re*ationship can sometimes be rep*ace$ ith " 1+' re*ationships instea$.
6.- Ceneralisation Eierarchies
Some entity types can be c*assifie$ into $ifferent sub+cate-ories.
For e;amp*e some or-aniBations have sa*arie$ emp*oyees an$ vo*untary emp*oyees.
%e can sho this in an E+1 $ia-ram usin- a &eneralisation hierarchy.
Let us say there are " types of Emp*oyee entities Sa*aryEmp an$ @o*untaryEmp.
Sa*arie$ emp*oyees are pai$ a sa*ary hi*e vo*untary emp*oyees receive a $ai*y
a**oance. Other attributes= *i/e #D an$ name= are common to both types.
Depict this as a hierarchy in the E+1 $ia-ram L$ra $ia-ram be*oM.
,a-e .1 of 87
CO',4TE1
Computer#D
Name
ST5FF
Staff#D
Name
FathersName
'5#NTEN5NCE
Date
Notes
FBE Computer Science Department Lecture Notes Theory of Databases
Emp*oyee entity type is the su"erty"e or "arent. Sa*aryEmp an$ @o*untaryEmp are
subty"es or children.
%e can say that a Sa*aryEmp is an Emp*oyee an$ a @o*untaryEmp is an Emp*oyee
this type of re*ationship is often ca**e$ #S+5.
The common attributes are inherited by the subtypes e.-. Emp*oyee#D a*so app*ies
to Sa*aryEmp an$ to @o*untaryEmp.
Each subtype can a*so have its on $irect attributes Sa*ary= Dai*y5**oance.
The D an$ the C in$icate some constraints on the
D is for a Dis3ointness constraintG hen subtypes in the hierarchy $o not have any
entity instances in common.
This one is $is3oint because an Emp*oyee cannot be both a Sa*aryEmp an$ a
@o*untaryEmp.
C is for a Comp*eteness constraintG it means that every entity instance of a supertype
must be an entity instance in one of the subtypes.
This one is comp*ete.
#f e sai$ that some Emp*oyees can be both types= then the hierarchy is not $is3oint
it is overlapping.
#f e sai$ that some Emp*oyees are neither types= then it is not comp*ete e.-. some
emp*oyees are vo*untary but $o not ta/e an a**oance.
The hierarchy can be e;ten$e$ to have more *eve*s.
6.1) #""roach to '($ Dia&ra!s
To he*p you in $rain- E+1 $ia-rams= consi$er these points Cprinte$ on bac/ of
>an$out !DG
9ey thin- isG /eep Thin-s C*ear an$ Simp*e
1. Each 'ntity Ty"e shou*$ mo$e* on*y one concept remember that an Entity
Type is a collection of entities that share common characteristics.
". 5 $elationshi" shou*$ mo$e* one interaction between /ntity ypes.
,a-e ." of 87
E',LOHEE
Emp*oyee#D
Emp*oyeeRName
5$$ress
S5L51HE',
Sa*ary
@OL4NT51HE',
Dai*y5**oance
D>C
FBE Computer Science Department Lecture Notes Theory of Databases
). #ttributes shou*$ mo$e* simp*e concepts. This means that attributes shou*$
not be multi-valued Ci.e. possib*e for one instance to have mu*tip*e va*ues for
the attributeD or be structured Chave $ifferent partsD.
!. #f an attribute is !ulti(3alued= it can be chan-e$ to a ea/ entity that is
connecte$ to the ori-ina* entity type by a re*ationship connector.
1ecap steps to -o throu-hG
1. #$entify entity types if you have DFDs= these are a -oo$ startin- point $ata
stores 6 e;terna* entitiesK rea$ throu-h your $escriptions of the system an$
un$er*ine the nouns these may in$icate entity types
". #$entify attributes of the entity types from your $escription of the system
). Se*ect the i$entifier for each entity type
!. #$entify re*ationships beteen entity types a-ain= from $escription of the
system
No *etJs a$$ Lprompt stu$ents a$$ ea/ entities for hatN.K a$$ associative
entities for hatN.MG
(. 5$$ ea/ entities for mu*ti+va*ue$ attributes an$ i$entification+$epen$ent
entities
.. 5$$ associative entities for '+N re*ationships
These steps are a -ui$e you mi-ht i$entify more entity types as you *oo/ at
re*ationships= for e;amp*e.
There are too*s avai*ab*e for creatin- E+1 $ia-rams e on*y have the E+1 5ssistant
here= hich is Auite simp*e an$ easy to use. Softare *i/e 'S @isio inc*u$e temp*ates
ith the E+1 symbo*s.
1ea$in-G 'annino a fu** e;amp*e= -oin- throu-h a** the steps section 0.(= p- ")2.
Note in particu*ar the Comp*eteness 6 Consistency Chec/s Cp- ")0D.
6.11 Con3ert an '($ Dia&ra! to a $elational Database Desi&n
NoNonce you have a nice E+1 $ia-ram= hat $o you $o ith it&&
5nserG you use it to $esi-n your $atabase. %eJ** *oo/ at ho to convert to a
re*ationa* $atabase.
Crou" acti3ity
Loo/ at your Computers E+1 $ia-ram.
%or/in- in -roups of )+! can you $efine some ru*es for convertin- the $ia-ram to a
re*ationa* $atabase&
Thin/ aboutN.ho to i$entify the re*ations= the attributes in the re*ations= the primary
/eys= the forei-n /eys.
5**o ( minutes for stu$ents or/in- to-ether.
Then each -roup turn to a -roup besi$e them an$ put your ru*es to-ether.
5s/ $ifferent -roups for a ru*e eachNput on boar$.
Tie into the fo**oin-.
,a-e .) of 87
FBE Computer Science Department Lecture Notes Theory of Databases
1. $e"resent entities each norma* entity type becomes a re*ation= ith the
i$entifier bein- the primary /ey an$ the other attributes bein- non+primary+
/ey attributes of the re*ation. For a ea/ entity $oes have a re*ation= but i**
a$$ to the ,9 *ater. For sub+types $o have a re*ation but i** a$$ to the ,9
*ater. On*y create re*ations for parent entity types at this sta-e.
E-sG Staff= Computer= Operatin- System Cea/ entityD
". $e"resent 1(Many relationshi"s D the "ri!ary 5ey of the 1 side beco!es a
forei&n 5ey in the !any si$e Ce.-. 1 Dept has many Emp*oyees Dept#D is
F9 in Emp*oyee re*ationD.
#f 'in car$ina*ity O 1 on the 1 si$e then the F9 cannot a**o nu** va*ues Ce.-.
an Emp*oyee must be in 1 Dept= so the Dept#D F9 cannot a**o nu**sD
Consi$er this e;amp*eG
teaches
Course
CourseCode
CourseName
Instructor
InstructorID
InstName
4sin- this ru*e= the ,9 of #nstructor becomes a F9 in Course.
ButNthe re*ationship is optiona* for Course Cmin car$ O 2 on 1 si$eD so a
Course can have no #nstructor Ce.-. #CT"2" is the pro3ect course there is no
instructor assi-ne$ to itD. This means the F9 can have nu** va*uesN.
Some $b $esi-ners prefer to not have a situation here nu**s are necessary.
FG Can you thin/ ho to a$$ress this&
5G create a ne tab*eG Course#nst CCourseCo$e= #nst#DD CourseCo$e is the
,9= #nst#D is a F9.
This a$$s another tab*e to the $b so Aueries to -et $ata about instructors an$
courses no have to 3oin ) instea$ of " tab*es more comp*e;.
Some $esi-ners /eep the F9 in Course an$ a**o nu**s or you cou*$ have a
$efau*t va*ue of 2 that you $eci$e means no instructor is assi-ne$.
# ou*$ /eep the F9 an$ a**o nu**sN.but it $epen$s on the system.
The ru*e isG
). 1"tional rule for 1"tional 1(Many $elationshi"s if the min car$ina*ity on
the 1 si$e is 2= then the F9 can a**o nu** va*ues. Can avoi$ this by creatin- a
ne re*ation ith ,9 bein- the ,9 of the entity type on the ' si$e. ,9 of
the 1 si$e is F9 in the ne re*ation.
!. $e"resent Many(Many relationshi"s ?associati3e entities@ each one
becomes a separate re*ation. The ,9 is a combination of the ,9s in each of the
entity types in the re*ationship an$ maybe another attribute of the re*ationship
E;amp*esG 'aintenance ,9 is Computer#D= Staff#D an$ Date. Emp+,ro3ect
5ssi-nment entity ,9 is Emp#D= ,ro3ectCo$e Cas an Emp*oyee is assi-ne$
to a particu*ar pro3ect one time on*y= no nee$ for another attribute in the ,9D.
(. $e"resent identifyin& de"endencies Cso*i$ connectin- *inesD each one
a$$s an attribute to the primary /ey ,9 of the connecte$ entity type e.-.
Operatin-System a$$ Computer#D to the ,9.
,a-e .! of 87
FBE Computer Science Department Lecture Notes Theory of Databases

',ercises to do in classG
#f havenJt a*rea$y $one so convert the Emp+Dept $ia-ram on han$out !.
'annino p- "). Cfi-ure 0.)"D= p- ")7 Cfi-ures 0.)) 6 0.)(D
.. CeneraliFation hierarchy the parent entity type an$ each sub+type becomes
a re*ation. The sub+type re*ations inherit the ,9 from the parent but the other
inherite$ attributes appear on*y in the parent.
The ,9 of a sub+type entity is a F9 to the parent re*ation.
D,C
Employee
EmpID
Emp_FathersName
Emp_FirstName
SalaryEmp
(EmpID)
Salary
(Emp_FathersName)
(Emp_FirstName)
VoluntaryEmp
(EmpID)
DailyAllowance
(Emp_FathersName)
(Emp_FirstName)
,arentheses sho inherite$ attributes from E1 5ssistant no nee$ to sho on
$rain- of this $ia-ram.
<et these tab*es Cas/ stu$ents hat type of re*ationships they thin/ there are
shou*$ be 1+1DG
6.12 Nor!alisation of the Data Model
This i** be covere$ in #CT)(" an$ a*so in this semesterJs S5D course.
6.1 +hysical Database Desi&n
Data types etc
Lchec/ ith 5nthoni are they $oin- this in S5D& #f so= i** not $o in this course
i** focus instea$ on in$e; fi*e structuresM
6.14 :ML
* +hysical Database Desi&n
,a-e .( of 87
Emp*oyee
Emp#D
EmpRFathersName
EmpRFirstName
Sa*aryEmp
Emp#D
Sa*ary
@o*untaryEmp
Emp#D
Dai*y5**oance
FBE Computer Science Department Lecture Notes Theory of Databases
1b2ecti3eG by the en$ of this unit= stu$ents i** /no hat tas/s are invo*ve$ in
physica* $atabase $esi-n.
$eadin& !aterialG
+re"aration:
*.1 13er3ie4 D +hysical D8 Desi&n
%hen you convert an E+1 mo$e* to a re*ationa* $atabase schema= you have the tab*es
6 co*umns= ,9s an$ F9s.
5s e** as the *o-ica* $atabase $esi-n Cfrom E+1D= there are other types of information
reAuire$ to comp*ete the physica* $atabase $esi-n.
These areG
Norma*ise$ re*ations= inc*u$in- estimates of the vo*ume of $ata Cnumber of
rosD that i** be store$ in each re*ation
0e are not covering normalisation on this course 1 you should be covering in
S&2 and also on next year,s IC!"# 3&dvanced 2atabases4 course*
5ormalisation is to convert complex data structures to more simple, stable
structures, with no data redundancies*
Definitions of each attribute
ype of data 3text, number, date etc4, number of characters allowed, does it
allow null values and so on*
o the $ata types Le i** be *earnin- SFL $ata typesM
o the entity inte-rity constraints C,9s an$ uniAueness constraintsD
o the referentia* inte-rity constraints CF9sD
o $efinitions of any tri--ers necessary
o va*i$ation ru*es + hich in SFL can be $one ith chec/ constraints
Descriptions of here an$ hen $ata are use$ + entere$= retrieve$= $e*ete$=
up$ate$ inc*u$in- freAuencies
6or a small system, this may not be necessary but for a large system, it will
help later on e*g* when deciding on indexes*
E;pectations an$?or reAuirements for response time
7ow fast does the db have to respond 1 this will affect what indexes you
choose
9no*e$-e an$ un$erstan$in- of the techno*o-ies to be use$ for fi*e stora-e
an$ for the DB'S
If you have understanding of the file % data structures that a 289S uses, it
will help you to understand your database better*
L'ay be ab*e to re*ate these to hat SFL has been covere$ in the *abs if the Tech
5ssistant has covere$ the SFL for tab*es= they i** /no ho to create $ifferent
constraints C,9= F9= uniAue= chec/D.M
,a-e .. of 87
FBE Computer Science Department Lecture Notes Theory of Databases
Every fie*$ must have a $ata type. The $ata type is a co$in- scheme use$ by the
DB'S to represent $ata. The $ata type $etermines hat are the va*i$ va*ues for the
fie*$ an$ a*so certain ru*es of behaviour for the fie*$.
#t a*so $etermines ho much space the fie*$ ta/es up on $is/= for each ro of $ata.
Different DB'S app*ications offer $ifferent $ata types.
For e;amp*e= for a numerica* va*ue= 'S 5ccess has types such as *on- inte-er an$
$oub*e.
*.2 Choosin& Data Ty"es
Every fie*$ must have a $ata type. The $ata type is a co$in- scheme use$ by the
DB'S to represent $ata. The $ata type $etermines hat are the va*i$ va*ues for the
fie*$ an$ a*so certain ru*es of behaviour for the fie*$.
#t a*so $etermines ho much space the fie*$ ta/es up on $is/= for each ro of $ata.
Different DB'S app*ications offer $ifferent $ata types.
For e;amp*e= for a numerica* va*ue= 'S 5ccess has types such as *on- inte-er an$
$oub*e.
%hen se*ectin- $ata types= you nee$ to ba*ance these ( ob3ectivesG
'inimise stora-e space
1epresent a** possib*e va*ues for the fie*$
#mprove the $ata inte-rity of the fie*$
Support a** $ata manipu*ations reAuire$ on the fie*$
Ensure that the $ata type i** be suitab*e for future as e** as present nee$s
e.-. that the ran-e a**oe$ by a number fie*$ covers future -roth in va*ues.
Some DB'S pac/a-es provi$e other capabi*ities for certain $ata types. E;amp*es are
ca*cu*ate$ fie*$s an$ co$in-?compression techniAues.
*.2.1 Calculated 9ields
5 ca*cu*ate$ fie*$ is one here the va*ue is $erive$ from other fie*$ va*ues. So a
formu*a can be specifie$ for the fie*$. Some DB'S app*ications a**o a ca*cu*ate$
fie*$ to be e;p*icit*y $efine$ a*on- ith other ra $ata fie*$s.
*.2.2 Codin&GCo!"ression
#f the va*ues in a fie*$ are from a *imite$ ran-e Cnumber or characterD= consi$er
assi-nin- a co$e to each va*ue. So= for e;amp*e= if manufacture$ furniture items are
bein- store$ in a $atabase tab*e= an$ one attribute is the oo$ from hich the item is
ma$e. The possib*e va*ues may be birch= oa/= maho-any= pine= an$ euca*yptus. These
va*ues reAuire a character fie*$ of *en-th 12. >oever= if a co$e= e.-. a sin-*e *etter or
a number is assi-ne$ to each one= the fie*$ $ata type can be character of *en-th 1 or
inte-er. The stora-e space reAuire$ for the fie*$ is thus re$uce$.
This can a*so have $isa$vanta-es users may not reco-nise the co$e va*ues= so they
nee$ to be $eco$e$ by pro-rams that rea$ them.
*. Controllin& Data Inte&rity
'ost DB'S app*ications provi$e further means of contro**in- the inte-rity of $ata.
Some of these are as fo**os.
,a-e .0 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
*..1 Default 3alues
5 $efau*t va*ue is a va*ue that i** be assi-ne$ to a fie*$ if no e;p*icit va*ue is
provi$e$. #t is a*so usefu* hen a fie*$ often has the same va*ue.
+icture Control ?+atterns@
,icture contro*s a**o for a pattern to be specifie$ for a fie*$. The pattern can specify=
for e;amp*e= that the character in position 1 must be in the ran-e 5+^ an$ that the
secon$ character must be a $i-it. These can a*so be use$ to format currency va*ues.
*..2 $an&e Control
%hen the va*ues a**oe$ for a fie*$ must be in a specifie$ ran-e= the ran-e can be
$efine. For e;amp*e= here a number must be in the ran-e 2 to 122 or a month fie*$
must have the va*ues :an= Feb= 'ar etc.
*.. $eferential Inte&rity
1eferentia* inte-rity contro*s can be use$ here there is cross+referencin- beteen
attribute va*ues in $ifferent re*ations. This is usua**y use$ for forei-n /ey attributes
i.e. the va*ues in the forei-n /ey fie*$ must e;ist in the re*ate$ primary /ey fie*$.
*..4 Null Halue Control
5 fie*$ can be specifie$ to a**o nu**s or not this $epen$s on the nature of the
attribute. For e;amp*e= hen enterin- a ne customer into a system= the customer
name shou*$ be /non= so this fie*$ shou*$ not a**o nu**s. >oever= it is reasonab*e
to e;pect that the customer]s phone number may not yet be /non= so this fie*$ can
a**o nu**s.
% Inde,in&
1b2ecti3eG by the en$ of this unit= stu$ents i** have *earne$ hat an in$e; is= hy
in$e;es are important= ho in$e;es are imp*emente$ an$ ho to choose in$e;es in a
$atabase.
$eadin& !aterialG 'annino chapter 12K Si*berschatB et a* chapter 11 Cfor revision of
physica* stora-eD an$ chapter 1" C#n$e;in- 6 >ashin-D
4sefu* a*so for stu$ents to revie their notes from the Data Structures course=
particu*ar*y binary search trees.
SFL Server Boo/s On*ine *oo/ up indexes Con the #n$e; tabD= rea$ the architecture
topic -ives some information on ho SFL Server mana-es in$e;es= mentions B+
trees.
5*so use the search tab e.-. search for Qb+treeJ brin-s up severa* resu*ts that i** -ive
you more information on ho SFL Server uses B+trees.
+re"aration
,rint?copy >an$out 7 for the section on Fi*e Structures
,rint ?copy >an$out 8 for the section on #n$e;+SeAuentia* Fi*e Or-aniBation
,rint?copy >an$out 12 *ab or/sheet for tab*es?in$ices for the section on %hy
4se #n$e;es.
,a-e .7 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
,rint?copy >an$out 11 AuiB to use in c*ass on comp*etion of the unit.
5*ternative*y= you can 3ust ca** out the Auestions an$ stu$ents can rite them into their
on notes.
%.1 +hysical 3s Lo&ical 3ie4s of data
5t an even *oer *eve*= the actual "hysical stora&e of the $ata on the $is/ is a*so
important. >oever= noa$ays= the DB'S you use i** ta/e care of this for you
base$ on physica* $b $esi-n you specify Cconstraints= $ata types etc as mentione$
ear*ierD.
For this course= e are -oin- to *oo/ at some fi*e structures that are use$ by DB'Ss
for $ata stora-e if you reca** your Computer Or- 6 5rchitecture course= you *earne$
about $ifferent types of stora-e ma-netic $is/= 15' an$ so on.
%hen you $esi-n a re*ationa* $atabase= you specify the re*ations Ctab*esD an$ the
attributes Cco*umnsD in them.
Data is then inserte$ into the tab*es by some means CSFL insert= import to the
$atabase= usin- a $ata entry too*D.
The $b user has a *o-ica* vie of that $ataN.but the ay the $ata is actua**y store$ on
the physica* $is/ may not be the same.
1emember that in a DB'S= the stora-e= retrieva* an$ manipu*ation of $ata shou*$ be
in$epen$ent of the interna* structures this means e can or/ ith the *o-ica* vie
of the $ata ithout havin- to /no how the DB'S stores the $ata on the physica*
$is/.
%e have a*so $iscusse$ the concept of data inde"endence that app*ications usin-
the $ata shou*$ be separate from the $ata structures an$ $ata stora-e.
ButNfor this course= you i** *earn ho this physica* stora-e re*ates to the *o-ica*
vie.
%e i** a*so consi$er ho physica* stora-e affects spee$ ho fast recor$s can be
retrieve$ from the $is/.
L'annino= ,- )10 usefu* $ia-ramsM
1eca** Cfrom Computer Or-aniBation 6 5rchitectureD a bloc5= or "hysical record=
is the sma**est amount of $ata that can be transferre$ beteen secon$ary stora-e an$
primary stora-e in a sin-*e access.
+ri!ary stora&e can be $irect*y accesse$ by the C,4K -ives fast access but *o
capacity. @o*ati*e main memory.
.econdary stora&e ma-netic?optica* $is/= tapes typica**y s*oer access but hi-h
capacity. Stab*e stora-e.
B*oc/s or physica* recor$s are or-aniBe$ into fi*es on the $is/.
Lo-ica* recor$ a ro in a tab*e.
One ,1 cou*$ contain severa* L1s from one tab*e.
Or an L1 cou*$ be containe$ in " or more ,1s.
Or a ,1 cou*$ contain severa* L1s from $ifferent tab*es.
But typica**y a ,1 contains severa* L1s.
,a-e .8 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
Typica* siBe of a ,1G a number of bytes from a poer of " e.-. 12"! bytes C1/b= "_
12D or !28. C!/b= "_1"D.
The OS *oo/s after the fi*e or-aniBation on $is/. The DB'S /nos about the *o-ica*
vie of the $ata. The " OS 6 DB'S must or/ to-ether to transfer the bytes of
$ata to app*ications hen nee$e$.
5pp*ications i** as/ for $ata in *o-ica* recor$s in $ifferent aysG
SeAuentia* retrieva* e.-. a** the ros in a tab*e
1etrieva* by a search /ey e.-. a** Customers hose first name is Te/*e or a**
Customers or$ere$ by the CustomerName.
The $ifferent app*ications runnin- un$er the OS usua**y have their on areas of
memory= ca**e$ buffers. Bytes of $ata for *o-ica* recor$s must be transferre$ to the
appropriate buffer hen nee$e$.
#f $ata is reAueste$ an$ it is a*rea$y in the buffer= it $oes not nee$ to be transferre$
a-ain.
The overa** performance of app*ications usin- the $b i** $epen$ on a number of
factors. Base$ on hat e have been ta*/in- about= these factors areG
,hysica* recor$ transfers Cto buffersD
C,4 operations?cyc*es CreAuire$ to $o thin-s *i/e sortin- the ros in a tab*e to
or$er by a particu*ar co*umnD
'ain memory Cho much memory is avai*ab*eD
Dis/ space Cho much $is/ space is avai*ab*eD
<enera**y= you are or/in- ith fi;e$ amounts of memory an$ $is/ space. But you
can increase them by -ettin- a ne ,C or a$$in- to the e;istin- one.
%hat e can attempt to contro* is the number of physica* recor$ transfers by tryin-
to or-aniBe the $ata on $is/ in a ay that minimises ho much has to be transferre$
for the types of reAuest our app*ications ma/e to the $atabase.
%e can a*so try to minimise the C,4 operations reAuire$ by consi$erin- hat the
most freAuent Aueries are on the $ata an$ ho the $ata shou*$ be or$ere$ on the $is/.
',a!"leG consi$er the fact that the *o-ica* recor$s can be or$ere$ on the $is/= in the
physica* recor$s= on*y one ay e.-. in or$er of Customer#D or CustomerName but
not by both. But app*ications i** ant to -et the Customer $ata in $ifferent or$ers for
$ifferent purposes or they i** ant to -et one customer -iven a Customer#D va*ue or
a CustomerName va*ue. #f it is more common to Auery the Customer $ata by the
CustomerName= then it ou*$ ma/e more sense to have the physica* or$er be by
CustomerName= even if Customer#D is the primary /ey.
This is hy e have in$e;es e can in$e; a tab*e by the primary /ey an$ a*so by
other co*umns.
%.2 9ile .tructures
#n or$er to -et an un$erstan$in- of in$e;es hy they are use$ an$ ho they are
imp*emente$= e i** start by *oo/in- at some $ifferent fi*e structures that can be
use$ by DB'Ss.
,a-e 02 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
Hou i** be fami*iar ith some of these from your Data Structures 6 5*-orithms
course.
%e i** *oo/ at these types of fi*e or-aniBationG
SeAuentia* fi*es
>ash fi*es
B trees
Distribute >an$out 7 te** stu$ents to use $urin- c*ass= they shou*$ try to fi** in the
spaces by *istenin- to hat you are sayin-. The purpose of this is "+fo*$ 1= to -ive
them a note+ta/in- ai$ an$ "= to -ive them a chance to practice Qactive *istenin-J= an
important s/i** they shou*$ be ab*e to *isten to hat you are sayin- an$ pic/ out the
important points.
%.2.1 .e=uential files
:nordered se=uential file
Simp*est fi*e structure *o-ica* recor$s store$ in the or$er in hich they are inserte$
Ne recor$s appen$e$ to the en$ of the fi*e
The fi*e is unor$ere$ because there is no or$erin- base$ on va*ues in the recor$s e.-.
base$ on CustomerName. But if each ne Customer is -iven the ne;t Customer#D in
a numerica* seAuence= then the fi*e happens to be or$ere$ by the Customer#D.
5*so ca**e$ a hea" fi*e.
De*ete a *o-ica* recor$ *eaves a space that can be fi**e$ by a ne recor$. " ays to
approachG
/eep a$$in- recor$s at the en$ of the fi*eK perio$ica**y reor-aniBe$ the fi*e to
free up the space.
mar/ the $e*ete$ recor$ as free spaceK hen a$$in-= *oo/ for free space an$ fi**
it.
#d3anta&eG fast insertion Cat the en$= or into spaces create$ by $e*ete$ recor$sDK fast
to retrieve in or$er in hich recor$s ere entere$.
1rdered se=uential file *o-ica* recor$s arran-e$ in or$er of a /ey Cone of the
co*umnsD. 4sua**y the /ey is the ,9 co*umn but not a*ays.
#d3anta&eG fast retrieva* if retrievin- a subset of recor$s or$ere$ by the /eyK fast for
seAuentia* searches
Disad3anta&eG s*o insertion= because recor$s have to reor$ere$ to /eep the /ey
or$er. E.-. if /ey is CustomerName= hen a ne Customer recor$ is inserte$= have to
put it in the ri-ht position for the name va*ue so the recor$s are sti** or$ere$ by the
name.
%.2.2 Eash files
Le;pect that stu$ents /no hash fi*es a*rea$y from the Data Structures courseM
SeAuentia* fi*e access is not fast hen you ant to access in$ivi$ua* recor$s by /ey
va*ue.
5 hash fi*e consists ofG a *ist of /ey va*ues 6 recor$ a$$resses CpointersD an$ the
recor$s.
5 hash function is app*ie$ to each /ey va*ue to -ives a physica* recor$ a$$ress here
the information associate$ ith the /ey is store$.
,a-e 01 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
'o$ is an e;amp*e of a simp*e hash function.
Collisions can occur here to /eys hash to the same physica* a$$ress. #f there is
not enou-h space in the physica* recor$ for both *o-ica* recor$s= there is a prob*em.
This reAuires co**ision han$*in- for e;amp*e= puttin- the *o-ica* recor$ in the ne;t
avai*ab*e physica* recor$ space Cthis is ca**e$ *inear probe co**ision han$*in- there
are other techniAuesD.
The hash function can be chosen to minimise co**isions e.-. usin- a prime number for
the mo$ function.
5 fi*e that is not fu** is *ess *i/e*y to have co**isions.
#f the fi*e -ets fu**= reor-aniBation is necessary to insert a** the *o-ica* recor$s into a
ne= bi--er hash fi*e.
To avoi$ this= $ynamic hash fi*es can be use$. 5 $ynamic hash fi*e can -ro
automatica**y. The space a**ocate$ for the hash fi*e is $ivi$e$ into buc/ets a buc/et
can ho*$ mu*tip*e *o-ica* recor$s.
#f a buc/et is fu**= it is sp*it into " buc/ets an$ the recor$s $istribute$ over the "
buc/ets.
#d3anta&eG fast for insertion 6 retrieva* Cif there are no co**isionsDK fast for searches
by the /ey va*ue e.-. fin$ a Customer -iven the Customer#D= if the Customer#D is the
/ey va*ue.
Disad3anta&eG not so fast for seAuentia* searches e.-. -et a** Customers or$ere$ by
the Customer#D or the CustomerName. This is because a -oo$ hash function ten$s to
sprea$ the *o-ica* recor$s uniform*y across the physica* recor$s in the hash fi*e.
So many physica* recor$ accesses are reAuire$ each access costs in terms of
resources C*i/e C,4 6 transferrin- to buffersD.
%. Inde,(.e=uential 9ile 1r&aniFation
%hat is rea**y necessary in a DB'S is fast seAuentia* an$ /ey access to $ata.
SeAuentia* fi*es 6 hash fi*es ays of or-aniBin- $ata on $is/.
To -et faster access= e use somethin- ca**e$ an in$e;.
%e are -oin- to ta*/ aboutG
%hat is an in$e;&
>o are in$e;es imp*emente$ by a DB'S&
%hy are in$ices important in a $atabase&
%..1 6hat is an inde,7
Thin/ of an index in a book you can *oo/ up a or$ in the in$e; an$ it -ives you a
pa-e number or numbers to *oo/ at. #n terms of fi*e structures= an in$e; is simi*ar it
is a *ist of /ey va*ue an$ a$$resses for recor$s.
'ore forma**yG
an inde, is a structured collection of 5ey 3alue 0 address "airs. The "ur"ose of
an inde, is to facilitate access to a collection of records.
L1efer to >an$out 8= fi-ure 1 6 fi-ure " an e;amp*e $atabase tab*e an$ an in$e; on
it.M
,a-e 0" of 87
FBE Computer Science Department Lecture Notes Theory of Databases
%..2 +ri!aryGClusterin& Inde,
1i-ht si$e of fi-ure " an or$ere$ seAuentia* fi*e of the Customer $ata or$ere$ on
Customer#D.
%e can have an in$e; for the fi*e the in$e; consists of a *ist of possib*e va*ues for
the co*umn that is use$ to or$er the fi*e i.e. Customer#D. The va*ues in the in$e; are
a*so store$ in or$er. This is shon at the *eft of Fi-ure ".
Ca** the Customer#D va*ues the search 5eys. Each search /ey is paire$ ith a pointer
to the recor$ that has that Customer#D va*ue.
+ointerG consists of the identifier of a disk block an$ an offset within the disk block to
i$entify the *o-ica* recor$ ithin the b*oc/.
Because Customer#D is a can$i$ate /ey Ca*so happens to be the primary /eyD= each
search /ey in the in$e; correspon$s to on*y 1 *o-ica* recor$ in the $ata fi*e.
0hat if the search key is not a candidate key& Can have many recor$s ith that /ey
va*ue so the pointer is to the first recor$ in the 1
st
b*oc/ containin- recor$s ith that
/ey va*ue. Because the fi*e is an or$ere$ seAuentia* fi*e= a** recor$s for that /ey va*ue
are in seAuence.
#f a fi*e is or$ere$ by a non+can$i$ate /ey= it is usua**y or-aniBe$ so that *o-ica*
recor$s ith $ifferent /ey va*ues are in $ifferent b*oc/s.
The in$e; fi*e itse*f can a*so be store$ as a separate seAuentia* fi*e but it ta/es up a
*ot *ess space than the fi*e containin- the $ata so it ta/es up feer b*oc/s.
L$urin- this session= you can test ho e** the stu$ents reca** hat they *earne$ in the
Data Structures 6 5*-orithms course by as/in- some Auestions about the a*-orithms
6 their comp*e;ityM
/G hat a*-orithm cou*$ be use$ to search the seAuentia* in$e; fi*e&
#G 5 binary search algorithm.
/G %hat is the Bi-+O comp*e;ity of a binary search a*-orithm&
#G O C*o- nD comp*e;ity
#f e say n is the number of b*oc/s Cphysica* recor$sD that the fi*e is store$ in= n is
sma**er for the in$e; fi*e than for the $ata fi*e. So it is faster to search the in$e; fi*e.
#f a fi*e occupies b b*oc/s= a binary search nee$s to rea$ up toG
*o-
"
CbD b*oc/s C means roun$ up i.e. cei*in-D.
#f the search /ey is the va*ue that or$ers the seAuentia* fi*e= the in$e; is a "ri!ary
inde, C$ifferent to primary /eyD.
5*so ca**e$ a clusterin& inde,. #n other or$s the physica* or$er of recor$s on $is/
is the same as the or$er of va*ues in the in$e;.
',erciseG ta/e a fe minutes to rea$ >an$out 8 E;amp*e 1.
1evieG in this e;amp*e= the number of in$e; entries Cr
i
D is eAua* to the number of
b*oc/s in the $ata fi*e. This type of in$e; $oes not have an entry for every possib*e
search /ey va*ue. This is ca**e$ a s"arse in$e; see Fi-ure ) on >an$out 8. The
,a-e 0) of 87
FBE Computer Science Department Lecture Notes Theory of Databases
opposite= here the in$e; has an entry for every possib*e search /ey va*ue= is a dense
in$e;. %eJ** ta*/ more about these *ater.
'a/e sure a** stu$ents have un$erstoo$ the improvement in performance by as/in-
these AuestionsG
aD %hich is faster to search the or$ere$ $ata fi*e or to search the in$e; fi*e&
Lin$e;M
bD %hy& Lsma**er number of b*oc/ accesses reAuire$M
cD 1eca*cu*ate if the in$e; is $ense i.e. has an entry for every search /ey= hat
is the number of b*oc/ accesses reAuire$&
bfr
i
O .7 Cas beforeD
r
i
O )2=222 Cnumber of in$e; entriesD
b
i
O )2=222?.7 O !!" Cnumber of b*oc/s reAuire$D
*o-" C!!"D O 8 C*oc/ access to search the in$e;D
8 E 1 O 1) Cto access the $ata b*oc/D
$D Base$ on the anser to CcD= hich $o you thin/ is better a $ense or a sparse
in$e; ith a pointer to each b*oc/ of the $ata fi*e& L$ense in$e; is faster than
searchin- the $ata fi*e but on*y 3ustN.sparse in$e; is betterM
%.. .econdaryGNon(clusterin& Inde,
L1efer to >an$out 8= fi-ure ! note that one of the in$e; va*ues has " pointers from
itM
Can have an in$e; on a search /ey that is not the va*ue that the recor$s are or$ere$ by.
E.-. an in$e; on CustomerName. Each CustomerName points to a** the recor$s that
have that name va*ue. This is a secondary or non(clusterin& inde,.
Consi$er the improvement in searchin- this -ives usN.
/G Bi-+O comp*e;ity for a *inear search on the $ata fi*e&
#G OCnD C*inear comp*e;ityD= nOnumber of b*oc/s
Cou*$ have to rea$ b?" b*oc/s on avera-eD
But a binary search on the secon$ary in$e; i** be much faster.
',erciseG ta/e a fe minutes to rea$ >an$out 8 e;amp*e ".
Shos that there is a -reater improvement than that of a primary in$e; over binary
search of the $ata fi*e.
5 seAuentia* $ata fi*e can have more than one in$e; but on*y one of them can be a
primary in$e; because the *o-ica* recor$s can be in b*oc/s in one or$er on*y.
%..4 Dense 0 ."arse Indices
There are " types of or$ere$ in$e;G
Dense inde,G every search /ey va*ue has an entry in the in$e;. To fin$ a *o-ica* $ata
recor$ base$ on a /ey va*ue= fin$ the /ey va*ue in the in$e;= then fo**o the pointer to
the b*oc/ 6 recor$.
."arse inde,G the in$e; has entries on*y for some of the search /ey va*ues. To fin$ a
*o-ica* $ata recor$ base$ on a /ey va*ue= fin$ the *ar-est /ey va*ue that is *ess than or
eAua* to the search va*ueK fo**o the pointer to the b*oc/ an$ then fo**o the pointers
insi$e the fi*e unti* the *o-ica* recor$ is foun$.
,a-e 0! of 87
FBE Computer Science Department Lecture Notes Theory of Databases
4sua**y store an in$e; entry for each b*oc/ so $onJt have to rea$ another b*oc/ to
fin$ the /ey va*ue.
',erciseG Hou are -iven the Cust#D va*ue C)!88(2) use the binary search
a*-orithm to fin$ the *o-ica* recor$ containin- this search /ey va*ue= usin- first fi-ure
" an$ then fi-ure ).
/G %hat va*ue in the in$e; $o you fo**o a pointer from in each fi-ure&
#G in fi-ure "G from the C)!88(2) /ey va*ueK in fi-ure )= from the C))!28(8 va*ue
Cbecause it is the hi-hest va*ue that is *ess than the one e antD.
/G 1efer to >an$out 8 fi-ures "= ) 6 ! hich is $ense an$ hich is sparse&
#G fi-ure " is $ense= fi-ure ) is sparse= fi-ure ! is $ense.
FG cou*$ the secon$ary in$e; in fi-ure ! be sparse&
5G no= because the or$er of the *o-ica* recor$s may not be the same as the or$er of the
/eys so the a*-orithm to *oo/ seAuentia**y for the search /ey va*ue onJt or/.
5 primary in$e; can be $ense or sparse. The a*-orithm for searchin- the sparse in$e;
i** or/ because the or$er of the /eys in the in$e; is the same as the or$erin- of the
*o-ica* recor$s.
#f it is $ense every *o-ica* recor$ has a pointer to it
#f it is sparse on*y some *o-ica* recor$s have a pointer to them
Sti** or/s if the search /ey is a can$i$ate /ey or not.
So= a secondary index always has to be dense.
Secondary index on a candidate key *oo/s *i/e a $ense primary in$e; but the
pointers are not to successive *o-ica* recor$s in the fi*e. Each va*ue in the in$e; has a
pointer to one recor$ on*y.
Secondary index on a non-candidate key each search /ey va*ue in the in$e; may
have pointers to more than one recor$. Cannot have a pointer to the first recor$ ith
the va*ue because other recor$s may be scattere$ throu-hout the fi*e.
L1efer to >an$out 8= fi-ure ( $ifferent to fi-ure !M
Sometimes the secondary index has an extra level of indirection so that each search
/ey has on*y 1 pointer from it. This /eeps the in$e; entries to a fi;e$ *en-th.
The pointer is to a buc/et Ca b*oc/ of space that can ho*$ mu*tip*e $ata recor$sD. The
buc/et ho*$s pointers to a** the *o-ica* recor$s that have the /ey va*ue in them. #f the
buc/et cannot ho*$ a** the pointers necessary= it can e;pan$ automatica**y e.-. by
usin- a *in/e$ *ist to pointers outsi$e the buc/et.
Comparison of $ense 6 sparse in$ices
."arseG ta/es up *ess space= an$ *ess or/ for insertin-?$e*etin- C$onJt a*ays have to
insert or $e*ete a /ey va*ue from the in$e;D
DenseG ta/es up more space but faster to fin$ a *o-ica* recor$ as $o not have to scan
*o-ica* recor$s hen searchin- for a /ey va*ue that is not in the in$e;.
%.4 Modifyin& Inde,(.e=uential files ?insert> u"date> delete@
,a-e 0( of 87
FBE Computer Science Department Lecture Notes Theory of Databases
The a*-orithms for insertin- an$ $e*etin- vary s*i-ht*y $epen$in- on the type of the
in$e;.
Both the in$e; an$ the $ata fi*e must be up$ate$.
%.4.1 Delete fro! the data file
%e a*rea$y ta*/e$ about ho deletion of *o-ica* recor$s can be han$*e$ C" metho$s
/eep a$$in- to the en$ 6 $o a fi*e reor- perio$ica**y 6 mar/ free spaceD.
%.4.2 Insert to the data file
Insertions for an or$ere$ fi*e have to fin$ the correct position= base$ on the
or$erin- va*ue= then ma/e space to put it in. #nvo*ves movin- recor$s on avera-e=
have to move ha*f the recor$s.
Other options for insertionG
1. 9eep some unuse$ space in each b*oc/K but hen they fi** up= same
prob*em.
". insert to a temporary unordered fi*e ca**e$ an overf*o fi*e.
,erio$ica**y= the overf*o fi*e is sorte$ an$ recor$s mer-e$ ith the
main?master fi*e. But this ma/es the search a*-orithms more comp*e;
have to $o *inear searches on the overf*o fi*e.
%.4. :"date the data file
'o$ifications?up$ates to $ata " factorsG
1. search con$ition to fin$ the *o-ica* recor$
". the fie*$ to be mo$ifie$
>ave to first fin$ the recor$ to be up$ate$ either binary search if va*ue of the search
/ey in the recor$ is /non O1 *inear search on the $ata fi*e.
Then $o the up$ate if a non+or$erin- fie*$= chan-e the recor$ an$ rerite in same
a$$ress.
#f the or$erin- fie*$ may have to chan-e position in fi*e OP $e*ete o*$ recor$ then
insert ne recor$.
%.4.4 #l&orith! for insertin& to the inde,
Lref ,- !(1 in Si*berschatB et a*M
,erform a *oo/ up for the search /ey va*ue that is in the ne recor$
Dense in$e;G
If /ey is not in the in$e;= insert an in$e; entry ith the ne va*ue= in
the correct position
'lse if I5ey is in the inde, andJ in$e; has pointers to a** recor$s for
the /ey va*ue= a$$ a pointer to the ne recor$
'lse if I5ey is in the inde, andJ in$e; has pointers on*y to first recor$
for the /ey va*ue= ensure the recor$ is after other recor$s ith the same
/ey va*ue.
Sparse in$e; Centry for each b*oc/DG
If a ne b*oc/ has been create$= insert the first /ey va*ue in the b*oc/
into the in$e;.
'lse if the ne recor$ has the *oest /ey va*ue in its b*oc/= up$ate the
in$e; entry pointin- to that b*oc/= so it has the ne /ey va*ue
'lse no chan-e to the in$e;.
,a-e 0. of 87
FBE Computer Science Department Lecture Notes Theory of Databases
%.4.5 #l&orith! for deletin& fro! the inde,
Lref ,- !(1 in Si*berschatB et a*M
Loo/ up the recor$ to be $e*ete$
Dense in$e;G
If $e*ete$ recor$ as on*y one ith the search /ey va*ue= $e*ete the
in$e; entry
'lse if the in$e; has pointers to a** recor$s ith the same /ey va*ue=
$e*ete the pointer to the $e*ete$ recor$
'lse if the in$e; has on*y a pointer to the first recor$ ith the /ey
va*ue if $e*ete$ recor$ as the first recor$= up$ate the pointer to
point to the ne;t recor$.
Sparse #n$e;G
If in$e; contains an entry for the /ey va*ue of the $e*ete$ recor$
If $e*ete$ recor$ as on*y recor$ ith the /ey va*ue= rep*ace
the in$e; entry ith an entry for the ne;t search /ey va*ue Cin
or$erD. #f ne;t search /ey va*ue a*rea$y has an entry= $e*ete the
in$e; entry instea$ of rep*acin- it.
'lse if in$e; entry for the /ey va*ue points to the $e*ete$
recor$= up$ate the in$e; entry to point to the ne;t recor$ ith
the same /ey va*ue.
'lse $o nothin- Cno chan-e reAuire$D.
%.5 Multile3el Indices
#n$e;+seAuentia* fi*e or-aniBation binary search to fin$ *o-ica* recor$s nee$s
*o-
"
CbD accesses if the in$e; fi*e is in b b*oc/s each step of the a*-orithm re$uces
the part of the in$e; fi*e that e continue to search by an in$e; of ".
Even ith a sparse in$e;= the in$e; fi*e can become very bi- thus ma/in- the binary
search *ess efficient. For e;amp*eG itJs not uncommon to have a $atabase tab*e
containin- 122=222 recor$s.
b*oc/in- factor= bfr O 12K assume one in$e; entry per b*oc/ OP 12=222 recor$s in the
in$e;.
5ssume 122 in$e; recor$s in a b*oc/ Cin$e; recor$s sma**er than $ata recor$sD= store$
as a seAuentia* fi*e on $is/.
Lrecor$G here= e are ta*/in- about a basic unit of stora-e on $is/ simi*ar to a recor$
in a $atabase tab*e but not Auite the same conceptM
#f in$e; is sma** enou-h can /eep in main memory OP fast access.
#f *ar-e /eep on $is/ OP nee$ severa* b*oc/ accesses to $o the search.
122 in$e; recor$s in a b*oc/K 12=222 in$e; entries OP 12=222?122 O 122 b*oc/s
binary search nee$s *o-
"
C122D O 0 b*oc/ rea$s.
#f a b*oc/ rea$ ta/es )2ms= search ta/es up to "12ms Auite s*o.
,a-e 00 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
#f e can re$uce the part of the in$e; that e continue to search $urin- the binary
search a*-orithm= e can re$uce the number of b*oc/ accesses.
%e $o this ith a mu*ti+*eve* in$e;= *i/e thisG
Create an in$e; on the in$e; fi*e itse*f this is the secon$ Cor outerD *eve* of
the in$e;.
The ori-ina* in$e; is the first Cor innerD *eve*.
The secon$ *eve* in$e; has an entry for the /ey va*ue of the first entry in each
block of the first *eve* in$e;K the pointer is to the b*oc/. This secon$ *eve* is
itse*f a primary in$e; on the in$e;.
Search a*-orithmG
Binary search on secon$ *eve* in$e; fin$ recor$ for *ar-est search /ey va*ue
SO search va*ue
Fo**o the pointer binary search the b*oc/ for *ar-est search /ey va*ue SO
search va*ue
Fo**o the pointer i** be to the recor$ Cfor a can$i$ate /ey $ense in$e;D or
the b*oc/ containin- the recor$ Ccan$i$ate /ey sparse in$e; or non+can$i$ate
/eyD
The process can be repeate$ create a thir$ *eve* in$e; on the secon$ *eve* an$ so on
/eep repeatin- if the ne *eve* nee$s more than one b*oc/ for stora-e i.e. unti* the
entries in a *eve* a** fit in 1 b*oc/.
Consi$er the performanceG
5ssume first *eve* has r
1
entries an$ the b*oc/in- factor is bfr OP number of b*oc/s b
O r1?bfr
5n$ a*so secon$ *eve* has r
"
Or
1
?bfr entries.
5 thir$ *eve* ou*$ have r
)
Or
"
?bfr entries.
5n$ so on.
Search of a mu*ti+*eve* in$e; nee$s appro;. *o-
bfr
CbD b*oc/ accesses an
improvement on *o-
"
CbD if bfrP ".
',erciseG ta/e a fe minutes to rea$ >an$out 8 e;amp*e ).
%.6 .u!!ary D Inde,(.e=uential 9ile 1r&aniFation
1evie
Or$ere$ seAuentia* fi*es ith an in$e;
fast seAuentia* access
fast /ey search Cusin- in$icesD
,erformance $e-ra$es as the $ata fi*e an$ the in$e; fi*e -ro in siBe particu*ar*y if
overf*o fi*e is use$ C*inear searchesD.
Can fi; this by reor-aniBin- the fi*e but freAuent reor-aniBation is not $esirab*e.
5 common*y use$ fi*e structure for in$ices is a B Tree maintains its efficiency
$espite insertion 6 $e*etion. Base$ on search trees so *etJs $o a Auic/ recap of
,a-e 07 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
binary search trees 6 search trees + hich you shou*$ have covere$ in the Data
Structures course.
Lperhaps $o this by as/in- the c*ass to come up ith hat they /no about binary
search trees hat are the characteristicsM
%.* 8inary .earch Trees ( $eca"
5 binary tree has ma;imum of " branches from each no$e.
#n a binary search tree= for each no$e= the va*ues in the *eft sub+tree are *oerK the
va*ues in the ri-ht sub+tree are hi-her.
To retrieve a** the /ey va*ues from the tree= in seAuentia* or$er= $o an in+or$er
traversa*.
Lfrom Data StructuresG tree traversa*G $epth+first +P rea$ a ho*e branch before the
ne;tK brea$th+first+P rea$ a ho*e *eve* before the ne;tK traversa* means to visit a** the
no$es in the tree.
or$erG hen the no$e is visite$ OP in+or$er O *eft tree= no$e= ri-ht treeK pre+or$er O
no$e= *eft tree= ri-ht treeK post+or$er O *eft tree= ri-ht tree= no$eD
/G a*-orithm for a binary search&
#G compare search va*ue ith the root if it is eAua*= search terminates. #f it is
-reater= procee$ $on the ri-ht sub+tree. #f it is *ess= procee$ $on the *eft sub+tree.
%.*.1 8inary .earch Tree as an Inde,
5 binary search tree can be use$ as the structure for the in$e; fi*es e have $iscusse$=
instea$ of storin- an in$e; in a seAuentia* fi*e.
1ather than storin- ho*e $ata recor$s in a tree Cthis ou*$ ma/e the tree very bi- in
terms of $is/ spaceD= e store the in$e; entries Cva*ue 6 pointer pairsD in the tree.
Each no$e can store a search /ey va*ue an$ a pointer to the $ata recor$ containin- that
search /ey Cor a buc/et of pointers to mu*tip*e recor$sD.
5 no$e can a*so ho*$ up to " more pointers= to its *eft an$ ri-ht sub+trees.
The va*ue store$ in the in$e; is the search 5ey. Each /ey has associate$ infor!ation.
Sho a $ia-ram *i/e this Ccomp*ete to sho pointers to $ata recor$s from every no$e
in the treeK a*so put a 2 in the pointers from *eaf no$es to sho that they are nu**
pointersD.
,a-e 08 of 87
5$$is 5baba

Bahir Dar

NaBret

5assa

Dessie

<on$er
'e/e**e

Data fi*e
5$$is 5babaN.
5assaN..
NNNNN
FBE Computer Science Department Lecture Notes Theory of Databases
9i&ure 1 D a binary search tree 4ith "ointers to data records
This e;tension of the binary search tree to inc*u$e pointers to $ata recor$s outsi$e the
tree itse*f ma/es it an inde,.
The a$$resses of the recor$s in the $ata fi*e are $etermine$ by the strate-y use$ to
insert recor$s to the $ata fi*e.
%.% M(4ay search trees
5n m+ay search tree can have up to m branches from each no$e.
This means that a no$e can store up to Cm+1D /ey va*ues an$ can have up to m
branches from it.
L1ef >an$out 7 m+ay search trees $ia-ramM
Dra this $ia-ram on the boar$ , in$icates a pointer= 9 in$icates a /ey va*ue.
This shos a no$e in an m+ay search tree= here the no$e has n /ey va*ues in it= so
it is of or$er CnE1D.
,
2
9
2
,
1
91 ,
"
N.. ,
n+1
9
n+1
,
n
9i&ure 2 D node in an !(4ay search tree ?sub(trees under the arro4s@
The /ey va*ues insi$e a no$e are store$ in ascen$in- or$er i.e. 9
i
S 9
iE1
for i O 2 to n+"
5s before= the branches from a no$e are pointers to the root no$es of its sub+trees.
#n an m+ay tree= a no$e can contain up to m+1 va*ues.
5 no$e containin- n /ey va*ues CnSO m+1D has nE1 pointers or branches.
For each /ey va*ue= the va*ues in the sub+tree to the *eft of it are *ess than it an$ the
va*ues in the sub+tree to the ri-ht of it are -reater than it.
#n other or$sG
all key values in the nodes pointed to by :
i
are less than ;
i
and are greater than ;
i--
for i < . to n--
The sub+trees in the m+ay tree are a** themse*ves m+ay search trees.
For a -iven number of /ey va*ues= an m+ay search tree i** have a sma**er hei-ht
than a binary search tree.
The ma;imum search *en-th is the hei-ht of the tree i.e. to fin$ a -iven /ey va*ue= the
ma; number of no$es that must be rea$ is the hei-ht of the tree. No more than one
no$e is visite$ on any -iven *eve* of the tree.
#f the tree hei-ht is minimiBe$= then the search time can be minimiBe$.
,a-e 72 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
To $o this= the tree shou*$ have as many branches from each no$e as possib*e i.e. a
hi-her va*ue of m. #n other or$s= a lo4(hei&ht> bushy tree -ives faster searches.
The a*-orithm for searchin- an m+ay tree is simi*ar to that for a binary search tree=
e;cept it has to scan the array of /ey va*ues in each no$e to fin$ the /ey or to fin$ the
pointer to the ne;t branch to fo**o.
1efG >an$out 8= fi-ure . shos a )+ay search tree.
%.- 8 Trees
The B+tree structure is a specia* form of an m+ay search tree that has become the
most popu*ar for or-aniBin- in$e; structures because it -ives -oo$ performance for
both seAuentia* an$ /ey searches. #t is a*so more efficient for insertin-?$e*etin-
compare$ to the more basic in$e;+seAuentia* fi*e or-aniBation.
5 B+tree is an m+ay search tree ith these propertiesG
Each no$e of the tree= except for the root and the leaf nodes= has at *east Cm?"D
sub+trees an$ no more than m sub+trees i.e. each no$e is at least half(full.
The root of the tree must have at least 2 subtrees= un*ess it is itse*f a *eaf
no$e. This forces the tree to branch ear*y so searchin- is faster.
5** *eaf no$es of the tree must be on the same *eve*. This -ives faster
searchin-.
NoteG Some te;ts say that the *ast property above is that the tree shou*$ be ba*ance$.
But this is a $ifferent $efinition of ba*ance$ to hat stu$ents have *earne$ in Data
Structures course. To chec/ their /no*e$-e from Data Structures= as/ these
AuestionsG
/G %hat is a hei-ht+ba*ance$ tree&
#G %hen the $ifference beteen the hei-hts of the *eft an$ ri-ht sub+trees is 2 or 1 for
a** no$es in the tree.
/G %hat is a perfect*y ba*ance$ tree&
#G a tree that is hei-ht+ba*ance$ an$ for hich a** *eaf no$es are on 1 or " *eve*s.
To avoi$ confusion= e i** not use the or$ ba*ance$ here e i** say that a** *eaf
no$es must be on the same *eve*.
5** of these properties he*p to optimise the performance of a search tree in terms of
searchin- an$ insertion?$e*etion of /eys.
#n a$$ition= a B+tree is often imp*emente$ so that each node is a block on disk.
The capacity of each no$e i.e. the va*ue of m is then $etermine$ by the physica*
recor$ siBe= the /ey siBe an$ the pointer siBe Ca** in bytesD.
The hei-ht of the tree can be minimise$ by ma;imisin- the number of /eys store$ in
each no$e.
The hei-ht a*so then $etermines ho many physica* recor$ accesses are reAuire$ to
fin$ a -iven /ey va*ue.
1efG >an$out 8= fi-ure 0. This is a B tree of or$er ) containin- the same /ey va*ues as
the )+ay search tree in fi-ure .. Chec/ for yourse*f that a** three properties are
fu*fi**e$.
,a-e 71 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
#nsertion 6 $e*etion in a B+tree is a bit more comp*icate$ than to an m+ay search
tree because the ba*ance an$ the minimum number of /eys per no$e must be
preserve$.
#nsertion a*-orithm Loomis p- )00?'annino p- )"8
Te** stu$ents to rea$ 'annino= chapter 12= pa-es )".+)"7 ith particu*ar reference
to the $ia-ram on p- )"8= shoin- ho va*ues are inserte$ to no$es of a B+tree.
De*etion Loomis p- )7"
Cost cover formu*as& 'annino p- )"7K Loomis p- )0..
%.1) Inde,ed .e=uential 9iles ( 8K Tree
B+tree prob*em for seAuentia* search= performance is not so -oo$.
>ave to $o an in+or$er traversa* of the tree. Each no$e has to be visite$ once for each
/ey va*ue in it.
5 variation of the B+tree a BETree is often use$ because it -ives -oo$ performance
for seAuentia* and /ey searches.
5 BE Tree is essentia**y a mu*ti+*eve* in$e;= here the bottom *eve* of the tree is
eAuiva*ent to the first *eve* of a mu*ti+*eve* in$e;.
The structure of a BE Tree is $ifferent to a mu*ti+*eve* in$e;+seAuentia* fi*eG
Consists of " parts an inde, set Cas in a B treeD an$ a se=uence set.
1efer to >an$out 8 fi-ure 12. The first ) *eve*s in this one are the in$e; set=
the bottom *eve* is the seAuence set.
SeAuence set the *eaf no$es.
On*y the *eaf no$es have pointers to the $ata recor$s hereas in a B tree= a**
no$es have pointers to $ata recor$s.
Structure of a *eaf no$e in a BETree of or$er CnE1DG
,
2
9
2
,
1
91 ,
"
N.. ,
n+1
9
n+1
,
n
9i&ure D structure of a leaf node in a 8KTree
Note that the pointer ,n is $ifferent to the others.
For i S n= ,
i
is a pointer to the recor$ containin- the search /ey va*ue 9
i
.
Or= if the search /ey is not a can$i$ate /ey an$ the fi*e is not or$ere$ by the search
/ey= a pointer to a buc/et of pointers to recor$s ith the search /ey va*ue 9
i
Can extra
level of indirectionD.
,
n
is a pointer to the ne;t *eaf no$e in the seAuence set so the *ast pointer in the *eaf
no$es chains the *eaf no$es to-ether to form the seAuence set.
,a-e 7" of 87
Ne;t *eaf no$e
FBE Computer Science Department Lecture Notes Theory of Databases
For an m+ay search tree Can$ a B TreeD= e sai$ that a** /ey va*ues in the
sub+tree pointe$ to by ,
i
are *ess than 9
i
an$ -reater than 9
i+1
.
For a BE Tree= every search /ey va*ue must appear at *east once in a *eaf
no$e an$ possib*y a*so in a non+*eaf no$e. That means that a /ey va*ue 9
i
in a
non+*eaf no$e i** a*so appear in a *eaf no$e.
.o all 5ey 3alues in the sub(tree "ointed to by +
i
are less than or equal to
K
i
and &reater than <
i(1
. So the /ey va*ue 9
i
appears in the sub+tree pointe$
to by ,
i
.
Or sometimes= the /ey va*ues in the sub+tree pointe$ to by ,
i
are *ess than 9
i

an$ greater than or e)ual to 9
i+1
. #n that case= the /ey va*ue 9
i
appears in the
sub+tree pointe$ to by ,
iE1
.
L1ef fi-ure 12 on >an$out 8 consi$er if the va*ue "2 as move$ to the no$e
CmD from the no$e C*DM
5 *eaf no$e can ho*$ up to n /ey va*ues= an$ has a minimum of n?" va*ues.
1an-es in the *eaf no$es $o not over*ap they are seAuentia* i.e. if L
i
an$ L
3
are *eaf
no$es= here iS3= every search /ey va*ue in L
i
is *ess than every search /ey in L
3
.
The *eaf no$es form a $ense in$e; for the $ata fi*e.
The non+*eaf no$es form a mu*ti+*eve*= sparse in$e; on the *eaf no$es.
/G No= can you see ho a BE Tree -ives better performance than a B+Tree for
seAuentia* searches&
#G because the *eaf no$es form a seAuence set can fo**o the pointers from no$e to
no$e to -et a** recor$s in or$er of the search /ey. This is faster than $oin- a tree
traversa*.
%.11 Insertin& 0 Deletin& toGfro! a 8K Tree
#nsertin- an$ $e*etin- to?from a BE Tree is simi*ar to a B Tree= e;cept that the
seAuence set must be consi$ere$ as e** as the in$e; no$es.
The a*-orithms must /eep the non+*eaf no$es Ce;cept for the root no$eD at *east ha*f
fu**.
%.12 .u!!ary of inde, file structures
L1ef Si*berschatB p- !!.= p- !00?!07M
%e have $iscusse$ severa* $ifferent fi*e or-aniBations for in$e;esG
#n$e;+seAuentia* fi*e
Or$ere$ seAuentia* $ata fi*e ith an in$e; that is itse*f an or$ere$ seAuentia*
fi*e
Fast access by search /ey
Fast seAuentia* access by or$erin- fie*$
B+Tree
Or$ere$ seAuentia* $ata fi*e ith an in$e; in a B+tree structure
Fast access by search /ey
Fast seAuentia* access by or$erin- fie*$
,a-e 7) of 87
FBE Computer Science Department Lecture Notes Theory of Databases
S*o seAuentia* access by search /ey
BE Tree
Or$ere$ seAuentia* $ata fi*e ith a mu*ti+*eve* in$e; in a BE tree structure
Fast access by search /ey
Fast seAuentia* access by or$erin- fie*$
Fast seAuentia* access by search /ey
#n a$$ition to these= the hash fi*e or-aniBation can a*so be use$ to bui*$ an in$e;.
>ash #n$e;
5$$resses of *o-ica* recor$s in the $ata fi*e $etermine$ by a hash function on
the search /ey va*ues
Search /ey va*ues a*so store$ in a hash fi*e consistin- of buc/ets
The hash function use$ to $etermine the buc/et in hich each search /ey is
store$
Each search /ey has pointerCsD to $ata fi*e recor$s
To fin$ a /ey va*ue= the hash function is app*ie$ to it an$ then the in$e; entry
can be *ocate$
5 hash in$e; is a secon$ary in$e; as the or$er of the search /eys $oes not
correspon$ to the or$er of the $ata fi*e. But can use the hash function to easi*y
fin$ recor$s in the $ata fi*e.
1emember a*so the non+or$ere$ seAuentia* fi*e or heap fi*e as a metho$ of fi*e
or-aniBation. #f seAuentia* an$ /ey searches are not freAuent*y nee$e$= a heap fi*e is as
-oo$ a ay as any to store $ata.
'any DB'Ss no use B Trees or BE Trees for their in$e;in-. They a*so use heap
fi*es for unor$ere$ $ata.
<enera**y= if you are usin- a DB'S for your $atabase= you $onJt have a choice over
the fi*e structure it uses. But you can $eci$e hat in$e;es are bui*t for the $ata.
'S SFL Server "222 uses B+Trees= not sure if it a*so uses BE Trees.
DonJt /no hat 5ccess uses probab*y the same.
'S SFL Server 6 'S 5ccess both create an in$e; on the primary /ey by $efau*t= for
e;amp*e.
#f you $o not te** it otherise= the in$e; i** be a primary?c*ustere$ in$e;.
But in SFL Server= you can te** it to be a non+c*ustere$?secon$ary in$e; an$ create a
primary?c*ustere$ in$e; on another fie*$. #n 5ccess Cas far as # /noD= you cannot
create a primary in$e; on a fie*$ other than the primary /ey. SFL Server has -reater
f*e;ibi*ity over in$ices.
Lasi$eG ,ro-rams+PSFL Server+PBoo/s On*ine use the #n$e; or Search tab to fin$
artic*es about b+tree an$ in$e;M
Lmaybe omit this ne;t bitM
Some factors that may inf*uence the DB'S $esi-nerJs choice of fi*e structures areG
,a-e 7! of 87
FBE Computer Science Department Lecture Notes Theory of Databases
#ccess ty"es seAuentia* or by search /ey va*ueK by in$ivi$ua* search /ey
va*ue or in a ran-e
#ccess ti!e ho *on- it ta/es to fin$ a particu*ar $ata item or set of items
e.-. binary search of a tree
Insertion ti!e time to insert the $ata recor$ an$ to up$ate the in$e;
Deletion ti!e time to $e*ete a $ata recor$ an$ to up$ate the in$e;
."ace o3erhead ho much space the in$e; ta/es up usua**y orthhi*e to
-o for some e;tra space e.-. a $ense in$e; if the other factors are improve$.
%.1 Indices on Multi"le <eys
5** the in$e;es e have consi$ere$ so far ere on a search /ey va*ue base$ on one
attribute. But an in$e; can a*so be on a combination of va*ues e.-. CustomerName
an$ CustomerFathersName.
The in$e; structure is sti** the same= e;cept that the /ey va*ues store$ are rea**y tup*es
of va*ues e.-. a** the combinations of Name+FathersName that e;ist in the Customer
tab*e. The in$e; entries are a*phabetica**y or$ere$ on both va*ues e.-.
5baba Te/*e comes before 5baba Tesfay.
The pointers are to $ata recor$s containin- both the va*ues.
%.14 'nforcin& :ni=ueness 4ith an Inde,
Hou may often ant to specify that a co*umn other than the primary /ey shou*$ have
uniAue va*ues.
'ost DB'Ss a**o this by creatin- an in$e; an$ *ettin- you specify that $ata recor$s
must have uniAue va*ues for the search /ey of the in$e;.
This is because chec/in- in$e; entries to see if the search /ey va*ue in a ne recor$
a*rea$y e;ists or not is more efficient than chec/in- in the $ata fi*e.
See for e;amp*e= 'S 5ccess hen you are $efinin- a co*umn= you can choose that
is in$e;e$ ith $up*icates a**oe$ or $up*icates not a**oe$.
%.15 6hy use inde,es 0 choosin& fields to inde,
No that you /no ho in$e;es are imp*emente$= *etJs ta*/ about hy you ou*$
ant to use them.
,ut simp*e= in$e;es can spee$ up Aueries in a $atabase. To un$erstan$ hy= eJ** *oo/
at ho Aueries are e;ecute$.
<enera**y= the DB'S stores each $b tab*e in one fi*e. #t may or may not have an in$e;
Cor in$icesD on the fi*e.
#n$e;in- i** ma/e some operations faster e.-. searchin- but others s*oer e.-.
insert?$e*ete so the choice is a tra$e off .
To ma/e -oo$ choices= you nee$ to un$erstan$ your $b system an$ hat Aueries i**
be most freAuent*y ma$e in it. 5 -oo$ startin- point ou*$ be to try to rite $on the
Aueries that are freAuent*y run e.-. *oo/ at forms in the interface an$ thin/ about
searches the users i** nee$ to $o.
%.15.1 Choosin& indices
,a-e 7( of 87
FBE Computer Science Department Lecture Notes Theory of Databases
Hou can have only one clusterin& inde, per tab*e because the recor$s can be in
on*y one or$er on secon$ary stora-e. So choose this in$e; carefu**y.
Some DB'Ss i** automatica**y ma/e the ,9 be the c*usterin- in$e; 5ccess 6
SFL Server. Some i** a**o you to chan-e it.
<ui$e*ines for choosin- itG
4sefu* on a co*umn that is often searche$ for a range of values because the
*o-ica* recor$s are *i/e*y to be in the same b*oc/. #f= for e;amp*e= a $ate
co*umn is often Auerie$ to fin$ a ran-e of $ates= the $ate co*umn shou*$ be in
a c*ustere$ in$e;.
#f there is a co*umn that is use$ freAuent*y to sort the data retrieved from a
table= then that co*umn is a can$i$ate for a c*usterin- in$e;. This is because
the ros are a*rea$y physica**y sorte$ by the search /ey= so it is not necessary
for the DB'S to sort the ros a-ain. For e;amp*e= in the pubs $atabase= the
emp*oyee tab*e has the c*ustere$ in$e; on the *name= fname an$ minit co*umns
as it is *i/e*y that most Aueries on emp*oyees i** sort the $ata by name= not
by empRi$.
4se for a primary key or other uni)ue column because it ma/es Aueries that
nee$ to fin$ a specific va*ue in the co*umn very fast.
#n summary thin/ about the $ata an$ hat fie*$s are most often use$ in the %>E1E
part of Aueries.
1emember a*so that you can create an in$e; on " or more co*umns if the va*ues are
often accesse$ to-ether. #n a$$ition= the DB'S may use the in$e; even hen on*y
one of the co*umns is bein- searche$.
See a*so chapter 12= section 12.( of 'annino p-s )). )!". This -ives some more
insi-ht an$ some -oo$ ru*es to use. %e i** not $iscuss here in c*ass= but consi$er this
as part of the courseN.so it can be in e;ams. Hou can as/ me about them either
persona**y or throu-h the $iscussion forums.
%.15.2 /uery 1"ti!iFation D ho4 inde,es are used
$eadin&G 'annino= chapter 12 Cas beforeDK Si*berschatB et a* Chapter 1)= section 1).(
on :oin Operations Cspecifica**y Neste$+Loop 3oin on p- (2) an$ 'er-e :oin= p- (2.D.
%hen you $efine or rite a SFL Auery= the DB'S carries out a process of trans*atin-
an$ ana*ysin- the Auery to pro$uce an execution plan.
1ea$G section 12.! of Chapter 12= 'annino= p- ))".
1. chec/ for synta; 6 semantic errors if there are errors= stop processin-. Synta;
O misuse$ /eyor$s e.-. F1O' in the ron- p*ace or misspe**e$. Semantic O
co*umns or tab*es ron-*y use$ e.-. comparin- va*ues in co*umns that have
incompatib*e $ata types.
". Auery transformation transform to a stan$ar$ format usua**y base$ on
re*ationa* a*-ebra. Can invo*ve rearran-ement to ma/e the Auery faster ca**e$
Auery optimisation.
,a-e 7. of 87
FBE Computer Science Department Lecture Notes Theory of Databases
). access p*an eva*uation base$ on the re*ationa* a*-ebra e;pression= come up
ith an access p*an that *ists the in$ivi$ua* fi*e operations nee$e$ e.-. to use a B
Tree= to mer-e recor$s= to sort recor$s.
!. e;ecution of the access p*an interpret an$ e;ecute the *ist of fi*e operations.
E;amp*e Ca 3oin of ) tab*es= ith 3oin con$ition in the %>E1E c*auseD
SELECT t.tit*e= t.type= a$vance= notes= a.auRi$= a.auRfname= a.auR*name
F1O' tit*es t= tit*eauthor ta= authors a
here t.tit*eRi$ O ta.tit*eRi$ 5ND ta.auRi$ O a.auRi$
5ND t.Type O ]business] 5ND auRfname *i/e ]m[]
This Auery $oes " inner 3oins one to 3oin tit*e an$ tit*eauthor an$ then another to 3oin
the resu*t to author. Tit*e an$ author are re*ate$ throu-h a many+many re*ationship=
mo$e**e$ by the tit*eauthor tab*e.
.te" 1G synta; 6 semantic chec/. #f there as an error e.-. one of the /eyor$s
misspe**e$ or a missin- 5ND= an error i** be reporte$.
.te" 2G Auery transformation transform to re*ationa* a*-ebra operations. ,uttin- "
tab*es in the F1O' c*ause in$icates a Cartesian pro$uct. To $o an inner 3oin= the
stan$ar$ synta; is to use the #NNE1 :O#N?ON /eyor$s. But SFL a**os you to
specify the matchin- fie*$s in the %>E1E c*ause so a Cartesian pro$uct is
converte$ to a natura* 3oin if there is a 3oin con$ition in the %>E1E c*ause.
One possib*e e;pression ou*$ beG

tit*e=type=a$vance=notes=auRi$=auRfname=auR*name
C
tit*e.typeOJbusinessJ
CTit*eD nat.3oin Tit*e5uthor nat.3oin C
author.fname L#9E Jm[J
C5uthorDDD
Note that the se*ect operations are $one before the natura* 3oins this e*iminates some
ros before $oin- the 3oin= ma/in- the 3oins more efficient.
.te" G 5ccess p*an eva*uation fi*e operations. 5ccess p*an *oo/s *i/e a tree
structure the *eaf no$es are the startin- points the tab*es in the Auery. #t shos
hat in$ices i** be use$. This is one possib*e access p*an.
,a-e 70 of 87
'er-e 3oin C-et
matchin- rosD
4se B+Tree in$e; on Tit*eR#D
to -et ros in seAuence=
fi*terin- to -et on*y those
ith typeOJbusinessJ
4se B+Tree in$e; on Tit*eR#D
to -et ros in seAuence
5uthors
4se B+Tree in$e; on 5uRi$ to -et
ros in seAuence= fi*terin- to -et on*y
those here auRfname be-ins ith m.
Sort CauRi$D
'er-e 3oin C-et matchin- rosD
FBE Computer Science Department Lecture Notes Theory of Databases
5nother possib*e p*an for this Auery mi-ht be to first 3oin 5uthors 6 Tit*e5uthor.
The DB'S i** eva*uate the $ifferent p*ans each operation type Ce.-. rea$ a B Tree
in$e;= sort= mer-e 3oinD has an associate$ cost. The DB'S can estimate costs by
estimatin- the number of ros that i** resu*t from each operation.
Hou can see a representation of the p*an create$ by SFL Server in the Fuery 5na*yBer.
Before runnin- the Auery= use Fuery menu= Sho E;ecution ,*an. %hen you run the
Auery= an e;tra tab appears in the resu*ts pane E;ecution ,*an. This shos a
$ia-ram= simi*ar to the one above. #f you point the mouse at a no$e in the $ia-ram=
you see more information about hat is happenin- e.-. e;act*y hat in$e; is bein-
use$= hat operation is bein- carrie$ out= the C,4 cost etc.
#n this case= you i** see that the in$ices on Tit*es an$ 5uthors are c*ustere$ so
accessin- those in or$er is fast. The in$e; on Tit*e5uthor.Tit*eR#D is non+c*ustere$= so
it may not be so fast but as the number of Tit*es has a*rea$y been re$uce$
CtypeOJbusinessJD= the number of matchin- ros is sma**. On*y those b*oc/s pointe$
to by in$e; entries for the remainin- Tit*eR#D va*ues nee$ to be accesse$.
.te" 4G e;ecution of access p*an interpret the p*an an$ e;ecute it.
%.15. :se of .tatistics
The Auery optimisation part of the DB'S nee$s statistics about the $ata in the
$atabase to ma/e -oo$ $ecisions.
# mentione$ that in eva*uatin- the access p*an= the DB'S estimates the cost of each
operation= by estimatin- the number of ros that i** be returne$.
#t $oes this by referencin- tab*e profi*es an$ statistics such as number of ros in
each tab*e an$ $istribution of va*ues in a co*umn Ce.-. of CustomerName va*uesD.
5 DB'S *i/e SFL Server /eeps these statistics in its system $atabases. #t a*so up$ates
the statistics re-u*ar*y= automatica**y.
The DB'S i** a*so ana*yse the avai*ab*e in$ices on a tab*e an$ try to use the one
that -ives the *oest cost for a particu*ar operation.
Sometimes= you may fin$ that it $oes not use the in$e; you e;pect it to= or ant it to.
SFL Server a**os you to force it to use a particu*ar in$e; in a Auery usin-
somethin- ca**e$ a tab*e hint Cuse Boo/s On*ine if you ant to fin$ out moreD.
%.16 Creatin& Inde,es 4ith ./L
Hou can create in$ices usin- SFL Aueries or ith the Enterprise 'ana-er.
#n SFLG
C1E5TE L4N#F4EM LCL4STE1ED V NONCL4STE1EDM #NDE` in$e;Rname
,a-e 77 of 87
Tit*es Tit*e5uthor
FBE Computer Science Department Lecture Notes Theory of Databases
ON tab*eRname Cco*umnRname L5SC V DESCM= N..D
4niAueG if uniAue= no " ros can have the same va*ue for the in$e; co*umnCsD
C*ustere$?nonc*ustere$G c*usterin- or not
5SC?DESCG ascen$in- or$er or $escen$in- or$er of search /eys in the in$e;. For a
c*ustere$ in$e;= this i** affect the or$er of the ros in the $ata fi*e.
Hou can a*so or/ ith in$ices usin- the Enterprise 'ana-er see >an$out 7=
section 0= un$er the course Database ith SFL Server for $etai*s.
#n 'S 5ccess= in$e;in- is one of the properties of a co*umn you can specify if the
co*umn is to be in$e;e$ an$ if yes= if $up*icates are a**oe$ or not an$ a*so if it is
ascen$in- or $escen$in-. Hou can a*so use the #n$e;es button on the too*bar usefu*
if you ant to create an in$e; on to or more co*umns to-ether.
LDurin- this unit= arran-e ith the technica* assistant to $o a *ab session on in$e;in-.
<ive him?her an$ the stu$ents a copy of >an$out 12 CTab*es 6 #n$ices %or/sheetD to
use in the *ab. #$ea**y near the en$ of the unit in the same ee/ as the *ast fe
sections above are covere$ in c*ass.M
%.1* /uiF
>an$out 11 is a AuiB for stu$ents to he*p chec/ their on *earnin- of this unit.
Either -ive them each a copy or ca** out the Auestions in c*ass.
<ive them about "2 minutes to anser the Auestions= or/in- on their on.
Then -et them to sap ith the person besi$e them.
Then you -o throu-h the Auestions an$ their ansers stu$ents can mar/ each otherJs
papers. 4se this as an opportunity to $iscuss the ansers if stu$ents have $ifferent
ansers to the su--este$ ansers= chec/ them an$ e;p*ain hy they are not Auite
ri-ht or accept if they are -oo$ ansers.
The so*utions to the AuiB are in the $oc >an$out 11 #n$e;in- FuiB So*utions.$oc.
- ./L
Cmost*y to be covere$ in the *abs by the course technica* assistantD
1) Introduction to Transactions 0 Concurrency
1b2ecti3esG stu$ents to /no hat $atabase transactions are= the 5C#D properties of a
transaction= hat concurrency means an$ to un$erstan$ the issues invo*ve$ in
concurrency contro*.
$eadin&G 'annino= chapter 1)K Si*berschatB et a* chapters 1( 6 1..
1).1 Introduction
5 transaction usua**y means an interaction amon- " or more parties in the con$uct of
business e.-. a customer *o$-in- money into a ban/ account= a customer ith$rain-
money from the ban/ account= the purchase of a car or boo/in- a f*i-ht.
,a-e 78 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
#n a DB'S= this type of transaction may reAuire severa* $atabase operations to ta/e
p*ace. For e;amp*e to *o$-e money into a ban/ account= the operations that must
ta/e p*ace cou*$ beG
1. #nsert a ne recor$ to the 5ccount>istory tab*e
". 4p$ate the account ba*ance in the 5ccounts tab*e
To ith$ra money= the operations mi-ht beG
1. Chec/ the account ba*ance to ma/e sure the ith$raa* amount is
avai*ab*e
". #nsert a ne recor$ to the 5ccount>istory tab*e
). 4p$ate the account ba*ance in the 5ccounts tab*e
L#f you nee$ another e;amp*e transfer money beteen " accounts nee$ to $ebit
one account= cre$it the other an$ up$ate both ba*ances. %i** refer bac/ to these
e;amp*es= so /eep on the boar$ if possib*e.M
So= in a $atabase system= one transaction can invo*ve any number of rea$s from an$
rites to the $atabase. 5ny one of the operations e.-. chec/ the account ba*ance $oes
not ma/e sense to the en$ user if ta/en on its on. The en$ user may as/ the system
to perform a *o$-ement or a ith$raa*= an$ $oes not nee$ to /no about the
operations that ta/e p*ace.
#n other or$s= a co**ection of operations to-ether is a sin-*e unit from the point of
vie of the $atabase user.
#n $atabase termino*o-y= a transaction is: a se=uence of o"erations "erfor!ed as a
sin&le> lo&ical unit of 4or5.
The operations performe$ as part of a transaction ma/e chan-es to the $ata i.e. they
are $ata mo$ification operations.
5t any point in time= there may be many transactions happenin-. For e;amp*e= in a
ban/Js system= there are customers comin- into $ifferent branches an$ ma/in-
*o$-ements an$ ith$raa*s at a** times $urin- the $ay. So the system has to be ab*e
to han$*e concurrent transactions Ctransactions happenin- at the same time= an$ usin-
the same $atabase tab*esD.
The 5ey feature of a transaction is that all the o"erations !ust succeed. #f any one
of the operations fai*s= then the transaction must fai* it must un$o any of the
operations that $i$ succee$.
For e;amp*e= if the system removes money from the savin-s account but fai*s to a$$
the money to the current account there i** be a prob*em ith the customer]s
account.
So= $atabase transactions must have ays of chec/in- that a** operations have
comp*ete$.
1).2 #CID +ro"erties
For a *o-ica* unit of or/ to be consi$ere$ a transaction= it shou*$ have certain
properties. These properties he*p to ensure the inte-rity of $ata in a $atabase.
,a-e 82 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
They areG
#to!icity
Consistency
Isolation
Durability
These are /no as the 5C#D properties of a transaction Cfrom the first *ettersD.
Lrite on boar$ but *eave room to put e;tra info for each one a$$ notes as you -o
throu-h the fo**oin- section that $escribes each propertyM
#to!icity
5** $ata chan-es ma$e by the operations are ref*ecte$ in the $atabase or none of them
are Ca** $ata mo$ifications performe$ or noneD.
For e;amp*e to ith$ra money from an account reAuires ) stepsG
1. Chec/ the account ba*ance to ma/e sure the ith$raa* amount is
avai*ab*e
". #nsert a ne recor$ to the 5ccount>istory tab*e
). 4p$ate the account ba*ance in the 5ccounts tab*e
#f a** ) steps succee$= the transaction is comp*ete e say it is co!!itted.
Let us suppose the first step succee$s= then the secon$ step fai*s for some reason= a
recor$ cannot be inserte$ to the 5ccount>istory tab*e.
#n this case= the entire transaction must fai* because the account cannot be $ebite$.
#f the secon$ step $oes succee$ an$ then the thir$ step fai*s the secon$ step must be
un$one= because the account cannot be $ebite$ ithout up$atin- the account ba*ance.
#f a transaction fai*s= the process of un$oin- the previous steps an$ -oin- bac/ to the
initia* state is ca**e$ rollin& bac5 the transaction.

Thus= 5tomicity means that a DB'S must be capab*e of recoverin- transactions if
somethin- -oes ron- hi*e the operations are bein- e;ecute$. #f a transaction is
partia**y comp*ete$= it must un$o the comp*ete$ operations.
There are $ifferent situations that can cause this. ,ossib*e causes areG
" sets of operations CtransactionsD -et into a $ea$*oc/ because both are
aitin- to use the same $ata eventua**y one of them i** be terminate$ this
is ca**e$ transaction recovery.
a crash of the system if the DB'S has a system fai*ure hi*e transactions
are ta/in- p*ace= after the $b is recovere$= there may be partia**y comp*ete$
transactions Chere some of the operations too/ p*ace but others $i$nJtD this
is ca**e$ crash recovery.
Consistency
%hen comp*ete$= a transaction must preserve the consistency of a $atabase. This
means that after comp*etion= a** $ata must be in a correct state an$ comp*y ith a**
$ata constraints an$ va*i$ation ru*es.
e.-. for a ban/ transaction that transfers money beteen " customer accounts + this
ou*$ mean that the tota* amount of money recor$e$ in the customer]s accounts
shou*$ sti** be the same.
,a-e 81 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
Consistency a*so means that data integrity in the $atabase must be maintaine$. This
means avoi$in- the situation here a transaction can rea$ Qdirty dataJ. Dirty $ata is
$ata that has been chan-e$ but not yet committe$ to the $atabase.
To he*p ensure consistency= the DB'S must be ab*e to $ea* ith concurrent
transactions. %e i** ta*/ more about this *ater.
Isolation
#n summary= this means that any transaction must be unaare of other transactions
e;ecutin- in the system concurrent*y Cmeanin- at the same timeD.
No other transactions or e*ements of the $atabase can see the chan-es resu*tin- from a
transaction unti* the transaction comp*etes. Other transactions shou*$ see the $ata in
the state it as in before the transaction or after it comp*etes not in beteen.
Or= in other or$s= a transaction must see a consistent $atabase a transaction cannot
rea$ or rite $ata that is bein- mo$ifie$ by another transaction.
,ossib*e conseAuences of transactions that are not iso*ate$G
Lost u"dates an up$ate by one user overrites an up$ate by another user
:nco!!itted de"endency Ca*so /non as a $irty rea$D hen one
transaction rea$s $ata ritten by another transaction before the first
transaction commits
+hanto! $o4s hen a transaction 5 rea$s $ata ros= then another
transaction B $oes somethin- to up$ate the ros= then transaction 5 carries out
the same rea$ a-ain but -ets $ifferent recor$s from the first time.
%e i** ta*/ more about each of these hen e cover Concurrency Contro*.
Durability
5fter a transaction comp*etes successfu**y= the chan-es it has ma$e in the $atabase
i** persist CremainD even if there is a system fai*ure Ce.-. a $is/ crashD. #n other
or$s the chan-es must be permanent an$ cannot be erase$ from the $atabase Cafter
they are committe$D.
e.-. for ban/in- if the system crashes= the customer i** sti** see that she move$
B122 to the current account. Or if customer ` transferre$ B1222 to the account of
customer H the money must sti** sho as havin- been transferre$.
1). D8M. .er3ices for Transactions
5 DB'S shou*$ provi$e services to he*p meet the 5C#D properties of transactions.
SFL a*so has some features that -ive the $b $esi-ner contro* over transactions.
1)..1 #to!icity
SFL has statements to be-in?commit?ro**bac/ transactions.
The DB'S automatica**y carries out transaction 6 crash recovery.
1)..1.1 ./L state!ents
#f there is a seAuence of operations that form a transaction= the pro-rammer shou*$
be-in a transaction before the first one an$ commit the transaction after the *ast one.
Hou a*so have to put *o-ic in the co$e to ma/e a** the operations fai* if any one of
them fai*s SFL has a statement to roll back a transaction a*so.
,a-e 8" of 87
FBE Computer Science Department Lecture Notes Theory of Databases
8e&in transaction
Operation 1
#f fai*ure
$ollbac5 transaction
E;it
En$ if
Operation "
#f fai*ure
$ollbac5 transaction
E;it
En$ if
Co!!it transaction
1)..2 Consistency
Data inte-rity constraints are chec/e$ hen up$ates are ma$e.
The re*ationa* mo$e* has bui*t+in inte-rity ru*es *i/e entity inte-rity Cprimary /ey
constraint= uniAueness constraintD= referentia* inte-rity Cforei-n /ey constraintD= chec/
constraints Cva*i$ation ru*esD= nu** chec/s.
%hen a $atabase operation is carrie$ out= the DB'S chec/s that a** the chan-e$ or
a$$e$ $ata sti** meets a** the constraints on it. #f it $oes not= an error occurs.
#n SFL Server= certain SFL statements e.-. C1E5TE T5BLE= #NSE1T are
automatica**y ro**e$ bac/ if an error occurs.
For other statements or combinations of statements= the pro-rammer can chec/ for
such errors an$ ro** bac/ the transaction if an error occurs.
1).. Isolation
>an$*in- of concurrent transactions concurrency contro*. 5 mechanism ca**e$
*oc/in- is use$ to he*p contro* concurrency.
To prevent other transactions accessin- a tab*e it is usin-= a transaction can loc5 the
tab*e. 5 *oc/ te**s other transactions that the tab*e cannot be accesse$.
But there are $ifferent *eve*s of *oc/in- possib*e for e;amp*e= if one step in a
transaction is to simp*e rea$ $ata from a particu*ar tab*e= there is no nee$ to stop other
transactions from rea$in- the tab*e.
%e i** ta*/ more about this *ater.
1)..4 Durability
5 $atabase can be bac/e$ up re-u*ar*y. Then if there is a system fai*ure= the $atabase
bac/up can be restore$ to recover the $atabase.
ButN.if the bac/ up as ta/en ) hours before the fai*ure= the *ast ) hours of $ata
chan-es i** not be in itN.so the transactions ma$e in that time are not $urab*e.
To a$$ress this= the DB'S can /eep a lo& of transactions. 5 *o- fi*e can be a$$e$ to
a** the time an$ $oes not ta/e as much space as a comp*ete bac/up.
The *o- is *i/e a *ist of a** the chan-es that have ta/en p*ace.
#f the bac/up is restore$= then the *o- fi*e from the time of bac/up can be app*ie$ to
the $atabase this i** brin- it ri-ht up to $ate= up to the point in time here it
crashe$.
,a-e 8) of 87
FBE Computer Science Department Lecture Notes Theory of Databases
This is /non as a transaction *o-.
L5s/ have you notice$ that hen you create a $atabase on SFL Server= there are to
fi*e *ocations specifie$ one is for the $ata itse*f an$ the other is for the *o- fi*e. They
can be put in $ifferent $rives= so if the $ata $rive crashes= you may sti** have the *o-
fi*e.M
1).4 Concurrency Control
Concurrent transactions are transactions that are runnin- at the same time. 5 DB'S
shou*$ be ab*e to han$*e this an$ sti** maintain the iso*ation property of transactions.
%hat iso*ation actua**y meansG
mo$ifications ma$e by concurrent transactions must be iso*ate$ from mo$ifications
ma$e by any other concurrent transactions.
Let us suppose there are to transactions= 5 an$ B= occurrin- concurrent*y an$ usin-
the same tab*es= then transaction 5 shou*$ see the $ata in the state it as in before
transaction B is carrie$ out or after transaction B as comp*ete$. Transaction 5
shou*$ not see the $ata in any interme$iate state Ce.-. after one up$ate but before
another oneD.
This is because a transaction can ma/e chan-es to $ata an$ it can consist of severa*
$ifferent operations to ma/e those chan-es. Suppose Trans 5 is ma/in- chan-es. #f
Trans B sees the $ata hi*e 5 is e;ecutin-= it may rea$ $ata before 5 chan-es it. Then
5 comp*etes= but B has rea$ some $ata that is no incorrect.
e.-. for ban/in- cannot have another transaction tryin- to ta/e money out of the
savin-s account hi*e money is bein- move$ to the current account hat if there
i** be no money *eft in the account after the transaction comp*etes&
LetJs *oo/ at a simp*e e;amp*e to sho hat can happen hen transactions run
concurrent*y. Dra a time *ine CT1+T.D an$ a series to sho " transactions runnin- at
the same time= operatin- on the same $ata.
The te;t in parentheses is information you Cthe teacherD can ta*/ about?a$$ after
$rain- the $ia-ram.
L,ossib*e c*ass activity to $o for this to invo*ve the stu$ents an$ to he*p your visua*
an$ /inaesthetic *earners
1
. Hou can have stu$ents act out the process of the $ifferent
transactions an$ hat they $o.
'a/e some car$s one sayin- Transaction 5= one sayin- Transaction B. <et a
vo*unteer to ho*$ up each one= or stic/ on the a** or boar$. Then have a stu$ent for
each transaction. Ta/e the steps ritten in the tab*e for each transaction at each time
CT1= T" etcD rite them on pieces of car$?paper as instructions. >ave somethin- to
be the $atabase e.-. ritten on the boar$ or another stu$ent ho*$in- the va*ue of ` in
the $b. Then han$ each QtransactionJ stu$ent a piece of paper ith an instruction on it
-ive the T1 instructions first= *et them $o the action an$ then continue ith T" an$
so on. 1emember that each transaction rea$s $ata into its on buffer so the va*ue it
1
One *earnin- theory says that *earners are visua*= au$itory or /inesthetic. #n short= this means some
peop*e *earn best by seein- $ia-rams= pictures etcK some by *istenin- an$ some by $oin- thin-s. 'ost
peop*e are visua* or au$itory but some stu$ents i** be /inesthetic. TheyJ** be bore$ by p*ain o*$ cha*/
6 ta*/ teachin-I
,a-e 8! of 87
FBE Computer Science Department Lecture Notes Theory of Databases
has in its buffer $oesnJt chan-e if the va*ue in the $b chan-es i.e. if Trans 5 rites ;=
Trans B $oes not -et the ne va*ue unti* it $oes a ne rea$ of `.
>opefu**y this i** he*p stu$ents to un$erstan$ ho concurrent transactions operate
an$ to see ho a *ost up$ate can occur.M
Ti!e Transaction # Transaction 8
T1 rea$ ` 3result= -4
T" ;GO ` E (2 3result= x<"-4 rea$ ` 3result= - 1 because rans & has not yet
written the new value of x4
T) rite ` 3> in db now has value
"-4
`GO`E"2 3result= #-4
T! rite ; 3> in db now has value #-4
T( Commit transaction
T. Commit transaction
FG hat is the va*ue of ` after T.&
5G "1.
FG %hat shou*$ it be&
5G 1 E "2 E (2 O 01.
#f this as your ban/ account= ou*$ you be happy&I
This is ca**e$ a lost u"date the up$ate ma$e by Transaction 5 as *ost.
:nco!!itted de"endency Cor $irty rea$D e;amp*eG
Ti!e Transaction # Transaction 8
T1 rea$ ` 3result= -4
T" ;GO ` + 1 3result= x<.4
T) rite ` 3> in db now has value .4
T! rea$ ` 3result= . 1 because rans & has
now written the new value of x4
T( ro**bac/ transaction
5fter T(= Transaction B has an incorrect va*ue for ` if it performs an operation e.-.
to a$$ to `= the resu*t i** be incorrect because Transaction 5 ro**e$ bac/ itJs
operations puttin- ` bac/ to a va*ue of 1.
Incorrect su!!ary Cphantom rosD e;amp*eG
Occurs hen Transaction 5 is up$atin- $ata hi*e Transaction B is rea$in- the $ata to
ca*cu*ate a summary.
Ti!e Transaction # Transaction 8
T1 rea$ ` 3result= -4
T" ;GO ` + 1 3result= x<.4
T) rite ` 3> in db now has value .4
T! rea$ ` 3result= . 1 because rans & has
now written the new value of x4
T( sum O sum E `
T. rea$ H 3result= !4
,a-e 8( of 87
FBE Computer Science Department Lecture Notes Theory of Databases
T0 Sum O sum E H 3result= .?! < !4
T7 1ea$ H 3result= !4
T8 H O y 1 3result= #4
T12 %rite y 3result y<# in the db4
>ere= Transaction B has rea$ ` after Transaction 5 up$ate$ it= but rea$ H before
Transaction 5 up$ate$ it.
So Transaction B has inconsistent $ata remember that iso*ation means that
Transaction B shou*$ see the $ata in the state it as in before ransaction & modified
it, or after it modified it not in beteen the to states.
1).5 Loc5s
5** of the above prob*ems can be avoi$e$ by usin- a mechanism ca**e$ *oc/s.
%hen a transaction is ma/in- chan-es to $ata in a tab*e= the transaction can -et a *oc/
on the tab*e.
%hi*e the tab*e is *oc/e$ by the transaction= other transactions ishin- to access the
$ata cannot $o so they must ait unti* the *oc/ is re*ease$.
LHou can sho ho this ou*$ prevent any of the previous ) e;amp*es in each case=
Transaction B ou*$ have to ait for Transaction 5 to re*ease its *oc/ before
accessin- the $ata.M
This seems *i/e a -oo$ so*utionN.no= transactions 3oin a Aueue Ca *ine in 4S
En-*ishD hen they ant to access $ata in a particu*ar tab*e.
%hen the *oc/ is avai*ab*e= it is -rante$ to the ne;t transaction in the Aueue.
But consi$er hat this means if there are many transactions concurrent*y accessin-
the same $ata they have to ait in the Aueue. This can cause $e*ays in the
app*ications comp*etin- their transactions.
So this can re$uce the $b performance s*oin- thin-s $on. 5n$ ith this=
transactions are not rea**y concurrent any more instea$ they ait an$ a transaction is
the on*y one up$atin- the $ata at a -iven point in time.
This is ca**e$ seria*iBation here transactions access $ata in seAuence= formin- a
Aueue.
This is one e;treme of isolation le3el. The other e;treme is to a**o transactions to
rea$ $ata before it is committe$ by other transactions i.e. to accept the possibi*ity of
uncommitted dependency happenin-.
#n SFL Server= this iso*ation *eve* is ca**e$ read unco!!itted.
#t means the DB5 C$b a$ministratorD is a**oin- more transactions to access the same
$ata at the same time= /noin- that sometimes this i** resu*t in an uncommitted
dependency an$ thus some inconsistent $ata.
But this may be acceptab*e if it $oes not happen often.
#t a*so $epen$s on the $ata e.-. sensitive $ata *i/e ban/ account transactions vs
customer a$$ress $ata. 5 chan-e in the ban/ account ba*ance shou*$ not be rea$ unti*
it is committe$K a chan-e to the customerJs a$$ress $oes not happen often an$ cou*$
be rea$ before committin-. #f the chan-e is committe$= the effect of havin- the $ata
before it chan-e$ is minima*.
,a-e 8. of 87
FBE Computer Science Department Lecture Notes Theory of Databases
5 DB'S i** have $ifferent *eve*s of iso*ation. SFL Server has ! *eve*s.
The choice of iso*ation *eve* is a tra$e+off beteen concurrency an$ $ata consistency.
The hi-hest *eve* of iso*ation= Seria*iBab*e= -ives *o concurrency but hi-h
consistency.
The *oest *eve*= read unco!!itted= -ives hi-h concurrency but *o consistency.
Dra a $ia-ram *i/e this C$onJt nee$ the -ri$ *inesDG
Concurrency Consistency
Seria*iBab*e Chi-h iso*ationD
*o
hi-h
1epeatab*e 1ea$
1ea$ Committe$
1ea$ 4ncommitte$ C*o
iso*ationD
hi-h *o
1).6 Loc5 Ty"es
1).* I!"licit Transactions
%e have not been usin- be-in?commit transaction hen e rite SFLNbut SFL
Server is smart it $oes somethin- ca**e$ #mp*icit Transactions. #t $oes these for
certain statementsG
5*ter= create= $rop= $e*ete= insert= up$ate= se*ect.
5n imp*icit transaction occurs automatica**y. So if you $o an insert= if it creates an
error Ce.-. you try to put a character va*ue into an int co*umnD= SFL Server
automatica**y ro**s bac/ the transaction.
This $efau*t transaction mo$e is ca**e$ 5utocommit Transactions because it
automatica**y commits if the statement e;ecution is successfu* an$ it automatica**y
ro**s bac/ if the statement fai*s.
%hen you connect to the SFL Server throu-h the Fuery 5na*yBer= you can choose a
$ifferent Transaction 'o$e. Others are #mp*ict Transactions an$ E;p*icit
Transactions.
L-et e;p*anations from >an$out 0 pa-es (?.M
#mp*ict TransactionsG
E;p*icit Transactions
1).% De!o in Lab
To $emonstrateG
On one c*ient= chan-e the #mp*icitRTransactions settin- to ON
Start an imp*icit transaction to up$ate the name of a tit*eG
up$ate tit*es set tit*e O ]The Busy E;ecutive]]s Database <ui$e up$ate$ by ;;;]
here tit*eRi$ O ]B412)"]
Chere ;;; is the stu$entJs user nameD.
#f other c*ients no try to se*ect from the tit*es tab*e= they shou*$ fin$ that the Auery is
ta/in- some time because the transaction has a *oc/ on the tab*e.
,a-e 80 of 87
FBE Computer Science Department Lecture Notes Theory of Databases
Commit the transaction on the first c*ientG
Commit tran
No the se*ect on the other c*ients shou*$ comp*ete.
11 Introduction to .ecurity
Types of Database $i$nJt cover ear*ier in -reat $etai*
Cou*$ $oG
5na*ytic COL5,D
Operationa* COLT,D
,a-e 87 of 87

You might also like