0 ratings0% found this document useful (0 votes) 36 views15 pagesUnit 3 - DBMS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Query Processing &
Optimization
4.0 Introduction :
unit mainly focuses on Query Processing and Query Optimization
abase
explains query processing along with various steps involved in it
escribe measures of query cost estimation,
it explains various join operations as neste® ‘oop join, block nested loop
join, index nested loop join, merge join and hash join.
« also describe concept of materialization and pipelining,
- It discuss problem based on join operation.
1 describe concept of query optimization.
es various equivalence rule use for transformation of relational
.0 explains teehniques of query optimization.
the end the concept of materialized view is explain.
w a
eseribe steps involved in query processing? Explain functionin
ofeach step. cand ihwelof
@ List and explain basic steps involved in query.
0. Explain query processing in DBMS with neat sketch.
[s-12, 14]
roceasing is the process of transforming « high level query
i nt nt execution plan expressed in
({retrevelo and manipulation
o
Wweitten in SQL into correct and effic
iow level language, that performs req
i 1 database.
jy involves set of steps in order to access databas
ut
rieval of
heir database system receives a query for update or ret
ation, it goes through series of query compilation steps called
¢ and give
157
Parsing and Translation
(2) Optimization
4 @) Evaluation
oey >
suaisies
‘About Ota
Fig. Query Processing Si
(1) Parsing and Translation
Itmainly involves conversion of SQL query into relational algebra query
as user writes query in SQL which is in a high level language but it is
not understandable by database engine who can only understand
relational algebra, so it requires conversion from SQL to relational
algebra.
~ This process is similar to work performed by parser of compiler.
- The parser checks syntax of user query and verify that relation name
appearing in query is the name of relation in database.
~ It also generates parse tree representation of query which it then
i translates into relational algebra expression.
{ (2) Optimization
- It's a process of selecting one efficient qulery plan out of many with aim
of reducing amount of time taken to execute query.
~ There are number of ways available to execute single query.
~ Each SQL query can itself be translated into a relational algebra
expression in several ways. nt
~ Consider the SQL query
Select balance 5
From account
Where balance < 2500‘TochnoScan - Database Management System
158
This query can be translated into either of following relational algebra
expression.
(2) cneess0laige(@COURt))
(2) Thera Ortanecseel@ecount))
- Query plan is a set of steps to obtain desire output.
+ For aboye relational algebra expression below query plans can be
generate
|
i
{
4
'
n,
I
account account
Fig. Query Evaluation Plan
= In order to select one query plan, optimizer must know cost of each
plan which is a selection parameter for it.
- Different evaluation plans for given query can have different cost.
- Exact cost is hard to compute since it depends or many parameters
‘such as actual memory available to operation,
- It is possible to get rough estimate of execution cost for each operation.
(3) Evaluation
Once query plan is chosen query is evaluated with that plan by
evaluation engine and send output of query to user.
4.2 Measures of Query Cost Estimation :
Q Explain the measures of query cost estimation.
= The cost of query evaluation can be measured in terms of a number of
different resources, including disk accesses, CPU time to execute a
query.
- In database systems disk access is the most important cost since disk
access is slow compared to memory operation.
- Estimating the CPU time is relatively hard, compared to estimating
disk access is slow compared to memory operation.
- We use number of block transfer from disk as measure of actual cost.
To simplify our computation of disk access cost. we assume all transfers
of block have same cost.
- This assumption ignores the variance arising from rotational latency
and seek time.
7
ites
~To get more precise numbers, we need to distinguish yoy
jal 1/O where blocks read are continuous on disk atid,
J
3s
y
g
sequenti
1/0 where blocks are non continuous.
- We also need to distinguish between reads and writes of blocks
takes more time to write a block to disk than to read a block from
- A more accurate measures would estimate
(1) The number of seek operation performed
(2) The number of block read
(3) The number of blocks written
4.3 Selection Operation
- In query processing, the file scan is the lowest level operator to a
data, File scans are search algorithms that locate and retrieve records
that fulfill a selection condition.
4.3.1 Basic Algorithms
Q. Explain two basic algorithms to implement the selection operations.
- Consider a selection operation on a relation whose tuples are
together in one file. Two scan algorithms use to implement selecti
operation are
(a) Linear Search :
- In linear search the system scan each file block and test all records t
‘see whether they satisfy selection condition.
- For a selection on a key attribute, the system can termin:
required record is found, without looking at other records of r
~ The cost of linear search in terms of number of I/O operations
where br denotes number of blocks in file.
(b) Binary Search :
~ If file is ordered on an attribute and selection condition is sn eq
comparison on the attribute we can use a binary search to locate rec
that satisfy the selection.
~ The system performs the binary search on blocks of file.
- The number of blocks that need to be examined to find a block con
required records is 1092(br) where br denotes number of blocks in/s.3.: raaeckotonUringindics === Indices ae
ndex structure is referred to as access paths, sinee they provide a
path through which data can be located and accessed,
- Aprimary index is an index that allows records of file to be read in an
order that correspond to physical order in the file
- Ordered indices such as B" trees also permit access to tuples in a sorted
onder, which is useful for implementing range queries,
- Indices can provide fast, direct and ordered access.
4.4 Sorting
Q Explain external sort merge algorithm.
- Sorting of data plays an important role in database system for reasons.
- First SQL queries can specify that the output be sorted.
Second several of the relational operations, such as joins can be
implemented efficiently if input relations are first sorted.
We can sorta relation by building an index on sort key and then using
that index to read the relation in sorted order, however such a process,
erders the relation only logically through an index, rather than
physically.
- For relations that fit in memory, techniques like quick sort can be
used, but for relations that don't fit in memory external sort merge is,
good choice.
Let M denote memory sizé (in pages)
creation of Runs :
Let i= 0;
Repeat
Read M blocks of relation into memory
Sort in memory part of relation
Write sorted data to run Ri
isied
Until end of relation
Merging of Runs
We assume that N (total no of Funs) < M (memory size)
se N blocks of memory to buffer input runs and 1 block to buffer output.
first block of each run into Hs buffer page.
4 161
Repeat
ree first record among all buffer pages. ©)
+ Write record to the output buffer. ifoutput butfer is full, write it to disk.
+ Delete record from it’s input buffer page
if buffer page becomes empty
then read next block of the run into buffer
Until all input buffer merge passes are required.
IfN > M then several merge passes are required.
- The below fi
relation. For illustration we assume only one tuple fits in a block and
we assume that memory holds at most 3 page frames. During merge
stage 2 page frames are used for input and for output.
Ts
31
rae ht =H
ater =
= zy = [it
ea ets e135
ra
ar Tr
ma a7
oe) \ peta
ate
a
eh Ye
ran
a Ron
Merge argo
Poet Past
Fig. Merging Process
4.5 Join Operation
- It involves several algorithms for computing join of relation and
analyzing their cost.
= It involves following join operation
(1) Nested loop join
(2) Block nested loop join
(3) Index nested loop join
(4) Merge join .
(8) Hash join,182
4.5.1 Nested loop join
= This algorithm is called nested loop join as it basically consist of pair of
nested for loop.
This algorithm is use to compute theta join rp40s of two rel:
jon and s is called i
~ The relation r is called outer rel
= The nested loop join algorithm is expensive, as it exan
Algorithm :
Foreach tuple tr in r do begin
Foreach tuple ts in s do begin
test pair (tr, ts) to see if they satisfy join condition
ifthey do, not tr:ts to the result
end
end
‘The computation cost of above algorithm can be calculated in worst case
Total block access = nr* bs + br
Here nr = number of tuples in relation r
bs = No of block for relation 5
br= No of block for relation r
In worst case buffer can hold only one block of each re
*bs + br block accesses would be required.
In best case there is enough space for both relations to fit in memory so
each block would have to be read only once hence
yn and total nr
Total block access = br + bs
4.5.2 Block Nested Loop Join
~ Ifbuffer is too small to hold either relation entirely in memory, we can
still obtain a major saving in block accesses if we process the relations
on per block basis rather than on per tuple basis.
is a variation of nested loop join where every
paired with every block of outer relation.
= Block nested loop j
block of inner relation
+ Po-each block Rr ofr da begin
Foreach block Bs of s do begin
_ Foreach tuple tr in Br do begin
‘and seek time:
ts in Bs do begin
to see if they satisfy join condition i
end
Computation cost of above algorit!
= Worst case
In worst case their will be
total block access = br* bs + br
where brand bs denote number of block containing records of rai
~ Best case
total block = br + bs
access
4.5.3 Indexed Nested Loop Join
and index is used to lookup tuple:
tuple tr.
4.5.4 Merge Join
Best case
Total block accessed = br1 + br2
Worst case
Total block accessed required = br2 * [2 log, (br2/3)] + 1
for r2
lock accessed required = br1 * [2 log, 9br1/3)} +L
Total
forrl
Total block accessed required = r2 + rl‘TechnoScan - Database Management System,
5.5 Hash Join
Like merge join algorithm hash join algorithm ean be used to
ment natural join and equi joins,
hash join algorithm, a hash funct
nh is used to partition tuples of
ns. The basic idea is to partition tuples ofeach of the relations
to sets that have same hash value on join attributes.
In Best case:
Total block access read = 3 (br + bs) or 3 (bri + br2)
iy
C46 Materialization :
K
@. What is materialization? Explain it with the help of example.
To evaluate an query containing multiple operations. One of the way is
to evaluate one operation at a time in specific order and result of each
evaluation is materialized in temporary relation for subsequent use.
~ Materialization is the process of storing output of an operator operation
the temporary relation for processing by the next operator.
Materialization process starts from lowest level operation in the
expression, which are at the bottom of the query tree.
Consider the below relational algebra query as
n,
aoolA'CCOUNt) pq Customer}
- The pictorial representation of the above query as shown below.
Gon <2000
‘Account
customer
Fig, Pictorial rapresentation of query
In onter to evaluate above query we apply materialization which start
: Jowest level operation that is from selection operation on account
relation, for above query.
. _ Tien the result obtained by selection operation is stored in temporary
© © setation which is use as input to operation at the next level in the tree.
Unit-4
165
- Here next level operation is join which takes input as temporary relation
created before and customer relation, @)
roduces another temporary
relation which is given as input to the projection operation at the root
of tree and hence complete query is evaluated.
~ Then join operation can be evaluated an
= This type of query evaluation is called materialized evaluatio
result of each intermediate operation are created and then used for
evaluation of next level operation.
- The main problem with this approach is the need to construct temporary
relation which must be written to disk.
(AY Pipelining MM Sell:
@. Explain pipelining process. js.)
- Pipelining is the process of passing result of one relational operator to
another operator directly without storing it in temporary relation.
- Pipelining is used to improve performance of the queries, as we know
the result of intermediate algebra operations are stored on secondary
storage temporarily.
- This process of temporarily writing intermediate algebra operations is
called materialization.
- The material
jon process starts from lowest level operation on input
relation at bottom of the query tree and output is stored in temporary
relation, which is then used to execute operation at the next level of
tree, same process is repeated and finally operation at root of the tree
is evaluated and we get final result.
- The efficiency of the query evaluation can be improved by reducing the
number of temporary files produced. So several relational operations
are combined into pipeline of operation in which result of one operation
pipelined to another operation without storing it into temporary relation,
~ Apipeline is implemented as a separate process within the DBMS.
- Each pipeline takes stream of tuples from it’s inputs and stream of
tuplesasit’s output
= A buffer is created for each pair of adjacent operation to hold tuples.
‘being passed from 1* operation to second one.
- Pipeline operation eliminates the cost of reading and writing temporary
relations.
Example of Pipelining
ase 200 hy (@CC))
In this query pipeliningcan be used, as here first projection operation is
performed whose output directly given as input to selection operatic
instead of storing it in temporary relation.166 TechnoScan - Database Management System
Pipeline can be executed in cither of two ways
(2) Demand Driven
(2) Producer Driven
4.7.1 Demand Driven:
jpeline system makes repeated request for tuples.
wv
est for tuples, it
tuple to be returmed and then returns that tuple.
= System keeps trick of ed so far.
n also makes request for
tuples from it’s pi
Using tuples received from it's pi
tuples for it’s output and passes them up t
~ Demand driven pipelining involves pulling data up from the operation
at the top.
ined inputs, the operation computes
4.7.2 Producer driven pipelining =
- In producer driven pipeline operations do not wait for request to produce
tuples but it generate the tuples itself.
= Each operation at bottom of a pipeline continually generates output
tuples and puts them in it’s output buffer until buffer is ful
- An operation at any level of pipeline generates output tuples when it
gets input tuples from lower operation in pipeline, until it's output buffer
is full.
- Once operation uses tuple from pipelined input, it removes tuple from
it's input buffer,
- Once output buffer is full, the operation waits until it’s pai
Femoves tuples from buffer, so that butfer has space for more tuples.
er is full
‘operation
~ At this point, the operation generates more tuples, unt
again.
- The operation repeat this process until all the output tuples have been.
generated.
- The producer driven pipeline involves pushing data up from below.
Let relations r1(A, By
2? as 20,000 tuple and 12 has 45,000 tuples 25 tuples o
block HTS
¢
Problem Based On Join Operation 9°“ «
and r2 (C, D, E) have following pro,
0 tuples of r2 on 1 Block.
ft eee eee eee
ing join strategies of r1hir2
# (a) Nested loop join
poe
Z(b) Block nested loop join
Ac) Merge join
(a) Hash join
[s-10,11,13}
a
20,000 _ 5
11-2002. vcs (ot)
45,000
Boag 71500 blocks (be2)
For Nested Loop Join:
asl pa thington In Be pelslaon
and 2 is inner.
Worst case
“Roof dak access = nr
Here nrl = No of tup
bel and br2 = No ofblocks fr
+ be2 + bel
= 20,000 * 1500 + soo
Total block access
No of disk access = 30,000, 800
Best Casi
Total block access = bri + br2
= 800+ 1500
2300‘TechnoSean - Database Management System
{0} For Block Nested Loop Join:
jlock access = br] * br2 + bri
= 800* 1500 + 800
00800
est Case:
lock access = br1 + br2
= 800+ 1500
= 2300
(c} For Merge Join
Rest Case
1ek access = bri + br2
= 800 + 1500
= 2300
Worst Case :
~~Fotal block access for r2 = br2 * [2 log, (br2/3)] + 1
= 1500 * [2 log, (1500/3)}
= 26808.35
‘Total block access for rl = bri * {2 log, (br1/3)] + 1
= 800 * [2 log, (800/3)
= 12895.22
lock access required = 26898.35 + 12895.22
=39793.57
= 39793,
| “Total block access = 3 (brl + br2)
< = 31800 + 1500)
=6900, :
4 pipet relation r1 (A, B,C) and r2 (C, D, E) have following proper
Die rt ha 3 has 5000 tuples
sic ri has 10,000 tuples and 2 P
8 a5 tuples of rt fits on one block and
tuples of 72 fits on one block.
¢
|
s 169
Estimate no of block access required using each of the follow!
Join strategy for rl p42. * aca meme
(a) Nested loon join ()
(b) Block nested loop join
Merge join
(d) Hash join
= Calculate no of blocks required for each relation using formula.
No of tuples in relation
No of tuples in I Block
No of block require for relation =
- 10,000 _
5% g5 ~40(te)
000
20 Sy" 100(t2)
(a) For Nested Loop Join
Worst Case:
* Total block access = nrl “br2 + bri
= 10,060 Kt00’» 400
= 1,000,400
Best Case: ,
Total block access = bri + br2
= 400+ 100
= 500
(b) For Block Nested Loop Join :
Total block access = brl * br2 + bri
= 400* 100 + 400
= 40,400
Total block access = bri + br2
= 400 + 100
= 500 “1‘ 170 TechnoScan - Database Management System
} (0) For Mexge Join ;
Best Case :
Total block access = bri + br2
= 400+ 100
=500
Worst Cas
‘Total block access for r2 = br2* [2 log, (br2/3)] + 1
= 100 * (2 log, (100/3)} +1
= 1012.77
Total block access for rl = bri *[2 log, (br1/3)] + 1
= 400 * [2 log, (400/3)] + 1
= 5648.11
Total block access required = 1012.77 + 5648.11
= 6660.8849
= 6661
(a) Hash Join:
Best Case: .
Total block access = 3 (br1 + br2)
=3 (400+ 100)
i = 1500
\ fee
|| Za query Optimization :
| gy rurerve query optimization process in detail. [s-12]
Explain query optimization. [w-12]
- It's a process of selecting the most. jon plan from
many strategies use for processing given query especially when query
is complex.
- Generally we can not expect from users to write their queries so that
they can be processed efficiently, but we expect the system to construct.
query evaluation plan that minimizes the cost of query evaluation
- Query optimization occurs at relati
attempts to find an expreasion the
but more efficient to execute.
- The query optimizer takes relational algebra expression as input and
produces efficient query plan as output.
- The relational algebra expression for above probl
(branch pq (account pq depositor)
are interested in only that tuples which are ré
in Nagpur.
tuples of branch relat
expression.
Preannamel Speny » ragpur
- The above expression
adve
rel
combine with account and depositor table and th
operation.
~ As there are many equivalent transformations of same
izer is to choose one that minimizes resource usagi
Painter ral
ial expressi
sa role of query optimizer to come
plan that computes same result as that of
- Computing the,
the plan. ee
pOpumnize
is not possible without actual‘TechnoScan - Database Management System
‘cess is slow compared to memory access. Usually dominates
of processing query.
- Using these statistics with cost formulae allows to estimate the cost of
‘alual operation,
‘the individual cost’s are combined to determine the estimated cost of
evaluating a given relational algebra expression.
(Query evaluation plan's generation for an expression involves
+ Generating expressions that are logically equivalent to given
expression.
+ Annotating resultant expression to get alternative query plans.
4.9 Transformation of Relational Expressions
ie}
- The order of the tuples is irrelevant means two expre:
generate tuples in different orders but would be consider equivalent as
long as set of tuples is same.
lational algebra expressions are said to be equivalent ifon every
jatabase instance, two expressions generate same set of tuples.
‘A set of equivalence rules can be use to generate another expression
equivalent to previous one.
4.9.1 Equivalence Rules :
@. Explain various equivalence rules of transformation of relational
expression and give it's pictorial representation.
according to equivalence Rule, expressions of two forms are equivalent
timeans we can replace an expression of the first form by an
sssion of the second form or vice versa, since two expressions would
erate same result on any valid database.
The optimizer uses equivalence rules to transform expressions into
other iogically equivalent expressions.
in equivalence niles 0, 0, and 0, denotes predicates whereas L,, Ly, Ly
denote list of attributes and E, E, and E, denote relational algebra
expressions,
Various equivalence rules are given as follows,
1 Conjunetive selection operations can be deconstructed into a
sequence of individual selections.
+ 904004 (E) #60, (Sus (8)
2 Selection operations are commutative
70 (E)) = 902 (8% (E))
i
Unit-4
i
3. Only the last in a sequence of projection operations is needed, the
others can be omitted, _
a (Ps (- (5 ©))-)=!rag (2) ©)
4, Selections can be combined with cartesian Products and theta joins.
a. 9, (B,*E,)=E,04,6,
This expression is just definition of theta join
B64 (E >a By) = E, Mey anEe :
5. ‘Theta-join operations are commutative
5,9 B= E,4,B,
-— is.
(E, pa E,) pa E, = E, pa (E, pa E,) i
1 Theta joins are associative in the following manner
1B, bys BaP a Bs Ey bean (EP BS)
Here 0, involves attributes from only B, and Ey
7. The selection operation distributes over theta join operation under
the following two conditions
a. When all attributes in selection condition @, involve only the
attributes of one ofthe expression being jined
Gq (E, Pd, E,) = (0, (E,)) Po (Sy, (E))
te When selection condition 6, involves only the attributes of E,
and, involves only the atirboutes of E,
Sayan EB, 4, Ey) * (8, (EN) D4 (64, (E,)
8. ‘The projection operation distributes over the theta join operation
as follows,
a, Ifjoin condition q involves only attributes from L; ULe
then
May ure (Bb B= Oy (EP (h(E)
b Consider ajoin E, og E,
Let L, and L, be sets of attributes from E, and B,
Let L, be attributes of E, that are involved in join condition 0,
but are not in L; ULg, then
1 b4 E)= Myute (Mey uts (Ey) bg Uy es (6) k
‘The sct operation union and intersection are commutative
FUE, =E2UEy
Thyutesecnnoscan - Database Management System
———_________zammoscan - Database Management System,
E, NE, =E,NE;
Set union and intersection are associative
10.
(E, UE») UE, =E; U(E2 UE)
(E, NE,) NE, = E, N(Ez NEs)
11, The selection operation distribute over U, and
~E2)= 09 (E1)~ 0 (Ez)
and similarly for Uand/ in place of -
also
% (Ey ~ Ez) =o (E,) Ep
and similarly for 11 in place - but not for U
12, The projection operation distributes over union
mM, (E, UE2)=(T (E1))U(M, (Ez)
‘Theta oin operation are commutative
bX, DX,
LN / \
& & @ 5
5
hues,
{ \ i \
ok ho
ae,
Fig. Pleortl representation of equiv
Example:
Of transformation using equivalence rule
¢
Consider the schema
Branch (Bname, B
Account (accno, Bname, bal)
Depositor (name, aceno)
- Based on above schema suppose query is ask,
Tlesame(Srory«tagaur (bTANCH PA (account depositor
Igebra expression can be transform t
sing rule 7a.
= The above relat
equivalent exp!
Tasane( pry «nagnue (BFANCH)) Pe (account ba depositor)
r intermediate relation compa’
- This expression generates smal
original expression.
4.10 Estimating statistics of Expression Result
nds on size and other sta
h as abd (b bac) to est
+ The cost of an operation de
cost of opera
4.10.1 (a) Catalog Information :
The DBMS catalog stores the following sta
database relations :
nr The number of tuples in rel
br The number of blocks containing tuple of relation r.
dr Size of tuple of relation and in bytes.
fr -> The blocking factor of re
relation r that fit into one block.
WA, 1) > The number of di
attribute A.
4.10.2 (b) Join Size Estimation
~ The cartesian r x s contains nr ns tuples, Each tuple of r x s occ
Ir + Is bytes from which we can calculate the size of cartesian prov
~ Estimating the [Link] natural join is somewhat more.¢omplica
estimating ore no ata carielan ponent‘TechnoStan - Database Management System
By considering all tuples in r, we estimate that there are nr + ns/V
les in r®4s, If we reverse roles of r and s in the preceding
. we obtain an estimate of nr + ns/V(A, 1) tuples in rbd.
csampie:
‘Jo illustrate way of estimating join size consider the expression depositor
ving catalog information about 2 relations.
= 10,000 and
= 25
= 1000/25 = 400
_ = 5000 and
= 50
nate 7 3000/50 = 100
‘name, depositor) = 2500, it implies that on avg, each customer
nts.
jpute size estimates for depositor > customer without using
about foreign key
‘ustname, depositor] = 2500 and
v(custname, customer) = 10000
mates we get here by using formula
#15 _ 5000 «10.000 _ 99.999 and
2500
1s _ $000 10.000 _ gong
10,000
‘and we choose the lower one.
AEA: choice of Evaluation Plans :
jon of expression is only part of query opti process,
fe each operation in the expression can be implemented with
werent algorithms.
fore evaluation plan defines exactly what algorithm is used for
‘cperation and how execution of operation is coordinated.
ynal algebra query
(branch P¢ (account M4 depositor
.e expression for 1
sv aee (Ouse
Unit-4 rh
One possible evaluation plan for above expression is
- In the below fig, the edges from selection operations to the merge join
operation are marked as pipelined,
operation generate their output sorted on the join attributes. They would
do so if indices on branch and account store records with equal value
for the index attributes sorted by branch name.
(hash join)
>A merge join) ceposior
Brancn
Fig. An eval
Classification of Query Evaluation Plan
‘The query evaluation plan may be classified
0 the following,
= Left deep tree query evaluation plan
- Right deep tree query evaluation plan
« Linear tree query evaluation plan
- Bushy (Non linear) tree query evaluation plan.178 TechnoScan - Database Managoment System
) Lett deep tree query evaluation plan,
In this type of query evaluation
the bottom of tree and proceed for
Fo. Ln dep execution plan
' (2) Right deep tree query evaluation plan
= In tis type of evaluation plan again executio
touon cece and proceed brverd wih sght handase input tea boeey
\ operation is an intermediate result and left hand side input as stored
‘lation. I's structure fs shown below.
tarts from rel
Fig. Right deep ex
(3) Linear tree query evaluation plan
7 its a combination of left deep and right deep trees where relation on
fone side of operator is always a base relation. It's structure is shown
(4) Bushy tree query evaluation plan
+ This type of plan is the most general type of plan,
inputs into binary operation to be intermediate resul
is shown below.
Fig. Bushy execution plan
’ 12 Evaluation Techniques/optimization Techniques
@
ion techniques
. Explain different query optimiz
~ One way to select eval
each ope
~ We can choose
oper
the tree.
~ Choosing cheapest algorithm for each ope
necessarily a good idea, al
costlier than a hash j
evaluating a later operat
+ Wecanuse rates
+can be used for,‘TechnoScan - Database Management System
cost of evaluation plan using stat
ches all the plans and chooses best plan in cost based fashion.
Heuristic based optimization.
ir uses heuristics to choose a plan.
4.12.1 Cost based optimization
@ Explain cost based optimization technique in detail. Is-12]
Q. Write note on cost base optimization. Is-11]
c optimizer generates a range of query evaluation plan from
ing equivalence rales and chooses the one with least
x query, the number of different query plans that are
lent to a given plan can be large.)
sn can be done by using below steps
Choose the cheapest plan based on estimated cost.
‘thé costof plan can be estimated by using following information.
inca information abouit relations such as number of tuples,
or of distinct values for attribute.
wn for intermediate results, to compute cost of
capressions
iy important to find optimal join order to have efficient query
ler the expression rl 4 r2.>4 13, for this expression there 12
nt join ordering can be possible as r1 bd (r2 bd r3), 3 ba (rl Par2)
(2-9) join onder can be
general with n relations there are ,
. te!
‘amie programming algorithm to find optimal join order.
of computations and
se dyn
ining algorithm store resul
tow fig | shows dynamic programming algorithm for join order.
unite 181
Dynamic-programming algorithm forjoin order optimization
Procedure Find best plan (S)
iflbestlan{s).cosea) — J) best nlan (S} already camped ronan
bestian[s]
(S contains only 1 relation)
Set best plan (s]. plan and best plan
accessing 5
Else for each non-empty subset S1 of S such that S1 ¢S
PI = Find Best Pan (S1)
2 Find Best Plan (S -S
‘A= Best algorithm for joining results of Pl and P2
cost» Pl. cost * P2. cost + costof A
itcost < best plan [8]. cost
best plan [S}. cost = cost
best plan [S] plan = “execute [Link]; execute P2. plan; join results of
Pland [Link] AY
return best plan
cost based on best way of
Fig, Oynamic Programming Algorithm
- This algorithm stores evaluation plan in an associative array called
best plan.
- Each element of the associative array contains two components the
The value of best plan [S]. Cost is
Jan [S] has not yet been computed,
= The procedure first check if the best plan for computing the join of
given set of relations s has been computed already, if so it retums
already computed plan.
- Otherwise procedure tries every way of dividing $ into 2 discount
Subsets, For each division, procedure recursively find best plan for each.
sta subsets and then compute cost of the overall plan by using that
division.
The procedure picks the cheapest plan from among all alternatives
for dividing S into 2 sets.
_ The cheapest plan and it’s cost are stored in the array best plan and
returned by procedure.
_ The complexity ofthe procedure can be shown to be O(3n}sz ‘TechnoScan -Datsbase Management System
—$——________fnstno Stan - Database Management System
4.12.2 Heuristic Optimization :
[5-10] W-10, 11]
(w-12][8-11)
Q Explain heuristic optimization.
Q Write note on heuristic optimization.
@ What do you mean by heuristic optimization? Discuss main
heuristics applied during it. 18-13]
~The cost based optimization is expensive, even with dynamic
Programming,
~ Systems may use heuristics to reduce the numberof choices that must
be made in cost based fashion.
~ Heuristic optimization transforms the query tree by using set of rules
‘hat typically but not in all cases improve execution performance.
~ Some of the heuristics rules are
(1) Perform selection as early as possible
+ His usually better to perform selection earlier than projection since
selections have potential to reduce the size of relations,
2 Perform projection easly
~ The projection operation like the selection, reduces sizeof relation
henever we need to generate termporay eaten, fie chee
pp ismnedatey any rejection
6) Pevorm most restictve selection and join operations before other
Slr operations
&) Some system use only heuristics, others combine heuristics with
Paralcost based optimization’
Steps in Hearstic Optimization
1 Devonsructcosunctve elections ino a sequence of ingle selection
Speraon. It is based on equ rule 1-1 facitats meee een
the tree
21 Mone selection operation down the query ce ‘or eariestposibe
fecution. tis base on Bgut Rule ooo,
9) Saree fet hase sleson and join operations that wl produee
the smallest relations. Itis base on Equi Rule @ st Wi Produce
9) pies creian product operations that ae flowed by eclect
condition by join operation. Itis based on Equi Rule: 4a. The cartesiz 7
Producti expensive to implement =
3 seanatctand move as far down the tre as possibietist of projecn
istof projection
Ruiahules. creating new projections where needed fe base on equi
Sugibaes creating ne ceded. It ‘on equi
unit —
rations can be pipelinseaan
6) Identify those sub tree whose ope ‘
ig pipelining.
i bove recorder an initial query tree rep
Tre persed sone ocr an ial ue ee
a
results are applied first.
Advantages : eer chosen
1. The access plan selection phase of heuristic optimizer chos:
strategy for each operation.
stem performs
ructure the tree so that system p y
ve selection and join operation before other
jemative sequence of operations to produce a set of can
evaluation plan.
Q
13 Materialized View
What is materialized view? Explain it with suitable example.
Q Explain the concept of materialized view along with its maintenance.
Aas.
~ Amaterialized view is a sub table created for an origin:
result obtained by firing query on original table
's virtual in nature but materi
n disk.
fen used in data warehousing
business intelligence applications where querying large tat lee
thousands of row takes more time.
~ Materialized view helps us by storing frequently accessed data of
large table.
~ Mcterialized view also helps in providing faster access todata and he
enhance the query performance.
~ Undoubtedly materialized of vi
+ Materializing all possible +
Performance but at the hi
cost.
7 fearing all the views virtual will have lowest view maint.
Dut poor query performance, hence to maintain tela be
Re may materialize some of the view while leaving ether
interme
ize query response ti
warehouse
st of view maintenance
‘Syntax for creating materialized view :
Create materialized view (Materialized view name]
as select att name from table name:/ TechnoScan - Database Management System
* example, Suppose We created table bank with attributes as
ids, DOB, Bal, contno)
Sank (Aecno, cname,
Sank needs to access only cname and aceno frequently rather th
adds, DOB, Bal and contno. So in order to save our time we store cay
acend and name in separate sub table called materialized vem,
cessing accno and ename from original table will take more time as
compare to accessing it from materialized view.
- Here we create materialized view called V1 as shown below.
Create materialized view V1
as select accno, name from Bank.
- In creation of materialized view we face two problems as
() View selection problem.
(2) View maintenance problem
View selection problem :
To materialize all the views created on original table is not possible
because of storage space constraint.
View maintenance problem :
4 problem with materialized views is that they must be kept up to date
chen data used in base table changes.
particular custname changes in bank table then
omes inconsistent and it should be updated.
up to date with the underlying
or example suppose
materialized view bec
‘The task of keeping materialized view
data is known as view maintenance.
manually written code, that is every piece of
aterialized view V1
\iew can be maintain by
code that updates the custname in mi
ning materialized view is to define triggers
‘of each relation in the view definition.
other option for maintait
sert, delete and update
the contents of materialized view to take
‘at caused trigger to fire, A simple way of
materialized view on every
on in
ihe triggers must modify
nt the change thi
ly re-compute the
ni acco
Going 80 is to completel
update,
4 better option is to modify only the affected parts of materialized view,
‘ich is known as incremental view maintenance:
tduciern database systems provide more direct support for incremental
sew maintenance.
5.0 Int
- This
prop
- Itals
~ It foc
- Itals
andi
- It giv
! imple
- At la
preve
- This v
| ne
yatta, Defin
Lae .
| Trans:
seque
~ It incl
! deletic
! - Transé
variou
= Gener
level ¢
| is deli
transa
' - The tr
transa
= Ifthe
then d
- For ex
book fr
from a