Ivunit Query Processing
Ivunit Query Processing
QUERY PROCESSING
Example
A sequence of primitive operations that can be used to evaluate a query is a Query Execution Plan or
Query Evaluation Plan.
The above diagram indicates that the query execution engine takes a query execution plan and
returns the answers to the query.
Query Execution Plan minimizes the cost of query evaluation.
MEASURES OF QUERY COST
Query processing steps and evaluation plan, Though a system can create multiple plans for a
query, the chosen method should be the best of all. It can be done by comparing each possible
plan in terms of their estimated cost. For calculating the net estimated cost of any plan, the cost
of each operation within a plan should be determined and combined to get the net estimated cost
of the query evaluation plan.
The cost estimation of a query evaluation plan is calculated in terms of various resources that
include:
To estimate the cost of a query evaluation plan, we use the number of blocks transferred from the
disk, and the number of disks seeks. Suppose the disk has an average block access time of
ts seconds and takes an average of tT seconds to transfer x data blocks. The block access time is
the sum of disk seeks time and rotational latency. It performs S seeks than the time taken will
be b*tT + S*tS seconds. If tT=0.1 ms, tS =4 ms, the block size is 4 KB, and its transfer rate is 40
MB per second. With this, we can easily calculate the estimated cost of the given query
evaluation plan.
The response time for a query-evaluation plan (that is, the wall-clock time required to execute
the plan), assuming no other activity is going on in the computer, would account for all these
costs, and could be used as a measure of the cost of the plan. Unfortunately, the response time of
a plan is very hard to estimate without actually executing the plan, for the following reasons:
1. The response time depends on the contents of the buffer when the query begins execution; this
information is not available when the query is optimized, and is hard to account for even if it
were available.
2. In a system with multiple disks, the response time depends on how accesses are distributed
among disks, which is hard to estimate without detailed knowledge of data layout on disk.
SELECT (symbol: σ)
PROJECT (symbol: π)
RENAME (symbol: ρ)
UNION (υ)
INTERSECTION ( ),
DIFFERENCE (-)
CARTESIAN PRODUCT ( x )
JOIN
DIVISION
Evaluation of Expressions
The result of each evaluation is materialized in a temporary relation for subsequent use. A
disadvantage to this approach is the need to construct the temporary relations, which (unless they
are small) must be written to disk. An alternative approach is to evaluate several operations
simultaneously in a pipeline, with the results of one operation passed on to the next, without the
need to store a temporary relation.
1. Materialization
2. Pipelining
In this method, the given expression evaluates one relational operation at a time. Also, each
operation is evaluated in an appropriate sequence or order. After evaluating all the operations,
the outputs are materialized in a temporary relation for their subsequent uses. It leads the
materialization method to a disadvantage. The disadvantage is that it needs to construct those
temporary relations for materializing the results of the evaluated operations, respectively. These
temporary relations are written on the disks unless they are small in size.
Pipelining
Optimizer Components
Query processing is done with the following aim −
Minimization of response time of query (time taken to produce the results to user’s
query).
Maximize system throughput (the number of requests that are processed in a given
amount of time).
Reduce the amount of memory and storage required for processing.
Increase parallelism.
Query Transformer
For some statements, the query transformer determines whether it is advantageous to rewrite the
original SQL statement into a semantically equivalent SQL statement with a lower cost.
When a viable alternative exists, the database calculates the cost of the alternatives separately
and chooses the lowest-cost alternative. The following graphic shows the query transformer
rewriting an input query that uses OR into an output query that uses UNION ALL.
Query Transformer
Approaches to Query Optimization
Among the approaches for query optimization, exhaustive search and heuristics-based
algorithms are mostly used.
Exhaustive Search Optimization
In these techniques, for a query, all possible query plans are initially generated and then the best
plan is selected. Though these techniques provide the best solution, it has an exponential time
and space complexity owing to the large solution space. For example, dynamic programming
technique.
Heuristic Based Optimization
Heuristic based optimization uses rule-based optimization approaches for query optimization.
These algorithms have polynomial time and space complexity, which is lower than the
exponential complexity of exhaustive search-based algorithms. However, these algorithms do
not necessarily produce the best query plan.
Some of the common heuristic rules are −
Perform select and project operations before join operations. This is done by moving the
select and project operations down the query tree. This reduces the number of tuples
available for join.
Perform the most restrictive select/project operations at first before the other operations.
EXAMPLE:
Estimating Statistics of Expression results in DBMS
In order to determine ideal plan for evaluating the query, it checks various details about the
tables that are stored in the data dictionary. These informations about tables are collected when a
table is created and when various DDL / DML operations are performed on it. The optimizer
checks data dictionary for :
Total number of records in a table, nr. This will help to determine which table needs to be
accessed first. Usually smaller tables are executed first to reduce the size of the
intermediary tables. Hence it is one of the important factors to be checked.
Total number of records in each block, fr. This will be useful in determining blocking
factor and is required to determine if the table fits in the memory or not.
Total number of blocks assigned to a table, br. This is also an important factor to
calculate number of records that can be assigned to each block. Suppose we have 100
records in a table and total number of blocks are 20, then fr can be calculated as nr/b r =
100/20 = 5.
Total length of the records in the table, l r. This is an important factor when the size of
the records varies significantly between any two tables in the query. If the record length
is fixed, there is no significant affect. But when a variable length records are involved in
the query, average length or actual length needs to be used depending upon the type of
operations.
Number of unique values for a column, d Ar. This is useful when a query uses
aggregation operation or projection. It will provide an estimate on distinct number of
columns selected while projection. Number groups of records can be determined using
this when Aggregation operation is used in the query. E.g.; SUM, MAX, MIN, COUNT
etc.
Levels of index, x. This data provides the information like whether the single level of
index like primary key index, secondary key indexes are used or multi-level indexes like
B+ tree index, merge-sort index etc are used. These index levels will provide details
about number of block access required to retrieve the data.
Selection cardinality of a column, s A. This is the number of records present with same
column value as A. This is calculated as nr/d Ar. i.e.; total number of records with
distinct value of A. For example, suppose EMP table has 500 records and DEPT_ID has
5 distinct values. Then the selection cardinality of DEPT_ID in EMP table is 500/ 5 =
100. That means, on an average 100 employees are distributed among each department.
To choose an evaluation plan for a query expression is simply to choose for each operation the cheapest
algorithm for evaluating it. We can choose any ordering of the operations that ensures that operations
lower in the tree are executed before operations.
statistics for them based on cost based evaluation and heuristic methods are collected. It
checks the costs based on the different techniques that we have seen so far. It checks for the
operator, joining type, indexes, number of records, selectivity of records, distinct values etc from
the data dictionary. Once all these informations are collected, the best evaluation plan.
EXAMPLE:
Or
∏ EMP_ID, DEPT_NAME (σ DEPT_ID = 10 AND EMP_LAST_NAME = ‘Joseph’ (EMP
∞DEPT))
Or
σ DEPT_ID = 10 AND EMP_LAST_NAME = ‘Joseph’ (∏ EMP_ID, DEPT_NAME, DEPT_ID
(EMP ∞DEPT))
MATERIALIZED VIEW
is a database object that contains the results of a query. For example, it may be a local copy of
data located remotely, or may be a subset of the rows and/or columns of a table or join result, or
may be a summary using an aggregate function.
The basic difference between View and Materialized View is that Views are not stored
physically on the disk. ... View can be defined as a virtual table created as a result of the query
expression. However, Materialized View is a physical copy, picture or snapshot of the base
table.
MATERIZLIZED AND VIEWS DIFFEERENCES:
Views needs not to be updated every time Materialized views are updated as the
the relation on which view is defined is tuples are stored in the database system. It
updated, as the tuples of the views are can be updated in one of three ways
computed every time when the view is depending on the databases system as
accessed. mentioned above.
It does not have any storage cost It does have a storage cost associated with
associated with it. it.
It does not have any updation cost It does have updation cost associated with
associated with it. it.