0% found this document useful (0 votes)
66 views

Ivunit Query Processing

Query processing involves compiling and executing database queries. It has two phases: compile-time and runtime. During compile-time, the query is optimized and an execution plan is generated. During runtime, the execution plan is carried out and results are returned. The goal of query optimization is to find the most efficient execution plan with the lowest estimated cost in terms of resources like disk access, CPU usage, and memory usage. Common techniques for query optimization include exhaustive search, heuristics-based rules, and query rewriting.

Uploaded by

Keshava Varma
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Ivunit Query Processing

Query processing involves compiling and executing database queries. It has two phases: compile-time and runtime. During compile-time, the query is optimized and an execution plan is generated. During runtime, the execution plan is carried out and results are returned. The goal of query optimization is to find the most efficient execution plan with the lowest estimated cost in terms of resources like disk access, CPU usage, and memory usage. Common techniques for query optimization include exhaustive search, heuristics-based rules, and query rewriting.

Uploaded by

Keshava Varma
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

IV UNIT

QUERY PROCESSING

Definition. Query processing denotes the compilation and execution of a query specification


usually expressed in a declarative database query language such as the
structured query language (SQL). Query processing consists of a compile-time phase and a
runtime phase.

Introduction to Query Processing

 Query Processing is a translation of high-level queries into low-level expression.


 It is a step wise process that can be used at the physical level of the file system, query
optimization and actual execution of the query to get the result.
 It requires the basic concepts of relational algebra and file structure.
 It refers to the range of activities that are involved in extracting data from the database.
 It includes translation of queries in high-level database languages into expressions that can be
implemented at the physical level of the file system.
 In query processing, we will actually understand how these queries are processed and how they
are optimized.

In the above diagram,


 The first step is to transform the query into a standard form.
 A query is translated into SQL and into a relational algebraic expression. During this process,
Parser checks the syntax and verifies the relations and the attributes which are used in the query.
 The second step is Query Optimizer. In this, it transforms the query into equivalent expressions
that are more efficient to execute.
 The third step is Query evaluation. It executes the above query execution plan and returns the
result.

Translating SQL Queries into Relational Algebra

Example

SELECT Ename FROM Employee


  WHERE Salary > 5000;

Translated into Relational Algebra Expression

σ Salary > 5000 (π Ename (Employee))


                                        OR
π Ename (σ Salary > 5000 (Employee))

A sequence of primitive operations that can be used to evaluate a query is a Query Execution Plan or
Query Evaluation Plan.
 The above diagram indicates that the query execution engine takes a query execution plan and
returns the answers to the query.
 Query Execution Plan minimizes the cost of query evaluation.
MEASURES OF QUERY COST

Query processing steps and evaluation plan, Though a system can create multiple plans for a
query, the chosen method should be the best of all. It can be done by comparing each possible
plan in terms of their estimated cost. For calculating the net estimated cost of any plan, the cost
of each operation within a plan should be determined and combined to get the net estimated cost
of the query evaluation plan.

The cost estimation of a query evaluation plan is calculated in terms of various resources that
include:

o Number of disk accesses


o Execution time taken by the CPU to execute a query
o Communication costs in distributed or parallel database systems.

To estimate the cost of a query evaluation plan, we use the number of blocks transferred from the
disk, and the number of disks seeks. Suppose the disk has an average block access time of
ts seconds and takes an average of tT seconds to transfer x data blocks. The block access time is
the sum of disk seeks time and rotational latency. It performs S seeks than the time taken will
be b*tT + S*tS seconds. If tT=0.1 ms, tS =4 ms, the block size is 4 KB, and its transfer rate is 40
MB per second. With this, we can easily calculate the estimated cost of the given query
evaluation plan.

The response time for a query-evaluation plan (that is, the wall-clock time required to execute
the plan), assuming no other activity is going on in the computer, would account for all these
costs, and could be used as a measure of the cost of the plan. Unfortunately, the response time of
a plan is very hard to estimate without actually executing the plan, for the following reasons:

1. The response time depends on the contents of the buffer when the query begins execution; this
information is not available when the query is optimized, and is hard to account for even if it
were available.
2. In a system with multiple disks, the response time depends on how accesses are distributed
among disks, which is hard to estimate without detailed knowledge of data layout on disk.

tT – time to transfer one block


tS – time for one seek
Cost for b block transfers plus S seeks
b * tT + S * tS
We ignore CPU costs for simplicity
Real systems do take CPU cost into account
SELECTION OPERATION

Unary Relational Operations

 SELECT (symbol: σ)
 PROJECT (symbol: π)
 RENAME (symbol: ρ)

Relational Algebra Operations From Set Theory

 UNION (υ)
 INTERSECTION ( ),
 DIFFERENCE (-)
 CARTESIAN PRODUCT ( x )

Binary Relational Operations

 JOIN
 DIVISION

Evaluation of Expressions

The evaluate an algebraic expression means to find the value of the expression when the


variable is replaced by a given number. To evaluate an expression, we substitute the given
number for the variable in the expression and then simplify the expression using the order of
operations.

The result of each evaluation is materialized in a temporary relation for subsequent use. A
disadvantage to this approach is the need to construct the temporary relations, which (unless they
are small) must be written to disk. An alternative approach is to evaluate several operations
simultaneously in a pipeline, with the results of one operation passed on to the next, without the
need to store a temporary relation.

1. Materialization

2. Pipelining

Let's take a brief discussion of these methods.


Materialization

In this method, the given expression evaluates one relational operation at a time. Also, each
operation is evaluated in an appropriate sequence or order. After evaluating all the operations,
the outputs are materialized in a temporary relation for their subsequent uses. It leads the
materialization method to a disadvantage. The disadvantage is that it needs to construct those
temporary relations for materializing the results of the evaluated operations, respectively. These
temporary relations are written on the disks unless they are small in size.

Pipelining

Pipelining is an alternate method or approach to the materialization method. In pipelining, it


enables us to evaluate each relational operation of the expression simultaneously in a pipeline. In
this approach, after evaluating one operation, its output is passed on to the next operation, and
the chain continues till all the relational operations are evaluated thoroughly. Thus, there is no
requirement of storing a temporary relation in pipelining.
Query Optimization

A query optimizer translates a query into a sequence of physical operators that can be directly


carried out by the query execution engine. ... The goal of query optimization is to derive an
efficient execution plan in terms of relevant performance measures, such as memory usage
and query response time.

A query optimizer translates a query into a sequence of physical operators that can be directly


carried out by the query execution engine. ... The goal of query optimization is to derive an
efficient execution plan in terms of relevant performance measures, such as memory usage
and query response time.

Optimizer Components
Query processing is done with the following aim −
 Minimization of response time of query (time taken to produce the results to user’s
query).
 Maximize system throughput (the number of requests that are processed in a given
amount of time).
 Reduce the amount of memory and storage required for processing.
 Increase parallelism.

Query Transformer

For some statements, the query transformer determines whether it is advantageous to rewrite the
original SQL statement into a semantically equivalent SQL statement with a lower cost.

When a viable alternative exists, the database calculates the cost of the alternatives separately
and chooses the lowest-cost alternative. The following graphic shows the query transformer
rewriting an input query that uses OR into an output query that uses UNION ALL.

Query Transformer
Approaches to Query Optimization

Among the approaches for query optimization, exhaustive search and heuristics-based
algorithms are mostly used.
Exhaustive Search Optimization
In these techniques, for a query, all possible query plans are initially generated and then the best
plan is selected. Though these techniques provide the best solution, it has an exponential time
and space complexity owing to the large solution space. For example, dynamic programming
technique.
Heuristic Based Optimization
Heuristic based optimization uses rule-based optimization approaches for query optimization.
These algorithms have polynomial time and space complexity, which is lower than the
exponential complexity of exhaustive search-based algorithms. However, these algorithms do
not necessarily produce the best query plan.
Some of the common heuristic rules are −
 Perform select and project operations before join operations. This is done by moving the
select and project operations down the query tree. This reduces the number of tuples
available for join.
 Perform the most restrictive select/project operations at first before the other operations.

EXAMPLE:
Estimating Statistics of Expression results in DBMS

In order to determine ideal plan for evaluating the query, it checks various details about the
tables that are stored in the data dictionary. These informations about tables are collected when a
table is created and when various DDL / DML operations are performed on it. The optimizer
checks data dictionary for :

 Total number of records in a table, nr. This will help to determine which table needs to be
accessed first. Usually smaller tables are executed first to reduce the size of the
intermediary tables. Hence it is one of the important factors to be checked.
 Total number of records in each block, fr. This will be useful in determining blocking
factor and is required to determine if the table fits in the memory or not.
 Total number of blocks assigned to a table, br. This is also an important factor to
calculate number of records that can be assigned to each block. Suppose we have 100
records in a table and total number of blocks are 20, then fr can be calculated as nr/b r =
100/20 = 5.
 Total length of the records in the table, l r. This is an important factor when the size of
the records varies significantly between any two tables in the query. If the record length
is fixed, there is no significant affect. But when a variable length records are involved in
the query, average length or actual length needs to be used depending upon the type of
operations.
 Number of unique values for a column, d Ar. This is useful when a query uses
aggregation operation or projection. It will provide an estimate on distinct number of
columns selected while projection. Number groups of records can be determined using
this when Aggregation operation is used in the query. E.g.; SUM, MAX, MIN, COUNT
etc.
 Levels of index, x. This data provides the information like whether the single level of
index like primary key index, secondary key indexes are used or multi-level indexes like
B+ tree index, merge-sort index etc are used. These index levels will provide details
about number of block access required to retrieve the data.
 Selection cardinality of a column, s A. This is the number of records present with same
column value as A. This is calculated as nr/d Ar. i.e.; total number of records with
distinct value of A. For example, suppose EMP table has 500 records and DEPT_ID has
5 distinct values. Then the selection cardinality of DEPT_ID in EMP table is 500/ 5 =
100. That means, on an average 100 employees are distributed among each department. 

Choice of Evaluation Plans

To choose an evaluation plan for a query expression is simply to choose for each operation the cheapest
algorithm for evaluating it. We can choose any ordering of the operations that ensures that operations
lower in the tree are executed before operations.

statistics for them based on cost based evaluation and heuristic methods are collected. It
checks the costs based on the different techniques that we have seen so far. It checks for the
operator, joining type, indexes, number of records, selectivity of records, distinct values etc from
the data dictionary. Once all these informations are collected, the best evaluation plan.

EXAMPLE:

EMP and DEPT.


∏ EMP_ID, DEPT_NAME (σ DEPT_ID = 10 AND EMP_LAST_NAME = ‘Joseph’ (EMP)
∞DEPT)

Or
∏ EMP_ID, DEPT_NAME (σ DEPT_ID = 10 AND EMP_LAST_NAME = ‘Joseph’ (EMP
∞DEPT))

Or
σ DEPT_ID = 10 AND EMP_LAST_NAME = ‘Joseph’ (∏ EMP_ID, DEPT_NAME, DEPT_ID
(EMP ∞DEPT))
MATERIALIZED VIEW

 is a database object that contains the results of a query. For example, it may be a local copy of
data located remotely, or may be a subset of the rows and/or columns of a table or join result, or
may be a summary using an aggregate function.

The basic difference between View and Materialized View is that Views are not stored
physically on the disk. ... View can be defined as a virtual table created as a result of the query
expression. However, Materialized View is a physical copy, picture or snapshot of the base
table.
MATERIZLIZED AND VIEWS DIFFEERENCES:

Views Materialized Views

Query expression are stored in the


databases system, and not the resulting Resulting tuples of the query expression are
tuples of the query expression. stored in the databases system.

Views needs not to be updated every time Materialized views are updated as the
the relation on which view is defined is tuples are stored in the database system. It
updated, as the tuples of the views are can be updated in one of three ways
computed every time when the view is depending on the databases system as
accessed. mentioned above.

It does not have any storage cost It does have a storage cost associated with
associated with it. it.

It does not have any updation cost It does have updation cost associated with
associated with it. it.

There is no SQL standard for defining a


materialized view, and the functionality is
There is an SQL standard of defining a provided by some databases systems as an
view. extension.

Materialized views are efficient when the


view is accessed frequently as it saves the
Views are useful when the view is computation time by storing the results
accessed infrequently. before hand.

You might also like