Ch-2 Query Processing and Optimization
Ch-2 Query Processing and Optimization
• EMPLOYEE ⟗ FACT_WORKERS
Cont’d...
• Therefore, Relational algebra serves as a backbone for querying,
manipulating, and optimizing relational databases.
• It provides a formal and mathematical framework for working with
relational data and is an essential tool for designing, implementing,
and analyzing database systems.
• Relational algebra serves as the foundation for query optimization in
database management systems (DBMS). The algebraic expressions
generated by the query optimizer are based on relational algebra
operations.
Cont’d...
• Query Processing includes translations on high level Queries into low
level expressions that can be used at physical level of file system,
query optimization and actual execution of query to get the actual
result.
Cont’d...
Cont’d...
• Step-1:
• Parser: During parse call, the database performs the following
checks- Syntax check, Semantic check and Shared pool check, after
converting the query into relational algebra.
• Parser performs the following checks as (refer detailed diagram):
• Syntax check – concludes SQL syntactic validity. Example:
• SELECT * FORM employee
• Here error of wrong spelling of FROM is given by this check.
Cont’d...
• Semantic check – determines whether the statement is meaningful or
not. Example: query contains a tablename which does not exist is
checked by this check.
• Shared Pool check – Every query possess a hash code during its
execution.
• So, this check determines existence of written hash code in shared
pool if code exists in shared pool then database will not take additional
steps for optimization and execution.
Cont’d...
• Step-2:
• Optimizer: During optimization stage, database must perform a hard parse
atleast for one unique DML statement and perform optimization during this
parse.
• This database never optimizes DDL unless it includes a DML component
such as subquery that require optimization.
• It is a process in which multiple query execution plan for satisfying a query
are examined and most efficient query plan is satisfied for execution.
• Database catalog stores the execution plans and then optimizer passes the
lowest cost plan for execution.
Cont’d...
• For any given query, there may be a number of different ways to
execute it.
• The process of choosing a suitable one for processing a query is
known as query optimization.
• The two forms of query optimization are as follows −
• Heuristic optimization − Here the query execution is refined based
on heuristic rules for reordering the individual operations.
• Cost based optimization − the overall cost of executing the query is
systematically reduced by estimating the costs of executing several
different execution plans.
Cont’d...
• Example
• Select name from customer, account where customer.name=account.name and
account.balance>2000;
• There are two evaluation plans −
• Πcustomer.name(σcustomer.name=account.name ^
account.balance>2000(customerXaccount)
• Πcustomer.name(σcustomer.name=account.name(customerXσ
account.balance>2000(account)
• Cost evaluator evaluates the cost of different evaluation plans and chooses the
evaluation plan with lowest cost. Disk access time, CPU time, number of
operations, number of tuples, size of tuples are considered for cost calculations.
Cont’d...
• Rules
• Heuristic optimization transforms the expression-tree by using a set of rules which
improve the performance. These rules are as follows −
• Perform the SELECTION process foremost in the query. This should be the first
action for any SQL table. By doing so, we can decrease the number of records required
in the query, rather than using all the tables during the query.
• Perform all the projection as soon as achievable in the query. Somewhat like a
selection but this method helps in decreasing the number of columns in the query.
• Perform the most restrictive joins and selection operations.
• What this means is that select only those sets of tables and/or views which will result
in a relatively lesser number of records and are extremely necessary in the query.
Obviously any query will execute better when tables with few records are joined.
Cont’d...
• Row Source Generation –
• The Row Source Generation is a software that receives a optimal
execution plan from the optimizer and produces an iterative execution
plan that is usable by the rest of the database. the iterative plan is the
binary program that when executes by the sql engine produces the
result set.
• Step-3:
• Execution Engine: Finally runs the query and display the required
result.