0% found this document useful (0 votes)
5 views26 pages

Ch-2 Query Processing and Optimization

Chapter Two covers query processing and optimization in databases, detailing the steps involved such as parsing, translation, optimization, and evaluation. It explains the role of relational algebra in query representation and optimization, as well as the importance of constructing an efficient query evaluation plan. The chapter also discusses various types of joins and the process of query optimization, including heuristic and cost-based methods.

Uploaded by

jecksewa
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
0% found this document useful (0 votes)
5 views26 pages

Ch-2 Query Processing and Optimization

Chapter Two covers query processing and optimization in databases, detailing the steps involved such as parsing, translation, optimization, and evaluation. It explains the role of relational algebra in query representation and optimization, as well as the importance of constructing an efficient query evaluation plan. The chapter also discusses various types of joins and the process of query optimization, including heuristic and cost-based methods.

Uploaded by

jecksewa
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 26

Chapter Two

Query Processing and Optimization


Outlines
• Query Processing
• Relational Algebra
• Query Optimization
• Transaction Processing
• Concurrency Control
• Recovery System
• Database Security
Introduction
• Query Processing is the activity performed in extracting data from the
database.
• In query processing, it takes several steps for fetching the data from
the database. The steps involved are:
Parsing and translation
Optimization
Evaluation
Parsing and Translation
• As query processing includes certain activities for data retrieval.
• Initially, the given user queries get translated in high-level database
languages such as SQL.
• It gets translated into expressions that can be further used at the physical
level of the file system.
• After this, the actual evaluation of the queries and a variety of query -
optimizing transformations and take place.
• Consequently, SQL or Structured Query Language is the best suitable
choice for humans.
• But, it is not perfectly suitable for the internal representation of the query to
the system
Cont’d...
• Relational algebra is well suited for the internal representation of a
query.
• The translation process in query processing is similar to the parser of a
query.
• When a user executes any query, for generating the internal form of the
query, the parser in the system checks the syntax of the query, verifies
the name of the relation in the database, the tuple, and finally the
attribute required value.
• The parser creates a tree of the query, known as 'parse-tree.'
Cont’d...
Cont’d...
• For doing this, the query is following is undertaken
Select emp_name from Employee where salary>10000;
• Thus, to make the system understand the user query, it needs to be
translated in the form of relational algebra.
• We can bring this query in the relational algebra form as
σsalary>10000 (πsalary (Employee)
πsalary (σsalary>10000 (Employee))
Cont’d...
• After translating the given query, we can execute each relational
algebra operation by using different algorithms.
• So, in this way, a query processing begins its work.
Evaluation
• For this, with addition to the relational algebra translation, it is
required to annotate the translated relational algebra expression with
the instructions used for specifying and evaluating each operation.
• Thus, after translating the user query, the system executes a query
evaluation plan.
Query Evaluation Plan
• In order to fully evaluate a query, the system needs to construct a query evaluation
plan.
• The annotations in the evaluation plan may refer to the algorithms to be used for
the particular index or the specific operations.
• Such relational algebra with annotations is referred to as Evaluation Primitives. The
evaluation primitives carry the instructions needed for the evaluation of the
operation.
• Thus, a query evaluation plan defines a sequence of primitive operations used for
evaluating a query. The evaluation plan is also referred to as the query execution
plan.
• A execution query engine is responsible for generating the output of the given query.
• It takes the query execution plan, executes it, and finally makes the output for the
user query.
Optimization
• The cost of the query evaluation can vary for different types of queries.
Although the system is responsible for constructing the evaluation plan, the
user does not need to write their query efficiently.
• Usually, a database system generates an efficient query evaluation plan,
which minimizes its cost.
• This type of task performed by the database system and is known as Query
Optimization.
• For optimizing a query, the query optimizer should have an estimated cost
analysis of each operation.
• It is because the overall operation cost depends on the memory allocations to
several operations, execution costs, and so on.
• Finally, after selecting an evaluation plan, the system evaluates the query and
provides the output of the query.
Relational Algebra
• Relational algebra is a procedural query language. It gives a step by
step process to obtain the result of the query.
• It uses operators to perform queries.
Cont’d...
Selection (σ):
• It select operation selects tuples that satisfy a given predicate.
• Retrieve all records where the age is greater than 30: σ(age > 30)(Employees)
Projection (π):
• shows the list of those attributes that we wish to appear in the result
• Retrieve only the customer IDs and emails of all customers:
• π(customer_id, email)(Customers)
Union (∪):
• Retrieve all distinct records from Employees and Managers:
• Employees ∪ Managers
Cont’d...
• Join
• A Join operation combines related tuples from different relations, if
and only if a given join condition is satisfied. It is denoted by ⋈.
• ∏EMP_NAME, SALARY (EMPLOYEE ⋈ SALARY)
Cont’d...
Left outer join:
• Left outer join contains the set of tuples of all combinations in R and S
that are equal on their common attribute names.
• In the left outer join, tuples in R have no matching tuples in S.
• It is denoted by ⟕.
• Example: Using the above EMPLOYEE table and FACT_WORKERS
table
• Input:
• EMPLOYEE ⟕ FACT_WORKERS
Cont’d...
Right outer join:
• Right outer join contains the set of tuples of all combinations in R and S
that are equal on their common attribute names.
• In right outer join, tuples in S have no matching tuples in R.
• It is denoted by ⟖.
• Example: Using the above EMPLOYEE table and FACT_WORKERS
Relation
• Input:
• EMPLOYEE ⟖ FACT_WORKERS
Cont’d...
Full outer join:
• Full outer join is like a left or right join except that it contains all rows from
both tables.
• In full outer join, tuples in R that have no matching tuples in S and tuples in S
that have no matching tuples in R in their common attribute name.
• It is denoted by ⟗.
• Example: Using the above EMPLOYEE table and FACT_WORKERS table
• Input:

• EMPLOYEE ⟗ FACT_WORKERS
Cont’d...
• Therefore, Relational algebra serves as a backbone for querying,
manipulating, and optimizing relational databases.
• It provides a formal and mathematical framework for working with
relational data and is an essential tool for designing, implementing,
and analyzing database systems.
• Relational algebra serves as the foundation for query optimization in
database management systems (DBMS). The algebraic expressions
generated by the query optimizer are based on relational algebra
operations.
Cont’d...
• Query Processing includes translations on high level Queries into low
level expressions that can be used at physical level of file system,
query optimization and actual execution of query to get the actual
result.
Cont’d...
Cont’d...
• Step-1:
• Parser: During parse call, the database performs the following
checks- Syntax check, Semantic check and Shared pool check, after
converting the query into relational algebra.
• Parser performs the following checks as (refer detailed diagram):
• Syntax check – concludes SQL syntactic validity. Example:
• SELECT * FORM employee
• Here error of wrong spelling of FROM is given by this check.
Cont’d...
• Semantic check – determines whether the statement is meaningful or
not. Example: query contains a tablename which does not exist is
checked by this check.
• Shared Pool check – Every query possess a hash code during its
execution.
• So, this check determines existence of written hash code in shared
pool if code exists in shared pool then database will not take additional
steps for optimization and execution.
Cont’d...
• Step-2:
• Optimizer: During optimization stage, database must perform a hard parse
atleast for one unique DML statement and perform optimization during this
parse.
• This database never optimizes DDL unless it includes a DML component
such as subquery that require optimization.
• It is a process in which multiple query execution plan for satisfying a query
are examined and most efficient query plan is satisfied for execution.
• Database catalog stores the execution plans and then optimizer passes the
lowest cost plan for execution.
Cont’d...
• For any given query, there may be a number of different ways to
execute it.
• The process of choosing a suitable one for processing a query is
known as query optimization.
• The two forms of query optimization are as follows −
• Heuristic optimization − Here the query execution is refined based
on heuristic rules for reordering the individual operations.
• Cost based optimization − the overall cost of executing the query is
systematically reduced by estimating the costs of executing several
different execution plans.
Cont’d...
• Example
• Select name from customer, account where customer.name=account.name and
account.balance>2000;
• There are two evaluation plans −
• Πcustomer.name(σcustomer.name=account.name ^
account.balance>2000(customerXaccount)
• Πcustomer.name(σcustomer.name=account.name(customerXσ
account.balance>2000(account)
• Cost evaluator evaluates the cost of different evaluation plans and chooses the
evaluation plan with lowest cost. Disk access time, CPU time, number of
operations, number of tuples, size of tuples are considered for cost calculations.
Cont’d...
• Rules
• Heuristic optimization transforms the expression-tree by using a set of rules which
improve the performance. These rules are as follows −
• Perform the SELECTION process foremost in the query. This should be the first
action for any SQL table. By doing so, we can decrease the number of records required
in the query, rather than using all the tables during the query.
• Perform all the projection as soon as achievable in the query. Somewhat like a
selection but this method helps in decreasing the number of columns in the query.
• Perform the most restrictive joins and selection operations.
• What this means is that select only those sets of tables and/or views which will result
in a relatively lesser number of records and are extremely necessary in the query.
Obviously any query will execute better when tables with few records are joined.
Cont’d...
• Row Source Generation –
• The Row Source Generation is a software that receives a optimal
execution plan from the optimizer and produces an iterative execution
plan that is usable by the rest of the database. the iterative plan is the
binary program that when executes by the sql engine produces the
result set.
• Step-3:
• Execution Engine: Finally runs the query and display the required
result.

You might also like