Chapter 2 - Query Processing and Optimization

Query processing involves transforming a SQL query into an efficient execution plan. This involves three main steps: 1. Query decomposition breaks down the query into a query tree of relational algebra operations. 2. Query optimization chooses the most efficient execution plan by applying heuristic rules to reduce intermediate results. This includes pushing down selections and projections before joins. 3. Execution runs the optimized plan to retrieve the query results from the database. Optimization aims to minimize disk I/O and joins by reducing the size of intermediate results through logical transformations.

Uploaded by

Dinksraw

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Download as pptx, pdf, or txt

100% found this document useful (1 vote)

1K views28 pages

Chapter 2 - Query Processing and Optimization

Uploaded by

Dinksraw

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Download as pptx, pdf, or txt

You are on page 1/ 28

Chapter 2

Query Processing and Optimization

1
Chapter Content

1. Query Processing
 Steps of Processing

2. Methods of Optimization
 Heuristic (Logical Transformations)
 Transformation Rules
 Heuristic Optimization Guidelines
 Cost Based (Physical Execution Costs)
 Data Storage/Access Refresher

2
What Query?
A query is a request for data or information from a database table or
combination of tables.
What is Query Processing?
 Steps required to transform high level SQL query into a correct and efficient
strategy for execution and retrieval.
Query Optimization?
 The activity of choosing a single efficient execution strategy (from
hundreds) as determined by database catalog statistics.
 Which relational algebra expression, equivalent to the given query,
will lead to the most efficient solution plan?
 For each algebraic operator, what algorithm (of several available) do
we use to compute that operator?
 How do operations pass data (main memory buffer, disk buffer,…)?
 Will this plan minimize resource usage? (CPU/Response Time/Disk)

3
 Staff
S_ID FName LName Position Salary branchNo
001 Tola Waqjira Chasher 5000 777
002 Habiba Ahmed Manager 8000 999
003 Abdi Jiregna Casher 6000 555
004 Lelise Boru DBA 7000 999

 Branch
branchNo BranchName City
999 Batu Batu Consider
777 Abba Gada Adama these tables
555 Bishoftu Bishoftu for next slides

4
Example: Identify all managers who work in Adama City

SELECT * FROM Staff s, Branch b

WHERE s.branchNo = b.branchNo AND s.position = ‘Manager’ AND
b.city = ‘Adama’;
• Results in these equivalent relational algebra statements

(1)s(position=‘Manager’)^(city=‘Adama’)^(Staff.branchNo=Branch.branchNo) (Staff X Branch)

(2) s(position=‘Manager’)^(city=‘Adama’) (Staff Staff.branchNo = Branch.branchNo Branch)
(3) [s(position=‘Manager’) (Staff)] wvStaff.branchNo = Branch.branchNo [s(city=‘Adama’) (Branch)]
Assume:
• 1000 tuples in Staff
• 50 Managers
• 50 tuples in Branch
• 5 Adama branches
• No indexes or sort keys
• All temporary results are written back to disk (memory is small)
• Tuples are accessed one at a time (not in blocks)
5
Query 1 (Bad)
s(position=‘Manager’)^(city=‘Adama’)^(Staff.branchNo=Branch.branchNo) (Staff X Branch)
 Requires (1000+50) disk accesses to read from Staff and Branch relations
 Creates temporary relation of Cartesian Product (1000*50) tuples
 Requires (1000*50) disk access to read in temporary relation and test
predicate
 Total Work = (1000+50) + 2*(1000*50) = 101,050 I/O operations
Query 2 (Better)
s(position=‘Manager’)^(city=‘Adama’) (Staff wvStaff.branchNo = Branch.branchNo Branch)

• Again requires (1000+50) disk accesses to read from Staff and Branch
• Joins Staff and Branch on branchNo with 1000 tuples (1 employee : 1
branch )
• Requires (1000) disk access to read in joined relation and check predicate
• Total Work = (1000+50) + 2*(1000) = 3050 I/O operations
• 3300% Improvement over Query 1

6
Query 3 (Best)

[ s(position=‘Manager’) (Staff) ] wvStaff.branchNo = Branch.branchNo [ s(city=‘Adama’) (Branch) ]

 Read Staff relation to determine ‘Managers’ (1000 reads)

 Create 50 tuple relation(50 writes)

 Read Branch relation to determine ‘Adama’ branches (50 reads)

 Create 5 tuple relation(5 writes)

 Join reduced relations and check predicate (50 + 5 reads)

 Total Work = 1000 + 2*(50) + 5 + (50 + 5) = 1160 I/O operations
 8700% Improvement over Query 1

7
Query Processing Steps

• Processing can be divided into:

• Decomposition,
• Optimization,
• Execution and
• Code generation
8
1. Query Decomposition
 It is the process of transforming a high level query into a relational algebra
query.
 to check that the query is syntactically and semantically correct.
 It consists of parsing and validation.

Typical stages in query decomposition are:

a. Analysis
 lexical and syntactical analysis of the query(correctness) based on attributes,
data type and etc.
 Query tree will be built for the query containing leaf node for base relations,
one or many non-leaf nodes for relations produced by relational algebra
operations.
 root node for the result of the query.
 Sequence of operation is from the leaves to the root .
SELECT * FROM Catalog c, Author a Where a.authorid = c.authorid AND
c.price>200 AND a.country= ‘ USA’ )
b. Normalization
Convert the query into a normalized form.
The predicate where will be converted to conjunctive (^) or disjunctive (v ) normal form.
9
c. Semantic Analysis
 To reject normalized queries that are not correctly formulated
or contradictory.
Incorrect: if components do not contribute to generate result.
Contradictory: if the predicate can not be satisfied by any tuple.

example
(catalog =“BS”  catalog= “CS”) since a given book can only be
classified in either of the category at a time
d. Simplification
To detect redundant qualifications, eliminate common sub-
expressions.
Transform the query to a semantically equivalent.

10
2. Query Optimization
 Everyone wants the performance of their database to be optimal.
 In particular, there is often a requirement for a specific query or
object that is query based, to run faster.
 Problem of query optimization is to find the sequence of steps that
produces the answer to user request in the most efficient manner,
given the database structure.
 The performance of a query is affected by the tables or queries that
underlies the query and by the complexity of the query.
 Given a request for data manipulation or retrieval, an optimizer will
choose an optimal plan for evaluating the request from among the
various alternative strategies. i.e. there are many ways (access
paths) for accessing desired file/record.
 hence ,DBMS is responsible to pick the best execution strategy
based on various considerations( Least amount of I/O and CPU resources. )

11
 Example: Consider relations r(AB) and s(CD). We require r X s.
 Method 1 :
a. Load next record of r in RAM.
b. Load all records of s, one at a time and concatenate with r.
c. All records of r concatenated?
 NO: goto a.
 YES: exit (the result in RAM or on disk).
 Performance: Too many accesses.
 Method 2: Improvement
a. Load as many blocks of r as possible leaving room for one block of s.
b. Run through the s file completely one block at a time.
 Performance: Reduces the number of times s blocks are loaded by a factor of
equal to the number of r records than can fit in main memory.
 Considerations during query Optimization:
 Narrow down intermediate result sets quickly. SELECT and PROJECTION
before JOIN
 Use access structures (indexes).

12
13
14
15
Approaches to Query Optimization
A. Heuristics Approach
 Uses the knowledge of the characteristics of the relational algebra operations .
 the relationship between the operators to optimize the query.
 Thus the heuristic approach of optimization will make use of:
Properties of individual operators
Association between operators
Query Tree: a graphical representation of the operators, relations, attributes
and processing sequence during query processing. It is composed of three
main parts:
a) The Leafs: the base relations used for processing the query/ extracting the
required information
b) The Root: the final result/relation as an out put based on the operation on the
relations used for query processing
c) Nodes: intermediate results or relations before reaching the final result.
Sequence of execution of operation in a query tree will start from the leaves
and continues to the intermediate nodes and ends at the root.
16
17
Using Heuristics in Query Optimization

 Process for heuristics optimization

 The parser of a high-level query generates an initial internal
representation;
 Apply heuristics rules to optimize the internal representation.
 A query execution plan is generated to execute groups of
operations based on the access paths available on the files
involved in the query.
 The main heuristic is to apply first the operations that reduce the size
of intermediate results.
 E.g. Apply SELECT and PROJECT operations before applying
the JOIN or other binary operations.

18
Query block:
 The basic unit that can be translated into the algebraic
operators and optimized.
 Contains a single select-from-where expression, as
well as group by and having clause if these are part of
the block.
Nested queries
 Within a query are identified as separate query
blocks.

19
Query tree
 A tree data structure that corresponds to a relational
algebra expression.
 It represents the input relations of the query as leaf nodes
of the tree, and represents the relational algebra operations
as internal nodes.
 An execution of the query tree consists of executing an
internal node operation whenever its operands are
available and then replacing that internal node by the
relation that results from executing the operation.
Query graph
 A graph data structure that corresponds to a relational calculus
expression.

20
Example:
 For every project located in ‘Stafford’, retrieve the project number, the controlling
department number and the department manager’s last name, address and birthdate.
 Relation algebra:

PNUMBER, DNUM, LNAME, ADDRESS, BDATE (((PLOCATION=‘STAFFORD’(PROJECT))
DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))
 SQL query:
Q2: SELECT P.NUMBER,P.DNUM,E.LNAME,E.ADDRESS, E.BDATE
FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E
WHERE P.DNUM=D.DNUMBER AND
D.MGRSSN=E.SSN AND
P.PLOCATION=‘STAFFORD’;

21
22
Summary of Heuristics for Algebraic Optimization:
 The same query could correspond to many different relational
algebra expressions and hence many different query trees.
 The main heuristic is to apply first the operations that reduce the size of
intermediate results.
 Perform select operations as early as possible to reduce the number of
tuples
 perform project operations as early as possible to reduce the number of
attributes. (This is done by moving select and project operations as far
down the tree as possible.)
 The select and join operations that are most restrictive should be executed
23 before other similar operations.
B. Cost Estimation Approach to Query Optimization
 The main idea is to minimize the cost of processing a query.
 The cost function is comprised of:
 I/O cost + CPU processing cost + communication cost + Storage cost
 These components might have different weights in different
processing environments
 The DBMs will use information stored in the system catalogue for the
purpose of estimating cost.
 The main target of query optimization is to minimize the size of the
intermediate relation. The size will have effect in the cost of:
1) Access Cost of Secondary Storage
2) Storage Cost
3) Computation Cost
4) Communication Cost
5) Memory usage cost
24
1. Access Cost of Secondary Storage
 Data is going to be accessed from secondary storage. The disk access cost can
again be analyzed in terms of:
 Searching
 Reading, and
 Writing, data blocks used to store some portion of a relation.
 Remark: The disk access cost will vary depending on
 The file organization used and the access method implemented for the file
organization.
 whether the data is stored contiguously or in scattered manner, will affect the
disk access cost.
2. Storage Cost
• While processing a query, as any query would be composed of many
database operations, there could be one or more intermediate results before
reaching the final output. These intermediate results should be stored in
primary memory for further processing.
• The bigger the intermediate relation, the larger the memory requirement,
which will have impact on the limited available space. This will be
considered as a cost of storage.
25
3. Computation Cost
 Query is composed of many operations. The operations could be
database operations like reading and writing to a disk, or
mathematical and other operations like:
 Searching
 Sorting
 Merging
 Computation on field values
4. Communication Cost
• In most database systems the database resides in one station and is
accessed by various queries originate from different terminals. This
will have impact on the performance of the system adding cost for
query processing. Thus, the cost of transporting data between the
database site and the terminal from where the query originate should
be analyzed.
5. Memory usage cost
is the cost pertaining to the number of memory buffers needed during
26 query execution.
Large databases
 the access cost to secondary storage is the main emphasis.
Smaller databases
 the emphasis is on minimizing computation cost.
distributed databases
 communication cost must be minimized also.

27
End
28

Lab Manual (CS-408 Database Systems)
0% (2)
Lab Manual (CS-408 Database Systems)
2 pages
Net-Centric Past Questions Answers
No ratings yet
Net-Centric Past Questions Answers
7 pages
IP Exit Exam
No ratings yet
IP Exit Exam
40 pages
Exit Exam
100% (1)
Exit Exam
258 pages
Aggregate Functions Questions and Answers
No ratings yet
Aggregate Functions Questions and Answers
57 pages
Advanced DB Lecture All in One PDF
33% (3)
Advanced DB Lecture All in One PDF
108 pages
Chapter - 1 - Query Optimization
No ratings yet
Chapter - 1 - Query Optimization
38 pages
Reflection Krashen's Sla Hypothesis
83% (6)
Reflection Krashen's Sla Hypothesis
2 pages
Chapter 4 Query Optimization
100% (2)
Chapter 4 Query Optimization
35 pages
Advanced Database Systems: Chapter 3:query Processing and Evaluation
100% (1)
Advanced Database Systems: Chapter 3:query Processing and Evaluation
36 pages
Advanced Database Systems Chapter 2
100% (1)
Advanced Database Systems Chapter 2
16 pages
Advanced DB Chapter-3
No ratings yet
Advanced DB Chapter-3
54 pages
Database System Final Exam Sheet 3
No ratings yet
Database System Final Exam Sheet 3
5 pages
Dsu MCQ PDF
No ratings yet
Dsu MCQ PDF
16 pages
Database Administration Level IV Practical Exam 4
0% (1)
Database Administration Level IV Practical Exam 4
2 pages
School of Information Science: Addis Ababa University College of Natural and Computational Science
0% (1)
School of Information Science: Addis Ababa University College of Natural and Computational Science
8 pages
Hawassa University Department of Informatics Data Communication and Computer Networking Mid Exam
No ratings yet
Hawassa University Department of Informatics Data Communication and Computer Networking Mid Exam
5 pages
Using Advanced Structured Query Language
No ratings yet
Using Advanced Structured Query Language
59 pages
CS402 Data Mining and Warehousing Question Bank
No ratings yet
CS402 Data Mining and Warehousing Question Bank
6 pages
SQL Questions & Answers
100% (2)
SQL Questions & Answers
8 pages
SQL Practical Questions
100% (1)
SQL Practical Questions
2 pages
Database Administration Level - 3 Practical Exam Level 3: Filed Name Data Type Size
100% (7)
Database Administration Level - 3 Practical Exam Level 3: Filed Name Data Type Size
3 pages
Ambo University Woliso Campus: Advanced Database For 2 Year
100% (2)
Ambo University Woliso Campus: Advanced Database For 2 Year
48 pages
Advanced Database Multiple Choice Question
No ratings yet
Advanced Database Multiple Choice Question
6 pages
ICT COC DBA Level 4
No ratings yet
ICT COC DBA Level 4
4 pages
File Organization in DBMS
100% (1)
File Organization in DBMS
23 pages
Database Administration Level IV (4) Theory Exam 1 - YouTube
100% (1)
Database Administration Level IV (4) Theory Exam 1 - YouTube
3 pages
Database Connectivity Using PHP
No ratings yet
Database Connectivity Using PHP
5 pages
MAP UNIT 4 MCQ
50% (2)
MAP UNIT 4 MCQ
6 pages
Final Exam E
No ratings yet
Final Exam E
2 pages
Advanced Database Technology: Ambo University
100% (1)
Advanced Database Technology: Ambo University
28 pages
Identify and Resolve Database Performance Problems
100% (2)
Identify and Resolve Database Performance Problems
11 pages
MCQs Databases Relational Model and Normalization - Notepad
No ratings yet
MCQs Databases Relational Model and Normalization - Notepad
6 pages
Exit-Exam - 230218 - 161311 NW Device & Conf
No ratings yet
Exit-Exam - 230218 - 161311 NW Device & Conf
32 pages
Final Project Documentation
No ratings yet
Final Project Documentation
80 pages
Exit Exam
100% (1)
Exit Exam
23 pages
Advanced Database Systems Transactions Processing: What Is A Transaction?
No ratings yet
Advanced Database Systems Transactions Processing: What Is A Transaction?
102 pages
Complet DB Backup and Recovery
100% (1)
Complet DB Backup and Recovery
13 pages
Pgdac QB C++&DS
No ratings yet
Pgdac QB C++&DS
6 pages
Hawassa UNIVERSITY: Daye Campus
No ratings yet
Hawassa UNIVERSITY: Daye Campus
15 pages
Sybca (Sem - III) US03CBCA01 - Relational Database Management Systems-I Question Bank
No ratings yet
Sybca (Sem - III) US03CBCA01 - Relational Database Management Systems-I Question Bank
9 pages
Single-User vs. Multi-User System: Dbms - Module - 5 - Notes
No ratings yet
Single-User vs. Multi-User System: Dbms - Module - 5 - Notes
19 pages
Online DS MCQs Paper-MCS 2nd Eve
100% (1)
Online DS MCQs Paper-MCS 2nd Eve
9 pages
7000+ Internet Programming Questions and Answers PDF - 1
No ratings yet
7000+ Internet Programming Questions and Answers PDF - 1
1 page
Data Communication and Computer Networks: Addis Ababa Science and Technology University
No ratings yet
Data Communication and Computer Networks: Addis Ababa Science and Technology University
191 pages
Advanced Database Chapter Two Query Processing and Optimization
100% (1)
Advanced Database Chapter Two Query Processing and Optimization
43 pages
All in ONE DBA Level 4
No ratings yet
All in ONE DBA Level 4
31 pages
Software Engineering (1 Marks Each
No ratings yet
Software Engineering (1 Marks Each
9 pages
CP7211 Advanced Databases Laboratory Manual
100% (2)
CP7211 Advanced Databases Laboratory Manual
106 pages
Distributed File Systems: Unit - V Essay Questions
No ratings yet
Distributed File Systems: Unit - V Essay Questions
10 pages
Exit Exam Fundamentals of Database System 1 3
No ratings yet
Exit Exam Fundamentals of Database System 1 3
5 pages
Term Weighting 2021
100% (2)
Term Weighting 2021
38 pages
Exit Exam of System Analysis and Design (OOSAD) For Ethiopian Students - OOSAD Part 1 - YouTube
100% (1)
Exit Exam of System Analysis and Design (OOSAD) For Ethiopian Students - OOSAD Part 1 - YouTube
1 page
Web Development and Database Administration Level - V: Based On December, 2021 Version-IV Occupational Standard (OS)
No ratings yet
Web Development and Database Administration Level - V: Based On December, 2021 Version-IV Occupational Standard (OS)
72 pages
Bca 3 y Imp Question Python
No ratings yet
Bca 3 y Imp Question Python
2 pages
Chapter Three: Lecture 1: Solving Problems by Searching and Constraint Satisfaction Problem
No ratings yet
Chapter Three: Lecture 1: Solving Problems by Searching and Constraint Satisfaction Problem
53 pages
Advanced Database Notes
50% (2)
Advanced Database Notes
21 pages
DBMS Lab Question
No ratings yet
DBMS Lab Question
4 pages
Database Level-III: February 2017
100% (1)
Database Level-III: February 2017
4 pages
Data Communication Basics CH 2
No ratings yet
Data Communication Basics CH 2
36 pages
2 Chapter 3 Query Optimization
No ratings yet
2 Chapter 3 Query Optimization
29 pages
Edp Chap 3part 1
No ratings yet
Edp Chap 3part 1
39 pages
EDP_CHAP_2
No ratings yet
EDP_CHAP_2
65 pages
Edp Chap 3part 2
No ratings yet
Edp Chap 3part 2
26 pages
EDP_CHAP_4
No ratings yet
EDP_CHAP_4
12 pages
EDP_CHAP_1
No ratings yet
EDP_CHAP_1
22 pages
Chap. Two - History of Ethiopia and The Horn
75% (12)
Chap. Two - History of Ethiopia and The Horn
19 pages
Basic Concepts Related to EDP
No ratings yet
Basic Concepts Related to EDP
2 pages
EDP_CHAP_5
No ratings yet
EDP_CHAP_5
16 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
9 pages
Chapter 3 Transaction Processing Conceptes
No ratings yet
Chapter 3 Transaction Processing Conceptes
75 pages
Chapter 1 - Concept of Object Oriented Database
No ratings yet
Chapter 1 - Concept of Object Oriented Database
23 pages
Chapter 3-ER (Part 1)
No ratings yet
Chapter 3-ER (Part 1)
45 pages
Chap. 3 History of Ethiopia and The Horn
92% (13)
Chap. 3 History of Ethiopia and The Horn
23 pages
Chapter 3-EER (Part 2)
No ratings yet
Chapter 3-EER (Part 2)
26 pages
Fundamentals of Database Management Systems (Cosc2041) : Introduction Database Management System Compiled By: Debritu A
No ratings yet
Fundamentals of Database Management Systems (Cosc2041) : Introduction Database Management System Compiled By: Debritu A
20 pages
Fundamentals of Database Management Systems (Cosc2041) : Chapter Two Database System Architecture
No ratings yet
Fundamentals of Database Management Systems (Cosc2041) : Chapter Two Database System Architecture
38 pages
Ai - Material
No ratings yet
Ai - Material
58 pages
Data Science RR Itec-Deep Learning
No ratings yet
Data Science RR Itec-Deep Learning
41 pages
Suguna 1 IEEE Xplore Conference
No ratings yet
Suguna 1 IEEE Xplore Conference
7 pages
Control Systems: Dr. Anilesh Dey
No ratings yet
Control Systems: Dr. Anilesh Dey
20 pages
Evolution of AI - Final
No ratings yet
Evolution of AI - Final
14 pages
Fpga Implementation of Neural Networks: Main Contents
No ratings yet
Fpga Implementation of Neural Networks: Main Contents
21 pages
Anomaly Detection of Industrial Control Systems Based On Transfer Learning
No ratings yet
Anomaly Detection of Industrial Control Systems Based On Transfer Learning
12 pages
SQL Convert Date Functions and Formats
No ratings yet
SQL Convert Date Functions and Formats
253 pages
MODULE 08 Artificial Intelligence
No ratings yet
MODULE 08 Artificial Intelligence
84 pages
SLA: A Formulaic Language Perspective: Bing-Yan ZHU Shi-Xiang LIU
No ratings yet
SLA: A Formulaic Language Perspective: Bing-Yan ZHU Shi-Xiang LIU
4 pages
Oracle-Goldengate-21c-Data-Sheet
No ratings yet
Oracle-Goldengate-21c-Data-Sheet
4 pages
Face Detection
No ratings yet
Face Detection
18 pages
Deep Learning CNN
No ratings yet
Deep Learning CNN
204 pages
Tech Seminar Topics
No ratings yet
Tech Seminar Topics
2 pages
Nonlinear Systems
No ratings yet
Nonlinear Systems
3 pages
Pseudo-PID Controller: Design, Tuning and Applications
No ratings yet
Pseudo-PID Controller: Design, Tuning and Applications
6 pages
ANN2
No ratings yet
ANN2
16 pages
Daa Unit 4 Part 2
No ratings yet
Daa Unit 4 Part 2
4 pages
AL&ML
No ratings yet
AL&ML
11 pages
Thermal Error Modeling Based On Bilstm Deep Learning For CNC Machine Tool
No ratings yet
Thermal Error Modeling Based On Bilstm Deep Learning For CNC Machine Tool
15 pages
Chinese Room
No ratings yet
Chinese Room
19 pages
Coronel PPT Ch03
100% (1)
Coronel PPT Ch03
38 pages
AI Chapter 1
No ratings yet
AI Chapter 1
18 pages
Database Management System (KCS 501)
No ratings yet
Database Management System (KCS 501)
16 pages
W (S) V (S) : Automatic Control System Quiz (IEE-552)
No ratings yet
W (S) V (S) : Automatic Control System Quiz (IEE-552)
1 page
Temperature Control System Using PID & Fuzzy Logic-1
No ratings yet
Temperature Control System Using PID & Fuzzy Logic-1
11 pages
Abstractive Text Summarization of Multimedia News Content Using RNN
No ratings yet
Abstractive Text Summarization of Multimedia News Content Using RNN
10 pages
Revised General Test/Gross Point Average System Via Fuzzy Logic Techniques
100% (1)
Revised General Test/Gross Point Average System Via Fuzzy Logic Techniques
9 pages
Scalability Challenges in Big Data Science
No ratings yet
Scalability Challenges in Big Data Science
33 pages