Unit-2(Query Optimization and Processing)
Unit-2(Query Optimization and Processing)
1. Execution time: Execution time is the time it takes for a query to execute and
return results. This measure is typically expressed in seconds or milliseconds and
is a useful measure of the performance of a query.
2. CPU time: CPU time is the amount of time that the query spends executing on
the CPU. This measure is typically expressed in seconds or milliseconds and is a
useful measure of the amount of processing power consumed by a query.
3. Memory usage: Memory usage is the amount of memory that the query
consumes while executing. This measure is typically expressed in bytes or
megabytes and is a useful measure of the amount of memory required to execute
a query.
4. Disk I/O: Disk I/O is the number of times that the query reads from or writes to
disk. This measure is typically expressed in the number of I/O operations or the
amount of data read or written.
5. Network bandwidth: Network bandwidth is the amount of data transferred
between the client and the server during the execution of the query. This
measure is typically expressed in bytes or megabytes per second and is a useful
measure of the amount of network resources consumed by a query.
Measuring the cost of a query in a DBMS is important for optimizing query performance
and improving system scalability. By understanding the resources consumed by a query,
DBMS professionals can make informed decisions about query optimization, indexing,
partitioning, and other performance tuning techniques.
sqlCopy code
SELECT column1, column2, ...
FROM table
WHERE condition;
In this syntax, SELECT is used to specify the columns to be retrieved, FROM is used to
specify the table to be queried, and WHERE is used to specify the condition that the rows
must meet in order to be retrieved.
The condition in the WHERE clause is typically a comparison between a column in the
table and a constant value, a comparison between two columns in the table, or a logical
combination of multiple conditions using logical operators such as AND , OR, and NOT .
For example, the following SQL statement retrieves all rows from the "customers" table
where the "city" column is equal to "New York":
sqlCopy code
SELECT *
FROM customers
WHERE city = 'New York';
The selection operation is an important part of querying data in a DBMS, and it can be
used to filter and retrieve specific subsets of data based on a wide range of conditions.
Sorting in DBMS:
Sorting is the process of arranging data in a specific order, typically in ascending or
descending order, based on the values of one or more columns. Sorting is an important
operation in a DBMS and is commonly used to display query results in a meaningful way
or to prepare data for further analysis.
In SQL, the ORDER BY clause is used to sort the results of a query based on one or more
columns. The syntax for the ORDER BY clause is as follows:
For example, the following SQL statement retrieves all rows from the "employees" table
where the "department" column is equal to "Sales", and sorts the results by the "salary"
column in descending order:
SELECT *
FROM employees
WHERE department = 'Sales'
ORDER BY salary DESC;
1. Inner join: The inner join operation returns only the rows from both tables where
the join condition is true.
2. SELECT *
3. FROM table1
4. INNER JOIN table2
ON table1.column = table2.column;
2. Left join: The left join operation returns all the rows from the left table and
matching rows from the right table. If there is no matching row in the right table,
the result set will contain NULL values for the right table columns.
3. SELECT *
4. FROM table1
5. LEFT JOIN table2
ON table1.column = table2.column;
3. Right join: The right join operation returns all the rows from the right table and
matching rows from the left table. If there is no matching row in the left table, the
result set will contain NULL values for the left table columns.
4. SELECT *
5. FROM table1
6. RIGHT JOIN table2
ON table1.column = table2.column;
Full outer join: The full outer join operation returns all the rows from both tables,
including those with no matching rows in the other table. If there is no matching row in
one of the tables, the result set will contain NULL values for the missing columns
SELECT *
FROM table1
FULL OUTER JOIN table2
ON table1.column = table2.column;
The join operation is an important part of querying data in a DBMS, and it can be used
to combine data from multiple tables based on a wide range of conditions. Joining tables
can be an expensive operation, especially on large datasets, so it is important to optimize
queries by using appropriate indexes and limiting the number of columns included in the
SELECT clause.
Apart from the selection and join operations, there are several other operations that are
commonly used in a DBMS to manipulate data. These include:
Set operations: Set operations are used to combine the results of two or more queries
into a single result set. The common set operations include UNION, INTERSECT, and
EXCEPT. The syntax for the UNION set operation in SQL is
1. Parsing: The query parser first reads the query and breaks it down into smaller
units, such as keywords, identifiers, operators, and constants. This process is
called parsing.
2. Semantic analysis: After parsing, the query parser performs semantic analysis to
check if the query is syntactically correct and meaningful. It checks for errors such
as undefined variables, ambiguous column names, and invalid operators.
3. Optimization: Once the query is parsed and validated, the query optimizer
identifies the most efficient way to execute the query. This involves selecting the
most appropriate access path, join order, and join method.
4. Execution: After optimization, the query engine executes the query by evaluating
each expression in the query. Expressions can be evaluated using either tuple-at-
a-time or block-at-a-time processing.
• Tuple-at-a-time processing: In tuple-at-a-time processing, each row of data is
processed one at a time. This approach is best suited for small result sets or
queries with complex expressions.
• Block-at-a-time processing: In block-at-a-time processing, a block of rows is
processed at once. This approach is best suited for large result sets or queries
with simple expressions.
5. Result generation: Finally, the query engine generates the result set by combining
the rows of data that meet the query conditions. The result set is returned to the
user or application that issued the query.
1. Algebraic simplification: In this step, the query optimizer simplifies the relational
expression using algebraic identities and properties. For example, the optimizer
can use the distributive property to rewrite a query as a combination of simpler
queries.
2. Predicate push-down: In this step, the optimizer pushes down predicates
(conditions) in a query to the lowest possible level in the expression tree. This
reduces the number of rows that need to be processed and improves query
performance.
3. Join reordering: In this step, the optimizer reorders the join operations in a query
to reduce the number of intermediate results that need to be stored and
processed. This can significantly improve query performance for queries with
multiple joins.
4. Subquery optimization: In this step, the optimizer optimizes subqueries by
selecting the most efficient access path and join method. This can reduce the
overall cost of the query execution.
5. Index selection: In this step, the optimizer selects the most appropriate indexes to
use for the query. This can improve query performance by reducing the number
of disk accesses needed to retrieve the data.
6. View merging: In this step, the optimizer combines views in a query to eliminate
redundant computations and reduce the number of intermediate results that
need to be stored and processed.
7. Query rewrite: In this step, the optimizer rewrites the query using alternative
expressions that have the same meaning but are more efficient to execute.
The following are some of the techniques used in a DBMS to estimate statistics of
expression results:
1. Index selection: The query optimizer may choose to use an index for a particular
query if it can significantly reduce the number of rows that need to be processed.
However, the use of an index may not always be the most efficient option,
especially if the index is not selective enough or if the cost of accessing the index
is too high.
2. Join order selection: In a query with multiple join operations, the order in which
the joins are performed can significantly affect the query performance. The query
optimizer may try different join orders and choose the one that results in the
lowest cost.
3. Join algorithm selection: There are several algorithms for performing joins, such
as nested loop join, hash join, and sort-merge join. The choice of join algorithm
depends on the size of the input tables, the available memory, and the available
CPU resources.
4. Parallelism: The query optimizer may choose to execute parts of a query in
parallel if there are multiple CPU cores available. Parallelism can significantly
improve query performance by allowing multiple operations to be executed
simultaneously.
5. Materialization: The query optimizer may choose to materialize intermediate
results if it can improve query performance. Materialization involves storing
intermediate results in a temporary table, which can be used later in the query
execution.
6. Subquery optimization: The query optimizer may choose to optimize subqueries
by selecting the most efficient access path and join method. This can reduce the
overall cost of the query execution.
The choice of evaluation plans in a DBMS is a complex process that involves considering
various factors and selecting the plan that results in the lowest cost. By selecting the
most efficient evaluation plan, the DBMS can improve the performance of query
processing and reduce the overall cost of executing queries.
Materialized Views in DBMS:
A materialized view in a DBMS is a precomputed table that stores the results of a query.
Materialized views are used to improve the performance of queries by reducing the
amount of work that needs to be done at query execution time. The following are some
of the benefits of using materialized views in a DBMS:
However, materialized views also have some disadvantages. The precomputed results
may become stale if the underlying data changes, which can lead to inconsistent query
results. To address this issue, the DBMS may need to refresh the materialized views
periodically or in response to changes in the underlying data. Materialized views can
also consume significant storage space, which can be a concern for large databases.
Overall, materialized views can be a useful tool for improving query performance in a
DBMS, but their use should be carefully considered and balanced against the potential
disadvantages.