0% found this document useful (0 votes)
5 views10 pages

Unit-2(Query Optimization and Processing)

This document discusses query optimization and processing in DBMS, detailing measures of query cost such as execution time, CPU time, memory usage, disk I/O, and network bandwidth. It covers various operations including selection, sorting, joining, and other data manipulation techniques, along with the evaluation of expressions and the transformation of relational expressions for improved query execution efficiency. Additionally, it highlights the importance of estimating statistics and choosing evaluation plans to optimize query performance, as well as the benefits of using materialized views.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
5 views10 pages

Unit-2(Query Optimization and Processing)

This document discusses query optimization and processing in DBMS, detailing measures of query cost such as execution time, CPU time, memory usage, disk I/O, and network bandwidth. It covers various operations including selection, sorting, joining, and other data manipulation techniques, along with the evaluation of expressions and the transformation of relational expressions for improved query execution efficiency. Additionally, it highlights the importance of estimating statistics and choosing evaluation plans to optimize query performance, as well as the benefits of using materialized views.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 10

UNIT -2(QUERY OPTIMIZATION AND PROCESSING)

Measures of Query Cost in DBMS :

In a DBMS, the cost of a query is typically measured in terms of the resources it


consumes, such as CPU time, memory, I/O operations, and network bandwidth. There
are several measures of query cost in DBMS, including:

1. Execution time: Execution time is the time it takes for a query to execute and
return results. This measure is typically expressed in seconds or milliseconds and
is a useful measure of the performance of a query.
2. CPU time: CPU time is the amount of time that the query spends executing on
the CPU. This measure is typically expressed in seconds or milliseconds and is a
useful measure of the amount of processing power consumed by a query.
3. Memory usage: Memory usage is the amount of memory that the query
consumes while executing. This measure is typically expressed in bytes or
megabytes and is a useful measure of the amount of memory required to execute
a query.
4. Disk I/O: Disk I/O is the number of times that the query reads from or writes to
disk. This measure is typically expressed in the number of I/O operations or the
amount of data read or written.
5. Network bandwidth: Network bandwidth is the amount of data transferred
between the client and the server during the execution of the query. This
measure is typically expressed in bytes or megabytes per second and is a useful
measure of the amount of network resources consumed by a query.

Measuring the cost of a query in a DBMS is important for optimizing query performance
and improving system scalability. By understanding the resources consumed by a query,
DBMS professionals can make informed decisions about query optimization, indexing,
partitioning, and other performance tuning techniques.

Selection Operation in DBMS :


In a DBMS, the selection operation is used to retrieve a subset of rows from a table that
meet a specified condition. The selection operation is commonly referred to as the
"where" clause in SQL.
The syntax for the selection operation in SQL is as follows:

sqlCopy code
SELECT column1, column2, ...
FROM table
WHERE condition;

In this syntax, SELECT is used to specify the columns to be retrieved, FROM is used to
specify the table to be queried, and WHERE is used to specify the condition that the rows
must meet in order to be retrieved.

The condition in the WHERE clause is typically a comparison between a column in the
table and a constant value, a comparison between two columns in the table, or a logical
combination of multiple conditions using logical operators such as AND , OR, and NOT .

For example, the following SQL statement retrieves all rows from the "customers" table
where the "city" column is equal to "New York":

sqlCopy code
SELECT *
FROM customers
WHERE city = 'New York';

The selection operation is an important part of querying data in a DBMS, and it can be
used to filter and retrieve specific subsets of data based on a wide range of conditions.

Sorting in DBMS:
Sorting is the process of arranging data in a specific order, typically in ascending or
descending order, based on the values of one or more columns. Sorting is an important
operation in a DBMS and is commonly used to display query results in a meaningful way
or to prepare data for further analysis.

In SQL, the ORDER BY clause is used to sort the results of a query based on one or more
columns. The syntax for the ORDER BY clause is as follows:

SELECT column1, column2, ...


FROM table
WHERE condition
ORDER BY column1 [ASC|DESC], column2 [ASC|DESC], ...;
In this syntax, the ORDER BY clause is used to specify one or more columns by which the
results should be sorted. The optional ASC or DESC keyword is used to specify whether the
sort should be in ascending or descending order.

For example, the following SQL statement retrieves all rows from the "employees" table
where the "department" column is equal to "Sales", and sorts the results by the "salary"
column in descending order:

SELECT *
FROM employees
WHERE department = 'Sales'
ORDER BY salary DESC;

Sorting can be an expensive operation, especially on large datasets. To improve query


performance, it is common to create indexes on the columns used in the ORDER BY clause, as well
as to limit the number of columns included in the SELECT clause to only those needed.

Join Operation in DBMS:


The join operation is used in a DBMS to combine rows from two or more tables based
on a related column between them. The join operation is an important part of querying
data in a relational database and allows users to combine data from multiple tables into
a single result set.

In SQL, there are several types of join operations, including:

1. Inner join: The inner join operation returns only the rows from both tables where
the join condition is true.
2. SELECT *
3. FROM table1
4. INNER JOIN table2
ON table1.column = table2.column;

2. Left join: The left join operation returns all the rows from the left table and
matching rows from the right table. If there is no matching row in the right table,
the result set will contain NULL values for the right table columns.
3. SELECT *
4. FROM table1
5. LEFT JOIN table2
ON table1.column = table2.column;
3. Right join: The right join operation returns all the rows from the right table and
matching rows from the left table. If there is no matching row in the left table, the
result set will contain NULL values for the left table columns.
4. SELECT *
5. FROM table1
6. RIGHT JOIN table2
ON table1.column = table2.column;
Full outer join: The full outer join operation returns all the rows from both tables,
including those with no matching rows in the other table. If there is no matching row in
one of the tables, the result set will contain NULL values for the missing columns

SELECT *
FROM table1
FULL OUTER JOIN table2
ON table1.column = table2.column;

The join operation is an important part of querying data in a DBMS, and it can be used
to combine data from multiple tables based on a wide range of conditions. Joining tables
can be an expensive operation, especially on large datasets, so it is important to optimize
queries by using appropriate indexes and limiting the number of columns included in the
SELECT clause.

Other Operations in DBMS:

Apart from the selection and join operations, there are several other operations that are
commonly used in a DBMS to manipulate data. These include:

1. Projection: The projection operation is used to select specific columns from a


table, while discarding the rest. The syntax for the projection operation in SQL is:
2. SELECT column1, column2, ...
FROM table

2. Aggregation: The aggregation operation is used to calculate summary statistics


for groups of rows in a table. Common aggregation functions include COUNT,
SUM, AVG, MIN, and MAX. The syntax for the aggregation operation in SQL is:
3. SELECT column1, aggregate_function(column2)
4. FROM table
GROUP BY column1;
3. Subquery: A subquery is a query that is embedded within another query.
Subqueries can be used to retrieve data that will be used in the main query, or to
perform complex filtering or aggregation operations. The syntax for a subquery in
SQL is:
4. SELECT column1, column2, ...
5. FROM table1
WHERE column1 IN (SELECT column1 FROM table2 WHERE condition);

Set operations: Set operations are used to combine the results of two or more queries
into a single result set. The common set operations include UNION, INTERSECT, and
EXCEPT. The syntax for the UNION set operation in SQL is

SELECT column1, column2, ...


FROM table1
UNION
SELECT column1, column2, ...
FROM table2;

5. Modification operations: Modification operations are used to modify the data in a


table. The common modification operations include INSERT, UPDATE, and
DELETE. The syntax for the INSERT operation in SQL is:
6. INSERT INTO table (column1, column2, ...)
VALUES (value1, value2, ...);

Evaluation of Expressions in DBMS:

In a DBMS, expressions are used to perform calculations or comparisons on data. The


evaluation of expressions is an important part of processing queries and retrieving data
from a database. The steps involved in evaluating expressions in a DBMS are:

1. Parsing: The query parser first reads the query and breaks it down into smaller
units, such as keywords, identifiers, operators, and constants. This process is
called parsing.
2. Semantic analysis: After parsing, the query parser performs semantic analysis to
check if the query is syntactically correct and meaningful. It checks for errors such
as undefined variables, ambiguous column names, and invalid operators.
3. Optimization: Once the query is parsed and validated, the query optimizer
identifies the most efficient way to execute the query. This involves selecting the
most appropriate access path, join order, and join method.
4. Execution: After optimization, the query engine executes the query by evaluating
each expression in the query. Expressions can be evaluated using either tuple-at-
a-time or block-at-a-time processing.
• Tuple-at-a-time processing: In tuple-at-a-time processing, each row of data is
processed one at a time. This approach is best suited for small result sets or
queries with complex expressions.
• Block-at-a-time processing: In block-at-a-time processing, a block of rows is
processed at once. This approach is best suited for large result sets or queries
with simple expressions.
5. Result generation: Finally, the query engine generates the result set by combining
the rows of data that meet the query conditions. The result set is returned to the
user or application that issued the query.

The evaluation of expressions in a DBMS is a complex process that involves several


steps, including parsing, semantic analysis, optimization, execution, and result
generation. By optimizing the evaluation of expressions, DBMS can efficiently process
large amounts of data and retrieve results quickly.

Transformation of Relational Expressions in DBMS:


In a relational database, queries are expressed as relational expressions using operators
such as SELECT, PROJECT, JOIN, and UNION. These expressions are then transformed or
optimized by the query optimizer to improve the efficiency of the query execution. The
transformation of relational expressions in a DBMS involves the following steps:

1. Algebraic simplification: In this step, the query optimizer simplifies the relational
expression using algebraic identities and properties. For example, the optimizer
can use the distributive property to rewrite a query as a combination of simpler
queries.
2. Predicate push-down: In this step, the optimizer pushes down predicates
(conditions) in a query to the lowest possible level in the expression tree. This
reduces the number of rows that need to be processed and improves query
performance.
3. Join reordering: In this step, the optimizer reorders the join operations in a query
to reduce the number of intermediate results that need to be stored and
processed. This can significantly improve query performance for queries with
multiple joins.
4. Subquery optimization: In this step, the optimizer optimizes subqueries by
selecting the most efficient access path and join method. This can reduce the
overall cost of the query execution.
5. Index selection: In this step, the optimizer selects the most appropriate indexes to
use for the query. This can improve query performance by reducing the number
of disk accesses needed to retrieve the data.
6. View merging: In this step, the optimizer combines views in a query to eliminate
redundant computations and reduce the number of intermediate results that
need to be stored and processed.
7. Query rewrite: In this step, the optimizer rewrites the query using alternative
expressions that have the same meaning but are more efficient to execute.

The transformation of relational expressions in a DBMS is an important aspect of query


optimization. By applying these transformations, the optimizer can improve the
efficiency of query execution and reduce the overall cost of processing queries.

Estimating Statistics of Expression Results in DBMS:


In a DBMS, statistics are used to estimate the size and selectivity of query expressions.
The estimation of statistics is an important part of query optimization, as it helps the
query optimizer to choose the most efficient query plan.

The following are some of the techniques used in a DBMS to estimate statistics of
expression results:

1. Sampling: In this technique, a subset of the data is randomly selected and


analyzed to estimate the statistics of the full dataset. This is a commonly used
technique as it is fast and can provide reasonably accurate estimates.
2. Histograms: In this technique, the values of a column are divided into several
buckets based on their frequency of occurrence. The number of values in each
bucket is then used to estimate the selectivity of the query.
3. Index statistics: In this technique, the optimizer uses statistics from the database
index to estimate the selectivity of the query. This technique is often used for
queries that involve indexed columns.
4. Cost-based estimation: In this technique, the query optimizer uses a cost-based
model to estimate the statistics of the expression results. The cost model
considers factors such as the size of the data, the number of join operations, and
the selectivity of the query to estimate the cost of executing the query.
The accuracy of the statistics estimation depends on the quality and quantity of the data
available. To improve the accuracy of the estimation, the DBMS may use a combination
of techniques and may continuously update the statistics based on the changes in the
database. By accurately estimating the statistics of expression results, the DBMS can
optimize query plans and improve the efficiency of query processing
Choice of Evaluation Plans in DBMS:
The choice of evaluation plans in a DBMS depends on several factors, such as the size of
the database, the complexity of the query, and the available hardware resources. The
query optimizer in a DBMS selects the most efficient evaluation plan based on these
factors. The following are some of the factors that affect the choice of evaluation plans:

1. Index selection: The query optimizer may choose to use an index for a particular
query if it can significantly reduce the number of rows that need to be processed.
However, the use of an index may not always be the most efficient option,
especially if the index is not selective enough or if the cost of accessing the index
is too high.
2. Join order selection: In a query with multiple join operations, the order in which
the joins are performed can significantly affect the query performance. The query
optimizer may try different join orders and choose the one that results in the
lowest cost.
3. Join algorithm selection: There are several algorithms for performing joins, such
as nested loop join, hash join, and sort-merge join. The choice of join algorithm
depends on the size of the input tables, the available memory, and the available
CPU resources.
4. Parallelism: The query optimizer may choose to execute parts of a query in
parallel if there are multiple CPU cores available. Parallelism can significantly
improve query performance by allowing multiple operations to be executed
simultaneously.
5. Materialization: The query optimizer may choose to materialize intermediate
results if it can improve query performance. Materialization involves storing
intermediate results in a temporary table, which can be used later in the query
execution.
6. Subquery optimization: The query optimizer may choose to optimize subqueries
by selecting the most efficient access path and join method. This can reduce the
overall cost of the query execution.

The choice of evaluation plans in a DBMS is a complex process that involves considering
various factors and selecting the plan that results in the lowest cost. By selecting the
most efficient evaluation plan, the DBMS can improve the performance of query
processing and reduce the overall cost of executing queries.
Materialized Views in DBMS:
A materialized view in a DBMS is a precomputed table that stores the results of a query.
Materialized views are used to improve the performance of queries by reducing the
amount of work that needs to be done at query execution time. The following are some
of the benefits of using materialized views in a DBMS:

1. Faster query performance: Materialized views can significantly improve the


performance of queries that access large amounts of data. By precomputing the
results of a query and storing them in a materialized view, the DBMS can avoid
expensive join and aggregation operations at query execution time.
2. Reduced workload on the database: Materialized views can reduce the workload
on the database by precomputing the results of a query and storing them in a
table. This can reduce the amount of work that needs to be done at query
execution time and improve the overall performance of the database.
3. Better scalability: Materialized views can improve the scalability of a database by
reducing the amount of work that needs to be done at query execution time. This
can allow the database to handle larger amounts of data and more complex
queries.
4. Query optimization: Materialized views can be used by the query optimizer to
improve query plans. By using materialized views, the query optimizer can choose
a more efficient plan that involves accessing the precomputed results rather than
performing expensive join and aggregation operations.
5. Data consistency: Materialized views can be used to enforce data consistency
across multiple tables. By precomputing the results of a query and storing them
in a materialized view, the DBMS can ensure that the results are always up-to-
date and consistent with the underlying data.

However, materialized views also have some disadvantages. The precomputed results
may become stale if the underlying data changes, which can lead to inconsistent query
results. To address this issue, the DBMS may need to refresh the materialized views
periodically or in response to changes in the underlying data. Materialized views can
also consume significant storage space, which can be a concern for large databases.

Overall, materialized views can be a useful tool for improving query performance in a
DBMS, but their use should be carefully considered and balanced against the potential
disadvantages.

You might also like