Data Base Performance and Query Optimization
Data Base Performance and Query Optimization
Database
Performance and Query Optimization
7.1
IBM i
Database
Performance and Query Optimization
7.1
Note
Before using this information and the product it supports, read the information in Notices, on
page 393.
This edition applies to IBM i 7.1 (product number 5770-SS1) and to all subsequent releases and modifications until
otherwise indicated in new editions. This version does not run on all reduced instruction set computer (RISC)
models nor does it run on CISC models.
Copyright International Business Machines Corporation 1998, 2010.
US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
|
|
|
Contents
Database performance and query
optimization . . . . . . . . . . . . . 1
Whats new for IBM i 7.1 . . . . . . . . . . 1
PDF file for Database performance and query
optimization . . . . . . . . . . . . . . 2
Query engine overview. . . . . . . . . . . 2
SQE and CQE engines . . . . . . . . . . 3
Query dispatcher . . . . . . . . . . . . 4
Statistics manager . . . . . . . . . . . 4
Global Statistics Cache . . . . . . . . . . 5
Plan cache . . . . . . . . . . . . . . 5
Data access methods. . . . . . . . . . . . 7
Permanent objects and access methods . . . . 8
Temporary objects and access methods . . . . 22
Objects processed in parallel. . . . . . . . 50
Spreading data automatically . . . . . . . 51
Processing queries: Overview . . . . . . . . 51
How the query optimizer makes your queries
more efficient. . . . . . . . . . . . . 52
General query optimization tips . . . . . . 52
Access plan validation. . . . . . . . . . 52
Single table optimization . . . . . . . . . 53
Join optimization . . . . . . . . . . . 54
Distinct optimization . . . . . . . . . . 65
Grouping optimization . . . . . . . . . 66
Ordering optimization. . . . . . . . . . 73
View implementation . . . . . . . . . . 74
Materialized query table optimization . . . . 76
Recursive query optimization . . . . . . . 85
Adaptive Query Processing . . . . . . . . 95
Optimizing query performance using query
optimization tools . . . . . . . . . . . . 99
DB2 for IBM i Health Center . . . . . . . 99
Monitoring your queries using the Database
Monitor . . . . . . . . . . . . . . 121
Using System i Navigator with detailed
monitors . . . . . . . . . . . . . . 132
Index advisor . . . . . . . . . . . . 137
Viewing your queries with Visual Explain . . . 141
Optimizing performance using the Plan Cache 146
Verifying the performance of SQL applications 155
Examining query optimizer debug messages in
the job log . . . . . . . . . . . . . 155
Print SQL Information . . . . . . . . . 156
Query optimization tools: Comparison . . . . 157
Changing the attributes of your queries . . . 158
Collecting statistics with the statistics manager 186
Displaying materialized query table columns 192
Managing check pending constraints columns 193
Creating an index strategy . . . . . . . . . 194
Binary radix indexes . . . . . . . . . . 194
Encoded vector indexes . . . . . . . . . 201
Comparing binary radix indexes and encoded
vector indexes . . . . . . . . . . . . 207
Indexes & the optimizer . . . . . . . . . 208
Indexing strategy . . . . . . . . . . . 217
Coding for effective indexes . . . . . . . 218
Using indexes with sort sequence . . . . . 221
Index examples. . . . . . . . . . . . 222
Application design tips for database performance 230
Using live data . . . . . . . . . . . . 230
Reducing the number of open operations . . . 232
Retaining cursor positions . . . . . . . . 234
Programming techniques for database performance 237
Use the OPTIMIZE clause . . . . . . . . 237
Use FETCH FOR n ROWS . . . . . . . . 238
Use INSERT n ROWS . . . . . . . . . 239
Control database manager blocking . . . . . 240
Optimize the number of columns that are
selected with SELECT statements. . . . . . 241
Eliminate redundant validation with SQL
PREPARE statements . . . . . . . . . . 241
Page interactively displayed data with
REFRESH(*FORWARD) . . . . . . . . . 242
Improve concurrency by avoiding lock waits 242
General DB2 for i performance considerations . . 243
Effects on database performance when using
long object names . . . . . . . . . . . 243
Effects of precompile options on database
performance. . . . . . . . . . . . . 243
Effects of the ALWCPYDTA parameter on
database performance . . . . . . . . . 244
Tips for using VARCHAR and VARGRAPHIC
data types in databases . . . . . . . . . 245
Using field procedures to provide column level
encryption . . . . . . . . . . . . . 247
SYSTOOLS . . . . . . . . . . . . . . 250
Using SYSTOOLS . . . . . . . . . . . 250
Database monitor formats . . . . . . . . . 252
Database monitor SQL table format . . . . . 252
Optional database monitor SQL view format 259
Query optimizer messages reference. . . . . . 350
Query optimization performance information
messages . . . . . . . . . . . . . . 350
Query optimization performance information
messages and open data paths . . . . . . 374
PRTSQLINF message reference . . . . . . 379
Appendix. Notices . . . . . . . . . 393
Programming interface information . . . . . . 395
Trademarks . . . . . . . . . . . . . . 395
Terms and conditions. . . . . . . . . . . 395
Copyright IBM Corp. 1998, 2010 iii
||
||
||
||
||
|
||
||
||
iv IBM i: Database Performance and Query Optimization
Database performance and query optimization
The goal of database performance tuning is to minimize the response time of your queries by making the
best use of your system resources. The best use of these resources involves minimizing network traffic,
disk I/O, and CPU time. This goal can only be achieved by understanding the logical and physical
structure of your data, the applications used on your system, and how the conflicting uses of your
database might affect performance.
The best way to avoid performance problems is to ensure that performance issues are part of your
ongoing development activities. Many of the most significant performance improvements are realized
through careful design at the beginning of the database development cycle. To most effectively optimize
performance, you must identify the areas that yield the largest performance increases over the widest
variety of situations. Focus your analysis on these areas.
Many of the examples within this publication illustrate a query written through either an SQL or an
OPNQRYF query interface. The interface chosen for a particular example does not indicate an operation
exclusive to that query interface, unless explicitly noted. It is only an illustration of one possible query
interface. Most examples can be easily rewritten into whatever query interface that you prefer.
Note: By using the code examples, you agree to the terms of the Code license and disclaimer
information on page 390.
Whats new for IBM i 7.1
The following information was added or updated in this release of the information:
v Global Statistics Cache on page 5
v Encoded vector index index-only access on page 16
v Encoded vector index symbol table scan on page 17
v Encoded vector index symbol table probe on page 20
v Encoded vector index INCLUDE aggregates on page 17
v Array unnest temporary table on page 48
v Adaptive Query Processing on page 95
v Health Center SQL procedures on page 100
v Sparse indexes on page 195
v View index build status on page 213
v Improve concurrency by avoiding lock waits on page 242
v Using field procedures to provide column level encryption on page 247
v SYSTOOLS on page 250
How to see whats new or changed
To help you see where technical changes have been made, this information uses:
v The image to mark where new or changed information begins.
v The image to mark where new or changed information ends.
To find other information about whats new or changed this release, see the Memo to users.
Copyright IBM Corp. 1998, 2010 1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
PDF file for Database performance and query optimization
View and print a PDF of this information.
To view or download the PDF version of this document, select Database performance and query
optimization (about 5537 KB).
Other information
You can also view or print any of the following PDF files:
v Preparing for and Tuning the SQL Query Engine on DB2
for i5/OS
for iSeries
.
Saving PDF files
To save a PDF on your workstation for viewing or printing:
1. Right-click the PDF in your browser (right-click the preceding link).
2. Click the option that saves the PDF locally.
3. Navigate to the directory in which you want to save the PDF.
4. Click Save.
Downloading Adobe
Reader
You need Adobe Reader installed on your system to view or print these PDF files. You can download a
free copy from Adobe (http://get.adobe.com/reader/) .
Query engine overview
IBM DB2 for i provides two query engines to process queries: Classic Query Engine (CQE) and SQL
Query Engine (SQE).
The CQE processes queries originating from non-SQL interfaces: OPNQRYF, Query/400, and QQQQry
API. SQL-based interfaces, such as ODBC, JDBC, CLI, Query Manager, Net.Data
, RUNSQLSTM, and
embedded or interactive SQL, run through the SQE. For ease of use, the routing decision for processing
the query by either CQE or SQE is pervasive and under the control of the system. The requesting user or
application program cannot control or influence this behavior. However, a better understanding of the
engines and process that determines which path a query takes can give you a better understanding of
query performance.
Within SQE, several more components were created and other existing components were updated.
Additionally, new data access methods are possible with SQE that are not supported under CQE.
2 IBM i: Database Performance and Query Optimization
Related information
Embedded SQL programming
SQL programming
Query (QQQQRY) API
Open Query File (OPNQRYF) command
Run SQL Statements (RUNSQLSTM) command
SQE and CQE engines
It is important to understand the implementation differences of query management and processing in
CQE versus SQE.
The following figure shows an overview of the IBM DB2 for i architecture. It shows the delineation
between CQE and SQE, how query processing is directed by the query dispatcher, and where each SQE
component fits. The functional separation of each SQE component is clearly evident. This division of
responsibility enables IBM to more easily deliver functional enhancements to the individual components
of SQE, as and when required. Notice that most of the SQE Optimizer components are implemented
below the MI. This implementation translates into enhanced performance efficiency.
Performance and query optimization 3
As seen in the previous graphic, the query runs from any query interface to the optimizer and the query
dispatcher. The query dispatcher determines whether the query is implemented with CQE or SQE.
Query dispatcher
The function of the dispatcher is to route the query request to either CQE or SQE, depending on the
attributes of the query. All queries are processed by the dispatcher. It cannot be bypassed.
Currently, the dispatcher routes an SQL statement to CQE if it finds that the statement references or
contains any of the following:
v INSERT WITH VALUES statement or the target of an INSERT with subselect statement
v tables with Read triggers
v Read-only queries with more than 1000 dataspaces, or updatable queries with more than 256
dataspaces.
v DB2 Multisystem tables
v multi-format logical files
v non-SQL queries, for example the QQQQry API, Query/400, or OPNQRYF
As new functionality is added in the future, the dispatcher will route more queries to SQE and
decreasingly fewer to CQE.
Related reference
MQT supported function on page 76
Although an MQT can contain almost any query, the optimizer only supports a limited set of query
functions when matching MQTs to user specified queries. The user-specified query and the MQT query
must both be supported by the SQE optimizer.
Statistics manager
In CQE, the retrieval of statistics is a function of the Optimizer. When the Optimizer needs to know
information about a table, it looks at the table description to retrieve the row count and table size. If an
index is available, the Optimizer might extract information about the data in the table. In SQE, the
collection and management of statistics is handled by a separate component called the statistics manager.
The statistics manager leverages all the same statistical sources as CQE, but adds more sources and
capabilities.
The statistics manager does not actually run or optimize the query. Instead, it controls the access to the
metadata and other information that is required to optimize the query. It uses this information to answer
questions posed by the query optimizer. The statistics manager always provides answers to the optimizer.
In cases where it cannot provide an answer based on actual existing statistics information, it is designed
to provide a predefined answer.
The Statistics manager typically gathers and tracks the following information:
Cardinality of values
The number of unique or distinct occurrences of a specific value in a single column or multiple
columns of a table.
Selectivity
Also known as a histogram, this information is an indication of how many rows are selected by
any given selection predicate or combination of predicates. Using sampling techniques, it
describes the selectivity and distribution of values in a given column of the table.
Frequent values
The top nn most frequent values of a column together with a count of how frequently each value
occurs. This information is obtained by using statistical sampling techniques. Built-in algorithms
4 IBM i: Database Performance and Query Optimization
|
|
|
|
|
|
|
|
|
|
|
eliminate the possibility of data skewing. For example, NULL values and default values that can
influence the statistical values are not taken into account.
Metadata information
Includes the total number of rows in the table, indexes that exist over the table, and which
indexes are useful for implementing the particular query.
Estimate of IO operation
An estimate of the amount of IO operations that are required to process the table or the identified
index.
The Statistics manager uses a hybrid approach to manage database statistics. Most of this information can
be obtained from existing indexes. In cases where the required statistics cannot be gathered from existing
indexes, statistical information is constructed on single columns of a table and stored internally. By
default, this information is collected automatically by the system, but you can manually control the
collection of statistics. Unlike indexes, however, statistics are not maintained immediately as data in the
tables change.
Related reference
Collecting statistics with the statistics manager on page 186
The collection of statistics is handled by a separate component called the statistics manager. Statistical
information can be used by the query optimizer to determine the best access plan for a query. Since the
query optimizer bases its choice of access plan on the statistical information found in the table, it is
important that this information is current.
Global Statistics Cache
In SQE, the DB2 Statistics Manager stores actual row counts into a Global Statistics Cache. In this manner,
the Statistics Manager refines its estimates over time as it learns where estimates have deviated from
actual row counts.
Both completed queries and currently executing queries might be inspected by the Adaptive Query
Processing on page 95 (AQP) task, which compares estimated row counts to actual row counts. If there
are any significant discrepancies, the AQP task notifies the DB2 Statistics Manager (SM). The SM stores
this actual row count (also called observed row count) into a Global Statistics Cache (GSC).
If the query which generated the observed statistic in the GSC is reoptimized, the actual row count
estimate is used in determining a new query plan. Further, if a different query asks for the same or a
similar row count, the SM could return the stored actual row count from the GSC. Faster query plans can
be generated by the query optimizer.
Typically, observed statistics are for complex predicates such as with a join. A simple example is a query
joining three files A, B, and C. There is a discrepancy between the estimate and actual row count of the
join of A and B. The SM stores an observed statistic into the GSC. Later, if a different join query of A, B,
and Z is submitted, the SM recalls the observed statistic of the A and B join. The SM considers that
observed statistic in its estimate of the A, B, and Z join.
The Global Statistics Cache is an internal DB2 object, and the contents of it are not directly observable.
Plan cache
The plan cache is a repository that contains the access plans for queries that were optimized by SQE.
Access plans generated by CQE are not stored in the plan cache; instead, they are stored in SQL
packages, the system-wide statement cache, and job cache. The purposes of the plan cache are to:
v Facilitate the reuse of a query access plan when the same query is re-executed
v Store runtime information for subsequent use in future query optimizations
v Provide performance information for analysis and tuning
Performance and query optimization 5
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Once an access plan is created, it is available for use by all users and all queries, regardless of where the
query originates. Furthermore, when an access plan is tuned, for example, when creating an index, all
queries can benefit from this updated access plan. This updated access plan eliminates the need to
reoptimize the query, resulting in greater efficiency.
The following graphic shows the concept of reusability of the query access plans stored in the plan cache:
As shown in the previous graphic, statements from packages and programs are stored in unique plans in
the plan cache. If Statement 3 exists in both SQL package 1 and SQL package 2, the plan is stored once in
the plan cache. The plan cache is interrogated each time a query is executed. If an access plan exists that
satisfies the requirements of the query, it is used to implement the query. Otherwise a new access plan is
created and stored in the plan cache for future use.
The plan cache is automatically updated with new query access plans as they are created. When new
statistics or indexes become available, an existing plan is updated the next time the query is run. The
plan cache is also automatically updated by the database with runtime information as the queries are run.
It is created with an overall size of 512 MB.
Each plan cache entry contains the original query, the optimized query access plan, and cumulative
runtime information gathered during the runs of the query. In addition, several instances of query
runtime objects are stored with a plan cache entry. These runtime objects are the real executable objects
and temporary storage containers (hash tables, sorts, temporary indexes, and so on) used to run the
query.
When the plan cache exceeds its designated size, a background task is automatically scheduled to remove
plans from the plan cache. Access plans are deleted based upon age, how frequently it is used, and how
much cumulative resources (CPU/IO) were consumed.
6 IBM i: Database Performance and Query Optimization
The total number of access plans stored in the plan cache depends largely upon the complexity of the
SQL statements that are being executed. In certain test environments, there have typically been between
10,000 to 20,000 unique access plans stored in the plan cache. The plan cache is cleared when a system
Initial Program Load (IPL) is performed.
Multiple access plans for a single SQL statement can be maintained in the plan cache. Although the SQL
statement is the primary key into the plan cache, different environmental settings can cause additional
access plans to be stored. Examples of these environmental settings include:
v Different SMP Degree settings for the same query
v Different library lists specified for the query tables
v Different settings for the share of available memory for the job in the current pool
v Different ALWCPYDTA settings
v Different selectivity based on changing host variable values used in selection (WHERE clause)
Currently, the plan cache can maintain a maximum of three different access plans for the same SQL
statement. As new access plans are created for the same SQL statement, older access plans are discarded
to make room for the new access plans. There are, however, certain conditions that can cause an existing
access plan to be invalidated. Examples of these conditions include:
v Specifying REOPTIMIZE_ACCESS_PLAN(*YES) or (*FORCE) in the QAQQINI table or in Run SQL
Scripts
v Deleting or recreating the table that the access plan refers to
v Deleting an index that is used by the access plan
Related reference
Effects of the ALWCPYDTA parameter on database performance on page 244
Some complex queries can perform better by using a sort or hashing method to evaluate the query
instead of using or creating an index.
Changing the attributes of your queries on page 158
You can modify different types of query attributes for a job with the Change Query Attributes
(CHGQRYA) CL command. You can also use the System i
to
connect to an IBM i as the AS where the application
requestor (AR) machine is an ASCII-based platform.
*DEFAULT
The default value is set to *NO.
*YES
Translate ASCII SQL statement text to the CCSID of the
IBM i job.
*NO
Translate ASCII SQL statement text to the EBCIDIC
CCSID associated with the ASCII CCSID.
SQL_XML_DATA_CCSID
Specifies the CCSID to be used for XML columns, host
variables, parameter markers, and expressions, if not
explicitly specified.
See SQL_XML_DATA_CCSID QAQQINI option on
page 177
*DEFAULT The default value is set to 1208.
*JOB
The job CCSID is used for XML columns, host
variables, parameter markers, and expressions, if not
explicitly specified. If the job CCSID is 65535, the
default CCSID of 1208 is used.
Integer Value
The CCSID used for XML columns, host variables,
parameter markers, and expressions, if not explicitly
specified. This value must be a valid single-byte or
mixed EBCDIC CCSID or Unicode CCSID. The value
cannot be 65535.
Performance and query optimization 175
|
|
|
|
|
|
||
Table 46. Query Options Specified on QAQQINI Command (continued)
Parameter Value Description
STAR_JOIN
Note: Only modifies the environment for the Classic
Query Engine.
Specifies enhanced optimization for hash queries
where both a hash join table and a Distinct List of
values is constructed from the data. This Distinct List
of values is appended to the selection against the
primary table of the hash join
Any EVI indexes built over these foreign key columns
can be used to perform bitmap selection against the
table before matching the join values.
The use of this option does not guarantee that star join
is chosen by the optimizer. It only allows the use of
this technique if the optimizer has decided to
implement the query by using a hash join.
*DEFAULT
The default value is set to *NO
*NO
The EVI Star Join optimization support is not enabled.
*COST
Allow query optimization to cost the usage of EVI Star
Join support.
The optimizer determines whether the Distinct List
selection is used based on how much benefit can be
derived from using that selection.
STORAGE_LIMIT
Specifies a temporary storage limit for database
queries. If a query is expected to use more than the
specified amount of storage, the query is not allowed
to run. The value specified is in megabytes.
*DEFAULT The default value is set to *NOMAX.
*NOMAX
Never stop a query from running because of storage
concerns.
Integer Value
The maximum amount of temporary storage in
megabytes that can be used by a query. This value is
checked against the estimated amount of temporary
storage required to run the query as calculated by the
query optimizer. If the estimated amount of temporary
storage is greater than this value, the query is not
started. Valid values range from 0 through 2147352578.
SYSTEM_SQL_STATEMENT_ CACHE
Specifies whether to disable the system-wide SQL
Statement Cache for SQL queries.
*DEFAULT The default value is set to *YES.
*YES
Examine the system-wide SQL Statement Cache when
an SQL prepare request is processed. If a matching
statement exists in the cache, use the results of that
prepare. This option allows the application to
potentially have better performing prepares.
*NO
Specifies that the system-wide SQL Statement Cache is
not examined when processing an SQL prepare request.
TEXT_SEARCH_DEFAULT_TIMEZONE
Specifies the time zone to apply to any date or
dateTime value specified in an XML text search using
the CONTAINS or SCORE function. The time zone is
the offset from UTC (Greenwich mean time. It is only
applicable when a specific time zone is not given for
the value.
*DEFAULT
Use the default as defined by database. This option is
equivalent to UTC.
sHH:MM
A time zone formatted value where
v s is the sign, + or
v HH is the hour
v MM is the minute
The valid range for HH is 00 - 23. The valid range for
MM is 00 - 59. The format is specific. All values are
required, including sign. If HH or MM is less than 10,
it must have a leading zero specified.
UDF_TIME_OUT
Note: Only modifies the environment for the Classic
Query Engine.
Specifies the amount of time, in seconds, that the
database waits for a User Defined Function (UDF) to
finish processing.
*DEFAULT
The amount of time to wait is determined by the
database. The default is 30 seconds.
*MAX
The maximum amount of time that the database waits
for the UDF to finish.
integer value
Specify the number of seconds that the database waits
for a UDF to finish. If the value given exceeds the
database maximum wait time, the maximum wait time
is used by the database. Minimum value is 1 and
maximum value is system defined.
176 IBM i: Database Performance and Query Optimization
|
|
|
|
|
|
|
|
|
|
Table 46. Query Options Specified on QAQQINI Command (continued)
Parameter Value Description
VARIABLE_LENGTH_ OPTIMIZATION
Specifies whether aggressive optimization techniques
are used on variable length columns.
*DEFAULT The default value is set to *YES.
*YES
Enables aggressive optimization of variable-length
columns, including index-only access. It also allows
constant value substitution when an equal predicate is
present against the columns. As a consequence, the
length of the data returned for the variable-length
column might not include any trailing blanks that
existed in the original data. As a result, the application
can receive the substituted value back instead of the
original data. Function calls could operate on the
substituted value instead of the original string value.
*NO
Do not allow aggressive optimization of variable length
columns.
SQL_XML_DATA_CCSID QAQQINI option:
The SQL_XML_DATA_CCSID QAQQINI option has several settings that affect SQL processing.
The SQL_XML_DATA_CCSID QAQQINI setting is applied within SQL in the following SQL processing:
Table 47. SQL_XML_DATA_CCSID setting application within SQL
SQL Processing item Description
Valid values for the QAQQINI option are
CCSIDs allowed on an XML column.
Valid values are all EBCDIC SBCS and mixed CCSIDs, and
Unicode 1208, 1200, and 13488 CCSIDs.
Does not affect the promotion of SQL data
types.
Other SQL data types cannot be directly promoted to the SQL
XML data type.
XMLPARSE untyped parameter markers. The QAQQINI setting applies to untyped parameter markers
passed as string-expression. The type is CLOB(2G) for SBCS,
mixed, and UTF-8 values. The type is DBCLOB(1G) for Unicode
1200 and 13488.
XMLCOMMENT, XMLTEXT, XMLPI untyped
parameter markers.
The QAQQINI setting applies to untyped parameter markers
passed as string-expression. The type is VARCHAR(32740) for
SBCS, mixed, and UTF-8 values. The type is VARGRAPHIC(16370)
for Unicode 1200 and 13488.
Applies to parameter marker casts to the XML
type for XMLCONCAT, and XMLDOCUMENT.
Applies to an untyped parameter marker passed as an
XML-expression. Unless an explicit CCSID clause is specified, the
CCSID of the parameter marker is obtained from the QAQQINI
setting.
The QAQQINI setting does not affect storage
and retrieval assignment rules.
The CCSID of the host variables and table columns apply.
String to column assignment on SQL INSERT
and UPDATE.
An implicit or explicit XMLPARSE is required on the column
assignment.
String to host variable assignment. An implicit or explicit XMLSERIALIZE is required on the host
variable assignment.
Column to column assignment. When the target column is XML, an implicit XMLPARSE is applied
if the source column is not XML. The target XML column has a
defined XML CCSID. When the source column is XML, an explicit
XMLSERIALIZE is required if the target column is not XML.
Host variable to column assignment. The target column has a defined CCSID.
UNION ALL (if XML publishing functions in
query).
The XML result CCSID is obtained from the QAQQINI setting.
Performance and query optimization 177
|
|
|
||
||
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
||
|
|
|
||
|
|
|
Table 47. SQL_XML_DATA_CCSID setting application within SQL (continued)
SQL Processing item Description
Does not apply to SQL constants. UX constants are defined as UTF-16. FX constants are defined as
UTF-8.
Result type of XML data built-in functions. If the first operand of XMLPARSE and XMLVALIDATE is an
untyped parameter marker, the CCSID is set from the QAQQINI
setting, which then affects the XML result CCSID. The QAQQINI
setting is used for XMLSERIALIZE for CHAR, VARCHAR, and
LOB AS data-type. UTF-16 is used for GRAPHIC, DBCLOB, and
NCHAR.
Result type of XML publishing functions -
XMLAGG, XMLGROUP, XMLATTRIBUTES,
XMLCOMMENT, XMLCONCAT,
XMLDOCUMENT, XMELEMENT,
XMLFOREST, XMLNAMESPACES, XMLPI,
XMLROW, and XMLTEXT.
The XML result CCSID for XML publishing functions is obtained
from the QAQQINI setting.
Result type of XML publishing functions in a
view.
The XML result CCSID is set when the view is created.
XML data type on external procedure XML AS
parameters.
The XML parameter CCSID is set when the procedure is created.
XML data type on external user-defined
functions.
The XML parameter and result CCSID are set when the function is
created.
CREATE TABLE XML column. The QAQQINI setting is used for dynamic SQL. The QAQQINI
setting is set in *PGM, *SRVPGM, and *SQLPKG objects when
created.
MQTs containing select-statement with XML
publishing functions.
The CCSID is set when the MQT is created. The CCSID is
maintained for an ALTER TABLE.
ALTER TABLE ADD MATERIALIZED QUERY
definition.
The QAQQINI setting is used if the select-statement contains XML
publishing functions.
XML AS CLOB CCSID The QAQQINI setting is built into *PGM and *SRVPGM objects
when the program is created. The CCSID defaults to UTF-8 for
CLOB when QAQQINI setting is UTF-16 or UCS2.
XML AS DBCLOB CCSID The default for DBCLOB is always UTF-16 for XML.
SQL GET and SET DESCRIPTOR XML data
type.
QAQQINI setting applied to XML data type.
SQL Global variables. QAQQINI setting applied to global variables with the XML data
type.
Related information
XML values
SQL statements and SQL/XML functions
Setting resource limits with the Predictive Query Governor
The DB2 for i Predictive Query Governor can stop the initiation of a query if the estimated run time
(elapsed execution time) or estimated temporary storage for the query is excessive. The governor acts
before a query is run instead of while a query is run. The governor can be used in any interactive or batch
job on the system. It can be used with all DB2 for i query interfaces and is not limited to use with SQL
queries.
The ability of the governor to predict and stop queries before they are started is important because:
v Operating a long-running query and abnormally ending the query before obtaining any results wastes
system resources.
178 IBM i: Database Performance and Query Optimization
|
||
||
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
||
|
|
||
|
|
|
||
|
|
|
|
|
v Some CQE operations within a query cannot be interrupted by the End Request (ENDRQS) CL
command. The creation of a temporary index or a query using a column function without a GROUP
BY clause are two examples of these types of queries. It is important to not start these operations if
they take longer than the user wants to wait.
The governor in DB2 for i is based on two measurements:
v The estimated runtime for a query.
v The estimated temporary storage consumption for a query.
If the query estimated runtime or temporary storage usage exceed the user-defined limits, the initiation of
the query can be stopped.
To define a time limit (in seconds) for the governor to use, do one of the following:
v Use the Query Time Limit (QRYTIMLMT) parameter on the Change Query Attributes (CHGQRYA) CL
command. The command language used is the first place where the optimizer attempts to find the time
limit.
v Set the Query Time Limit option in the query options file. The query options file is the second place
where the query optimizer attempts to find the time limit.
v Set the QQRYTIMLMT system value. Allow each job to use the value *SYSVAL on the Change Query
Attributes (CHGQRYA) CL command, and set the query options file to *DEFAULT. The system value is
the third place where the query optimizer attempts to find the time limit.
To define a temporary storage limit (in megabytes) for the governor to use, do the following:
v Use the Query Storage Limit (QRYSTGLMT) parameter on the Change Query Attributes (CHGQRYA)
CL command. The command language used is the first place where the query optimizer attempts to
find the limit.
v Set the Query Storage Limit option STORAGE_LIMIT in the query options file. The query options file
is the second place where the query optimizer attempts to find the time limit.
The time and temporary storage values generated by the optimizer are only estimates. The actual query
runtime might be more or less than the estimate. In certain cases when the optimizer does not have full
information about the data being queried, the estimate could vary considerably from the actual resource
used. In those cases, you might need to artificially adjust your limits to correspond to an inaccurate
estimate.
When setting the time limit for the entire system, set it to the maximum allowable time that any query
must be allowed to run. By setting the limit too low you run the risk of preventing some queries from
completing and thus preventing the application from successfully finishing. There are many functions
that use the query component to internally perform query requests. These requests are also compared to
the user-defined time limit.
You can check the inquiry message CPA4259 for the predicted runtime and storage. If the query is
canceled, debug messages are still written to the job log.
You can also add the Query Governor Exit Program that is called when estimated runtime and temporary
storage limits have exceeded the specified limits.
Performance and query optimization 179
Related information
Query Governor Exit Program
End Request (ENDRQS) command
Change Query Attributes (CHGQRYA) command
Using the Query Governor:
The resource governor works with the query optimizer.
When a user issues a request to the system to run a query, the following occurs:
1. The query access plan is created by the optimizer.
As part of the evaluation, the optimizer predicts or estimates the runtime for the query. This estimate
helps determine the best way to access and retrieve the data for the query. In addition, as part of the
estimating process, the optimizer also computes the estimated temporary storage usage for the query.
2. The estimated runtime and estimated temporary storage are compared against the user-defined query
limit currently in effect for the job or user session.
3. If the estimates for the query are less than or equal to the specified limits, the query governor lets the
query run without interruption. No message is sent to the user.
4. If the query limit is exceeded, inquiry message CPA4259 is sent to the user. The message states the
estimates as well as the specified limits. Realize that only one limit needs to be exceeded; it is possible
that you see that only one limit was exceeded. Also, if no limit was explicitly specified by the user, a
large integer value is shown for that limit.
Note: A default reply can be established for this message so that the user does not have the option to
reply. The query request is always ended.
5. If a default message reply is not used, the user chooses to do one of the following:
v End the query request before it is run.
v Continue and run the query even though the estimated value exceeds the associated governor limit.
Setting the resource limits for jobs other than the current job
You can set either or both resource limits for a job other than the current job. You set these limits by
using the JOB parameter on the Change Query Attributes (CHGQRYA) command. Specify either a query
options file library to search (QRYOPTLIB) or a specific QRYTIMLMT, or QRYSTGLMT, or both for that
job.
Using the resource limits to balance system resources
After the source job runs the Change Query Attributes (CHGQRYA) command, effects of the governor on
the target job are not dependent upon the source job. The query resource limits remain in effect for the
duration of the job or user session, or until a resource limit is changed by a Change Query Attributes
(CHGQRYA) command.
Under program control, a user might be given different limits depending on the application function
performed, time of day, or system resources available. These limits provide a significant amount of
flexibility when trying to balance system resources with temporary query requirements.
Cancel a query with the Query Governor:
When a query is expected to take more resources than the set limit, the governor issues inquiry message
CPA4259.
You can respond to the message in one of the following ways:
180 IBM i: Database Performance and Query Optimization
v Enter a C to cancel the query. Escape message CPF427F is issued to the SQL runtime code. SQL returns
SQLCODE -666.
v Enter an I to ignore the exceeded limit and let the query run to completion.
Control the default reply to the query governor inquiry message:
The system administrator can control whether the interactive user has the option of ignoring the database
query inquiry message by using the Change Job (CHGJOB) CL command.
Changes made include the following:
v If a value of *DFT is specified for the INQMSGRPY parameter of the Change Job (CHGJOB) CL
command, the interactive user does not see the inquiry messages. The query is canceled immediately.
v If a value of *RQD is specified for the INQMSGRPY parameter of the Change Job (CHGJOB) CL
command, the interactive user sees the inquiry. The user must reply to the inquiry.
v If a value of *SYSRPYL is specified for the INQMSGRPY parameter of the Change Job (CHGJOB) CL
command, a system reply list is used to determine whether the interactive user sees the inquiry and
whether a reply is necessary. The system reply list entries can be used to customize different default
replies based on user profile name, user id, or process names. The fully qualified job name is available
in the message data for inquiry message CPA4259. This algorithm allows the keyword CMPDTA to be
used to select the system reply list entry that applies to the process or user profile. The user profile
name is 10 characters long and starts at position 51. The process name is 10 character long and starts at
position 27.
v The following example adds a reply list element that causes the default reply of C to cancel requests for
jobs whose user profile is QPGMR.
ADDRPYLE SEQNBR(56) MSGID(CPA4259) CMPDTA(QPGMR 51) RPY(C)
The following example adds a reply list element that causes the default reply of C to cancel requests for
jobs whose process name is QPADEV0011.
ADDRPYLE SEQNBR(57) MSGID(CPA4259) CMPDTA(QPADEV0011 27) RPY(C)
Related information
Change Job (CHGJOB) command
Testing performance with the query governor:
You can use the query governor to test the performance of your queries.
To test the performance of a query with the query governor, do the following:
1. Set the query time limit to zero ( QRYTIMLMT(0) ) using the Change Query Attributes (CHGQRYA)
command or in the INI file. This forces an inquiry message from the governor stating that the
estimated time to run the query exceeds the query time limit.
2. Prompt for message help on the inquiry message and find the same information that you can find by
running the Print SQL Information (PRTSQLINF) command.
The query governor lets you optimize performance without having to run through several iterations of
the query.
Additionally, if the query is canceled, the query optimizer evaluates the access plan and sends the
optimizer debug messages to the job log. This process occurs even if the job is not in debug mode. You
can then review the optimizer tuning messages in the job log to see if additional tuning is needed to
obtain optimal query performance.
This method allows you to try several permutations of the query with different attributes, indexes, and
syntax, or both. You can then determine what performs better through the optimizer without actually
Performance and query optimization 181
running the query to completion. This process saves on system resources because the actual query of the
data is never done. If the tables to be queried contain many rows, this method represents a significant
savings in system resources.
Be careful when you use this technique for performance testing, because all query requests are stopped
before they are run. This caution is especially important for a CQE query that cannot be implemented in
a single query step. For these types of queries, separate multiple query requests are issued, and then their
results are accumulated before returning the final results. Stopping the query in one of these intermediate
steps gives you only the performance information for that intermediate step, and not for the entire query.
Related information
Print SQL Information (PRTSQLINF) command
Change Query Attributes (CHGQRYA) command
Examples of setting query time limits:
You can set the query time limit for the current job or user session using query options file QAQQINI.
Specify the QRYOPTLIB parameter on the Change Query Attributes (CHGQRYA) command. Use a user
library where the QAQQINI file exists with the parameter set to QUERY_TIME_LIMIT, and the value set
to a valid query time limit.
To set the query time limit for 45 seconds you can use the following Change Query Attributes
(CHGQRYA) command:
CHGQRYA JOB(*) QRYTIMLMT(45)
This command sets the query time limit at 45 seconds. If the user runs a query with an estimated runtime
equal to or less than 45 seconds, the query runs without interruption. The time limit remains in effect for
the duration of the job or user session, or until the time limit is changed by the Change Query Attributes
(CHGQRYA) command.
Assume that the query optimizer estimated the runtime for a query as 135 seconds. A message is sent to
the user that stated that the estimated runtime of 135 seconds exceeds the query time limit of 45 seconds.
To set or change the query time limit for a job other than your current job, the Change Query Attributes
(CHGQRYA) command is run using the JOB parameter. To set the query time limit to 45 seconds for job
123456/USERNAME/JOBNAME use the following Change Query Attributes (CHGQRYA) command:
CHGQRYA JOB(123456/USERNAME/JOBNAME) QRYTIMLMT(45)
This command sets the query time limit at 45 seconds for job 123456/USERNAME/JOBNAME. If job
123456/USERNAME/JOBNAME tries to run a query with an estimated runtime equal to or less than 45
seconds the query runs without interruption. If the estimated runtime for the query is greater than 45
seconds, for example, 50 seconds, a message is sent to the user. The message states that the estimated
runtime of 50 seconds exceeds the query time limit of 45 seconds. The time limit remains in effect for the
duration of job 123456/USERNAME/JOBNAME, or until the time limit for job 123456/USERNAME/
JOBNAME is changed by the Change Query Attributes (CHGQRYA) command.
To set or change the query time limit to the QQRYTIMLMT system value, use the following Change
Query Attributes (CHGQRYA) command:
CHGQRYA QRYTIMLMT(*SYSVAL)
The QQRYTIMLMT system value is used for duration of the job or user session, or until the time limit is
changed by the Change Query Attributes (CHGQRYA) command. This use is the default behavior for the
Change Query Attributes (CHGQRYA) command.
Note: The query time limit can also be set in the INI file, or by using the Change System Value
(CHGSYSVAL) command.
182 IBM i: Database Performance and Query Optimization
Related information
Change Query Attributes (CHGQRYA) command
Change System Value (CHGSYSVAL) command
Test temporary storage usage with the query governor:
The predictive storage governor specifies a temporary storage limit for database queries. You can use the
query governor to test if a query uses any temporary object, such as a hash table, sort, or temporary
index.
To test for usage of a temporary object, do the following:
v Set the query storage limit to zero (QRYSTGLMT(0)) using the Change Query Attributes (CHGQRYA)
command or in the INI file. This forces an inquiry message from the governor anytime a temporary
object is used for the query. The message is sent regardless of the estimated size of the temporary
object.
v Prompt for message help on the inquiry message and find the same information that you can find by
running the Print SQL Information (PRTSQLINF) command. This command allows you to see what
temporary objects were involved.
Related information
Print SQL Information (PRTSQLINF) command
Change Query Attributes (CHGQRYA) command
Examples of setting query temporary storage limits:
The temporary storage limit can be specified either in the QAQQINI file or on the Change Query
Attributes (CHGQRYA) command.
You can set the query temporary storage limit for a job using query options file QAQQINI. Specify the
QRYOPTLIB parameter on the Change Query Attributes (CHGQRYA) command. Use a user library where
the QAQQINI file exists with a valid value set for parameter STORAGE_LIMIT.
To set the query temporary storage limit on the Change Query Attributes (CHGQRYA) command itself,
specify a valid value for the QRYSTGLMT parameter.
If a value is specified both on the Change Query Attributes (CHGQRYA) command QRYSTGLMT
parameter and in the QAQQINI file specified on the QRYOPTLIB parameter, the QRYSTGLMT value is
used.
To set the temporary storage limit for 100 MB in the current job, you can use the following Change Query
Attributes (CHGQRYA) command:
CHGQRYA JOB(*) QRYSTGLMT(100)
If the user runs any query with an estimated temporary storage consumption equal to or less than 100
MB, the query runs without interruption. If the estimate is more than 100 MB, the CPA4259 inquiry
message is sent by the database. To set or change the query time limit for a job other than your current
job, the CHGQRYA command is run using the JOB parameter. To set the same limit for job
123456/USERNAME/JOBNAME use the following CHGQRYA command:
CHGQRYA JOB(123456/USERNAME/JOBNAME) QRYSTGLMT(100)
This sets the query temporary storage limit to 100 MBfor job 123456/USERNAME/JOBNAME.
Performance and query optimization 183
Note: Unlike the query time limit, there is no system value for temporary storage limit. The default
behavior is to let any queries run regardless of their temporary storage usage. The query
temporary storage limit can be specified either in the INI file or on the Change Query Attributes
(CHGQRYA) command.
Related information
Change Query Attributes (CHGQRYA) command
Controlling parallel processing for queries
There are two types of parallel processing available. The first is a parallel I/O that is available at no
charge. The second is DB2 Multisystem, a feature that you can purchase. You can turn parallel processing
on and off.
Even if parallelism is enabled for a system or job, the individual queries that run in a job might not
actually use a parallel method. This decision might be because of functional restrictions, or the optimizer
might choose a non-parallel method because it runs faster.
Queries processed with parallel access methods aggressively use main storage, CPU, and disk resources.
The number of queries that use parallel processing must be limited and controlled.
Controlling system-wide parallel processing for queries:
You can use the QQRYDEGREE system value to control parallel processing for a system.
The current value of the system value can be displayed or modified using the following CL commands:
v WRKSYSVAL - Work with System Value
v CHGSYSVAL - Change System Value
v DSPSYSVAL - Display System Value
v RTVSYSVAL - Retrieve System Value
The special values for QQRYDEGREE control whether parallel processing is allowed by default for all
jobs on the system. The possible values are:
*NONE
No parallel processing is allowed for database query processing.
*IO
I/O parallel processing is allowed for queries.
*OPTIMIZE
The query optimizer can choose to use any number of tasks for either I/O or SMP parallel processing
to process the queries. SMP parallel processing is used only if the DB2 Multisystem feature is
installed. The query optimizer chooses to use parallel processing to minimize elapsed time based on
the job share of the memory in the pool.
*MAX
The query optimizer can choose to use either I/O or SMP parallel processing to process the query.
SMP parallel processing can be used only if the DB2 Multisystem feature is installed. The choices
made by the query optimizer are like the choices made for parameter value *OPTIMIZE. The
exception is that the optimizer assumes that all active memory in the pool can be used to process the
query.
The default QQRYDEGREE system value is *NONE. You must change the value if you want parallel
query processing as the default for jobs run on the system.
Changing this system value affects all jobs that is run or are currently running on the system whose
DEGREE query attribute is *SYSVAL. However, queries that have already been started or queries using
reusable ODPs are not affected.
184 IBM i: Database Performance and Query Optimization
Controlling job level parallel processing for queries:
You can also control query parallel processing at the job level using the DEGREE parameter of the
Change Query Attributes (CHGQRYA) command or in the QAQQINI file. You can also use the
SET_CURRENT_DEGREE SQL statement.
Using the Change Query Attributes (CHGQRYA) command
The parallel processing option allowed and, optionally, the number of tasks that can be used when
running database queries in the job can be specified. You can prompt on the Change Query Attributes
(CHGQRYA) command in an interactive job to display the current values of the DEGREE query attribute.
Changing the DEGREE query attribute does not affect queries that have already been started or queries
using reusable ODPs.
The parameter values for the DEGREE keyword are:
*SAME
The parallel degree query attribute does not change.
*NONE
No parallel processing is allowed for database query processing.
*IO
Any number of tasks can be used when the database query optimizer chooses to use I/O parallel
processing for queries. SMP parallel processing is not allowed.
*OPTIMIZE
The query optimizer can choose to use any number of tasks for either I/O or SMP parallel processing
to process the query. SMP parallel processing can be used only if the DB2 Multisystem feature is
installed. Use of parallel processing and the number of tasks used is determined by:
v the number of system processors available
v the job share of active memory available in the pool
v whether the expected elapsed time is limited by CPU processing or I/O resources
The query optimizer chooses an implementation that minimizes elapsed time based on the job share
of the memory in the pool.
*MAX
The query optimizer can choose to use either I/O or SMP parallel processing to process the query.
SMP parallel processing can be used only if the DB2 Multisystem feature is installed. The choices
made by the query optimizer are like the choices made for parameter value *OPTIMIZE. The
exception is that the optimizer assumes that all active memory in the pool can be used to process the
query.
*NBRTASKS number-of-tasks
Specifies the number of tasks to be used when the query optimizer chooses to use SMP parallel
processing to process a query. I/O parallelism is also allowed. SMP parallel processing can be used
only if the DB2 Multisystem feature is installed.
Using a number of tasks less than the number of system processors available restricts the number of
processors used simultaneously for running a query. A larger number of tasks ensures that the query
is allowed to use all the processors available on the system to run the query. Too many tasks can
degrade performance because of the over commitment of active memory and the overhead cost of
managing all the tasks.
*SYSVAL
Specifies that the processing option used is set to the current value of the QQRYDEGREE system
value.
Performance and query optimization 185
The initial value of the DEGREE attribute for a job is *SYSVAL.
Using the SET CURRENT DEGREE SQL statement
You can use the SET CURRENT DEGREE SQL statement to change the value of the CURRENT_DEGREE
special register. The possible values for the CURRENT_DEGREE special register are:
1 No parallel processing is allowed.
2 through 32767
Specifies the degree of parallelism that is used.
ANY
Specifies that the database manager can choose to use any number of tasks for either I/O or SMP
parallel processing. Use of parallel processing and the number of tasks used is determined by:
v the number of system processors available
v the job share of active memory available in the pool
v whether the expected elapsed time is limited by CPU processing or I/O resources
The database manager chooses an implementation that minimizes elapsed time based on the job
share of the memory in the pool.
NONE
No parallel processing is allowed.
MAX
The database manager can choose to use any number of tasks for either I/O or SMP parallel
processing. MAX is like ANY except the database manager assumes that all active memory in the
pool can be used.
IO Any number of tasks can be used when the database manager chooses to use I/O parallel processing
for queries. SMP is not allowed.
The value can be changed by invoking the SET CURRENT DEGREE statement.
The initial value of CURRENT DEGREE comes from the CHGQRYA CL command, PARALLEL_DEGREE
parameter in the current query options file (QAQQINI), or the QQRYDEGREE system value.
Related information
Set Current Degree statement
Change Query Attributes (CHGQRYA) command
DB2 Multisystem
Collecting statistics with the statistics manager
The collection of statistics is handled by a separate component called the statistics manager. Statistical
information can be used by the query optimizer to determine the best access plan for a query. Since the
query optimizer bases its choice of access plan on the statistical information found in the table, it is
important that this information is current.
On many platforms, statistics collection is a manual process that is the responsibility of the database
administrator. With IBM i products, the database statistics collection process is handled automatically, and
only rarely is it necessary to update statistics manually.
The statistics manager does not actually run or optimize the query. It controls the access to the metadata
and other information that is required to optimize the query. It uses this information to answer questions
posed by the query optimizer. The answers can either be derived from table header information, from
existing indexes, or from single-column statistics.
186 IBM i: Database Performance and Query Optimization
The statistics manager must always provide an answer to the questions from the Optimizer. It uses the
best method available to provide the answers. For example, it could use a single-column statistic or
perform a key range estimate over an index. Along with the answer, the statistics manager returns a
confidence level to the optimizer that the optimizer can use to provide greater latitude for sizing
algorithms. If the statistics manager provides a low confidence in the number of groups estimated for a
grouping request, the optimizer can increase the size of the temporary hash table allocated.
Related concepts
Statistics manager on page 4
In CQE, the retrieval of statistics is a function of the Optimizer. When the Optimizer needs to know
information about a table, it looks at the table description to retrieve the row count and table size. If an
index is available, the Optimizer might extract information about the data in the table. In SQE, the
collection and management of statistics is handled by a separate component called the statistics manager.
The statistics manager leverages all the same statistical sources as CQE, but adds more sources and
capabilities.
Automatic statistics collection
When the statistics manager prepares its responses to the optimizer, it tracks the responses that were
generated using default filter factors. Default filter factors are used when column statistics or indexes are
not available. The statistics manager uses this information to automatically generate a statistic collection
request for the columns. This request occurs while the access plan is written to the plan cache. If system
resources allow, statistics collections occur in real time for direct use by the current query, avoiding a
default answer to the optimizer.
Otherwise, as system resources become available, the requested column statistics are collected in the
background. The next time the query is executed, the missing column statistics are available to the
statistics manager. This process allows the statistics manager to provide more accurate information to the
optimizer at that time. More statistics make it easier for the optimizer to generate a better performing
access plan.
If a query is canceled before or during execution, the requests for column statistics are still processed.
These requests occur if the execution reaches the point where the generated access plan is written to the
Plan Cache.
To minimize the number of passes through a table during statistics collection, the statistics manger
groups multiple requests for the same table. For example, two queries are executed against table T1. The
first query has selection criteria on column C1 and the second over column C2. If no statistics are
available for the table, the statistics manager identifies both of these columns as good candidates for
column statistics. When the statistics manager reviews requests, it looks for multiple requests for the
same table and groups them into one request. This grouping allows both column statistics to be created
with only one pass through table T1.
One thing to note is that column statistics are usually automatically created when the statistics manager
must answer questions from the optimizer using default filter factors. However, when an index is
available that might be used to generate the answer, then column statistics are not automatically
generated. In this scenario, there might be cases where optimization time benefits from column statistics.
Using column statistics to answer questions from the optimizer is more efficient than using the index
data. So if query performance seems extended, you might want to verify that there are indexes over the
relevant columns in your query. If so, try manually generating column statistics for these columns.
As stated before, statistics collection occurs as system resources become available. If you have a low
priority job permanently active on your system that is supposed to use all spare CPU cycles for
processing, your statistics collection is never active.
Automatic statistics refresh
Column statistics are not maintained when the underlying table data changes. The statistics manager
determines if columns statistics are still valid or if they no longer represent the column accurately (stale).
Performance and query optimization 187
This validation is done each time one of the following occurs:
v A full open occurs for a query where column statistics were used to create the access plan
v A new plan is added to the plan cache, either because a new query was optimized or because an
existing plan was reoptimized.
To validate the statistics, the statistics manager checks to see if any of the following apply:
v Number of rows in the table has changed by more than 15% of the total table row count
v Number of rows changed in the table is more than 15% of the total table row count
If the statistics are stale, the statistics manager still uses them to answer the questions from the optimizer.
However, the statistics manager marks the statistics as stale in the plan cache and generates a request to
refresh them.
Viewing statistics requests
You can view the current statistics requests by using System i Navigator or by using Statistics APIs.
To view requests in System i Navigator, right-click Database and select Statistic Requests. This window
shows all user requested statistics collections that are pending or active. The view also shows all system
requested statistics collections that are being considered, are active, or have failed. You can change the
status of the request, order the request to process immediately, or cancel the request.
Related reference
Statistics manager APIs on page 191
You can use APIs to implement the statistics function of System i Navigator.
Indexes and column statistics
While performing similar functions, indexes and column statistics are different.
If you are trying to decide whether to use statistics or indexes to provide information to the statistics
manager, keep in mind the following differences.
One major difference between indexes and column statistics is that indexes are permanent objects that are
updated when changes to the underlying table occur. Column statistics are not updated. If your data is
constantly changing, the statistics manager might need to rely on stale column statistics. However,
maintaining an index after each table change might use more system resources than refreshing stale
column statistics after a group of changes have occurred.
Another difference is the effect that the existence of new indexes or column statistics has on the
optimizer. When new indexes become available, the optimizer considers them for implementation. If they
are candidates, the optimizer reoptimizes the query and tries to find a better implementation. However,
this reoptimization is not true for column statistics. When new or refreshed column statistics are
available, the statistics manager interrogates immediately. Reoptimization occurs only if the answers are
different from the ones that were given before these refreshed statistics. It is possible to use statistics that
are refreshed without causing a reoptimization of an access plan.
When trying to determine the selectivity of predicates, the statistics manager considers column statistics
and indexes as resources for its answers in the following order:
1. Try to use a multi-column keyed index when ANDed or ORed predicates reference multiple columns
2. If there is no perfect index that contains all the columns in the predicates, it tries to find a
combination of indexes that can be used.
3. For single column questions, it uses available column statistics
4. If the answer derived from the column statistics shows a selectivity of less than 2%, indexes are used
to verify this answer
188 IBM i: Database Performance and Query Optimization
Accessing column statistics to answer questions is faster than trying to obtain these answers from
indexes.
Column statistics can only be used by SQE. For CQE, all statistics are retrieved from indexes.
Finally, column statistics can be used only for query optimization. They cannot be used for the actual
implementation of a query, whereas indexes can be used for both.
Monitoring background statistics collection
The system value QDBFSTCCOL controls who is allowed to create statistics in the background.
The following list provides the possible values:
*ALL
Allows all statistics to be collected in the background. *ALL is the default setting.
*NONE
Restricts everyone from creating statistics in the background. *NONE does not prevent immediate
user-requested statistics from being collected, however.
*USER
Allows only user-requested statistics to be collected in the background.
*SYSTEM
Allows only system-requested statistics to be collected in the background.
When you switch the system value to something other than *ALL or *SYSTEM, the statistics manager
continues to place statistics requests in the plan cache. When the system value is switched back to *ALL,
for example, background processing analyzes the entire plan cache and looks for any existing column
statistics requests. This background task also identifies column statistics that have been used by a plan in
the plan cache. The task determines if these column statistics have become stale. Requests for the new
column statistics as well as requests for refresh of the stale columns statistics are then executed.
All background statistic collections initiated by the system or submitted by a user are performed by the
system job QDBFSTCCOL. User-initiated immediate requests are run within the user job. This job uses
multiple threads to create the statistics. The number of threads is determined by the number of
processors that the system has. Each thread is then associated with a request queue.
There are four types of request queues based on who submitted the request and how long the collection
is estimated to take. The default priority assigned to each thread can determine to which queue the
thread belongs:
v Priority 90 short user requests
v Priority 93 long user requests
v Priority 96 short system requests
v Priority 99 long system requests
Background statistics collections attempt to use as much parallelism as possible. This parallelism is
independent of the SMP feature installed on the system. However, parallel processing is allowed only for
immediate statistics collection if SMP is installed on the system. The job that requests the column
statistics also must allow parallelism.
Related information
Performance system values: Allow background database statistics collection
Replication of column statistics with CRTDUPOBJ versus CPYF
You can replicate column statistics with the Create Duplicate Object (CRTDUPOBJ) or the Copy File
(CPYF) commands.
Performance and query optimization 189
Statistics are not copied to new tables when using the Copy File (CPYF) command. If statistics are needed
immediately after using this command, then you must manually generate the statistics using System i
Navigator or the statistics APIs. If statistics are not needed immediately, then they could be created
automatically by the system after the first touch of a column by a query.
Statistics are copied when using Create Duplicate Object (CRTDUPOBJ) command with DATA(*YES). You
can use this command as an alternative to creating statistics automatically after using a Copy File (CPYF)
command.
Related information
Create Duplicate Object (CRTDUPOBJ) command
Copy File (CPYF) command
Determining what column statistics exist
You can determine what column statistics exist in a couple of ways.
The first is to view statistics by using System i Navigator. Right-click a table or alias and select Statistic
Data. Another way is to create a user-defined table function and call that function from an SQL statement
or stored procedure.
Manually collecting and refreshing statistics
You can manually collect and refresh statistics through System i Navigator or by using statistics APIs.
To collect statistics using System i Navigator, right-click a table or alias and select Statistic Data. On the
Statistic Data dialog, click New. Then select the columns that you want to collect statistics for. Once you
have selected the columns, you can collect the statistics immediately or collect them in the background.
To refresh a statistic using System i Navigator, right-click a table or alias and select Statistic Data. Click
Update. Select the statistic that you want to refresh. You can collect the statistics immediately or collect
them in the background.
There are several scenarios in which the manual management (create, remove, refresh, and so on) of
column statistics could be beneficial and recommended.
High Availability (HA) solutions
High availability solutions replicate data to a secondary system by using journal entries.
However, column statistics are not journaled. That means that, on your backup system, no
column statistics are available when you first start using that system. To prevent the warm up
effect, you might want to propagate the column statistics that were gathered on your production
system. Recreate them on your backup system manually.
ISV (Independent Solution Provider) preparation
An ISV might want to deliver a customer solution that already includes column statistics
frequently used in the application, rather than waiting for the automatic statistics collection to
create them. Run the application on the development system for some time and examine which
column statistics were created automatically. You can then generate a script file to execute on the
customer system after the initial data load takes place. The script file can be shipped as part of
the application
Business Intelligence environments
In a large Business Intelligence environment, it is common for large data load and update
operations to occur overnight. Column statistics are marked stale only when they are touched by
the statistics manager, and then refreshed after first touch. You might want to consider refreshing
the column statistics manually after loading the data.
You can do this refresh easily by toggling the system value QDBFSTCCOL to *NONE and then
back to *ALL. This process causes all stale column statistics to be refreshed. It also starts
collection of any column statistics previously requested by the system but not yet available. Since
190 IBM i: Database Performance and Query Optimization
this process relies on the access plans stored in the plan cache, avoid performing a system initial
program load (IPL) before toggling QDBFSTCCOL. An IPL clears the plan cache.
This procedure works only if you do not delete (drop) the tables and recreate them in the process
of loading your data. When deleting a table, access plans in the plan cache that refer to this table
are deleted. Information about column statistics on that table is also lost. The process in this
environment is either to add data to your tables or to clear the tables instead of deleting them.
Massive data updates
Updating rows in a column statistics-enabled table can significantly change the cardinality, add
new ranges of values, or change the distribution of data values. These updates can affect query
performance on the first query run against the new data. On the first run of such a query, the
optimizer uses stale column statistics to determine the access plan. At that point, it starts a
request to refresh the column statistics.
Prior to this data update, you might want to toggle the system value QDBFSTCCOL to *NONE
and back to *ALL or *SYSTEM. This toggle causes an analysis of the plan cache. The analysis
includes searching for column statistics used in access plan generation, analyzing them for
staleness, and requesting updates for the stale statistics.
If you massively update or load data, and run queries against these tables at the same time, the
automatic column statistics collection tries to refresh every time 15% of the data is changed. This
processing can be redundant since you are still updating or loading the data. In this case, you
might want to block automatic statistics collection for the tables and deblock it again after the
data update or load finishes. An alternative is to turn off automatic statistics collection for the
whole system before updating or loading the data. Switch it back on after the updating or
loading has finished.
Backup and recovery
When thinking about backup and recovery strategies, keep in mind that creation of column
statistics is not journaled. Column statistics that exist at the time a save operation occurs are
saved as part of the table and restored with the table. Any column statistics created after the save
took place are lost and cannot be recreated by using techniques such as applying journal entries.
If you have a long interval between save operations and rely on journaling to restore your
environment, consider tracking column statistics that are generated after the latest save operation.
Related information
Performance system values: Allow background database statistics collection
Statistics manager APIs
You can use APIs to implement the statistics function of System i Navigator.
v Cancel Requested Statistics Collections (QDBSTCRS, QdbstCancelRequestedStatistics) immediately
cancels statistics collections that have been requested, but are not yet completed or not successfully
completed.
v Delete Statistics Collections (QDBSTDS, QdbstDeleteStatistics) immediately deletes existing completed
statistics collections.
v List Requested Statistics Collections (QDBSTLRS, QdbstListRequestedStatistics) lists all the columns
and combination of columns and file members that have background statistic collections requested, but
not yet completed.
v List Statistics Collection Details (QDBSTLDS, QdbstListDetailStatistics) lists additional statistics data for
a single statistics collection.
v List Statistics Collections (QDBSTLS, QdbstListStatistics) lists all the columns and combination of
columns for a given file member that have statistics available.
v Request Statistics Collections (QDBSTRS, QdbstRequestStatistics) allows you to request one or more
statistics collections for a given set of columns of a specific file member.
v Update Statistics Collection (QDBSTUS, QdbstUpdateStatistics) allows you to update the attributes and
to refresh the data of an existing single statistics collection
Performance and query optimization 191
Related reference
Viewing statistics requests on page 188
You can view the current statistics requests by using System i Navigator or by using Statistics APIs.
Displaying materialized query table columns
You can display materialized query tables associated with another table using System i Navigator.
To display materialized query tables, follow these steps:
1. In the System i Navigator window, expand the system that you want to use.
2. Expand Databases and the database that you want to work with.
3. Expand Schemas and the schema that you want to work with.
4. Right-click a table and select Show Materialized Query Tables.
Table 48. Columns used in Show materialized query table window
Column name Description
Name The SQL name for the materialized query table
Schema Schema or library containing the materialized query table
Partition Partition detail for the index. Possible values:
v <blank>, which means For all partitions
v For Each Partition
v specific name of the partition
Owner The user ID of the owner of the materialized query table.
System Name System table name for the materialized query table
Enabled Whether the materialized query table is enabled. Possible values are:
v Yes
v No
If the materialized query table is not enabled, it cannot be used for
query optimization. It can, however, be queried directly.
Creation Date The timestamp of when the materialized table was created.
Last Refresh Date The timestamp of the last time the materialized query table was
refreshed.
Last Query Use The timestamp when the materialized query table was last used by
the optimizer to replace user specified tables in a query.
Last Query Statistics Use The timestamp when the materialized query table was last used by
the statistics manager to determine an access method.
Query Use Count The number of instances the materialized query table was used by
the optimizer to replace user specified tables in a query.
Query Statistics Use Count The number of instances the materialized query table was used by
the statistics manager to determine an access method.
Last Used Date The timestamp when the materialized query table was last used.
Days Used Count The number of days the materialized query table has been used.
Date Reset Days Used Count The year and date when the days-used count was last set to 0.
Current Number of Rows The total number of rows included in this materialized query table
at this time.
Current Size The current size of the materialized query table.
Last Changed The timestamp when the materialized query table was last changed.
192 IBM i: Database Performance and Query Optimization
||
||
||
||
||
|
|
|
||
||
||
|
|
|
|
||
||
|
||
|
||
|
||
|
||
|
||
||
||
||
|
||
||
Table 48. Columns used in Show materialized query table window (continued)
Column name Description
Maintenance The maintenance for the materialized query table. Possible values
are:
v User
v System
Initial Data Whether the initial data was inserted immediately or deferred.
Possible values are
v Deferred
v Immediate
Refresh Mode The refresh mode for the materialized query table. A materialized
query table can be refreshed whenever a change is made to the
table or deferred to a later time.
Isolation Level The isolation level for the materialized query table.
Sort Sequence The alternate character sorting sequence for National Language
Support (NLS).
Language Identifier The language code for the object.
SQL Statement The SQL statement that is used to populate the table.
Text The text description of the materialized query table.
Table Schema and table name.
Table Partition Table partition.
Table System Name System name of the table.
Managing check pending constraints columns
You can view and change constraints that have been placed in a check pending state by the system.
Check pending constraints refers to a state in which a mismatch exists between a parent and foreign key
in a referential constraint. A mismatch can also occur between the column value and the check constraint
definition in a check constraint.
To view constraints that have been placed in a check pending state, follow these steps:
1. Expand the system name and Databases.
2. Expand the database that you want to work with.
3. Expand the Database Maintenance folder.
4. Select Check Pending Constraints.
5. From this interface, you can view the definition of the constraint and the rows that are in violation of
the constraint rules. Select the constraint that you want to work with and then select Edit Check
Pending Constraint from the File menu.
6. You can either alter or delete the rows that are in violation.
Table 49. Columns used in Check pending constraints window
Column name Description
Name of Constraint in Check Pending Displays the name of the constraint that is in a check pending state.
Schema Schema containing the constraint that is in a check pending state.
Type Displays the type of constraint that is in check pending. Possible
values are:
Check constraint
Foreign key constraint
Performance and query optimization 193
|
||
||
|
|
|
||
|
|
|
||
|
|
||
||
|
||
||
||
||
||
||
|
Table 49. Columns used in Check pending constraints window (continued)
Column name Description
Table name The name of the table associated with the constraint in check
pending state.
Enabled Displays whether the constraint is enabled. The constraint must be
disabled or the relationship taken out of the check pending state
before any input/output (I/O) operations can be performed.
Creating an index strategy
DB2 for i provides two basic means for accessing tables: a table scan and an index-based retrieval.
Index-based retrieval is typically more efficient than table scan when less than 20% of the table rows are
selected.
There are two kinds of persistent indexes: binary radix tree indexes, which have been available since
1988, and encoded vector indexes (EVIs), which became available in 1998 with V4R2. Both types of
indexes are useful in improving performance for certain kinds of queries.
Binary radix indexes
A radix index is a multilevel, hybrid tree structure that allows many key values to be stored efficiently
while minimizing access times. A key compression algorithm assists in this process. The lowest level of
the tree contains the leaf nodes, which contain the base table row addresses associated with the key
value. The key value is used to quickly navigate to the leaf node with a few simple binary search tests.
The binary radix tree structure is good for finding a few rows because it finds a given row with a
minimal amount of processing. For example, create a binary radix index over a customer number column.
Then create a typical OLTP request like find the outstanding orders for a single customer. The binary
index results in fast performance. An index created over the customer number column is considered the
perfect index for this type of query. The index allows the database to find the rows it needs and perform
a minimal number of I/Os.
In some situations, however, you do not always have the same level of predictability. Many users want
on demand access to the detail data. For example, they might run a report every week to look at sales
data. Then they want to drill down for more information related to a particular problem area they
found in the report. In this scenario, you cannot write all the queries in advance on behalf of the end
users. Without knowing what queries might run, it is impossible to build the perfect index.
Related information
SQL Create Index statement
Derived key index
You can use the SQL CREATE INDEX statement to create a derived key index using an SQL expression.
Traditionally an index could only specify column names in the key of the index over the table it was
based on. With this support, an index can have an expression in place of a column name that can use
built-in functions, or some other valid expression. Additionally, you can use the SQL CREATE INDEX
statement to create a sparse index using a WHERE condition.
For restrictions and other information about derived indexes, see the Create Index statement and Using
derived indexes.
194 IBM i: Database Performance and Query Optimization
|
|
|
|
Related reference
Using derived indexes on page 220
SQL indexes can be created where the key is specified as an expression. This type of key is also referred
to as a derived key.
Related information
SQL Create Index statement
Sparse indexes
You can use the SQL CREATE INDEX statement to create a sparse index using SQL selection predicates.
Last release users were given the ability to use the SQL CREATE INDEX statement to create a sparse
index using a WHERE condition. With this support, the query optimizer recognizes and considers sparse
indexes during its optimization. If the query WHERE selection is a subset of the sparse index WHERE
selection, then the sparse index is used to implement the query. Use of the sparse index usually results in
improved performance.
Examples
In this example, the query selection is a subset of the sparse index selection and an index scan over the
sparse index is used. The remaining query selection (COL3=30) is executed following the index scan.
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 and COL2=20
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 and COL2=20 and COL3=30
In this example, the query selection is not a subset of the sparse index selection and the sparse index
cannot be used.
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 and COL2=20 and COL3=30
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 and COL2=20
Related reference
Using sparse indexes on page 221
SQL indexes can be created using WHERE selection predicates. These indexes can also be referred to as
sparse indexes. The advantage of a sparse index is that fewer entries are maintained in the index. Only
those entries matching the WHERE selection criteria are maintained in the index.
Related information
SQL Create Index statement
Sparse index optimization:
An SQL sparse index is like a select/omit access path. Both the sparse index and the select/omit logical
file contain only keys that meet the selection specified. For a sparse index, the selection is specified with a
WHERE clause. For a select/omit logical file, the selection is specified in the DDS using the COMP
operation.
The reason for creating a sparse index is to provide performance enhancements for your queries. The
performance enhancement is done by precomputing and storing results of the WHERE selection in the
sparse index. The database engine can use these results instead of recomputing them for a user specified
query. The query optimizer looks for any applicable sparse index and can choose to implement the query
using a sparse index. The decision is based on whether using a sparse index is a faster implementation
choice.
Performance and query optimization 195
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
For a sparse index to be used, the WHERE selection in the query must be a subset of the WHERE
selection in the sparse index. That is, the set of records in the sparse index must contain all the records to
be selected by the query. It might contain extra records, but it must contain all the records to be selected
by the query. This comparison of WHERE selection is performed by the query optimizer during
optimization. It is like the comparison that is performed for Materialized Query Tables (MQT).
Besides the comparison of the WHERE selection, the optimization of a sparse index is identical to the
optimization that is performed for any Binary Radix index.
Refer to section Indexes and the Optimizer for more details on how Binary Radix indexes are optimized.
Related concepts
Indexes & the optimizer on page 208
Since the optimizer uses cost based optimization, more information about the database rows and columns
makes for a more efficient access plan created for the query. With the information from the indexes, the
optimizer can make better choices about how to process the request (local selection, joins, grouping, and
ordering).
Related reference
Using sparse indexes on page 221
SQL indexes can be created using WHERE selection predicates. These indexes can also be referred to as
sparse indexes. The advantage of a sparse index is that fewer entries are maintained in the index. Only
those entries matching the WHERE selection criteria are maintained in the index.
Sparse index matching algorithm:
This topic is a generalized discussion of how the sparse index matching algorithm works.
The selection in the query must be a subset of the selection in the sparse index in order for the sparse
index to be used. This statement is true whether the selection is ANDed together, ORed together, or a
combination of the two. For selection where all predicates are ANDed together, all WHERE selection
predicates specified in the sparse index must also be specified in the query. The query can contain
additional ANDed predicates. The selection for the additional predicates will be performed after the
entries are retrieved from the sparse index. See examples A1, A2, and A3 following.
Example A1
In this example, the query selection exactly matches the sparse index selection and an index scan over the
sparse index can be used.
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 and COL2=20 and COL3=30
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 and COL2=20 and COL3=30
Example A2
In this example, the query selection is a subset of the sparse index selection and an index scan over the
sparse index can be used. The remaining query selection (COL3=30) is executed following the index scan.
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 and COL2=20
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 and COL2=20 and COL3=30
Example A3
196 IBM i: Database Performance and Query Optimization
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In this example, the query selection is not a subset of the sparse index selection and the sparse index
cannot be used.
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 and COL2=20 and COL3=30
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 and COL2=20
For selection where all predicates are ORed together, all WHERE selection predicates specified in the
query, must also be specified in the sparse index. The sparse index can contain additional ORed
predicates. All the ORed selection in the query will be executed after the entries are retrieved from the
sparse index. See examples O1, O2, andO3 following.
Example O1
In this example, the query selection exactly matches the sparse index selection and an index scan over the
sparse index can be used. The query selection is executed following the index scan.
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 or COL2=20 or COL3=30
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 or COL2=20 or COL3=30
Example O2
In this example, the query selection is a subset of the sparse index selection and an index scan over the
sparse index can be used. The query selection is executed following the index scan.
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 or COL2=20 or COL3=30
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 or COL2=20
Example O3
In this example, the query selection is not a subset of the sparse index selection and the sparse index
cannot be used.
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 or COL2=20
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 or COL2=20 or COL3=30
The previous examples used simple selection, all ANDed, or all ORed together. These examples are not
typical, but they demonstrate how the selection of the sparse index is compared to the selection of the
query. Obviously, the more complex the selection the more difficult it becomes to determine compatibility.
In the next example T1, the constant MN was replaced by a parameter marker for the query selection.
The sparse index had the local selection of COL1=MN applied to it when it was created. The sparse
index matching algorithm matches the parameter marker to the constant MN in the query predicate
COL1 =?. It verifies that the value of the parameter marker is the same as the constant in the sparse
index; therefore the sparse index can be used.
The sparse index matching algorithm attempts to match where the predicates between the sparse index
and the query are not the same. An example is a sparse index with a predicate SALARY > 50000, and a
Performance and query optimization 197
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
query with the predicate SALARY > 70000. The sparse index contains the rows necessary to run the
query. The sparse index is used in the query, but the predicate SALARY > 70000 remains as selection in
the query (it is not removed).
Example T1
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1='MN' or COL2='TWINS'
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=? or COL2='TWINS' or COL3='WIN'
In the next example T2, the keys of the sparse index match the ORDER BY fields in the query. For the
sparse index to satisfy the specified ordering, the optimizer must verify that the query selection is a
subset of the sparse index selection. In this example, the sparse index can be used.
Example T2
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL1, COL3)
WHERE COL1='MN' or COL2='TWINS'
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL2='TWINS'
ORDER BY COL1, COL3
Related reference
Using sparse indexes on page 221
SQL indexes can be created using WHERE selection predicates. These indexes can also be referred to as
sparse indexes. The advantage of a sparse index is that fewer entries are maintained in the index. Only
those entries matching the WHERE selection criteria are maintained in the index.
Details on the MQT matching algorithm on page 80
What follows is a generalized discussion of how the MQT matching algorithm works.
Sparse index examples:
This topic shows examples of how the sparse index matching algorithm works.
In example S1, the query selection is a subset of the sparse index selection and consequently an index
scan over the sparse index is used. The remaining query selection (COL3=30) is executed following the
index scan.
Example S1
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 and COL2=20
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 and COL2=20 and COL3=30
In example S2, the query selection is not a subset of the sparse index selection and the sparse index
cannot be used.
Example S2
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 and COL2=20 and COL3=30
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 and COL2=20
198 IBM i: Database Performance and Query Optimization
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In example S3, the query selection exactly matches the sparse index selection and an index scan over the
sparse index can be used.
Example S3
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 and COL2=20 and COL3=30
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 and COL2=20 and COL3=30
In example S4, the query selection is a subset of the sparse index selection and an index scan over the
sparse index can be used. The remaining query selection (COL3=30) is executed following the index scan.
Example S4
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 and COL2=20
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 and COL2=20 and COL3=30
In example S5, the query selection is not a subset of the sparse index selection and the sparse index
cannot be used.
Example S5
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 and COL2=20 and COL3=30
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 and COL2=20
In example S6, the query selection exactly matches the sparse index selection and an index scan over the
sparse index can be used. The query selection is executed following the index scan to eliminate excess
records from the sparse index.
Example S6
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 or COL2=20 or COL3=30
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 or COL2=20 or COL3=30
In example S7, the query selection is a subset of the sparse index selection and an index scan over the
sparse index can be used. The query selection is executed following the index scan to eliminate excess
records from the sparse index.
Example S7
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 or COL2=20 or COL3=30
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 or COL2=20
In example S8, the query selection is not a subset of the sparse index selection and the sparse index
cannot be used.
Example S8
Performance and query optimization 199
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 or COL2=20
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 or COL2=20 or COL3=30
In the next example S9, the constant MN was replaced by a parameter marker for the query selection.
The sparse index had the local selection of COL1=MN applied to it when it was created. The sparse
index matching algorithm matches the parameter marker to the constant MN in the query predicate
COL1 =?. It verifies that the value of the parameter marker is the same as the constant in the sparse
index; therefore the sparse index can be used.
Example S9
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1='MN' or COL2='TWINS'
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
Where Col3='WIN' and (Col1=? or Col2='TWINS')
In the next example S10, the keys of the sparse index match the order by fields in the query. For the
sparse index to satisfy the specified ordering, the optimizer must verify that the query selection is a
subset of the sparse index selection. In this example, the sparse index can be used.
Example S10
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL1, COL3)
WHERE COL1='MN' or COL2='TWINS'
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
Where Col3='WIN' and (Col1='MN' or Col2='TWINS')
ORDER BY COL1, COL3
In the next example S11, the keys of the sparse index do not match the order by fields in the query. But
the selection in sparse index T2 is a superset of the query selection. Depending on size, the optimizer
might choose an index scan over sparse index T2 and then use a sort to satisfy the specified ordering.
Example S11
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL2, COL4)
WHERE COL1='MN' or COL2='TWINS'
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
Where Col3='WIN' and (Col1='MN' or Col2='TWINS')
ORDER BY COL1, COL3
The next example S12 represents the classic optimizer decision: is it better to do an index probe using
index IX1 or is it better to do an index scan using sparse index SPR1? Both indexes retrieve the same
number of index entries and have the same cost from that point forward. For example, both indexes have
the same cost to retrieve the selected records from the dataspace, based on the retrieved entries/keys.
The cost to retrieve the index entries is the deciding criteria. In general, if index IX1 is large then an
index scan over sparse index SPR1 has a lower cost to retrieve the index entries. If index IX1 is rather
small then an index probe over index IX1 has a lower cost to retrieve the index entries. Another cost
decision is reusability. The plan using sparse index SPR1 is not as reusable as the plan using index IX1
because of the static selection built into the sparse selection.
Example S12
CREATE INDEX MYLIB/IX1 on MYLIB/T1 (COL1, COL2, COL3)
200 IBM i: Database Performance and Query Optimization
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 and COL2=20 and COL3=30
CSELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 and COL2=20 and COL3=30
Specify PAGESIZE on index creates
You can use the PAGESIZE parameter to specify the access path logical page size used by the system
when the access path is created. Use the PAGESIZE parameter when creating keyed files or indexes using
the Create Physical File (CRTPF) or Create Logical File (CRTLF) commands, or the SQL CREATE INDEX
statement.
The logical page size is the access path number of bytes that can be moved from auxiliary storage to the
job storage pool for a page fault.
Consider using the default of *KEYLEN for this parameter, except in rare circumstances. Then the page
size can be determined by the system based on the total length of the keys. When the access path is used
by selective queries (for example, individual key lookup), a smaller page size is typically more efficient.
When the query-selected keys are grouped in the access path with many records selected, or the access
path is scanned, a larger page size is more efficient.
Related information
Create Logical File (CRTLF) command
Create Physical File (CRTPF) command
SQL Create Index statement
General index maintenance
Whenever indexes are created and used, there is a potential for a decrease in I/O velocity due to
maintenance. Therefore, consider the maintenance cost of creating and using additional indexes. For radix
indexes with MAINT(*IMMED), maintenance occurs when inserting, updating, or deleting rows.
To reduce the maintenance of your indexes consider:
v Minimizing the number of table indexes by creating composite (multiple column) key indexes.
Composite indexes can be used for multiple different situations.
v Dropping indexes during batch inserts, updates, and deletes
v Creating in parallel. Either create indexes, one at a time, in parallel using SMP or create multiple
indexes simultaneously with multiple batch jobs
v Maintaining indexes in parallel using SMP
The goal of creating indexes is to improve query performance by providing statistics and implementation
choices. Maintain a reasonable balance on the number of indexes to limit maintenance overhead.
Encoded vector indexes
An encoded vector index (EVI) is used to provide fast data access in decision support and query
reporting environments.
EVIs are a complementary alternative to existing index objects (binary radix tree structure - logical file or
SQL index) and are a variation on bitmap indexing. Because of their compact size and relative simplicity,
EVIs provide for faster scans of a table that can also be processed in parallel.
An EVI is a data structure that is stored as two components:
v The symbol table contains statistical and descriptive information about each distinct key value
represented in the table. Each distinct key is assigned a unique code, either 1 byte, 2 bytes or 4 bytes in
size.
Performance and query optimization 201
|
|
|
|
|
|
By specifying INCLUDE on the create, additional aggregate values can be maintained in real time as an
extension of the key portion of the symbol table entry. These aggregated values are over non-key data
in the table grouped by the specified EVI key.
v The vector is an array of codes listed in the same ordinal position as the rows in the table. The vector
does not contain any pointers to the actual rows in the table.
Advantages of EVIs:
v Require less storage
v May have better build times than radix, especially if the number of unique values in the columns
defined for the key is relatively small.
v Provide more accurate statistics to the query optimizer
v Considerably better performance for certain grouping types of queries
v Good performance characteristics for decision support environments.
v Can be further extended for certain types of grouping queries with the addition of INCLUDE values.
Provides ready-made numeric aggregate values maintained in real time as part of index maintenance.
INCLUDE values become an extension of the EVI symbol table. Multiple include values can be
specified over different aggregating columns and maintained in the same EVI provided the group by
values are the same. This technique can reduce overall maintenance.
Disadvantages of EVIs:
v Cannot be used in ordering.
v Use for grouping is specialized. Supports:
COUNT, DISTINCT requests over key columns
aggregate requests over key columns where all other selection can be applied to the EVI symbol
table keys
INCLUDE aggregates
MIN or MAX, if aggregating value is part of the symbol table key.
v Use with joins always done in cooperation with hash table processing.
v Some additional maintenance idiosyncrasies.
Related reference
Encoded vector index on page 14
An encoded vector index is a permanent object that provides access to a table. This access is done by
assigning codes to distinct key values and then representing those values in a vector.
Related information
SQL Create Index statement
SQL INCLUDE statement
How the EVI works
EVIs work in different ways for costing and implementation.
For costing, the optimizer uses the symbol table to collect metadata information about the query.
For implementation, the optimizer can use the EVI in one of the following ways:
v Selection (WHERE clause)
The database engine uses the vector to build a dynamic bitmap or list of selected row ids. The bitmap
or list contains 1 bit for each row in the table. The bit is turned on for each selected row. Like a bitmap
index, these intermediate dynamic bitmaps (or lists) can be ANDed and ORed together to satisfy a
query.
For example, a user wants to see sales data for a specific region and time period. You can define an
EVI over the region and quarter columns of the table. When the query runs, the database engine
202 IBM i: Database Performance and Query Optimization
|
|
|
|
|
|
|
|
|
|
|
|
|
|
builds dynamic bitmaps using the two EVIs. The bitmaps are ANDed together to produce a single
bitmap containing only the relevant rows for both selection criteria.
This ANDing capability drastically reduces the number of rows that the system must read and test.
The dynamic bitmaps exists only as long as the query is executing. Once the query is completed, the
dynamic bitmaps are eliminated.
v Grouping or Distinct
The symbol table within the EVI contains distinct values for the specified columns in the key
definition. The symbol table also contains a count of the number of records in the base table that have
each distinct value. Queries involving grouping or distinct, based solely on columns in the key, are
candidates for a technique that uses the symbol table directly to determine the query result.
The symbol table contains only the key values and their associated counts, unless INCLUDE is
specified. Therefore, queries involving column function COUNT are eligible for this technique. But
queries with column functions MIN or MAX on other non-key columns are not eligible. MIN and MAX
values are not stored in the symbol table.
v EVI INCLUDE aggregates
Including additional aggregate values further extends the ability of the symbol table to provide
ready-made results. Aggregate data is grouped by the specified columns in the key definition.
Therefore, aggregate data must be over columns in the table other than those columns specified as EVI
key values.
For performance, these included aggregates are limited to numeric results (SUM, COUNT, AVG,
VARIANCE) as they can be maintained directly from the inserted or removed row.
MIN or MAX values would occasionally require other row comparisons during maintenance and
therefore are not supported with the INCLUDE keyword.
EVI symbol table only access is used to satisfy distinct or grouping requests when the query is run
with commitment control *NONE or *CHG.
INCLUDE for additional aggregate values can be used in join queries. When possible, the existence of
EVIs with INCLUDE aggregates causes the group by process to be pushed down to each table as
necessary. See the following EVI INCLUDE grouping push down example: EVI INCLUDE aggregate
example on page 69
Related reference
Encoded vector index index-only access on page 16
The encoded vector index can also be used for index-only access.
Encoded vector index symbol table scan on page 17
An encoded vector index symbol table scan operation is used to retrieve the entries from the symbol
table portion of the index.
Encoded vector index symbol table probe on page 20
An encoded vector index symbol table probe operation is used to retrieve entries from the symbol table
portion of the index. Scanning the entire symbol table is not necessary.
Index grouping implementation on page 67
There are two primary ways to implement grouping using an index: Ordered grouping and
pre-summarized processing.
Related information
SQL INCLUDE statement
When to create EVIs
There are several instances to consider creating EVIs.
Consider creating encoded vector indexes when any one of the following is true:
v You want to gather live statistics
v Full table scan is currently being selected for the query
Performance and query optimization 203
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
v Selectivity of the query is 20%-70% and using skip sequential access with dynamic bitmaps speed up
the scan
v When a star schema join is expected to be used for star schema join queries.
v When grouping or distinct queries are specified against a column, the columns have few distinct values
and only the COUNT column function, if any, is used.
v When ready-made aggregate results grouped by the specified key columns would benefit query
performance.
Create encoded vector indexes with:
v Single key columns with a low number of distinct values expected
v Keys columns with a low volatility (do not change often)
v Maximum number of distinct values expected using the WITH n DISTINCT VALUES clause
v Single key over foreign key columns for a star schema model
EVI with INCLUDE vs Materialized Query Tables
Although EVIs with INCLUDE are not a substitute for Materialized Query Tables (MQTs), INCLUDE
EVIs have an advantage over single table aggregate MQTs (materialized query tables). The advantage is
that the ready-made aggregate results are maintained in real time, not requiring explicit REFRESH TABLE
requests. For performance and read access to aggregate results, consider turning your single table,
aggregate MQTs into INCLUDE EVIs. Keep in mind that the other characteristics of a good EVI are
applicable, such as a relatively low number of distinct key values.
As indexes, these EVIs are found during optimization just as any other indexes are found. Unlike MQTs,
there is no INI setting to enable and no second pass through the optimizer to cost the application of this
form of ready-made aggregate. In addition, EVIs with INCLUDE can be used to populate MQT summary
tables if the EVI is a match for a portion of the MQT definition.
Related reference
Encoded vector index symbol table scan on page 17
An encoded vector index symbol table scan operation is used to retrieve the entries from the symbol
table portion of the index.
Index grouping implementation on page 67
There are two primary ways to implement grouping using an index: Ordered grouping and
pre-summarized processing.
Related information
SQL INCLUDE statement
EVI maintenance
There are unique challenges to maintaining EVIs. The following table shows a progression of how EVIs
are maintained, the conditions under which EVIs are most effective, and where EVIs are least effective,
based on the EVI maintenance characteristics.
204 IBM i: Database Performance and Query Optimization
|
|
|
|
|
|
|
|
|
|
|
|
|
Table 50. EVI Maintenance Considerations
Condition Characteristics
Most Effective
Least Effective
When inserting an existing distinct
key value
v Minimum overhead
v Symbol table key value looked up
and statistics updated
v Vector element added for new row,
with existing byte code
v Minimal additional pathlength to
maintain any INCLUDEd
aggregate values (the increment of
a COUNT or adding to an
accumulating SUM)
When inserting a new distinct key
value - in order, within byte code
range
v Minimum overhead
v Symbol table key value added,
byte code assigned, statistics
assigned
v Vector element added for new row,
with new byte code
v Minimal additional pathlength to
maintain any INCLUDEd
aggregate values (the increment of
a COUNT or adding to an
accumulating SUM)
When inserting a new distinct key
value - out of order, within byte code
range
v Minimum overhead if contained
within overflow area threshold
v Symbol table key value added to
overflow area, byte code assigned,
statistics assigned
v Vector element added for new row,
with new byte code
v Considerable overhead if overflow
area threshold reached
v Access path validated - not
available
v EVI refreshed, overflow area keys
incorporated, new byte codes
assigned (symbol table and vector
elements updated)
v Minimal additional path-length to
maintain any INCLUDEd
aggregate values (the increment of
a COUNT or adding to an
accumulating SUM)
When inserting a new distinct key
value - out of byte code range
v Considerable overhead
v Access plan invalidated - not
available
v EVI refreshed, next byte code size
used, new byte codes assigned
(symbol table and vector elements
updated
v Not applicable to EVIs with
INCLUDE, as by definition the
max allowed byte code is used
Performance and query optimization 205
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Related reference
Encoded vector index on page 14
An encoded vector index is a permanent object that provides access to a table. This access is done by
assigning codes to distinct key values and then representing those values in a vector.
Related information
SQL INCLUDE statement
Recommendations for EVI use
Encoded vector indexes are a powerful tool for providing fast data access in decision support and query
reporting environments. To ensure the effective use of EVIs, use the following guidelines.
Create EVIs on
v Read-only tables or tables with a minimum of INSERT, UPDATE, DELETE activity.
v Key columns that are used in the WHERE clause - local selection predicates of SQL requests.
v Single key columns that have a relatively small set of distinct values.
v Multiple key columns that result in a relatively small set of distinct values.
v Key columns that have a static or relatively static set of distinct values.
v Non-unique key columns, with many duplicates.
Create EVIs with the maximum byte code size expected
v Use the WITH n DISTINCT VALUES clause on the CREATE ENCODED VECTOR INDEX statement.
v If unsure, use a number greater than 65,535 to create a 4 byte code. This method avoids the EVI
maintenance involved in switching byte code sizes.
v EVIs with INCLUDE always create with a 4 byte code.
When loading data
v Drop EVIs, load data, create EVIs.
v EVI byte code size is assigned automatically based on the number of actual distinct key values found
in the table.
v Symbol table contains all key values, in order, no keys in overflow area.
v EVIs with INCLUDE always use 4 byte code
Consider adding INCLUDE values to existing EVIs
An EVI index with INCLUDE values can be used to supply ready-made aggregate results. The existing
symbol table and vector are still used for table selection, when appropriate, for skip sequential plans over
large tables, or for index ANDing and ORing plans. If you already have EVIs, consider creating new ones
with additional INCLUDE values, and then drop the pre-existing index.
Consider specifying multiple INCLUDE values on the same EVI create
If you need different aggregates over different table values for the same GROUP BY columns specified as
EVI keys, define those aggregates in the same EVI. This definition cuts down on maintenance costs and
allows for a single symbol table and vector.
For example:
Select SUM(revenue) from sales group by Country
Select SUM(costOfGoods) from sales group by Country, Region
Both queries could benefit from the following EVI:
206 IBM i: Database Performance and Query Optimization
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CREATE ENCODED VECTOR INDEX eviCountryRegion on Sales(country,region)
INCLUDE(SUM(revenue), SUM(costOfGoods))
The optimizer does additional grouping (regrouping) if the EVI key values are wider than the
corresponding GROUP BY request of the query. This additional grouping would be the case in the first
example query.
If an aggregate request is specified over null capable results, an implicit COUNT over that same result is
included as part of the symbol table entry. The COUNT is used to facilitate index maintenance when a
requested aggregate needs to reflect. It can also assist with pushing aggregation through a join if the
optimizer determines this push is possible. The COUNT is then used to help compensate for fewer join
activity due to the pushed down grouping.
Consider SMP and parallel index creation and maintenance
Symmetrical Multiprocessing (SMP) is a valuable tool for building and maintaining indexes in parallel.
The results of using the optional SMP feature of i5/OS are faster index build times, and faster I/O
velocities while maintaining indexes in parallel. Using an SMP degree value of either *OPTIMIZE or
*MAX, additional multiple tasks and additional system resources are used to build or maintain the
indexes. With a degree value of *MAX, expect linear scalability on index creation. For example, creating
indexes on a 4-processor system can be four times as fast as a 1-processor system.
Checking values in the overflow area
You can also use the Display File Description (DSPFD) command (or System i Navigator - Database) to
check how many values are in the overflow area. Once the DSPFD command is issued, check the
overflow area parameter for details on the initial and actual number of distinct key values in the
overflow area.
Using CHGLF to rebuild the access path of an index
Use the Change Logical File (CHGLF) command with the attribute Force Rebuild Access Path set to YES
(FRCRBDAP(*YES)). This command accomplishes the same thing as dropping and recreating the index,
but it does not require that you know about how the index was built. This command is especially
effective for applications where the original index definitions are not available, or for refreshing the
access path.
Related information
SQL Create Index statement
SQL INCLUDE statement
Change Logical File (CHGLF) command
Display File Description (DSPFD) command
Comparing binary radix indexes and encoded vector indexes
DB2 for IBM i makes indexes a powerful tool.
The following table summarizes some of the differences between binary radix indexes and encoded
vector indexes:
Table 51. Comparison of radix and EVI indexes
Comparison value Binary Radix Indexes Encoded Vector Indexes
Basic data structure A wide, flat tree A Symbol Table and a vector
Interface for creating Command, SQL, System i Navigator SQL, System i Navigator
Can be created in parallel Yes Yes
Performance and query optimization 207
|
|
|
|
|
|
|
|
|
|
||
|||
|||
|||
|||
Table 51. Comparison of radix and EVI indexes (continued)
Comparison value Binary Radix Indexes Encoded Vector Indexes
Can be maintained in parallel Yes Yes
Used for statistics Yes Yes
Used for selection Yes Yes, with dynamic bitmaps or RRN
list
Used for joining Yes Yes (with a hash table)
Used for grouping Yes Yes
Used for ordering Yes No
Used to enforce unique Referential
Integrity constraints
Yes No
Source for predetermined or
ready-made numeric aggregate
results
No Yes, with INCLUDE keyword option
on create
Indexes & the optimizer
Since the optimizer uses cost based optimization, more information about the database rows and columns
makes for a more efficient access plan created for the query. With the information from the indexes, the
optimizer can make better choices about how to process the request (local selection, joins, grouping, and
ordering).
The CQE optimizer attempts to examine most, if not all, indexes built over a table unless or until it times
out. However, the SQE optimizer only considers those indexes that are returned by the Statistics Manager.
These include only indexes that the Statistics Manager decides are useful in performing local selection
based on the where clause predicates. Consequently, the SQE optimizer does not time out.
The primary goal of the optimizer is to choose an implementation that efficiently eliminates the rows that
are not interesting or required to satisfy the request. Normally, query optimization is thought of as trying
to find the rows of interest. A proper indexing strategy assists the optimizer and database engine with
this task.
Instances where an index is not used
DB2 for i does not use indexes in the certain instances.
v For a column that is expected to be updated; for example, when using SQL, your program might
include the following:
EXEC SQL
DECLARE DEPTEMP CURSOR FOR
SELECT EMPNO, LASTNAME, WORKDEPT
FROM CORPDATA.EMPLOYEE
WHERE (WORKDEPT = 'D11' OR
WORKDEPT = 'D21') AND
EMPNO = '000190'
FOR UPDATE OF EMPNO, WORKDEPT
END-EXEC.
When using the OPNQRYF command, for example:
OPNQRYF FILE((CORPDATA/EMPLOYEE)) OPTION(*ALL)
QRYSLT('(WORKDEPT *EQ ''D11'' *OR WORKDEPT *EQ ''D21'')
*AND EMPNO *EQ ''000190''')
Even if you do not intend to update the employee department, the system cannot use an index with a
key of WORKDEPT.
208 IBM i: Database Performance and Query Optimization
|
|||
|||
|||
|||
|
|||
|||
|||
|
|
||
|
|
|
||
|
|
An index can be used if all the index updatable columns are also used within the query as an
isolatable selection predicate with an equal operator. In the previous example, the system uses an index
with a key of EMPNO.
The system can operate more efficiently if the FOR UPDATE OF column list only names the column
you intend to update: WORKDEPT. Therefore, do not specify a column in the FOR UPDATE OF
column list unless you intend to update the column.
If you have an updatable cursor because of dynamic SQL, or FOR UPDATE was not specified and the
program contains an UPDATE statement, then all columns can be updated.
v For a column being compared with another column from the same row. For example, when using SQL,
your program might include the following:
EXEC SQL
DECLARE DEPTDATA CURSOR FOR
SELECT WORKDEPT, DEPTNAME
FROM CORPDATA.EMPLOYEE
WHERE WORKDEPT = ADMRDEPT
END-EXEC.
When using the OPNQRYF command, for example:
OPNQRYF FILE (EMPLOYEE) FORMAT(FORMAT1)
QRYSLT('WORKDEPT *EQ ADMRDEPT')
Even though there is an index for WORKDEPT and another index for ADMRDEPT, DB2 for i does not
use either index. The index has no added benefit because every row of the table needs to be looked at.
Display indexes for a table
You can display indexes that are created on a table using System i Navigator.
To display indexes for a table, follow these steps:
1. In the System i Navigator window, expand the system that you want to use.
2. Expand Databases and the database that you want to work with.
3. Expand Schemas and the schema that you want to work with.
4. Right-click a table and select Show Indexes.
The Show index window includes the following columns:
Table 52. Columns used in Show index window
Column name Description
Name The SQL name for the index
Type The type of index displayed. Possible values are:
v Keyed Physical File
v Keyed Logical File
v Primary Key Constraint
v Unique Key Constraint
v Foreign Key Constraint
v Index
Schema Schema or library containing the index or access path
Owner User ID of the owner of this index or access path
System Name System table name for the index or access path.
Text The text description of the index or access path
Performance and query optimization 209
||
||
||
||
|
|
|
|
|
|
||
||
||
||
Table 52. Columns used in Show index window (continued)
Column name Description
Index partition Partition detail for the index. Possible values:
v <blank>, For all partitions
v For Each Partition
v specific name of the partition
Valid Whether the access path or index is valid. The possible values are
Yes or No.
Creation Date The timestamp of when the index was created.
Last Build The last time that the access path or index was rebuilt.
Last Query Use Timestamp when the access path was last used by the optimizer.
Last Query Statistics Use Timestamp when the access path was last used for statistics
Query Use Count Number of times the access path has been used for a query
Query Statistics Use Count Number of times the access path has been used for statistics
Last Used Date Timestamp when the access path or index was last used.
Days Used Count The number of days the index has been used.
Date Reset Days Used Count The year and date when the days-used count was last set to 0.
Number of Key Columns The number of key columns defined for the access path or index.
Key Columns The key columns defined for the access path or index.
Current Key Values The number of current key values.
Current Size The size of the access path or index.
Current Allocated Pages The current number of pages allocated for the access path or index.
Logical Page Size The number of bytes used for the access path or the logical page
size of the index. Indexes with larger logical page sizes are typically
more efficient when scanned during query processing. Indexes with
smaller logical page sizes are typically more efficient for simple
index probes and individual key look ups. If the access path or
index is an encoded vector, the value 0 is returned.
Duplicate Key Order How the access path or index handles duplicate key values.
Possible values are:
v Unique - all values are unique.
v Unique where not null - all values are unique unless null is
specified.
Maximum Key Length The maximum key length for the access path or index.
Unique Partial Key Values The number of unique partial keys for the key fields 1 through 4. If
the access path is an encoded vector, this number represents the
number of full key distinct values.
Overflow Values The number of overflow values for this encoded vector index.
Key Code Size The length of the code assigned to each distinct key value of the
encoded vector index.
Sparse Is the index considered sparse. Sparse indexes only contain keys for
rows that satisfy the query. Possible values are:
v Yes
v No
210 IBM i: Database Performance and Query Optimization
|
||
||
|
|
|
||
|
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|
|
|
|
|
||
|
|
|
|
||
||
|
|
||
||
|
||
|
|
|
Table 52. Columns used in Show index window (continued)
Column name Description
Derived Key Is the index considered derived. A derived key is a key that is the
result of an operation on the base column. Possible values are:
v Yes
v No
Partitioned Is the index partition created for each data partition defined for the
table using the specified columns. Possible values are:
v Yes
v No
Maximum Size The maximum size of the access path or index.
Sort Sequence The alternate character sorting sequence for National Language
Support (NLS).
Language Identifier The language code for the object.
Estimated Rebuild Time The estimated time in seconds required to rebuild the access path
or index.
Held Is a rebuild of an access path or index held. Possible values are:
v Yes
v No
Maintenance For objects with key fields or join logical files, the type of access
path maintenance used. The possible values are:
v Do not wait
v Delayed
v Rebuild
Delayed Maintenance Keys The number of delayed maintenance keys for the access path or
index.
Recovery When the access path is rebuilt after damage to the access path is
recognized. The possible values are:
v After IPL
v During IPL
v Next Open
Index Logical Reads The number of access path or index logical read operations since
the last IPL.
WHERE Clause Specifies the condition to apply for a row to be included in the
index.
WHERE Clause Has UDF Does the WHERE clause have a UDF. Possible values are:
v Yes
v No
Table Table name of the table that the index is based on.
Table Partition Partition name of the table that the index is based on.
Table System Name System name of the table that the index is based on.
Last Rebuild Number Keys Number of keys in the index when the index was last rebuilt.
Last Rebuild Parallel Degree Parallel degree used when the index was last rebuilt.
Last Rebuild Time Amount of time in seconds it took to rebuild the index the last time
the index was rebuilt.
Performance and query optimization 211
|
||
||
|
|
|
||
|
|
|
||
||
|
||
||
|
||
|
|
||
|
|
|
|
||
|
||
|
|
|
|
||
|
||
|
||
|
|
||
||
||
||
||
||
|
Table 52. Columns used in Show index window (continued)
Column name Description
Keep in Memory Is the index kept in memory. Possible values are:
v Yes
v No
Sort Sequence Schema Schema of the sort sequence table if one is used.
Sort Sequence Name Name of the sort sequence table if one is used.
Random Reads The number of reads that have occurred in a random fashion.
Random means that the location of the row or key could not be
predicted ahead of time.
Media Preference Indicates preference whether the storage for the table, partition, or
index is allocated on Solid State Disk (SSD), if available.
Determine unnecessary indexes
You can easily determine which indexes are being used for query optimization.
Before V5R3, it was difficult to determine unnecessary indexes. Using the Last Used Date was not
dependable, as it was only updated when the logical file was opened using a native database application
(for example, an RPG application). Furthermore, it was difficult to find all the indexes over a physical
file. Indexes are created as part of a keyed physical file, keyed logical file, join logical file, SQL index,
primary key or unique constraint, or referential constraint. However, you can now easily find all indexes
and retrieve statistics on index usage as a result of System i Navigator and i5/OS functionality. To assist
you in tuning your performance, this function now produces statistics on index usage as well as index
usage in a query.
To access index information through the System i Navigator, navigate to: Database Schemas Tables.
Right-click your table and select Show Indexes.
You can show all indexes for a schema by right-clicking on Tables or Indexes and selecting Show
indexes.
Note: You can also view the statistics through the Retrieve Member Description (QUSRMBRD) API.
Certain fields available in the Show Indexes window can help you to determine any unnecessary
indexes. Those fields are:
Last Query Use
States the timestamp when the index was last used to retrieve data for a query.
Last Query Statistic Use
States the timestamp when the index was last used to provide statistical information.
Query Use Count
Lists the number of instances the index was used in a query.
Query Statistics Use
Lists the number of instances the index was used for statistical information.
Last Used Date
The century and date this index was last used.
Days Used Count
The number of days the index was used. If the index does not have a last used date, the count is
0.
212 IBM i: Database Performance and Query Optimization
|
||
||
|
|
||
||
||
|
|
||
|
|
Date Reset Days Used Count
The date that the days used count was last reset. You can reset the days used by Change Object
Description (CHGOBJD) command.
The fields start and stop counting based on your situation, or the actions you are currently performing on
your system. The following list describes what might affect one or both of your counters:
v The SQE and CQE query engines increment both counters. As a result, the statistics field is updated
regardless of which query interface is used.
v A save and restore procedure does not reset the statistics counter if the index is restored over an
existing index. If an index is restored that does not exist on the system, the statistics are reset.
Related information
Retrieve Member Description (QUSRMBRD) API
Change Object Description (CHGOBJD) command
Reset usage counts
Resetting the usage counts for a table allows you to determine how the changes you made to your
indexing strategy affected the indexes and constraints on that table. For example, if your new strategy
causes an index to never be used, you could then delete that index. Resetting usage counts on a table
affect all indexes and constraints that are created on that object.
Note: Resetting usage counts for a keyed physical file or a constraint in the Show Indexes window resets
the counts of all constraints and keyed access for that file or table.
You can reset index usage counts by right-clicking a specific index in the Indexes folder or in the Show
Indexes dialog and selecting Reset Usage Counts.
View index build status
You can view a list of indexes that are being built by the database. This view might be helpful in
determining when the index becomes usable to your applications.
To display indexes that are being built, follow these steps:
1. In the System i Navigator window, expand the system that you want to use.
2. Expand Databases.
3. Expand the database that you want to work with and then expand the Database Maintenance folder.
Select Index Builds.
Manage index rebuilds
You can manage the rebuild of your indexes using System i Navigator. You can view a list of access paths
that are rebuilding and either hold the access path rebuild or change the priority of a rebuild.
To display access paths to rebuild, follow these steps:
1. In the System i Navigator window, expand the system that you want to use.
2. Expand Databases.
3. Expand the database that you want to work with and then expand the Database Maintenance folder.
SelectIndex Rebuilds.
The access paths to rebuild dialog includes the following columns:
Table 53. Columns used in Index rebuilds window
Column name Description
Name Name of access path being rebuilt.
Schema Schema name where the index is located.
Performance and query optimization 213
|
|
|
|
|
|
|
|
|
||
||
||
||
Table 53. Columns used in Index rebuilds window (continued)
Column name Description
System Name The system name of the file that owns the index to be rebuilt.
System Schema System schema name of access path being rebuilt.
Type The type of index displayed. Possible values are:
Keyed Physical File
Keyed Logical File
Primary Key
Unique Key
Foreign Key
Index
Status Displays the status of the rebuild. Possible values are:
1-99 Rebuild Priority
Running Rebuilding
Held Held from be rebuilt
Rebuild Priority Displays the priority in which the rebuild for this access path is
run. Also referred to as sequence number. Possible values are:
1-99: Order to rebuild
Held
Open
Rebuild Reason Displays the reason why this access path needs to be rebuilt.
Possible values are:
Create or build index
IPL
Runtime error
Change file or index sharing
Other
Not needed
Change End of Data
Restore
Alter table
Change table
Change file
Reorganize
Enable a constraint
Alter table recovery
Change file recovery
Index shared
Runtime error
Verify constraint
Convert member
Restore recovery
214 IBM i: Database Performance and Query Optimization
|
||
||
||
||
|
|
|
|
|
|
||
|
|
|
||
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Table 53. Columns used in Index rebuilds window (continued)
Column name Description
Rebuild Reason Subtype Displays the subtype reason why this access path needs to be
rebuilt. Possible values are:
Unexpected error
Index in use during failure
Unexpected error during update, delete, or insert
Delayed maintenance overflow or catch-up error
Other
No event
Change End of Data
Delayed maintenance mismatch
Logical page size mismatch
Partial index restore
Index conversion
Index not saved and restored
Partitioning mismatch
Partitioning change
Index or key attributes change
Original index invalid
Index attributes change
Force rebuild of index
Index not restored
Asynchronous rebuilds requested
Job ended abnormally
Alter table
Change constraint
Index invalid or attributes change
Invalid unique index found
Invalid constraint index found
Index conversion required
If there is no subtype, this field displays 0.
Invalidation Reason Displays the reason why this access path was invalidated. Possible
values are:
User requested (See Invalidation Reason type for more
information)
Create or build Index
Load (See Invalidation Reason type for more information)
Initial Program Load (IPL)
Runtime error
Modify
Journal failed to build the index
Marked index as fixable during runtime
Marked index as fixable during IPL
Change end of data
Performance and query optimization 215
|
||
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
Table 53. Columns used in Index rebuilds window (continued)
Column name Description
Invalidation Reason Type Displays the reason type for why this access path was invalidation.
Possible reason types for User requested:
Invalid because of REORG
It is a copy
Alter file
Converting new member
Change to *FRCRBDAP
Change to *UNIQUE
Change to *REBLD
Possible reason type for LOAD
The index was marked for invalidation but the system
crashed before the invalidation could actually occur
The index was associated with the overlaid data space header
during a load, therefore it was invalidated
Index was in IMPI format. The header was converted and
now it is invalidated to be rebuilt in RISC format
The RISC index was converted to V5R1 format
Index invalidated due to partial load
Index invalidated due to a delayed maintenance mismatch
Index invalidated due to a pad key mismatch
Index invalidated due to a significant fields bitmap fix
Index invalidated due to a logical page size mismatch
Index was not restored. File might have been saved with
ACCPTH(*NO) or index did not exist when file was saved.
Index was not restored. File might have been saved with
ACCPTH(*NO) or index did not exist when file was saved.
Index was rebuilt because file was saved in an inconsistent
state with SAVACT(*SYSDFN).
For other invalidation codes, this field displays 0.
Estimated Rebuild Time Estimated amount of time in seconds that it takes to rebuild the
index access path.
Rebuild Start Time Time when the rebuild was started.
Elapsed Rebuild Time Amount of time that has elapsed in seconds since the start of the
rebuild of the access path
Unique Indicates whether the rows in the access path are unique. Possible
values are:
Yes
No
Last Query Use Timestamp when the access path was last used
Last Query Statistics Use Timestamp when the access path was last used for statistics
Query Use Count Number of times the access path has been used for a query
Query Statistics Use Count Number of times the access path has been used for statistics
Partition Partition detail for the index. Possible values:
v <blank>, which means For all partitions
v For Each Partition
v specific name of the partition
Owner User ID of the owner of this access path.
Parallel Degree Number of processors to be used to rebuild the index.
Text Text description of the file owning the index.
216 IBM i: Database Performance and Query Optimization
|
||
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
||
||
|
||
|
|
|
||
||
||
||
||
|
|
|
||
||
||
|
You can also use the Edit Rebuild of Access Paths (EDTRBDAP) command to manage rebuilding of access
paths.
Related information
Rebuild access paths
Edit Rebuild of Access Paths (EDTRBDAP) command
Indexing strategy
There are two approaches to index creation: proactive and reactive. Proactive index creation involves
anticipating which columns are most often used for selection, joining, grouping, and ordering. Then
building indexes over those columns. In the reactive approach, indexes are created based on optimizer
feedback, query implementation plan, and system performance measurements.
It is useful to initially build indexes based on the database model and applications and not any particular
query. As a starting point, consider designing basic indexes based on the following criteria:
v Primary and foreign key columns based on the database model
v Commonly used local selection columns, including columns that are dependent, such as an
automobiles make and model
v Commonly used join columns not considered primary or foreign key columns
v Commonly used grouping columns
Related information
Indexing and statistics strategies for DB2 for i5/OS
Reactive approach to tuning
To perform reactive tuning, build a prototype of the proposed application without any indexes and start
running some queries. Or you could build an initial set of indexes and start running the application to
see which ones get used and which do not. Even with a smaller database, the slow running queries
become obvious quickly.
The reactive tuning method is also used when trying to understand and tune an existing application that
is not performing up to expectations. Use the appropriate debugging and monitoring tools, described in
the next section, to view the database feedback messages:
v the indexes the optimizer recommends for local selection
v the temporary indexes used for a query
v the query implementation methods the optimizer chose
If the database engine is building temporary indexes to process joins or perform grouping and selection
over permanent tables, build permanent indexes over the same columns. This technique is used to
eliminate the temporary index creation. In some cases, a temporary index is built over a temporary table,
so a permanent index is not able to be built for those tables. You can use the optimization tools listed in
the previous section to note the temporary index creation, the reason it was created, and the key columns.
Proactive approach to tuning
Typically you will create an index for the most selective columns and create statistics for the least
selective columns in a query. By creating an index, the optimizer knows that the column is selective and
it also gives the optimizer the ability to use that index to implement the query.
In a perfect radix index, the order of the columns is important. In fact, it can make a difference as to
whether the optimizer uses it for data retrieval at all. As a general rule, order the columns in an index in
the following way:
v Equal predicates first. That is, any predicate that uses the = operator may narrow down the range of
rows the fastest and should therefore be first in the index.
v If all predicates have an equal operator, then order the columns as follows:
Performance and query optimization 217
Selection predicates + join predicates
Join predicates + selection predicates
Selection predicates + group by columns
Selection predicates + order by columns
In addition to the guidelines above, in general, the most selective key columns should be placed first in
the index.
Consider the following SQL statement:
SELECT b.col1, b.col2, a.col1
FROM table1 a, table2 b
WHERE b.col1='some_value' AND
b.col2=some_number AND
a.join_col=b.join_col
GROUP BY b.col1, b.col2, a.col1
ORDER BY b.col1
With a query like this, the proactive index creation process can begin. The basic rules are:
v Custom-build a radix index for the largest or most commonly used queries. Example using the query
above:
radix index over join column(s) - a.join_col and b.join_col
radix index over most commonly used local selection column(s) - b.col2
v For ad hoc online analytical processing (OLAP) environments or less frequently used queries, build
single-key EVIs over the local selection column(s) used in the queries. Example using the query above:
EVI over non-unique local selection columns - b.col1 and b.col2
Coding for effective indexes
The following topics provide suggestions to help you design code which allows DB2 for i to take
advantage of available indexes:
Avoid numeric conversions
When a column value and a host variable (or constant value) are being compared, try to specify the same
data types and attributes. DB2 for i might not use an index for the named column if the host variable or
constant value has a greater precision than the precision of the column. If the two items being compared
have different data types, DB2 for i needs to convert one or the other of the values, which can result in
inaccuracies (because of limited machine precision).
To avoid problems for columns and constants being compared, use the following:
v same data type
v same scale, if applicable
v same precision, if applicable
For example, EDUCLVL is a halfword integer value (SMALLINT). When using SQL, specify:
... WHERE EDUCLVL < 11 AND
EDUCLVL >= 2
instead of
... WHERE EDUCLVL < 1.1E1 AND
EDUCLVL > 1.3
When using the OPNQRYF command, specify:
... QRYSLT('EDUCLVL *LT 11 *AND ENUCLVL *GE 2')
instead of
218 IBM i: Database Performance and Query Optimization
... QRYSLT('EDUCLVL *LT 1.1E1 *AND EDUCLVL *GT 1.3')
If an index was created over the EDUCLVL column, then the optimizer might not use the index in the
second example. The constant precision is greater than the column precision. It attempts to convert the
constant to the precision of the column. In the first example, the optimizer considers using the index,
because the precisions are equal.
Avoid arithmetic expressions
Do not use an arithmetic expression as an operand to compare to a column in a row selection predicate.
The optimizer does not use an index on a column compared to an arithmetic expression. While this
technique might not cause the column index to become unusable, it prevents any estimates and possibly
the use of index scan-key positioning. The primary thing that is lost is the ability to use and extract any
statistics that might be useful in the optimization of the query.
For example, when using SQL, specify the following:
... WHERE SALARY > 16500
instead of
... WHERE SALARY > 15000*1.1
Avoid character string padding
Try to use the same data length when comparing a fixed-length character string column value to a host
variable or constant value. DB2 for i might not use an index if the constant value or host variable is
longer than the column length.
For example, EMPNO is CHAR(6) and DEPTNO is CHAR(3). For example, when using SQL, specify the
following:
... WHERE EMPNO > '000300' AND
DEPTNO < 'E20'
instead of
... WHERE EMPNO > '000300 ' AND
DEPTNO < 'E20 '
When using the OPNQRYF command, specify:
... QRYSLT('EMPNO *GT "000300" *AND DEPTNO *LT "E20"')
instead of
... QRYSLT('EMPNO *GT "000300" *AND DEPTNO *LT "E20"')
Avoid the use of LIKE patterns beginning with % or _
The percent (%), and underline (_), used in the pattern of a LIKE (OPNQRYF %WLDCRD) predicate,
specify a character string like the row column values to select. They can take advantage of indexes when
used to denote characters in the middle or at the end of a character string.
For example, when using SQL, specify the following:
... WHERE LASTNAME LIKE 'J%SON%'
When using the OPNQRYF command, specify the following:
... QRYSLT('LASTNAME *EQ %WLDCRD(''J*SON*'')')
However, when used at the beginning of a character string, they can prevent DB2 for i from using any
indexes that might be defined on the LASTNAME column to limit the number of rows scanned using
index scan-key positioning. Index scan-key selection, however, is allowed. For example, in the following
queries index scan-key selection can be used, but index scan-key positioning cannot.
Performance and query optimization 219
In SQL:
... WHERE LASTNAME LIKE '%SON'
In OPNQRYF:
... QRYSLT('LASTNAME *EQ %WLDCRD(''*SON'')')
Avoid patterns with a % so that you can get the best performance with key processing on the predicate.
If possible, try to get a partial string to search so that index scan-key positioning can be used.
For example, if you were looking for the name Smithers, but you only type S%, this query returns all
names starting with S. Adjust the query to return all names with Smi%. By forcing the use of partial
strings, you might get better performance in the long term.
Using derived indexes
SQL indexes can be created where the key is specified as an expression. This type of key is also referred
to as a derived key.
For example, look at the following:
CREATE INDEX TOTALIX ON EMPLOYEE(SALARY+BONUS+COMM AS TOTAL)
In this example, return all the employees whose total compensation is greater than 50000.
SELECT * FROM EMPLOYEE
WHERE SALARY+BONUS+COMM > 50000
ORDER BY SALARY+BONUS+COMM
Since the optimizer uses the index TOTALIX with index probe to satisfy the WHERE selection and the
ordering criteria.
Some special considerations to with derived key index usage and matching include:
v There is no matching for index key constants to query host variables. This non-match includes implicit
parameter marker conversion performed by the database manager.
CREATE INDEX D_IDX1 ON EMPLOYEE (SALARY/12 AS MONTHLY)
In this example, return all employees whose monthly salary is greater than 3000.
long months = 12;
EXEC SQL SELECT * FROM EMPLOYEE WHERE SALARY/:months > 3000
However, in this case the optimizer does not use the index since there is no support for matching the
host variable value months in the query to the constant 12 in the index.
Usage of the QAQQINI option PARAMETER_MARKER_CONVERSION with value *NO can be used
to prevent conversion of constants to parameter markers. This technique allows for improved derived
index key matching. However, because of the performance implications of using this QAQQINI setting,
take care with its usage.
v In general, expressions in the index must match the expression in the query:
.... WHERE SALARY+COMM+BONUS > 50000
In this case, the WHERE SALARY+COMM+BONUS is different from the index key
SALARY+BONUS+COMM and would not match.
v It is recommended that the derived index keys be kept as simple as possible. The more complex the
query expression to match and the index key expression is, the less likely it is that the index is used.
v The CQE optimizer has limited support for matching derived key indexes.
220 IBM i: Database Performance and Query Optimization
Related reference
Derived key index on page 194
You can use the SQL CREATE INDEX statement to create a derived key index using an SQL expression.
Related information
SQL Create Index statement
Using sparse indexes
SQL indexes can be created using WHERE selection predicates. These indexes can also be referred to as
sparse indexes. The advantage of a sparse index is that fewer entries are maintained in the index. Only
those entries matching the WHERE selection criteria are maintained in the index.
In general, the query WHERE selection must be a subset of the sparse index WHERE selection in order
for the sparse index to be used.
Here is a simple example of when a sparse index can be used:
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 and COL2=20
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 and COL2=20 and COL3=30
It is recommended that the WHERE selection in the sparse index is kept as simple as possible. The more
complex the WHERE selection, the more difficult it becomes to match the sparse index WHERE selection
to the query WHERE selection. Then it is less likely that the sparse index is used. The CQE optimizer
does not support sparse indexes. It does support select/omit logical files however. The SQE optimizer
matches the CQE optimizer in its support for select/omit logical files and has nearly full support for
sparse indexes.
Related reference
Sparse indexes on page 195
You can use the SQL CREATE INDEX statement to create a sparse index using SQL selection predicates.
Related information
SQL Create Index statement
Using indexes with sort sequence
The following sections provide useful information about how indexes work with sort sequence tables.
Using indexes and sort sequence with selection, joins, or grouping
Before using an existing index, DB2 for i ensures the attributes of the columns (selection, join, or
grouping columns) match the attributes of the key columns in the existing index. The sort sequence table
is an additional attribute that must be compared.
The query sort sequence table (specified by the SRTSEQ and LANGID) must match the index sort
sequence table. DB2 for i compares the sort sequence tables. If they do not match, the existing index
cannot be used.
There is an exception to this rule, however. If the sort sequence table associated with the query is a
unique-weight sequence table (including *HEX), DB2 for i acts as though no sort sequence table is
specified for selection, join, or grouping columns that use the following operators and predicates:
v equal (=) operator
v not equal (^= or <>) operator
v LIKE predicate (OPNQRYF %WLDCRD and *CT)
v IN predicate (OPNQRYF %VALUES)
Performance and query optimization 221
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
When these conditions are true, DB2 for i is free to use any existing index where the key columns match
the columns and either:
v The index does not contain a sort sequence table or
v The index contains a unique-weight sort sequence table
Note:
1. The table does not need to match the unique-weight sort sequence table associated with the
query.
2. Bitmap processing has a special consideration when multiple indexes are used for a table. If
two or more indexes have a common key column referenced in the query selection, then those
indexes must either use the same sort sequence table or no sort sequence table.
Using indexes and sort sequence with ordering
Unless the optimizer chooses a sort to satisfy the ordering request, the index sort sequence table must
match the query sort sequence table.
When a sort is used, the translation is done during the sort. Since the sort is handling the sort sequence
requirement, this technique allows DB2 for i to use any existing index that meets the selection criteria.
Index examples
The following index examples are provided to help you create effective indexes.
For the purposes of the examples, assume that three indexes are created.
Assume that an index HEXIX was created with *HEX as the sort sequence.
CREATE INDEX HEXIX ON STAFF (JOB)
Assume that an index UNQIX was created with a unique-weight sort sequence.
CREATE INDEX UNQIX ON STAFF (JOB)
Assume that an index SHRIX was created with a shared-weight sort sequence.
CREATE INDEX SHRIX ON STAFF (JOB)
Index example: Equal selection with no sort sequence table
Equal selection with no sort sequence table (SRTSEQ(*HEX)).
SELECT * FROM STAFF
WHERE JOB = 'MGR'
When using the OPNQRYF command, specify:
OPNQRYF FILE((STAFF))
QRYSLT('JOB *EQ ''MGR''')
SRTSEQ(*HEX)
The system can use either index HEXIX or index UNQIX.
Index example: Equal selection with a unique-weight sort sequence table
Equal selection with a unique-weight sort sequence table (SRTSEQ(*LANGIDUNQ) LANGID(ENU)).
SELECT * FROM STAFF
WHERE JOB = 'MGR'
When using the OPNQRYF command, specify:
OPNQRYF FILE((STAFF))
QRYSLT('JOB *EQ ''MGR''')
SRTSEQ(*LANGIDUNQ) LANGID(ENU)
222 IBM i: Database Performance and Query Optimization
The system can use either index HEXIX or index UNQIX.
Index example: Equal selection with a shared-weight sort sequence table
Equal selection with a shared-weight sort sequence table (SRTSEQ(*LANGIDSHR) LANGID(ENU)).
SELECT * FROM STAFF
WHERE JOB = 'MGR'
When using the OPNQRYF command, specify:
OPNQRYF FILE((STAFF))
QRYSLT('JOB *EQ ''MGR''')
SRTSEQ(*LANGIDSHR) LANGID(ENU)
The system can only use index SHRIX.
Index example: Greater than selection with a unique-weight sort sequence table
Greater than selection with a unique-weight sort sequence table (SRTSEQ(*LANGIDUNQ)
LANGID(ENU)).
SELECT * FROM STAFF
WHERE JOB > 'MGR'
When using the OPNQRYF command, specify:
OPNQRYF FILE((STAFF))
QRYSLT('JOB *GT ''MGR''')
SRTSEQ(*LANGIDUNQ) LANGID(ENU)
The system can only use index UNQIX.
Index example: Join selection with a unique-weight sort sequence table
Join selection with a unique-weight sort sequence table (SRTSEQ(*LANGIDUNQ) LANGID(ENU)).
SELECT * FROM STAFF S1, STAFF S2
WHERE S1.JOB = S2.JOB
or the same query using the JOIN syntax.
SELECT *
FROM STAFF S1 INNER JOIN STAFF S2
ON S1.JOB = S2.JOB
When using the OPNQRYF command, specify:
OPNQRYF FILE(STAFF STAFF)
FORMAT(FORMAT1)
JFLD((1/JOB 2/JOB *EQ))
SRTSEQ(*LANGIDUNQ) LANGID(ENU)
The system can use either index HEXIX or index UNQIX for either query.
Index example: Join selection with a shared-weight sort sequence table
Join selection with a shared-weight sort sequence table (SRTSEQ(*LANGIDSHR) LANGID(ENU)).
SELECT * FROM STAFF S1, STAFF S2
WHERE S1.JOB = S2.JOB
or the same query using the JOIN syntax.
SELECT *
FROM STAFF S1 INNER JOIN STAFF S2
ON S1.JOB = S2.JOB
When using the OPNQRYF command, specify:
Performance and query optimization 223
OPNQRYF FILE(STAFF STAFF) FORMAT(FORMAT1)
JFLD((1/JOB 2/JOB *EQ))
SRTSEQ(*LANGIDSHR) LANGID(ENU)
The system can only use index SHRIX for either query.
Index example: Ordering with no sort sequence table
Ordering with no sort sequence table (SRTSEQ(*HEX)).
SELECT * FROM STAFF
WHERE JOB = 'MGR'
ORDER BY JOB
When using the OPNQRYF command, specify:
OPNQRYF FILE((STAFF))
QRYSLT('JOB *EQ ''MGR''')
KEYFLD(JOB)
SRTSEQ(*HEX)
The system can only use index HEXIX.
Index example: Ordering with a unique-weight sort sequence table
Ordering with a unique-weight sort sequence table (SRTSEQ(*LANGIDUNQ) LANGID(ENU)).
SELECT * FROM STAFF
WHERE JOB = 'MGR'
ORDER BY JOB
When using the OPNQRYF command, specify:
OPNQRYF FILE((STAFF))
QRYSLT('JOB *EQ ''MGR''')
KEYFLD(JOB) SRTSEQ(*LANGIDUNQ) LANGID(ENU)
The system can only use index UNQIX.
Index example: Ordering with a shared-weight sort sequence table
Ordering with a shared-weight sort sequence table (SRTSEQ(*LANGIDSHR) LANGID(ENU)).
SELECT * FROM STAFF
WHERE JOB = 'MGR'
ORDER BY JOB
When using the OPNQRYF command, specify:
OPNQRYF FILE((STAFF))
QRYSLT('JOB *EQ ''MGR''')
KEYFLD(JOB) SRTSEQ(*LANGIDSHR) LANGID(ENU)
The system can only use index SHRIX.
Index example: Ordering with ALWCPYDTA(*OPTIMIZE) and a unique-weight sort
sequence table
Ordering with ALWCPYDTA(*OPTIMIZE) and a unique-weight sort sequence table
(SRTSEQ(*LANGIDUNQ) LANGID(ENU)).
SELECT * FROM STAFF
WHERE JOB = 'MGR'
ORDER BY JOB
When using the OPNQRYF command, specify:
224 IBM i: Database Performance and Query Optimization
OPNQRYF FILE((STAFF))
QRYSLT('JOB *EQ ''MGR''')
KEYFLD(JOB)
SRTSEQ(*LANGIDUNQ) LANGID(ENU)
ALWCPYDTA(*OPTIMIZE)
The system can use either index HEXIX or index UNQIX for selection. Ordering is done during the sort
using the *LANGIDUNQ sort sequence table.
Index example: Grouping with no sort sequence table
Grouping with no sort sequence table (SRTSEQ(*HEX)).
SELECT JOB FROM STAFF
GROUP BY JOB
When using the OPNQRYF command, specify:
OPNQRYF FILE((STAFF)) FORMAT(FORMAT2)
GRPFLD((JOB))
SRTSEQ(*HEX)
The system can use either index HEXIX or index UNQIX.
Index example: Grouping with a unique-weight sort sequence table
Grouping with a unique-weight sort sequence table (SRTSEQ(*LANGIDUNQ) LANGID(ENU)).
SELECT JOB FROM STAFF
GROUP BY JOB
When using the OPNQRYF command, specify:
OPNQRYF FILE((STAFF)) FORMAT(FORMAT2)
GRPFLD((JOB))
SRTSEQ(*LANGIDUNQ) LANGID(ENU)
The system can use either index HEXIX or index UNQIX.
Index example: Grouping with a shared-weight sort sequence table
Grouping with a shared-weight sort sequence table (SRTSEQ(*LANGIDSHR) LANGID(ENU)).
SELECT JOB FROM STAFF
GROUP BY JOB
When using the OPNQRYF command, specify:
OPNQRYF FILE((STAFF)) FORMAT(FORMAT2)
GRPFLD((JOB))
SRTSEQ(*LANGIDSHR) LANGID(ENU)
The system can only use index SHRIX.
The following examples assume that three more indexes are created over columns JOB and SALARY. The
CREATE INDEX statements precede the examples.
Assume an index HEXIX2 was created with *HEX as the sort sequence.
CREATE INDEX HEXIX2 ON STAFF (JOB, SALARY)
Assume that an index UNQIX2 was created and the sort sequence is a unique-weight sort sequence.
CREATE INDEX UNQIX2 ON STAFF (JOB, SALARY)
Assume an index SHRIX2 was created with a shared-weight sort sequence.
CREATE INDEX SHRIX2 ON STAFF (JOB, SALARY)
Performance and query optimization 225
Index example: Ordering and grouping on the same columns with a unique-weight
sort sequence table
Ordering and grouping on the same columns with a unique-weight sort sequence table
(SRTSEQ(*LANGIDUNQ) LANGID(ENU)).
SELECT JOB, SALARY FROM STAFF
GROUP BY JOB, SALARY
ORDER BY JOB, SALARY
When using the OPNQRYF command, specify:
OPNQRYF FILE((STAFF)) FORMAT(FORMAT3)
GRPFLD(JOB SALARY)
KEYFLD(JOB SALARY)
SRTSEQ(*LANGIDUNQ) LANGID(ENU)
The system can use UNQIX2 to satisfy both the grouping and ordering requirements. If index UNQIX2
did not exist, the system creates an index using a sort sequence table of *LANGIDUNQ.
Index example: Ordering and grouping on the same columns with
ALWCPYDTA(*OPTIMIZE) and a unique-weight sort sequence table
Ordering and grouping on the same columns with ALWCPYDTA(*OPTIMIZE) and a unique-weight sort
sequence table (SRTSEQ(*LANGIDUNQ) LANGID(ENU)).
SELECT JOB, SALARY FROM STAFF
GROUP BY JOB, SALARY
ORDER BY JOB, SALARY
When using the OPNQRYF command, specify:
OPNQRYF FILE((STAFF)) FORMAT(FORMAT3)
GRPFLD(JOB SALARY)
KEYFLD(JOB SALARY)
SRTSEQ(*LANGIDUNQ) LANGID(ENU)
ALWCPYDTA(*OPTIMIZE)
The system can use UNQIX2 to satisfy both the grouping and ordering requirements. If index UNQIX2
did not exist, the system does one of the following actions:
v Create an index using a sort sequence table of *LANGIDUNQ or
v Use index HEXIX2 to satisfy the grouping and to perform a sort to satisfy the ordering
Index example: Ordering and grouping on the same columns with a shared-weight
sort sequence table
Ordering and grouping on the same columns with a shared-weight sort sequence table
(SRTSEQ(*LANGIDSHR) LANGID(ENU)).
SELECT JOB, SALARY FROM STAFF
GROUP BY JOB, SALARY
ORDER BY JOB, SALARY
When using the OPNQRYF command, specify:
OPNQRYF FILE((STAFF)) FORMAT(FORMAT3)
GRPFLD(JOB SALARY)
KEYFLD(JOB SALARY)
SRTSEQ(*LANGIDSHR) LANGID(ENU)
The system can use SHRIX2 to satisfy both the grouping and ordering requirements. If index SHRIX2 did
not exist, the system creates an index using a sort sequence table of *LANGIDSHR.
226 IBM i: Database Performance and Query Optimization
Index example: Ordering and grouping on the same columns with
ALWCPYDTA(*OPTIMIZE) and a shared-weight sort sequence table
Ordering and grouping on the same columns with ALWCPYDTA(*OPTIMIZE) and a shared-weight sort
sequence table (SRTSEQ(*LANGIDSHR) LANGID(ENU).
SELECT JOB, SALARY FROM STAFF
GROUP BY JOB, SALARY
ORDER BY JOB, SALARY
When using the OPNQRYF command, specify:
OPNQRYF FILE((STAFF)) FORMAT(FORMAT3)
GRPFLD(JOB SALARY)
KEYFLD(JOB SALARY)
SRTSEQ(*LANGIDSHR) LANGID(ENU)
ALWCPYDTA(*OPTIMIZE)
The system can use SHRIX2 to satisfy both the grouping and ordering requirements. If index SHRIX2 did
not exist, the system creates an index using a sort sequence table of *LANGIDSHR.
Index example: Ordering and grouping on different columns with a unique-weight
sort sequence table
Ordering and grouping on different columns with a unique-weight sort sequence table
(SRTSEQ(*LANGIDUNQ) LANGID(ENU)).
SELECT JOB, SALARY FROM STAFF
GROUP BY JOB, SALARY
ORDER BY SALARY, JOB
When using the OPNQRYF command, specify:
OPNQRYF FILE((STAFF)) FORMAT(FORMAT3)
GRPFLD(JOB SALARY)
KEYFLD(SALARY JOB)
SRTSEQ(*LANGIDSHR) LANGID(ENU)
The system can use index HEXIX2 or index UNQIX2 to satisfy the grouping requirements. A temporary
result is created containing the grouping results. A temporary index is then built over the temporary
result using a *LANGIDUNQ sort sequence table to satisfy the ordering requirements.
Index example: Ordering and grouping on different columns with
ALWCPYDTA(*OPTIMIZE) and a unique-weight sort sequence table
Ordering and grouping on different columns with ALWCPYDTA(*OPTIMIZE) and a unique-weight sort
sequence table (SRTSEQ(*LANGIDUNQ) LANGID(ENU)).
SELECT JOB, SALARY FROM STAFF
GROUP BY JOB, SALARY
ORDER BY SALARY, JOB
When using the OPNQRYF command, specify:
OPNQRYF FILE((STAFF)) FORMAT(FORMAT3)
GRPFLD(JOB SALARY)
KEYFLD(SALARY JOB)
SRTSEQ(*LANGIDUNQ) LANGID(ENU)
ALWCPYDTA(*OPTIMIZE)
The system can use index HEXIX2 or index UNQIX2 to satisfy the grouping requirements. A sort is
performed to satisfy the ordering requirements.
Index example: Ordering and grouping on different columns with
ALWCPYDTA(*OPTIMIZE) and a shared-weight sort sequence table
Ordering and grouping on different columns with ALWCPYDTA(*OPTIMIZE) and a shared-weight sort
sequence table (SRTSEQ(*LANGIDSHR) LANGID(ENU)).
Performance and query optimization 227
SELECT JOB, SALARY FROM STAFF
GROUP BY JOB, SALARY
ORDER BY SALARY, JOB
When using the OPNQRYF command, specify:
OPNQRYF FILE((STAFF)) FORMAT(FORMAT3)
GRPFLD(JOB SALARY)
KEYFLD(SALARY JOB)
SRTSEQ(*LANGIDSHR) LANGID(ENU)
ALWCPYDTA(*OPTIMIZE)
The system can use index SHRIX2 to satisfy the grouping requirements. A sort is performed to satisfy the
ordering requirements.
Sparse index examples
This topic shows examples of how the sparse index matching algorithm works.
In example S1, the query selection is a subset of the sparse index selection and consequently an index
scan over the sparse index is used. The remaining query selection (COL3=30) is executed following the
index scan.
Example S1
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 and COL2=20
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 and COL2=20 and COL3=30
In example S2, the query selection is not a subset of the sparse index selection and the sparse index
cannot be used.
Example S2
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 and COL2=20 and COL3=30
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 and COL2=20
In example S3, the query selection exactly matches the sparse index selection and an index scan over the
sparse index can be used.
Example S3
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 and COL2=20 and COL3=30
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 and COL2=20 and COL3=30
In example S4, the query selection is a subset of the sparse index selection and an index scan over the
sparse index can be used. The remaining query selection (COL3=30) is executed following the index scan.
Example S4
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 and COL2=20
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 and COL2=20 and COL3=30
228 IBM i: Database Performance and Query Optimization
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In example S5, the query selection is not a subset of the sparse index selection and the sparse index
cannot be used.
Example S5
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 and COL2=20 and COL3=30
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 and COL2=20
In example S6, the query selection exactly matches the sparse index selection and an index scan over the
sparse index can be used. The query selection is executed following the index scan to eliminate excess
records from the sparse index.
Example S6
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 or COL2=20 or COL3=30
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 or COL2=20 or COL3=30
In example S7, the query selection is a subset of the sparse index selection and an index scan over the
sparse index can be used. The query selection is executed following the index scan to eliminate excess
records from the sparse index.
Example S7
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 or COL2=20 or COL3=30
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 or COL2=20
In example S8, the query selection is not a subset of the sparse index selection and the sparse index
cannot be used.
Example S8
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 or COL2=20
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 or COL2=20 or COL3=30
In the next example S9, the constant MN was replaced by a parameter marker for the query selection.
The sparse index had the local selection of COL1=MN applied to it when it was created. The sparse
index matching algorithm matches the parameter marker to the constant MN in the query predicate
COL1 =?. It verifies that the value of the parameter marker is the same as the constant in the sparse
index; therefore the sparse index can be used.
Example S9
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1='MN' or COL2='TWINS'
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
Where Col3='WIN' and (Col1=? or Col2='TWINS')
Performance and query optimization 229
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In the next example S10, the keys of the sparse index match the order by fields in the query. For the
sparse index to satisfy the specified ordering, the optimizer must verify that the query selection is a
subset of the sparse index selection. In this example, the sparse index can be used.
Example S10
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL1, COL3)
WHERE COL1='MN' or COL2='TWINS'
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
Where Col3='WIN' and (Col1='MN' or Col2='TWINS')
ORDER BY COL1, COL3
In the next example S11, the keys of the sparse index do not match the order by fields in the query. But
the selection in sparse index T2 is a superset of the query selection. Depending on size, the optimizer
might choose an index scan over sparse index T2 and then use a sort to satisfy the specified ordering.
Example S11
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL2, COL4)
WHERE COL1='MN' or COL2='TWINS'
SELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
Where Col3='WIN' and (Col1='MN' or Col2='TWINS')
ORDER BY COL1, COL3
The next example S12 represents the classic optimizer decision: is it better to do an index probe using
index IX1 or is it better to do an index scan using sparse index SPR1? Both indexes retrieve the same
number of index entries and have the same cost from that point forward. For example, both indexes have
the same cost to retrieve the selected records from the dataspace, based on the retrieved entries/keys.
The cost to retrieve the index entries is the deciding criteria. In general, if index IX1 is large then an
index scan over sparse index SPR1 has a lower cost to retrieve the index entries. If index IX1 is rather
small then an index probe over index IX1 has a lower cost to retrieve the index entries. Another cost
decision is reusability. The plan using sparse index SPR1 is not as reusable as the plan using index IX1
because of the static selection built into the sparse selection.
Example S12
CREATE INDEX MYLIB/IX1 on MYLIB/T1 (COL1, COL2, COL3)
CREATE INDEX MYLIB/SPR1 on MYLIB/T1 (COL3)
WHERE COL1=10 and COL2=20 and COL3=30
CSELECT COL1, COL2, COL3, COL4
FROM MYLIB/T1
WHERE COL1=10 and COL2=20 and COL3=30
Application design tips for database performance
There are some design tips that you can apply when designing SQL applications to maximize your
database performance.
Using live data
The term live data refers to the type of access that the database manager uses when it retrieves data
without making a copy of the data. Using this type of access, the data, which is returned to the program,
always reflects the current values of the data in the database. The programmer can control whether the
database manager uses a copy of the data or retrieves the data directly. This control is done by specifying
the allow copy data (ALWCPYDTA) parameter on the precompiler commands or the Start SQL (STRSQL)
command.
230 IBM i: Database Performance and Query Optimization
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Specifying ALWCPYDTA(*NO) instructs the database manager to always use live data. In most cases,
forcing live data access is a detriment to performance. It severely limits the possible plan choices that the
optimizer could use to implement the query. Avoid it in most cases. However, in specialized cases
involving a simple query, live data access can be used as a performance advantage. The cursor does not
need to be closed and opened again to refresh the data being retrieved.
An example application demonstrating this advantage is one that produces a list on a display. If the
display can show only 20 list elements at a time, then, after the initial 20 elements are displayed, the
programmer can request that the next 20 rows be displayed. A typical SQL application designed for an
operating system other than the i5/OS operating system, might be structured as follows:
EXEC SQL
DECLARE C1 CURSOR FOR
SELECT EMPNO, LASTNAME, WORKDEPT
FROM CORPDATA.EMPLOYEE
ORDER BY EMPNO
END-EXEC.
EXEC SQL
OPEN C1
END-EXEC.
* PERFORM FETCH-C1-PARA 20 TIMES.
MOVE EMPNO to LAST-EMPNO.
EXEC SQL
CLOSE C1
END-EXEC.
* Show the display and wait for the user to indicate that
* the next 20 rows should be displayed.
EXEC SQL
DECLARE C2 CURSOR FOR
SELECT EMPNO, LASTNAME, WORKDEPT
FROM CORPDATA.EMPLOYEE
WHERE EMPNO > :LAST-EMPNO
ORDER BY EMPNO
END-EXEC.
EXEC SQL
OPEN C2
END-EXEC.
* PERFORM FETCH-C21-PARA 20 TIMES.
* Show the display with these 20 rows of data.
EXEC SQL
CLOSE C2
END-EXEC.
In the preceding example, notice that an additional cursor had to be opened to continue the list and to
get current data. This technique can result in creating an additional ODP that increases the processing
time on the system. In place of the preceding example, the programmer can design the application
specifying ALWCPYDTA(*NO) with the following SQL statements:
EXEC SQL
DECLARE C1 CURSOR FOR
SELECT EMPNO, LASTNAME, WORKDEPT
FROM CORPDATA.EMPLOYEE
ORDER BY EMPNO
END-EXEC.
Performance and query optimization 231
EXEC SQL
OPEN C1
END-EXEC.
* Display the screen with these 20 rows of data.
* PERFORM FETCH-C1-PARA 20 TIMES.
* Show the display and wait for the user to indicate that
* the next 20 rows should be displayed.
* PERFORM FETCH-C1-PARA 20 TIMES.
EXEC SQL
CLOSE C1
END-EXEC.
In the preceding example, the query might perform better if the FOR 20 ROWS clause was used on the
multiple-row FETCH statement. Then, the 20 rows are retrieved in one operation.
Related information
Start SQL Interactive Session (STRSQL) command
Reducing the number of open operations
The SQL data manipulation language statements must do database open operations in order to create an
open data path (ODP) to the data. An open data path is the path through which all input/output
operations for the table are performed. In a sense, it connects the SQL application to a table. The number
of open operations in a program can significantly affect performance.
A database open operation occurs on:
v An OPEN statement
v SELECT INTO statement
v An INSERT statement with a VALUES clause
v An UPDATE statement with a WHERE condition
v An UPDATE statement with a WHERE CURRENT OF cursor and SET clauses that refer to operators or
functions
v SET statement that contains an expression
v VALUES INTO statement that contains an expression
v A DELETE statement with a WHERE condition
An INSERT statement with a select-statement requires two open operations. Certain forms of subqueries
could also require one open per subselect.
To minimize the number of opens, DB2 for i leaves the open data path (ODP) open and reuses the ODP if
the statement is run again, unless:
v The ODP used a host variable to build a subset temporary index. The optimizer could choose to build
a temporary index with entries for only the rows that match the row selection specified in the SQL
statement. If a host variable was used in the row selection, the temporary index does not have the
entries required for a different host variable value.
v Ordering was specified on a host variable value.
v An Override Database File (OVRDBF) or Delete Override (DLTOVR) CL command has been issued
since the ODP was opened, which affects the SQL statement execution.
Note: Only overrides that affect the name of the table being referred to causes the ODP to be closed
within a given program invocation.
v The join is a complex join that requires temporary objects to contain the intermediate steps of the join.
232 IBM i: Database Performance and Query Optimization
v Some cases involve a complex sort, where a temporary file is required, might not be reusable.
v A change to the library list since the last open has occurred, which changes the table selected by an
unqualified referral in system naming mode.
v The join was implemented by the CQE optimizer using hash join.
For embedded static SQL, DB2 for i only reuses ODPs opened by the same statement. An identical
statement coded later in the program does not reuse an ODP from any other statement. If the identical
statement must be run in the program many times, code it once in a subroutine and call the subroutine to
run the statement.
The ODPs opened by DB2 for i are closed when any of the following occurs:
v a CLOSE, INSERT, UPDATE, DELETE, or SELECT INTO statement completes and the ODP required a
temporary result that was not reusable or a subset temporary index.
v the Reclaim Resources (RCLRSC) command is issued. A Reclaim Resources (RCLRSC) is issued when
the first COBOL program on the call stack ends or when a COBOL program issues the STOP RUN
COBOL statement. Reclaim Resources (RCLRSC) does not close the ODPs created for programs
precompiled using CLOSQLCSR(*ENDJOB). For interaction of Reclaim Resources (RCLRSC) with
non-default activation groups, see the following books:
WebSphere
(ILE)
programs, causes all SQL resources to only be accessible by the same invocation of a program.
Once an *ENDPGM program has completed, if it is called again, the SQL resources are no longer
active.
-- The *ENDMOD option causes all SQL resources to only be accessible by the same invocation of
the module.
-- The *ENDACTGRP option, which is the default for ILE modules, will allow the user to keep
the SQL resources active for the duration of the activation group.
SQL7911 - ODP reused
Message Text: ODP reused.
Cause Text: An ODP that was previously created has been reused. There was a reusable Open Data Path
(ODP) found for this SQL statement, and it has been used. The reusable ODP may have been
from the same call to a program or a previous call to the program. A reuse of an ODP will not
generate an OPEN entry in the journal.
Recovery Text: None
SQL7912 - ODP created
Message Text: ODP created.
Cause Text: An Open Data Path (ODP) has been created. No reusable ODP could be found. This occurs in the
following cases:
-- This is the first time the statement has been run.
-- A RCLRSC has been issued since the last run of this statement.
-- The last run of the statement caused the ODP to be deleted.
-- If this is an OPEN statement, the last CLOSE of this cursor caused the ODP to be deleted.
-- The Application Server (AS) has been changed by a CONNECT statement.
Recovery Text: If a cursor is being opened many times in an application, it is more efficient to use a reusable
ODP, and not create an ODP every time. This also applies to repeated runs of INSERT, UPDATE,
DELETE, and SELECT INTO statements. If ODPs are being created on every open, see the close
message to determine why the ODP is being deleted.
Performance and query optimization 375
The first time that the statement is run or the cursor is opened for a process, an ODP must always be
created. However, if this message appears on every statement run or cursor open, use the tips
recommended in Retaining cursor positions for non-ILE program calls on page 235 in your application.
SQL7913 - ODP deleted
Message Text: ODP deleted.
Cause Text: The Open Data Path (ODP) for this statement or cursor has been deleted. The ODP was not
reusable. This could be caused by using a host variable in a LIKE clause, ordering on a host
variable, or because the query optimizer chose to accomplish the query with an ODP that was not
reusable.
Recovery Text: See previous query optimizer messages to determine how the cursor was opened.
SQL7914 - ODP not deleted
Message Text: ODP not deleted.
Cause Text: The Open Data Path (ODP) for this statement or cursor has not been deleted. This ODP can be
reused on a subsequent run of the statement. This will not generate an entry in the journal.
Recovery Text: None
SQL7915 - Access plan for SQL statement has been built
Message Text: Access plan for SQL statement has been built.
Cause Text: SQL had to build the access plan for this statement at run time. This occurs in the following
cases:
-- The program has been restored from a different release and this is the first time this statement
has been run.
-- All the files required for the statement did not exist at precompile time, and this is the first
time this statement has been run.
-- The program was precompiled using SQL naming mode, and the program owner has changed
since the last time the program was called.
Recovery Text: This is normal processing for SQL. Once the access plan is built, it will be used on subsequent
runs of the statement.
SQL7916 - Blocking used for query
Message Text: Blocking used for query.
Cause Text: Blocking has been used in the implementation of this query. SQL will retrieve a block of records
from the database manager on the first FETCH statement. Additional FETCH statements have to
be issued by the calling program, but they do not require SQL to request more records, and
therefore will run faster.
Recovery Text: SQL attempts to utilize blocking whenever possible. In cases where the cursor is not update
capable, and commitment control is not active, there is a possibility that blocking will be used.
SQL7917 - Access plan not updated
Message Text: Access plan not updated.
Cause Text: The query optimizer rebuilt the access plan for this statement, but the program could not be
updated. Another job may be running the program. The program cannot be updated with the
new access plan until a job can obtain an exclusive lock on the program. The exclusive lock
cannot be obtained if another job is running the program, if the job does not have proper
authority to the program, or if the program is currently being saved. The query will still run, but
access plan rebuilds will continue to occur until the program is updated.
376 IBM i: Database Performance and Query Optimization
SQL7917 - Access plan not updated
Recovery Text: See previous messages from the query optimizer to determine why the access plan has been
rebuilt. To ensure that the program gets updated with the new access plan, run the program
when no other active jobs are using it.
SQL7918 - Reusable ODP deleted
Message Text: Reusable ODP deleted. Reason code &1.
Cause Text: An existing Open Data Path (ODP) was found for this statement, but it could not be reused for
reason &1. The statement now refers to different files or uses different override options than are
in the ODP. Reason codes and their meanings are:
1 -- Commitment control isolation level is not compatible.
2 -- The statement contains SQL special register USER, CURRENT DEBUG MODE, CURRENT
DECFLOAT ROUNDING MODE, or CURRENT TIMEZONE, and the value for one of these
registers has changed.
3 -- The PATH used to locate an SQL function has changed.
4 -- The job default CCSID has changed.
5 -- The library list has changed, such that a file is found in a different library. This only affects
statements with unqualified table names, when the table exists in multiple libraries.
6 -- The file, library, or member for the original ODP was changed with an override.
7 -- An OVRDBF or DLTOVR command has been issued. A file referred to in the statement now
refers to a different file, library, or member.
8 -- An OVRDBF or DLTOVR command has been issued, causing different override options, such
as different SEQONLY or WAITRCD values.
9 -- An error occurred when attempting to verify the statement override information is
compatible with the reusable ODP information.
10 -- The query optimizer has determined the ODP cannot be reused.
11 -- The client application requested not to reuse ODPs.
Recovery Text: Do not change the library list, the override environment, or the values of the special registers if
reusable ODPs are to be used.
SQL7919 - Data conversion required on FETCH or embedded SELECT
Message Text: Data conversion required on FETCH or embedded SELECT.
Performance and query optimization 377
SQL7919 - Data conversion required on FETCH or embedded SELECT
Cause Text: Host variable &2 requires conversion. The data retrieved for the FETCH or embedded SELECT
statement can not be directly moved to the host variables. The statement ran correctly.
Performance, however, would be improved if no data conversion was required. The host variable
requires conversion for reason &1.
-- Reason 1 - host variable &2 is a character or graphic string of a different length than the value
being retrieved.
-- Reason 2 - host variable &2 is a numeric type that is different than the type of the value being
retrieved.
-- Reason 3 - host variable &2 is a C character or C graphic string that is NUL-terminated, the
program was compiled with option *CNULRQD specified, and the statement is a multiple-row
FETCH.
-- Reason 4 - host variable &2 is a variable length string and the value being retrieved is not.
-- Reason 5 - host variable &2 is not a variable length string and the value being retrieved is.
-- Reason 6 - host variable &2 is a variable length string whose maximum length is different than
the maximum length of the variable length value being retrieved.
-- Reason 7 - a data conversion was required on the mapping of the value being retrieved to host
variable &2, such as a CCSID conversion.
-- Reason 8 - a DRDA connection was used to get the value being retrieved into host variable &2.
The value being retrieved is either null capable or varying-length, is contained in a partial row, or
is a derived expression.
-- Reason 10 - the length of host variable &2 is too short to hold a TIME or TIMESTAMP value
being retrieved.
-- Reason 11 - host variable &2 is of type DATE, TIME or TIMESTAMP, and the value being
retrieved is a character string.
-- Reason 12 - too many host variables were specified and records are blocked. Host variable &2
does not have a corresponding column returned from the query.
-- Reason 13 - a DRDA connection was used for a blocked FETCH and the number of host
variables specified in the INTO clause is less than the number of result values in the select list.
-- Reason 14 - a LOB Locator was used and the commitment control level of the process was not
*ALL.
Recovery Text: To get better performance, attempt to use host variables of the same type and length as their
corresponding result columns.
SQL7939 - Data conversion required on INSERT or UPDATE
Message Text: Data conversion required on INSERT or UPDATE.
378 IBM i: Database Performance and Query Optimization
SQL7939 - Data conversion required on INSERT or UPDATE
Cause Text: The INSERT or UPDATE values can not be directly moved to the columns because the data type
or length of a value is different than one of the columns. The INSERT or UPDATE statement ran
correctly. Performance, however, would be improved if no data conversion was required. The
reason data conversion is required is &1.
-- Reason 1 is that the INSERT or UPDATE value is a character or graphic string of a different
length than column &2.
-- Reason 2 is that the INSERT or UPDATE value is a numeric type that is different than the type
of column &2.
-- Reason 3 is that the INSERT or UPDATE value is a variable length string and column &2 is
not.
-- Reason 4 is that the INSERT or UPDATE value is not a variable length string and column &2
is.
-- Reason 5 is that the INSERT or UPDATE value is a variable length string whose maximum
length is different that the maximum length of column &2.
-- Reason 6 is that a data conversion was required on the mapping of the INSERT or UPDATE
value to column &2, such as a CCSID conversion.
-- Reason 7 is that the INSERT or UPDATE value is a character string and column &2 is of type
DATE, TIME, or TIMESTAMP.
-- Reason 8 is that the target table of the INSERT is not a SQL table.
Recovery Text: To get better performance, try to use values of the same type and length as their corresponding
columns.
PRTSQLINF message reference
The following messages are returned from PRTSQLINF.
SQL400A - Temporary distributed result file &1 was created to contain join result
Message Text: Temporary distributed result file &1 was created to contain join result. Result file was directed.
Cause Text: Query contains join criteria over a distributed file and a distributed join was performed in
parallel. A temporary distributed result file was created to contain the results of the distributed
join.
Recovery Text: For more information about processing of distributed files, refer to the Distributed database
programming topic collection.
SQL400B - Temporary distributed result file &1 was created to contain join result
Message Text: Temporary distributed result file &1 was created to contain join result. Result file was broadcast.
Cause Text: Query contains join criteria over a distributed file and a distributed join was performed in
parallel. A temporary distributed result file was created to contain the results of the distributed
join.
Recovery Text: For more information about processing of distributed files, refer to the Distributed database
programming topic collection.
SQL400C - Optimizer debug messages for distributed query step &1 and &2 follow
Message Text: Optimizer debug messages for distributed query step &1 of &2 follow:
Performance and query optimization 379
SQL400C - Optimizer debug messages for distributed query step &1 and &2 follow
Cause Text: A distributed file was specified in the query which caused the query to be processed in multiple
steps. The optimizer debug messages that follow provide the query optimization information
about the current step.
Recovery Text: For more information about processing of distributed files, refer to the Distributed database
programming topic collection.
SQL400D - GROUP BY processing generated
Message Text: GROUP BY processing generated.
Cause Text: GROUP BY processing was added to the query step. Adding the GROUP BY reduced the number
of result rows which should, in turn, improve the performance of subsequent steps.
Recovery Text: For more information refer to the SQL programming topic collection.
SQL400E - Temporary distributed result file &1 was created while processing distributed subquery
Message Text: Temporary distributed result file &1 was created while processing distributed subquery.
Cause Text: A temporary distributed result file was created to contain the intermediate results of the query.
The query contains a subquery which requires an intermediate result.
Recovery Text: Generally, if the fields correlated between the query and subquery do not match the partition
keys of the respective files, the query must be processed in multiple steps and a temporary
distributed file will be built to contain the intermediate results. For more information about
processing of distributed files, refer to the Distributed database programming topic collection.
SQL4001 - Temporary result created
Message Text: Temporary result created.
Cause Text: Conditions exist in the query which cause a temporary result to be created. One of the following
reasons may be the cause for the temporary result:
-- The table is a join logical file and its join type (JDFTVAL) does not match the join-type
specified in the query.
-- The format specified for the logical file refers to more than one physical table.
-- The table is a complex SQL view requiring a temporary table to contain the results of the SQL
view.
-- The query contains grouping columns (GROUP BY) from more than one table, or contains
grouping columns from a secondary table of a join query that cannot be reordered.
Recovery Text: Performance may be improved if the query can be changed to avoid temporary results.
SQL4002 - Reusable ODP sort used
Message Text: Reusable ODP sort used.
380 IBM i: Database Performance and Query Optimization
SQL4002 - Reusable ODP sort used
Cause Text: Conditions exist in the query which cause a sort to be used. This allowed the open data path
(ODP) to be reusable. One of the following reasons may be the cause for the sort:
-- The query contains ordering columns (ORDER BY) from more than one table, or contains
ordering columns from a secondary table of a join query that cannot be reordered.
-- The grouping and ordering columns are not compatible.
-- DISTINCT was specified for the query.
-- UNION was specified for the query.
-- The query had to be implemented using a sort. Key length of more than 2000 bytes, more than
120 ordering columns, or an ordering column containing a reference to an external user-defined
function was specified for ordering.
-- The query optimizer chose to use a sort rather than an index to order the results of the query.
Recovery Text: A reusable ODP generally results in improved performance when compared to a non-reusable
ODP.
SQL4003 - UNION
Message Text: UNION, EXCEPT, or INTERSECT.
Cause Text: A UNION, EXCEPT, or INTERSECT operator was specified in the query. The messages preceding
this keyword delimiter correspond to the subselect preceding the UNION, EXCEPT, or
INTERSECT operator. The messages following this keyword delimiter correspond to the subselect
following the UNION, EXCEPT, or INTERSECT operator.
Recovery Text: None
SQL4004 - SUBQUERY
Message Text: SUBQUERY.
Cause Text: The SQL statement contains a subquery. The messages preceding the SUBQUERY delimiter
correspond to the subselect containing the subquery. The messages following the SUBQUERY
delimiter correspond to the subquery.
Recovery Text: None
SQL4005 - Query optimizer timed out for table &1
Message Text: Query optimizer timed out for table &1.
Cause Text: The query optimizer timed out before it could consider all indexes built over the table. This is not
an error condition. The query optimizer may time out in order to minimize optimization time.
The query can be run in debug mode (STRDBG) to see the list of indexes which were considered
during optimization. The table number refers to the relative position of this table in the query.
Recovery Text: To ensure an index is considered for optimization, specify the logical file of the index as the table
to be queried. The optimizer will first consider the index of the logical file specified on the SQL
select statement. Note that SQL created indexes cannot be queried. An SQL index can be deleted
and recreated to increase the chances it will be considered during query optimization. Consider
deleting any indexes no longer needed.
SQL4006 - All indexes considered for table &1
Message Text: All indexes considered for table &1.
Performance and query optimization 381
SQL4006 - All indexes considered for table &1
Cause Text: The query optimizer considered all index built over the table when optimizing the query. The
query can be run in debug mode (STRDBG) to see the list of indexes which were considered
during optimization. The table number refers to the relative position of this table in the query.
Recovery Text: None
SQL4007 - Query implementation for join position &1 table &2
Message Text: Query implementation for join position &1 table &2.
Cause Text: The join position identifies the order in which the tables are joined. A join position of 1 indicates
this table is the first, or left-most, table in the join order. The table number refers to the relative
position of this table in the query.
Recovery Text: Join order can be influenced by adding an ORDER BY clause to the query. Refer to Join
optimization on page 54 for more information about join optimization and tips to influence join
order.
SQL4008 - Index &1 used for table &2
Message Text: Index &1 used for table &2.
Cause Text: The index was used to access rows from the table for one of the following reasons:
-- Row selection.
-- Join criteria.
-- Ordering/grouping criteria.
-- Row selection and ordering/grouping criteria.
The table number refers to the relative position of this table in the query.
The query can be run in debug mode (STRDBG) to determine the specific reason the index was
used.
Recovery Text: None
SQL4009 - Index created for table &1
Message Text: Index created for table &1.
Cause Text: A temporary index was built to access rows from the table for one of the following reasons:
-- Perform specified ordering/grouping criteria.
-- Perform specified join criteria.
The table number refers to the relative position of this table in the query.
Recovery Text: To improve performance, consider creating a permanent index if the query is run frequently. The
query can be run in debug mode (STRDBG) to determine the specific reason the index was
created and the key columns used when creating the index. NOTE: If permanent index is created,
it is possible the query optimizer may still choose to create a temporary index to access the rows
from the table.
SQL401A - Processing grouping criteria for query containing a distributed table
Message Text: Processing grouping criteria for query containing a distributed table.
382 IBM i: Database Performance and Query Optimization
SQL401A - Processing grouping criteria for query containing a distributed table
Cause Text: Grouping for queries that contain distributed tables can be implemented using either a one or
two step method. If the one step method is used, the grouping columns (GROUP BY) match the
partitioning keys of the distributed table. If the two step method is used, the grouping columns
do not match the partitioning keys of the distributed table or the query contains grouping criteria
but no grouping columns were specified. If the two step method is used, message SQL401B will
appear followed by another SQL401A message.
Recovery Text: For more information about processing of distributed tables, refer to the Distributed database
programming topic collection.
SQL401B - Temporary distributed result table &1 was created while processing grouping criteria
Message Text: Temporary distributed result table &1 was created while processing grouping criteria.
Cause Text: A temporary distributed result table was created to contain the intermediate results of the query.
Either the query contains grouping columns (GROUP BY) that do not match the partitioning keys
of the distributed table or the query contains grouping criteria but no grouping columns were
specified.
Recovery Text: For more information about processing of distributed tables, refer to the Distributed database
programming topic collection.
SQL401C - Performing distributed join for query
Message Text: Performing distributed join for query.
Cause Text: Query contains join criteria over a distributed table and a distributed join was performed in
parallel. See the following SQL401F messages to determine which tables were joined together.
Recovery Text: For more information about processing of distributed tables, refer to the Distributed database
programming topic collection.
SQL401D - Temporary distributed result table &1 was created because table &2 was directed
Message Text: Temporary distributed result table &1 was created because table &2 was directed.
Cause Text: Temporary distributed result table was created to contain the intermediate results of the query.
Data from a distributed table in the query was directed to other nodes.
Recovery Text: Generally, a table is directed when the join columns do not match the partitioning keys of the
distributed table. When a table is directed, the query is processed in multiple steps and processed
in parallel. A temporary distributed result file is required to contain the intermediate results for
each step. For more information about processing of distributed tables, refer to the Distributed
database programming topic collection.
SQL401E - Temporary distributed result table &1 was created because table &2 was broadcast
Message Text: Temporary distributed result table &1 was created because table &2 was broadcast.
Cause Text: Temporary distributed result table was created to contain the intermediate results of the query.
Data from a distributed table in the query was broadcast to all nodes.
Recovery Text: Generally, a table is broadcast when join columns do not match the partitioning keys of either
table being joined or the join operator is not an equal operator. When a table is broadcast the
query is processed in multiple steps and processed in parallel. A temporary distributed result
table is required to contain the intermediate results for each step. For more information about
processing of distributed tables, refer to the Distributed database programming topic collection.
SQL401F - Table &1 used in distributed join
Message Text: Table &1 used in distributed join.
Performance and query optimization 383
SQL401F - Table &1 used in distributed join
Cause Text: Query contains join criteria over a distributed table and a distributed join was performed in
parallel.
Recovery Text: For more information about processing of distributed tables, refer to the Distributed database
programming topic collection.
SQL4010 - Table scan access for table &1
Message Text: Table scan access for table &1.
Cause Text: Table scan access was used to select rows from the table. The table number refers to the relative
position of this table in the query.
Recovery Text: Table scan is generally a good performing option when selecting a high percentage of rows from
the table. The use of an index, however, may improve the performance of the query when
selecting a low percentage of rows from the table.
SQL4011 - Index scan-key row positioning used on table &1
Message Text: Index scan-key row positioning used on table &1.
Cause Text: Index scan-key row positioning is defined as applying selection against the index to position
directly to ranges of keys that match some or all of the selection criteria. Index scan-key row
positioning only processes a subset of the keys in the index and is a good performing option
when selecting a small percentage of rows from the table.
The table number refers to the relative position of this table in the query.
Recovery Text: Refer to Data access methods on page 7 for more information about index scan-key row
positioning.
SQL4012 - Index created from index &1 for table &2
Message Text: Index created from index &1 for table &2.
Cause Text: A temporary index was created using the specified index to access rows from the queried table
for one of the following reasons:
-- Perform specified ordering/grouping criteria.
-- Perform specified join criteria.
The table number refers to the relative position of this table in the query.
Recovery Text: Creating an index from an index is generally a good performing option. Consider creating a
permanent index for frequently run queries. The query can be run in debug mode (STRDBG) to
determine the key columns used when creating the index. NOTE: If a permanent index is created,
it is possible the query optimizer may still choose to create a temporary index to access the rows
from the table.
SQL4013 - Access plan has not been built
Message Text: Access plan has not been built.
Cause Text: An access plan was not created for this query. Possible reasons may include:
-- Tables were not found when the program was created.
-- The query was complex and required a temporary result table.
-- Dynamic SQL was specified.
Recovery Text: If an access plan was not created, review the possible causes. Attempt to correct the problem if
possible.
384 IBM i: Database Performance and Query Optimization
SQL4014 - &1 join column pair(s) are used for this join position
Message Text: &1 join column pair(s) are used for this join position.
Cause Text: The query optimizer may choose to process join predicates as either join selection or row
selection. The join predicates used in join selection are determined by the final join order and the
index used. This message indicates how many join column pairs were processed as join selection
at this join position. Message SQL4015 provides detail on which columns comprise the join
column pairs.
If 0 join column pairs were specified then index scan-key row positioning with row selection was
used instead of join selection.
Recovery Text: If fewer join pairs are used at a join position than expected, it is possible no index exists which
has keys matching the desired join columns. Try creating an index whose keys match the join
predicates.
If 0 join column pairs were specified then index scan-key row positioning was used. Index
scan-key row positioning is normally a good performing option. Message SQL4011 provides more
information on index scan-key row positioning.
SQL4015 - From-column &1.&2, to-column &3.&4, join operator &5, join predicate &6
Message Text: From-column &1.&2, to-column &3.&4, join operator &5, join predicate &6.
Cause Text: Identifies which join predicate was implemented at the current join position. The replacement text
parameters are:
-- &1: The join from table number. The table number refers to the relative position of this table
in the query.
-- &2: The join from column name. The column within the join from table which comprises the
left half of the join column pair. If the column name is *MAP, the column is an expression
(derived field).
-- &3: The join to table number. The table number refers to the relative position of this table in
the query.
-- &4. The join to column name. The column within the join to column which comprises the
right half of the join column pair. If the column name is *MAP, the column is an expression
(derived field).
-- &5. The join operator. Possible values are EQ (equal), NE (not equal), GT (greater than), LT
(less than), GE (greater than or equal), LE (less than or equal), and CP (cross join or cartesian
product).
-- &6. The join predicate number. Identifies the join predicate within this set of join pairs.
Recovery Text: Refer to Join optimization on page 54 for more information about joins.
SQL4016 - Subselects processed as join query
Message Text: Subselects processed as join query.
Cause Text: The query optimizer chose to implement some or all of the subselects with a join query.
Implementing subqueries with a join generally improves performance over implementing
alternative methods.
Recovery Text: None
SQL4017 - Host variables implemented as reusable ODP
Message Text: Host variables implemented as reusable ODP.
Performance and query optimization 385
SQL4017 - Host variables implemented as reusable ODP
Cause Text: The query optimizer has built the access plan allowing for the values of the host variables to be
supplied when the query is opened. This query can be run with different values being provided
for the host variables without requiring the access plan to be rebuilt. This is the normal method
of handling host variables in access plans. The open data path (ODP) that will be created from
this access plan will be a reusable ODP.
Recovery Text: Generally, reusable open data paths perform better than non-reusable open data paths.
SQL4018 - Host variables implemented as non-reusable ODP
Message Text: Host variables implemented as non-reusable ODP.
Cause Text: The query optimizer has implemented the host variables with a non-reusable open data path
(ODP).
Recovery Text: This can be a good performing option in special circumstances, but generally a reusable ODP
gives the best performance.
SQL4019 - Host variables implemented as file management row positioning reusable ODP
Message Text: Host variables implemented as file management row positioning reusable ODP.
Cause Text: The query optimizer has implemented the host variables with a reusable open data path (ODP)
using file management row positioning.
Recovery Text: Generally, a reusable ODP performs better than a non-reusable ODP.
SQL402A - Hashing algorithm used to process join
Message Text: Hashing algorithm used to process join.
Cause Text: The hash join algorithm is typically used for longer running join queries. The original query will
be subdivided into hash join steps. Each hash join step will be optimized and processed
separately. Access plan implementation information for each of the hash join steps is not available
because access plans are not saved for the individual hash join dials. Debug messages detailing
the implementation of each hash dial can be found in the joblog if the query is run in debug
mode using the STRDBG CL command.
Recovery Text: The hash join method is usually a good implementation choice, however, if you want to disallow
the use of this method specify ALWCPYDTA(*YES). Refer to the &qryopt. for more information
on hashing algorithm for join processing.
SQL402B - Table &1 used in hash join step &2
Message Text: Table &1 used in hash join step &2.
Cause Text: This message lists the table number used by the hash join steps. The table number refers to the
relative position of this table in the query. If there are two or more of these messages for the same
hash join step, then that step is a nested loop join. Access plan implementation information for
each of the hash join step are not available because access plans are not saved for the individual
hash steps. Debug messages detailing the implementation of each hash step can be found in the
joblog if the query is run in debug mode using the STRDBG CL command.
Recovery Text: Refer to Data access methods on page 7 for more information about hashing.
SQL402C - Temporary table created for hash join results
Message Text: Temporary table created for hash join results.
386 IBM i: Database Performance and Query Optimization
SQL402C - Temporary table created for hash join results
Cause Text: The results of the hash join were written to a temporary table so that query processing could be
completed. The temporary table was required because the query contained one or more of the
following: GROUP BY or summary functions ORDER BY DISTINCT Expression containing
columns from more than one table Complex row selection involving columns from more than one
table
Recovery Text: Refer to Data access methods on page 7 for more information about the hashing algorithm for
join processing.
SQL402D - Query attributes overridden from query options file &2 in library &1
Message Text: Query attributes overridden from query options file &2 in library &1.
Cause Text: None
Recovery Text: None
SQL4020 - Estimated query run time is &1 seconds
Message Text: Estimated query run time is &1 seconds.
Cause Text: The total estimated time, in seconds, of executing this query.
Recovery Text: None
SQL4021 - Access plan last saved on &1 at &2
Message Text: Access plan last saved on &1 at &2.
Cause Text: The date and time reflect the last time the access plan was successfully updated in the program
object.
Recovery Text: None
SQL4022 - Access plan was saved with SRVQRY attributes active
Message Text: Access plan was saved with SRVQRY attributes active.
Cause Text: The access plan that was saved was created while SRVQRY was active. Attributes saved in the
access plan may be the result of SRVQRY.
Recovery Text: The query will be re-optimized the next time it is run so that SRVQRY attributes will not be
permanently saved.
SQL4023 - Parallel table prefetch used
Message Text: Parallel table prefetch used.
Cause Text: The query optimizer chose to use a parallel prefetch access method to reduce the processing time
required for the table scan.
Recovery Text: Parallel prefetch can improve the performance of queries. Even though the access plan was
created to use parallel prefetch, the system will actually run the query only if the following are
true:
-- The query attribute degree was specified with an option of *IO or *ANY for the application
process.
-- There is enough main storage available to cache the data being retrieved by multiple I/O
streams. Normally, 5 megabytes would be a minimum. Increasing the size of the shared pool may
improve performance.
For more information about parallel table prefetch, refer to Data access methods on page 7.
Performance and query optimization 387
SQL4024 - Parallel index preload access method used
Message Text: Parallel index preload access method used.
Cause Text: The query optimizer chose to use a parallel index preload access method to reduce the processing
time required for this query. This means that the indexes used by this query will be loaded into
active memory when the query is opened.
Recovery Text: Parallel index preload can improve the performance of queries. Even though the access plan was
created to use parallel preload, the system will actually use parallel preload only if the following
are true:
-- The query attribute degree was specified with an option of *IO or *ANY for the application
process.
-- There is enough main storage to load all of the index objects used by this query into active
memory. Normally, a minimum of 5 megabytes would be a minimum. Increasing the size of the
shared pool may improve performance.
For more information about parallel table prefetch, refer to Data access methods on page 7.
SQL4025 - Parallel table preload access method used
Message Text: Parallel table preload access method used.
Cause Text: The query optimizer chose to use a parallel table preload access method to reduce the processing
time required for this query. This means that the data accessed by this query will be loaded into
active memory when the query is opened.
Recovery Text: Parallel table preload can improve the performance of queries. Even though the access plan was
created to use parallel preload, the system will actually use parallel preload only if the following
are true:
-- The query attribute degree must have been specified with an option of *IO or *ANY for the
application process.
-- There is enough main storage available to load all of the data in the file into active memory.
Normally, 5 megabytes would be a minimum. Increasing the size of the shared pool may improve
performance.
For more information about parallel table prefetch, refer to Data access methods on page 7.
SQL4026 - Index only access used on table number &1
Message Text: Index only access used on table number &1.
Cause Text: Index only access is primarily used in conjunction with either index scan-key row positioning
index scan-key selection. This access method will extract all of the data from the index rather
than performing random I/O to the data space. The table number refers to the relative position of
this table in the query.
Recovery Text: Refer to Data access methods on page 7 for more information about index only access.
SQL4027 - Access plan was saved with DB2 Multisystem installed on the system
Message Text: Access plan was saved with DB2 Multisystem installed on the system.
Cause Text: The access plan saved was created while the system feature DB2 Multisystem was installed on
the system. The access plan may have been influenced by the presence of this system feature.
Having this system feature installed may cause the implementation of the query to change.
Recovery Text: For more information about how the system feature DB2 Multisystem can influence a query, refer
to the Controlling parallel processing for queries on page 184
388 IBM i: Database Performance and Query Optimization
SQL4028 - The query contains a distributed table
Message Text: The query contains a distributed table.
Cause Text: A distributed table was specified in the query which may cause the query to be processed in
multiple steps. If the query is processed in multiple steps, additional messages will detail the
implementation for each step. Access plan implementation information for each step is not
available because access plans are not saved for the individual steps. Debug messages detailing
the implementation of each step can be found in the joblog if the query is run in debug mode
using the STRDBG CL command.
Recovery Text: For more information about how a distributed table can influence the query implementation refer
to the Distributed database programming topic collection.
SQL4029 - Hashing algorithm used to process the grouping
Message Text: Hashing algorithm used to process the grouping.
Cause Text: The grouping specified within the query was implemented with a hashing algorithm.
Recovery Text: Implementing the grouping with the hashing algorithm is generally a performance advantage
since an index does not have to be created. However, if you want to disallow the use of this
method simply specify ALWCPYDTA(*YES). Refer to Data access methods on page 7 for more
information about the hashing algorithm.
SQL4030 - &1 tasks specified for parallel scan on table &2
Message Text: &1 tasks specified for parallel scan on table &2.
Cause Text: The query optimizer has calculated the optimal number of tasks for this query based on the
query attribute degree. The table number refers to the relative position of this table in the query.
Recovery Text: Parallel table or index scan can improve the performance of queries. Even though the access plan
was created to use the specified number of tasks for the parallel scan, the system may alter that
number based on the availability of the pool in which this job is running or the allocation of the
tables data across the disk units. Refer to Data access methods on page 7 for more information
about parallel scan.
SQL4031 - &1 tasks specified for parallel index create over table &2
Message Text: &1 tasks specified for parallel index create over table &2.
Cause Text: The query optimizer has calculated the optimal number of tasks for this query based on the
query attribute degree. The table number refers to the relative position of this table in the query.
Recovery Text: Parallel index create can improve the performance of queries. Even though the access plan was
created to use the specified number of tasks for the parallel index build, the system may alter
that number based on the availability of the pool in which this job is running or the allocation of
the tables data across the disk units. Refer to Data access methods on page 7 for more
information about parallel index create.
SQL4032 - Index &1 used for bitmap processing of table &2
Message Text: Index &1 used for bitmap processing of table &2.
Cause Text: The index was used, in conjunction with query selection, to create a bitmap. The bitmap, in turn,
was used to access rows from the table. This message may appear more than once per table. If
this occurs, then a bitmap was created from each index of each message. The bitmaps were then
combined into one bitmap using boolean logic and the resulting bitmap was used to access rows
from the table. The table number refers to the relative position of this table in the query.
Recovery Text: The query can be run in debug mode (STRDBG) to determine more specific information. Also,
refer to Data access methods on page 7 for more information about bitmap processing.
Performance and query optimization 389
SQL4033 - &1 tasks specified for parallel bitmap create using &2
Message Text: &1 tasks specified for parallel bitmap create using &2.
Cause Text: The query optimizer has calculated the optimal number of tasks to use to create the bitmap based
on the query attribute degree.
Recovery Text: Using parallel index scan to create the bitmap can improve the performance of queries. Even
though the access plan was created to use the specified number of tasks, the system may alter
that number based on the availability of the pool in which this job is running or the allocation of
the files data across the disk units. Refer to Data access methods on page 7 for more
information about parallel scan.
SQL4034 - Multiple join classes used to process join
Message Text: Multiple join classes used to process join.
Cause Text: Multiple join classes are used when join queries are written that have conflicting operations or
cannot be implemented as a single query. Each join class will be optimized and processed as a
separate step of the query with the results written out to a temporary table. Access plan
implementation information for each of the join classes is not available because access plans are
not saved for the individual join class dials. Debug messages detailing the implementation of
each join dial can be found in the joblog if the query is run in debug mode using the STRDBG
CL command.
Recovery Text: Refer to Join optimization on page 54 for more information about join classes.
SQL4035 - Table &1 used in join class &2
Message Text: Table &1 used in join class &2.
Cause Text: This message lists the table numbers used by each of the join classes. The table number refers to
the relative position of this table in the query. All of the tables listed for the same join class will
be processed during the same step of the query. The results from all of the join classes will then
be joined together to return the final results for the query. Access plan implementation
information for each of the join classes are not available because access plans are not saved for
the individual classes. Debug messages detailing the implementation of each join class can be
found in the joblog if the query is run in debug mode using the STRDBG CL command.
Recovery Text: Refer to Join optimization on page 54 for more information about join classes.
Code license and disclaimer information
IBM grants you a nonexclusive copyright license to use all programming code examples from which you
can generate similar function tailored to your own specific needs.
SUBJECT TO ANY STATUTORY WARRANTIES WHICH CANNOT BE EXCLUDED, IBM, ITS
PROGRAM DEVELOPERS AND SUPPLIERS MAKE NO WARRANTIES OR CONDITIONS EITHER
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OR
CONDITIONS OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
NON-INFRINGEMENT, REGARDING THE PROGRAM OR TECHNICAL SUPPORT, IF ANY.
UNDER NO CIRCUMSTANCES IS IBM, ITS PROGRAM DEVELOPERS OR SUPPLIERS LIABLE FOR
ANY OF THE FOLLOWING, EVEN IF INFORMED OF THEIR POSSIBILITY:
1. LOSS OF, OR DAMAGE TO, DATA;
2. DIRECT, SPECIAL, INCIDENTAL, OR INDIRECT DAMAGES, OR FOR ANY ECONOMIC
CONSEQUENTIAL DAMAGES; OR
3. LOST PROFITS, BUSINESS, REVENUE, GOODWILL, OR ANTICIPATED SAVINGS.
390 IBM i: Database Performance and Query Optimization
SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OR LIMITATION OF DIRECT,
INCIDENTAL, OR CONSEQUENTIAL DAMAGES, SO SOME OR ALL OF THE ABOVE LIMITATIONS
OR EXCLUSIONS MAY NOT APPLY TO YOU.
Performance and query optimization 391
392 IBM i: Database Performance and Query Optimization
Appendix. Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries.
Consult your local IBM representative for information on the products and services currently available in
your area. Any reference to an IBM product, program, or service is not intended to state or imply that
only that IBM product, program, or service may be used. Any functionally equivalent product, program,
or service that does not infringe any IBM intellectual property right may be used instead. However, it is
the users responsibility to evaluate and verify the operation of any non-IBM product, program, or
service.
IBM may have patents or pending patent applications covering subject matter described in this
document. The furnishing of this document does not grant you any license to these patents. You can send
license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.
For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property
Department in your country or send inquiries, in writing, to:
Intellectual Property Licensing
Legal and Intellectual Property Law
IBM Japan, Ltd.
3-2-12, Roppongi, Minato-ku, Tokyo 106-8711
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some
states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this
statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically
made to the information herein; these changes will be incorporated in new editions of the publication.
IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this
publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in
any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of
the materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
Licensees of this program who wish to have information about it for the purpose of enabling: (i) the
exchange of information between independently created programs and other programs (including this
one) and (ii) the mutual use of the information which has been exchanged, should contact:
IBM Corporation
Copyright IBM Corp. 1998, 2010 393
|
|
|
|
Software Interoperability Coordinator, Department YBWA
3605 Highway 52 N
Rochester, MN 55901
U.S.A.
Such information may be available, subject to appropriate terms and conditions, including in some cases,
payment of a fee.
The licensed program described in this document and all licensed material available for it are provided
by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement,
IBM License Agreement for Machine Code, or any equivalent agreement between us.
Any performance data contained herein was determined in a controlled environment. Therefore, the
results obtained in other operating environments may vary significantly. Some measurements may have
been made on development-level systems and there is no guarantee that these measurements will be the
same on generally available systems. Furthermore, some measurements may have been estimated through
extrapolation. Actual results may vary. Users of this document should verify the applicable data for their
specific environment.
Information concerning non-IBM products was obtained from the suppliers of those products, their
published announcements or other publicly available sources. IBM has not tested those products and
cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM
products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of
those products.
All statements regarding IBMs future direction or intent are subject to change or withdrawal without
notice, and represent goals and objectives only.
All IBM prices shown are IBMs suggested retail prices, are current and are subject to change without
notice. Dealer prices may vary.
This information is for planning purposes only. The information herein is subject to change before the
products described become available.
This information contains examples of data and reports used in daily business operations. To illustrate
them as completely as possible, the examples include the names of individuals, companies, brands, and
products. All of these names are fictitious and any similarity to the names and addresses used by an
actual business enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs
in any form without payment to IBM, for the purposes of developing, using, marketing or distributing
application programs conforming to the application programming interface for the operating platform for
which the sample programs are written. These examples have not been thoroughly tested under all
conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these
programs. The sample programs are provided AS IS, without warranty of any kind. IBM shall not be
liable for any damages arising out of your use of the sample programs.
Each copy or any portion of these sample programs or any derivative work, must include a copyright
notice as follows:
(your company name) (year). Portions of this code are derived from IBM Corp. Sample Programs.
Copyright IBM Corp. _enter the year or years_.
394 IBM i: Database Performance and Query Optimization
|
|
|
|
|
|
|
|
|
|
If you are viewing this information softcopy, the photographs and color illustrations may not appear.
Programming interface information
This Database performance and query optimization publication documents intended Programming
Interfaces that allow the customer to write programs to obtain the services of IBM i.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at
Copyright and trademark information at www.ibm.com/legal/copytrade.shtml.
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks
of Adobe Systems Incorporated in the United States, and/or other countries.
Other company, product, or service names may be trademarks or service marks of others.
Terms and conditions
Permissions for the use of these publications is granted subject to the following terms and conditions.
Personal Use: You may reproduce these publications for your personal, noncommercial use provided that
all proprietary notices are preserved. You may not distribute, display or make derivative works of these
publications, or any portion thereof, without the express consent of IBM.
Commercial Use: You may reproduce, distribute and display these publications solely within your
enterprise provided that all proprietary notices are preserved. You may not make derivative works of
these publications, or reproduce, distribute or display these publications or any portion thereof outside
your enterprise, without the express consent of IBM.
Except as expressly granted in this permission, no other permissions, licenses or rights are granted, either
express or implied, to the publications or any information, data, software or other intellectual property
contained therein.
IBM reserves the right to withdraw the permissions granted herein whenever, in its discretion, the use of
the publications is detrimental to its interest or, as determined by IBM, the above instructions are not
being properly followed.
You may not download, export or re-export this information except in full compliance with all applicable
laws and regulations, including all United States export laws and regulations.
IBM MAKES NO GUARANTEE ABOUT THE CONTENT OF THESE PUBLICATIONS. THE
PUBLICATIONS ARE PROVIDED AS-IS AND WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO IMPLIED WARRANTIES OF
MERCHANTABILITY, NON-INFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE.
Appendix. Notices 395
|
|
|
|
396 IBM i: Database Performance and Query Optimization
Printed in USA