0% found this document useful (0 votes)
98 views59 pages

Advanced Query Tuning With IBM Data Studio

The document discusses using IBM Data Studio to improve query performance through SQL tuning. It provides an overview of Data Studio's visual explain feature which can help understand access paths, filter factors, how the optimizer determines access paths, and how to navigate Data Studio for SQL tuning. Examples are given demonstrating how to interpret access paths, predicates, joins, sorts, and predicate generation in Data Studio.

Uploaded by

ivana
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
98 views59 pages

Advanced Query Tuning With IBM Data Studio

The document discusses using IBM Data Studio to improve query performance through SQL tuning. It provides an overview of Data Studio's visual explain feature which can help understand access paths, filter factors, how the optimizer determines access paths, and how to navigate Data Studio for SQL tuning. Examples are given demonstrating how to interpret access paths, predicates, joins, sorts, and predicate generation in Data Studio.

Uploaded by

ivana
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 59

Advanced Query Tuning with IBM Data Studio

David Simpson
Themis Training
dsimpson@themisinc.com
www.themisinc.com
Platform: Both Db2 LUW and z/OS
David Simpson

David Simpson is currently the Vice President of Themis Inc. He teaches courses on SQL,
Application Programming, Database Administration as well as optimization, performance
and tuning. He also installs and maintains the database systems used for training at Themis
and works with our network of instructors to deliver high quality training solutions to our
customers worldwide.

Since 1993 David has worked as a developer and DBA in support of very large
transactional and business intelligence systems. David is a certified DB2 DBA on both z/OS
and LUW. David was voted Best User Speaker and Best Overall Speaker at IDUG North
America 2006. He was also voted Best User Speaker at IDUG Europe 2006 and is a
member of the IDUG Speakers Hall of Fame. David is also an IBM Gold Consultant.

dsimpson@themisinc.com Find my work:


www.themisinc.com www.themisinc.com/webinars
@ThemisDave www.idug.org/content
@ThemisTraining
Objectives
By the end of this presentation, you should:

• Know how to use Data Studio to help improve query performance.


• Know the different access paths and understand how they are
presented
• Understand filter factors
• Better understand how the DB2 optimizer determines access paths
• Better understand how to use and navigate Data Studio for SQL
tuning
Improving SQL Performance

• System Tuning

• Change the SQL

• Gather / Alter Statistics

• Change Physical Design


Developers Should Focus On

• Appropriate use of indexes • Clustering order of data


• Predicate Types • Knowing ‘why’ any table space scan
• Access Path Choice • Stage 1 Predicates / Stage 2 / Residual
• Filter Factors • Minimal Sorts
• Known Statistics • Possible Rewrites
• Data Outliers
Optimizer

Catalog Statistics
Object Definitions

Access Path
Rid Pools, Sort Pools
CPU Speed, # CPUs
Access Path Hint
Explain

EXPLAIN PLAN SET QUERYNO = 10 FOR


SELECT LASTNAME,SALARY
FROM EMP
WHERE EMPNO BETWEEN '000000' AND '099999'
Optimizer
AND SALARY < 40000

OR

BIND PACKAGE with option


EXPLAIN(YES) z/OS LUW
PLAN_TABLE EXPLAIN_STATEMENT
DSN_STATEMNT_TABLE EXPLAIN_PREDICATE
DSN_FUNCTION_TABLE & a bunch of “other” tables
& a bunch of “other” tables
Data Studio Explaining Queries

Open Visual
Explain

Select Subsystem
Connection
z/OS Data Studio Access Path Graphs
LUW Data Studio Access Path Graphs
Data Studio Access Path Graphs

Sources of Data
Data Studio Access Path Graphs
Data Studio Access Path Graphs

Data Retrieval
Operations
z/OS Stage 1, Stage 2 Predicates
z/OS Stage 2 Predicates Rewrites

value NOT BETWEEN COL1 AND COL2


value BETWEEN col expr and col expr
COL op ANY
COL op ALL
COL = SUBSTR(expr,1,x)
YEAR(date column) = expr
DATE(timestamp column) = expr
LUW Predicates – Sargable (similar to S1)
Data Studio Access Path Graphs

Where do the filter factors come from ?


Is it defaulting to 1/column cardinality?
FIRSTNME has 188 values. 1/188 = .005319
Data Studio Access Path Graphs
Tablespace Scan
SELECT EMPNO, LASTNAME, SALARY
FROM EMP
WHERE EMPNO BETWEEN ‘000000’ AND ‘099999’
AND SALARY < 40000
PLAN METHOD TNAME ACCESS MATCH ACCESS INDEX PREFETCH
NO TYPE COLS NAME ONLY
1 0 EMP R 0 N S

Data Studio
Estimated number
of rows
Plan Table
Why Would the Optimizer Choose a Table Space Scan?

1. Are any predicate(s) poorly coded in a non-indexable way that takes away any possible
index choices from the optimizer?
2. Do the predicates in the query not match any available indexes on the table? Know your
indexes on a table!
3. The table could be small and Db2 decides a table scan may be faster than index processing.
4. The catalog statistics could say the table is small. This is more common in test
environments where the Runstats utility is not executed very often.
5. Are the predicates such that Db2 thinks the query is going to retrieve a large enough
amount of data that would require a table scan? Some explain tools will show the number
of rows Db2 thinks will be returned in the execution of a query (the IBM Data Studio tool is
very good at this).
Why Would the Optimizer Choose a Table Space Scan?

6. Are the predicates such that Db2 picks a non-clustered index, and the rows needed are
scattered throughout the table file such that the number of data pages to retrieve is high
enough based on total number of pages in the table to require a table scan? Know how
data is physically clustered in the tablespace!
7. Are the tablespace files or index files physically out of shape and
need a REORG?
8, Are there no predicates? So the query wants all the rows.
9. Sometimes there are just too many conditions in the logic to return the
results needed any other way. This is quite typical with many predicates
that are OR’d together.
z/OS Index Scan - Matching
SELECT * FROM EMP
WHERE LASTNAME = ?
PLAN_TABLE AND FIRSTNME = ‘Michelle’;

PLAN METHOD TNAME ACCESS MATCH ACCESS INDEX PREFETCH


NO TYPE COLS NAME ONLY
1 0 EMP I 2 XEMP03 N

Notice FF of
.00105 for
LASTNAME
Predicate.
1/947 = .00105
z/OS Index Screening

INDEX XEMP03 on
(LASTNAME, FIRSTNME, MIDINIT)
SELECT * FROM EMP
WHERE LASTNAME = ‘Coldsmith’
AND MIDINIT = ‘R’;

Index Screening
Predicate

PLAN_TABLE
PLAN METHOD TNAME ACCESS MATCH ACCESS INDEX PREFETCH
NO TYPE COLS NAME ONLY

1 0 EMP I 1 XEMP03 N
z/OS Index Screening
LUW Index Scan - Matching (Start/Stop Keys)
SELECT * FROM EMP
WHERE EMPNO BETWEEN ‘000000’ and ‘099999’
LUW Index Screening
SELECT * FROM EMP
WHERE LASTNAME = ‘Smith’
AND MIDINIT = ‘R’
z/OS Index Scan - Nonmatching

SELECT * FROM EMP


WHERE FIRSTNME = ‘Michelle’
AND MIDINIT = ‘R’;

PLAN_TABLE
PLAN METHOD TNAME ACCESS MATCH ACCESS INDEX PREFETCH
NO TYPE COLS NAME ONLY

1 0 EMP I 0 XEMP03 N
z/OS Index Scan - Nonmatching

SELECT * FROM EMP


WHERE FIRSTNME = ‘Michelle’
AND MIDINIT = ‘R’;
LUW Index Scan – Non Matching (Sargable)
SELECT * FROM EMP
WHERE FIRSTNME = ‘Michelle’
AND MIDINIT = ‘R’;
No Start/Stop
Index Only Access

SELECT LASTNAME, FIRSTNME, MIDINIT


FROM EMP
WHERE LASTNAME LIKE 'Jo%'
z/OS Nested Loop Join
Nested Loop Join

LUW Nested Loop Join


Which Join Method

1) Depends on the predicates


2) How much filtering on the tables
3) Possible indexes
4) Optimization level
5) Clustering of table data
Sort Activities

Data Sorts RID Sorts


ORDER BY
 List Prefetch
GROUP BY
 Multiple Index Access
DISTINCT
 Hybrid Join
UNION
Subqueries
JOIN
z/OS Data Sorts via Data Studio
LUW Data Sorts via Data Studio
Predicate Generation Through Transitive Closure

The Premise
If A must equal B
And A is RED,
Then B must also be RED.
Predicate Generation Through Transitive Closure

Single Table DB2 Generated Predicate

Index XDEPT1 on DEPTNO


Index XDEPT3 on ADMRDEPT

SELECT . . . . SELECT . . . .
FROM DEPT FROM DEPT
WHERE DEPTNO = ADMRDEPT WHERE DEPTNO = ADMRDEPT
AND ADMRDEPT = ‘A00’ ; AND ADMRDEPT = ‘A00’
AND DEPTNO = ‘A00’ ;

XDEPT1 index chosen !


Predicate Generation Through Transitive Closure
Single Table DB2 Generated Predicate

Index XDEPT1 on DEPTNO


Index XDEPT3 on ADMRDEPT

SELECT . . . . SELECT . . . .
FROM DEPT D, EMP E FROM DEPT D, EMP E
WHERE D.DEPTNO = E.DEPTNO WHERE D.DEPTNO = E.DEPTNO
AND D.DEPTNO IN (‘A00’ AND D.DEPT IN (‘A00’,
‘B01’, ‘B01’,
‘C11’) ; ‘C11’)
AND D.DEPTNO IN (‘A00’,
‘B01’,
‘C11’) ;
Predicate Transitive Closure

SELECT . . . .
FROM DEPT
WHERE DEPTNO = ADMRDEPT
AND ADMRDEPT = ‘A00’ ;

Note: Index on
DEPTNO chosen
LUW Predicate Transitive Closure
Tuning a Query

Tune a Query
Tuning a Query
Tuning a Query Output

All the output from ‘Tune a Query’.


Note some are not available in ‘no
charge’ version.
Statistics Advisor

Opens Advisor
details tab to view
information

Click on this line to expand


statistics recommendations
Statistics Advisor Summary Report
z/OS Tune a query – Query Transformation
z/OS Tune a query – Query Transformation

Note: Note:
Non Correlated Correlated
Case #1: 2 Possible Indexes

SELECT * FROM EMP


WHERE LASTNAME = 'Smith'
AND FIRSTNME = 'Joe'
AND DEPTNO = 'A00';

XEMP02 Col XEMP03 Cols


51,834 * .0092 = 477 rows 51,834 * .001 * .0057 = < 1 row
Winner!
Comparing Estimates with Reality

SELECT COUNT(*) 1,289 rows


FROM EMP
WHERE LASTNAME = 'Smith' (Db2 est. <1)
AND FIRSTNME = 'Joe';

SELECT COUNT(*) 4 rows


FROM EMP
WHERE DEPTNO = 'A00'; (Db2 est. 477)
Additional Statistics

A00 has far fewer rows than


the average dept
RUNSTATS INDEX (THEMIS82.XEMP02)
KEYCARD FREQVAL NUMCOLS 1 COUNT 5 BOTH

RUNSTATS INDEX (THEMIS82.XEMP03)


KEYCARD FREQVAL NUMCOLS 2 COUNT 5 MOST
Joe Smith has far more rows than the
average name combination
Frequency Stats
Result of Adding Frequency Stats

Better Index
Chosen
Case #2: Join Order

SELECT *
FROM EMP E, PROJ P
WHERE E.EMPNO = P.EMPNO
AND E.COMM = 0
AND P.PRSTAFF > 3
Comparing Estimates with Reality

SELECT COUNT(*) # Rows on EMP 51,803 rows


FROM EMP
WHERE COMM = 0; (Db2 est. 1,620)

SELECT COUNT(*) 14,598 rows


FROM PROJ
WHERE PRSTAFF > 3; (Db2 est.
50,242 * .2906 =
14,600) Filter Factor
Additional Statistics

COMM = 0 appears far more often


than other values appear
RUNSTATS TABLESPACE THEMIS82.TS00EMP
TABLE(THEMIS82.EMP)
COLUMN(COMM)
COLGROUP(COMM) FREQVAL COUNT 1
SORTDEVT SYSDA SORTNUM 4

Runstats needed on non


uniform distribution of
COMM data
Adjusted Join Order

SELECT *
FROM EMP E, PROJ P
WHERE E.EMPNO = P.EMPNO
AND E.COMM = 0
AND P.PRSTAFF > 3
Thank You for Attending!

“There is always time for an Explain”


“I have noticed that when the developers get
educated, good SQL programming standards are in
place, program walkthroughs and Explains are executed
correctly, incident reporting stays low, CPU costs do not get out
of control, and most performance issues are found before
promoting code to production.”
David Simpson
Themis Inc.
dsimpson@themisinc.com

Tony Andrews
Themis Inc.
tandrews@themisinc.com

You might also like