IBM DB2 RUNSTATS Utility and Real-Time Statistics
Bryan F. Smith IBM Tuesday, August 14, 2007 Tuesday, 6 November 2007 11:45 am 12:45 pm Session 1316
Platform: DB2 for z/OS
Abstract
This presentation reviews the basics of the RUNSTATS utility (What it does; Why you need to run it; How DB2 uses the information), and explores new statistics collected on data and indexes, including: partition level information on Data Partitioned Secondary Indexes; non-uniform distributon statistics on non-indexed columns; and historical statistics. The real-time statistics are also reviewed. Upon completion of this session, the attendee, whose skill level may range from low to high, will be able to understand how to get the most out of DB2's statistics and operate at optimal efficiency. 2
Topics
Why RUNSTATS? Invoking RUNSTATS Commonly asked questions (about the stats) Real-time Statistics Rebinding considerations Reorg recommendations When is RUNSTATS needed? New/changed data statistics New/changed index statistics Handling part level statistics for DPSIs Distribution Statistics Enhanced HISTORY statistics changes Flushing the dynamic statement cache What statistics should I gather?
3
Why RUNSTATS?
The RUNSTATS utility computes statistics on a specified table space or index and updates the DB2 catalog Two types of statistics
Access path statistics
Those used by BIND/PREPARE in its process of optimization to determine access path (some can also be used to help determine when to reorg)
Space
Those used by the DBA to monitor space usage; to assist in capacity planning; to help determine when to reorg; etc.
4
Statistics gathered by RUNSTATS
Access path statistic Access path (not used) Space statistic SYSIBM.SYSTABLES_HIST
CARD/F NPAGES/F PCTPAGES PCTROWCOMP AVGROWLEN SPACEF
SYSIBM.SYSTABLESPACE
NACTIVE/F AVGROWLEN SPACEF
Table in DSNDB06.SYSDBASE SYSIBM.SYSINDEXES_HIST
CLUSTERRATIO/F CLUSTERED FIRSTKEYCARD/F FULLKEYCARD/F NLEAF NLEVELS AVGKEYLEN SPACEF
Table in DSNDB06.SYSHIST Table in DSNDB06.SYSSTATS Collected from table space scan either Collected from index scan SYSIBM.SYSINDEXPART_HIST
AVGKEYLEN CARDF DSNUM EXTENTS FAROFFPOSF LEAFNEAR LEAFFAR NEAROFFPOS LEAFDIST PSUEDO_DEL_ENTRIES SPACEF PQTY SECQTYI
aggregates SYSIBM.SYSTABSTATS_HIST
CARD/F NPAGES PCTPAGES NACTIVE PCTROWCOMP
aggregates SYSIBM.SYSTABLEPART_HIST
AVGROWLEN CARD/F DSNUM EXTENTS NEARINDREF FARINDREF PAGESAVE PERCACTIVE PERCDROP SPACE/F PQTY SQTY SECQTYI
SYSIBM.SYSINDEXSTATS_HIST
FIRSTKEYCARD/F FULLKEYCARD/F NLEAF NLEVELS IOFACTOR PREFETCHFACTOR KEYCOUNT/F CLUSTERRATIO/F FULLKEYCARDDATA
SYSIBM.SYSCOLUMNS_HIST SYSIBM.SYSCOLSTATS SYSIBM.SYSCOLDIST_HIST SYSIBM.SYSCOLDISTSTATS aggregates aggregates
COLCARD/F HIGH2KEY LOW2KEY STATS_FORMAT COLCARD HIGHKEY HIGH2KEY LOWKEY LOW2KEY COLCARDDATA STATS_FORMAT CARDF COLGROUPCOLNO COLVALUE TYPE FREQUENCY/F NUMCOLUMNS CARDF COLGROUPCOLNO COLVALUE TYPE FREQUENCY/F NUMCOLUMNS KEYCARDDATA
SYSIBM.SYSLOBSTATS_HIST FREESPACE ORGRATIO AVGSIZE
Invoking RUNSTATS
Scans the tablespace
Scans the index
Invoking RUNSTATS
Affects the collection of column-statistics from the table space scan (expensive)
colgroup-spec
KEYCARD (Recommended)
Collects all of the distinct values in all of the 1 to n key column combinations for the specified indexes. n is the number of columns in the index. For example, suppose that you have an index defined on three columns: A, B, and C. If you specify KEYCARD, RUNSTATS collects cardinality statistics for column A, column set A and B, and column set A, B, and C. So these are cardinality statisics across column sets... if we had a 3-column index that had these values:
Col1 A A A A A A A B Col2 B B B B C C D B Col3 C D E E A A A B
then these stats would be collected:
Col1 cardinality = 2 Col1 and Col2 cardinality = 4 Col 1, Col2, and Col3 cardinality = 6
8
Commonly asked questions about the stats
What is SYSIBM.SYSINDEXPART.LEAFDIST?
LEAFDIST is 100 times the average number of pages between successive leaf pages of the index summation of distance between pages LEAFDIST = 100 x Number of leaf pages index leaf pages 1 gaps 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 number of leaf pages = 9 summation of gaps = 0
LEAFDIST = 100 * (0/9) = 0 (%)
9
Commonly asked questions about the stats
Another example of LEAFDIST index leaf pages 1 gaps 0 2 1 4 8 9 number of leaf pages = 5
3 0 summation of gaps = 4 LEAFDIST = 100 * (4/5) = 80
If there were more gaps than active pages, LEAFDIST would be larger FREEPAGE on an index can certainly affect the calculation of LEAFDIST We used to use this value to determine when to reorg an index, but now we have better stats to determine this (LEAFFAR/NEAR)
10
Commonly asked questions about the stats
What is SYSIBM.SYSINDEXPART.LEAFNEAR and LEAFFAR?
LEAFNEAR/FAR measure the disorganization of physical leaf pages
Number of pages that are not in an optimal position due to
index pages being deleted or index leaf page splits caused by an insert that cannot fit onto a full page
Logical and physical views of an index in which LEAFNEAR=1 and LEAFFAR=3
0th jump
11
Commonly asked questions about the stats
SYSIBM.SYSINDEXES.CLUSTERRATIO
An access path statistic that can also helps in determining when to reorg % of the rows that are in cluster order Rows are counted as being clustered if they are in a greater or equal page number of the previous row This is a statistic that describes the data in the table(space), even though it is reported in SYSINDEXES REORG INDEX will never affect this statistic
12
CLUSTERRATIO
Cluster Count
page 1
A B D E F K H I C
1 2 3 4 5 6
page 2
page 3
page 4
J L G
7 8
A, 1 <B, 1 <C, 3 D, 1 <E, 2 <F, 2 <G, 4 H, 2 <I, 3 <J, 4 K, 2 <L, 4
CC incremented 8 Optimal would be 11
Clustering index (key, page#)
13
Commonly asked questions about the stats
How does NEAR|FAR INDREF and NEAR|FAR OFFPOS contribute to CLUSTERRATIO? *INDREF correlates closely with the cluster count if the keys are in cluster order and then rows are relocated to another page, but we can create cases where these stats are correlated and cases where they are not correlated *OFFPOS directly affects the cluster count. A single jump counts as two OFFPOS, so almost always, the cluster count is of the sum of the *OFFPOS.
SYSIBM.SYSTABLEPART_HIST
NEARINDREF FARINDREF
SYSIBM.SYSINDEXES_HIST
CLUSTERRATIO/F
SYSIBM.SYSINDEXPART_HIST
FAROFFPOSF NEAROFFPOS
14
Example where INDREF is correlated with Cluster Count -> CLUSTERRATIO
INDREF
page 1
Cluster Count
A B D X
PCTFREE
1 2 X 3 4
PCTFREE
page 2
E F C
page 3
X H I K
PCTFREE
5 6 7 X
page 4
J L G 8
PCTFREE
A, 1 <B, 1 <C, 3 D, 1 <E, 2 <F, 2 <G, 4 H, 2 <I, 3 <J, 4 K, 2 <L, 4
8 Optimal would be 11
8 Optimal would be 11
Clustering index (key, page#)
15
Example where INDREF is not correlated with Cluster Count -> CLUSTERRATIO
INDREF
page 1
Cluster Count
A B C X
PCTFREE
1 2 X 3 4 5 X 6 7 8 9 X 10 11 A, 1 <B, 1 <C, 1 <D, 2 <E, 2 <F, 2 <G, 3 <H, 3 <I, 3 <J, 4 <K, 4 <L, 4
page 2
D E F
page 3
PCTFREE
G H I
page 4
11 Optimal!
PCTFREE
J K L
PCTFREE
11 Cluster count is perfect
Clustering index (key, page#)
16
Example where OFFPOS is correlated with Cluster Count -> CLUSTERRATIO
Cluster Count
page 1
A B D E F K H I C
1 2
page 2
OFFPOS
3 4 5 6 A, 1 <B, 1 <C, 3 D, 1 <E, 2 <F, 2 <G, 4 H, 2 <I, 3 <J, 4 K, 2 <L, 4 X X X X X X
page 3
page 4
J L G
7 8
8 Optimal would be 11
6 Clustering index OFFPOS / 2 = 3 (key, page#) So, Cluster Count is off by 3
17
Exercise for the reader
We just saw an example where *OFFPOS is correlated to the cluster count (which is used to compute CLUSTERATIO). Can an example be created showing non-correlation between these two metrics?
18
Commonly asked questions
Can you collect stats and have them stored in the catalog without affecting any binds/prepares?
Yes (by specifying REPORT YES UPDATE NONE or UPDATE NONE HISTORY ALL)
Should you collect statistics on the DB2 Catalog?
Yes. Will it benefit DB2 processing like BIND or PREPARE?
No, but SQL against the catalog can benefit
Is there any difference between running
RUNSTATS TABLESPACE DB1.TS1 INDEX (ALL) vs. RUNSTATS TABLESPACE DB1.TS1 RUNSTATS INDEX(ALL) TABLESPACE DB1.TS1 -- ?? No, they are semantically equivalent, but you could run these two utility statements in parallel to reduce overall elapsed time
Can/should you update the statistics in the DB2 Catalog?
It depends
What is the semantic difference between RUNSTATS TABLESPACE and RUNSTATS TABLESPACE TABLE (ALL)?
The TABLE keyword triggers collection of column statistics
19
Extra credit
Is there any difference between running
RUNSTATS TABLESPACE DB1.TS1 TABLE (ALL) INDEX (ALL) vs. RUNSTATS TABLESPACE DB1.TS1 TABLE (ALL) RUNSTATS INDEX(ALL) TABLESPACE DB1.TS1 - ??
There is a difference what is it?
20
Real-time Statistics
Introduced in V7 Contain space and some accesspath statistics in userdefined tables:
SYSIBM.TABLESPACESTATS (one row per partition) SYSIBM.INDEXSPACESTATS (one row per partition) In DB2 9, these are moved into the DB2 Catalog (DSNDB06.SYSRTSTS) as
SYSIBM.SYSTABLESPACESTATS SYSIBM.SYSINDEXSPACESTATS
Intended to eliminate running RUNSTATS for reasons of running utilities by exception Access path selection doesnt use RTS in V7, V8 or V9
21
DSNDB06.SYSRTSTS Real-time statistics tables in DSNRTSDB.DSNRTSTS
Index SYSIBM.DSNRTX01 (dbid, psid, partition.instance) New in Index SYSIBM.DSNRTX02 V9 (dbid, isobid, partition.instance)
Reorg Statistics
Runstats Statistics
Copy Statistics
Global Statistics
Reorg Statistics
Runstats Statistics
Copy Statistics
Global Statistics
Incremental Statistics SYSIBM.SYSTABLESPACESTATS
Incremental Statistics SYSIBM.SYSINDEXSPACESTATS
22
RTS
SYSTTABLESPACESTATS
Global
SYSINDEXSPACESTATS
Global
NACTIVE Incremental Statistics REORG Statistics NLEVELS REBUILDLASTTIME NPAGES LASTTIME NLEAF INSERTS EXTENTS UPDATES DELETES SPACE APPENDINSERT TOTALENTRIES PSEUDODELETES LASTUSED MASSDELETE UPDATESTATSTIME LEAFNEAR LEAFFAR NUMLEVELS COPY Statistics LASTTIME UPDATEDPAGES CHANGES UPDATELRSN UPDATETIME RUNSTATS Statistics LASTTIME INSERTS DELETES MASSDELETE
Incremental NACTIVE REORG Statistics NPAGES LASTTIME EXTENTS INSERTS SPACE UPDATES TOTALROWS DELETES DATASIZE DISORGLOB UNCOMPRESSEDDATASIZE UNCLUSTINS MASSDELETE UPDATESTATSTIME NEARINDREF FARINDREF COPY Statistics LASTTIME UPDATEDPAGES CHANGES UPDATELRSN UPDATETIME RUNSTATS Statistics LASTTIME INSERTS UPDATES DELETES MASSDELETE
23
Enable/Disable Real Time Statistics in V7/V8
START DATABASE (DSNRTSDB)
Validate table space, table and index definitions Enable real time statistics collection Issue this command to enable RTS after the statistics tables and indexes are first created Data may not be accurate until a new REORG/RUNSTATS/COPY is done
START DB2
Implicitly to enable real time statistics if
DSNRTSDB is not STOPPED and DB2 Catalog is accessible
STOP DATABASE(DSNRTSDB)
Flush all in-memory statistics
In V9, RTS are a part of the catalog and are always enabled
24
Collect Real Time Statistics in Memory
Data Sharing Member DB2A
Real-time Statistics Tables
Data Sharing Member DB2B
Allocate RTS blocks
At first update for table spaces since the pageset/partition is opened At open time for indexes since we collect SYSINDEXSPACESTATS.LASTUSED In DBM1 Address Space (~140 bytes per pageset/partition moved above bar in V9) 0/32KB per pageset/partition above the bar in V9)
Free RTS blocks when
Pagesets/Partitions are closed After statistics are written to RTS tables
In a data sharing system, statistics are collected by each member In-memory statistics are always collected even if RTS is not enabled
25
When to externalize in-memory statistics?
On a timer interval
STATSINT in ZPARM - default 30 minutes REAL TIME STATS in DSNTIPO install panel
Range: 1 to 1,440 minutes
STOP/START DATABASE SPACENAM command
Flush in-memory statistics for all target objects
STOP/START DATABASE(DSNRTSDB) in V7/8
Flush all in-memory statistics
STOP DB2 MODE(QUIESCE) A utility operation (e.g. LOAD, REORG, RUNSTATS, COPY, REBUILD, RECOVER)
26
Process to externalize in-memory statistics
RTS manager externalizes in-memory statistics to the RTS Tables RTS manager runs under a system task in DBM1 address space
CPU time is included in DBM1's SRB time The system task is created during START DB2
RTS manager is triggered on a timer interval
Default is 30 minutes Scan in-memory statistics blocks
Free dormant statistics blocks that belong to closing data sets
Order active statistics blocks in clustering order Insert/update rows in the RTS tables via the clustering index
Each data sharing member externalizes its own statistics
27
When to collect statistics for DB2 Objects?
Newly created table spaces and indexes
Rows are inserted into RTS tables at CREATE
Loadrlasttime and Reorglasttime is set to CREATE timestamp Stats/Copylasttime are set to NULL Totalrows/Totalentries are set to zero, all other global counters are set to null or a known value, incremental counters are set to zero
Table spaces and Indexes existed before RTS is enabled
Rows are inserted when the objects are first updated
At the next STATSINT timer interval All statistics values are set to NULL (except for Nactive, Space, Extent) Reorg/Stats/Copy/Loadr-lasttime are set to NULL Statistics values will be set after the first REORG, RUNSTATS, or COPY
No RTS rows for read only table space accessed objects (LASTUSED will be updated for read only indexes)
28
How SQL affects table space statistics?
CREATE/DROP TABLESPACE
Insert/delete a row in SYSIBM.TABLESPACESTATS
Insert
Increment Inserts,Totalrows, Copy Changes counters May update Nactive, Space, Extents, Uncluster_Inserts, Distinct Updated Pages, Update LRSN, Update Timestamp, Datasize
Update
Increment Updates, Copy Changes counters May update NearIndRef/FarIndRef, Nactive, Space, Extents for VARCHAR Tables, Distinct Updated Pages, Update LRSN, Update Timestamp, Datasize
Delete
Increment Deletes, Copy Changes counters, Datasize
29
How SQL affects table space statistics? ...
Delete without the WHERE clause or DROP TABLE for Segmented Table Spaces
Increments the Mass Deletes/Drops counter
Rollback
Insert
Increment Deletes counter
Delete
Increment Inserts counter
Update
Increment Update counter
Mass Delete/Drop Table
Will not decrement the Mass Deletes/Drops counter
Statistics counters will not be updated during DB2 Restart Triggers may cause statistics updated for other tables
30
How SQL affects index space statistics?
CREATE/DROP INDEX
Insert/delete a row in/from SYSIBM.INDEXSPACESTATS
Insert
Increment Inserts,TotalEntries counters May update Append_Inserts, LeafNear, LeafFar, ReorgNumLevels, Nactive, Space, Extents, Nleaf
Delete
Increment Deletes counter May update Pseudo Deletes, ReorgNumLevels
COPY YES indexes (Insert/Delete)
Maintain Copy Changes, Distinct Updated Pages, Update LRSN, Update Timestamp
Delete without a WHERE clause or DROP TABLE
Increment Mass Deletes counter
Rollbacks/Restart - same as for table space statistics
31
How Utility affects real-time statistics?
REORG
Set Last_REORG_Timestamp Reset REORG related statistics Log apply changes for online REORG will be treated as Inserts/Deletes/Updates
RUNSTATS
Set Last_RUNSTATS_Timestamp Reset RUNSTATS related statistics
COPY
Set Last_COPY_Timestamp Reset COPY related statistics
LOAD REPLACE
Set Last_Load_Replace Timestamp Reset REORG related statistics
32
How Utility affects real-time statistics? ...
REORG/LOAD REPLACE PART
Will not reset REORG statistics for non-partitioned indexes Statistics for NPIs will be updated as INSERT and DELETE
COPY with the DSNUM option
Will not reset Last_Copy_Timestamp Will not reset COPY related statistics We maintain statistics if DSNUM <> 0 refers to partitioned object If DSNUM references a data set, statistics are NOT maintained for the data set
RECOVER TORBA/TOCOPY
Set Last_REORG, Last_RUNSTATS, Last_COPY, Last_Load_Replace, Last_Rebuild_Index to NULL Reset REORG, RUNSTATS, COPY statistics to NULL
REBUILD INDEX
Set Last_Rebuild_Index_Timestamp Reset REORG related statistics
Online LOAD Resume
Treated as Inserts
33
Accuracy of the statistics
Always delayed by the timer interval
Controlled by ZPARM STATSINT (default 30 minutes)
Loss all in-memory statistics when DB2 is crashed or STOP DB2 MODE(FORCE) Unable to externalize statistics when DSNRTSDB is stopped or statistics tables are unavailable Need to run REORG, RUNSTATS, COPY to establish a reference point Statistics could be inaccurate if running vendor utilities without flushing the in-memory statistics Only physical space statistics (i.e. Nactives, Space, Extents) are maintained for DSNDB07 and the TEMP databases 34
Guideline for SQL/Utility to access RTS objects
Avoid Timeouts or Deadlocks with RTS manager
Use Uncommitted Read lock isolation when accessing RTS tables Use SHRLEVEL CHANGE when running REORG, RUNSTATS, COPY on the RTS objects
Don't mix RTS objects with other user objects in a utility list operation
If mixed, RTS statistics will not be reset for all objects in the list
For Disaster Recovery
Recover RTS objects after DB2 catalog and directory objects are recovered Explicitly issue START DATABASE(DSNRTSDB) after RTS objects are recovered
35
What is DSNACCOR?
A DB2 stored procedure that accesses the RTS tables And makes IFI calls
to gain -DISPLAY status on DB2 objects
Primary purpose To recommend any DB2 object that requires a:
REORG RUNSTATS IMAGE COPY
New version of DSNACCOR in DB2 9 is named DSNACCOX
36
Historical RTS
There is no historical capability in RTS This can easily be built manually
Create SYSIBM.TABLE/INDEXSPSTATS_HIST LIKE SYSIBM.SYSTABLE/INDEXSPACESTATS and add CAPTURE_TIME AS TIMESTAMP NOT NULL WITH DEFAULT cols Periodically (daily?) insert into history tables with a subselect from the RTS tables those rows that arent already in the history tables; and delete old information. Code this up in a stored proc?
37
Rebinding considerations
Consider the following guidelines regarding when to rebind
CLUSTERRATIOF changes to less or more than 80% (a value of 0.80) NLEAF changes more than 20% from the previous value NLEVELS changes NPAGES changes more than 20% from the previous value NACTIVEF changes more than 20% from the previous value The range of HIGH2KEY to LOW2KEY range changes more than 20% from the range previously recorded Cardinality changes more than 20% from previous range Distribution statistics change the majority of the frequent column values
38
Reorg recommendations
These are generic and do not apply in all cases there is no absolutely reliable statistic as to when reorganization of table spaces or indexes should occur; however, understanding the rules of thumb will help in understanding data disorganization If reorg for performance, then track performance over time DSNACCOR (V7/8) /DSNACCOX (V9) usage
39
Reorg table space (incl. LOBs in V9) recommendations
Consider running REORG TABLESPACE in the following situations:
Real-time statistics (TABLESPACESTATS)
REORGUNCLUSTINS (number of records inserted since the last Reorg that are not wellclustered)/TOTALROWS > 10%
Irrelevant if predominantly random access REORGUNCLUSTINS is only an indication of the insert behavior and is correlated to the cluster ratio only if there are no updates or deletes. To prevent DSNACCOR/X from triggering on these, identify such objects and put them in exception list
(REORGNEARINDREF+REORGFARINDREF (number of overflow rows since the last Reorg))/TOTALROWS > 5% in data sharing, >10% in non-data sharing REORGINSERTS (number of records inserted since the last Reorg)/TOTALROWS > 25% REORGDELETES (number of records deleted since the last Reorg)/TOTALROWS > 25% EXTENTS (number of extents) > 254 REORGDISORGLOB (number of LOBs inserted since the last Reorg that are not perfectly chunked)/TOTALROWS > 50% SPACE > 2 * (DATASIZE / 1024) (when free space is more than used space) REORGMASSDELETE > 0 (mass deletes on seg tsp and DROP on multi-table tsps)
RUNSTATS
PERCDROP > 10% SYSIBM.SYSLOBSTATS.ORGRATIO < 50% (changed to a value 0-100 in PQ96460 on V7/V8) (NEARINDREF + FARINDREF) / CARDF > 10% non-data-sharing, > 5% if data sharing FAROFFPOSF / CARDF > 10%
Or, if index is a clustering index, CLUSTERRATIOF < 90% (irrelevant if predominantly random access)
Other
Tsp is in adv reorg pending status (AREO*) as result of an ALTER TABLE stmnt Index on the tsp is in adv REBUILD pend state (ARBDP) as result an ALTER stmnt
40
Reorg table space (incl. LOBs in V9) recommendations
Consider running REORG TABLESPACE in the following situations:
Real-time statistics (TABLESPACESTATS)
REORGUNCLUSTINS (number of records inserted since the last Reorg that are not wellclustered)/TOTALROWS > 10%
Irrelevant if predominantly random access REORGUNCLUSTINS is only an indication of the insert behavior and is correlated to the cluster ratio only if there are no updates or deletes. To prevent DSNACCOR/X from triggering on these, identify such objects and put them in exception list
(REORGNEARINDREF+REORGFARINDREF (number of overflow rows since the last Reorg))/TOTALROWS > 5% in data sharing, >10% in non-data sharing REORGINSERTS (number of records inserted since the last Reorg)/TOTALROWS > 25% REORGDELETES (number of records deleted since the last Reorg)/TOTALROWS > 25% EXTENTS (number of extents) > 254 REORGDISORGLOB (number of LOBs inserted since the last Reorg that are not perfectly chunked)/TOTALROWS > 50% SPACE > 2 * (DATASIZE / 1024) (when free space is more than used space) REORGMASSDELETE > 0 (mass deletes on seg tsp and DROP on multi-table tsps)
RUNSTATS PERCDROP > 10% Dont use RUNSTATS statistics as to a value 0-100 in PQ96460 on V7/V8) SYSIBM.SYSLOBSTATS.ORGRATIO < 50% (changeda trigger to consider running REORG
(NEARINDREF + FARINDREF) / CARDF > 10% non-data-sharing, > 5% if data sharing FAROFFPOSF / CARDF > 10%
Or, if index is a clustering index, CLUSTERRATIOF < 90% (irrelevant if predominantly random access)
Other
Tsp is in adv reorg pending status (AREO*) as result of an ALTER TABLE stmnt Index on the tsp is in adv REBUILD pend state (ARBDP) as result an ALTER stmnt
41
Reorg table space (incl. LOBs in V9) recommendations
Consider running REORG TABLESPACE in the following situations:
Real-time statistics (TABLESPACESTATS)
REORGUNCLUSTINS (number of records inserted since the last Reorg that are not wellclustered)/TOTALROWS > 10% Irrelevant if predominantly random access REORGUNCLUSTINS is only an indication of the insert behavior and is correlated to the cluster ratio only if there are no updates or deletes. To prevent DSNACCOR/X from triggering on these, identify such objects and put them in exception list (REORGNEARINDREF+REORGFARINDREF (number of overflow rows since the last Reorg))/TOTALROWS > 5% in data sharing, >10% in non-data sharing REORGINSERTS (# of records inserted since the last Reorg)/TOTALROWS > 25% REORGDELETES (# of records deleted since the last Reorg)/TOTALROWS > 25% EXTENTS (number of extents) > 254 REORGDISORGLOB (number of LOBs inserted since the last Reorg that are not perfectly chunked)/TOTALROWS > 50% SPACE > 2 * (DATASIZE / 1024) (when free space is more than used space) REORGMASSDELETE > 0 (mass deletes on seg tsp and DROP on multi-table tsps)
Other
Tsp is in adv reorg pending status (AREO*) as result of an ALTER TABLE stmnt Index on the tsp is in adv REBUILD pend state (ARBDP) as result an ALTER stmnt
42
Reorganizing LOBs in V7 and V8
Generally not recommended
Only possible with SHRLEVEL NONE Small performance gain that can be achieved is outweighed by
Loss of availability Likelihood of increasing the size of the LOB table space
With DB2 9s REORG support of LOBs with SHRLEVEL REFERENCE
Chunkiness (REORGDISORGLOB/TOTALROWS > 50% Space reclamation SPACE > 2 * (DATASIZE / 1024)
43
Reorg index recommendations
Consider running REORG INDEX in the following cases:
Real-time statistics (INDEXSPACESTATS)
REORGPSEUDODELETES (number of index entries pseudo-deleted since the last Reorg)/TOTALENTRIES > 10% in non-data sharing, 5% if data sharing as pseudo-deleted entry can cause S-lock/unlock in Insert for unique index REORGLEAFFAR (number of index leaf page splits since the last Reorg and the new leaf page far from the original leaf page)/NACTIVE > 10% REORGINSERTS ( number of index entries inserted since the last Reorg)/TOTALENTRIES > 25% REORGDELETES ( number of index entries inserted since the last Reorg)/TOTALENTRIES > 25% REORGAPPENDINSERT / TOTALENTRIES > 20% EXTENTS (number of extents) > 254
RUNSTATS
LEAFFAR / NLEAF > 10% (NLEAF is a column in SYSIBM.SYSINDEXES and SYSIBM.SYSINDEXPART) PSEUDO_DEL_ENTRIES / CARDF > 10% for non-data sharing and > 5% for data sharing
Other
The index is in advisory REORG-pending status (AREO*) or advisoryREBUILD-pending status (ARBDP) as the result of an ALTER statement
44
When is RUNSTATS needed?
When the data changes sufficiently to warrant new statistics
REORG of tablespace or index (use inline stats!) LOAD REPLACE of tablespace (use inline stats!) After "significant" application changes for the tablespace or index
Periodically (weekly, monthly) except for read only data? Application tracks updates with activity tables? After percentage of pages changed since last RUNSTATS (RTS)?
Understand implications for access paths! SHRLEVEL
REFERENCE drains writers CHANGE runs like application with ISOLATION (UR)
(claim reader for allocation duration)
45
New/Changed Data Statistics (V8)
SPACEF at the table space level
4096 partitions can hold a lot of data!
HIGHKEY/HIGH2KEY/LOWKEY/LOW2KEY expanded
From CHAR(8) to VARCHAR(2000)
8 bytes not adequate for multi-byte character representations especially with Unicode
Optimizer has better information to estimate filter factors and determine access paths
AVGROWLEN at the table space/partition level
V7 only collected at the table level Useful for estimating current number of rows of table space from file size without having to run RUNSTATS Conversely, can calculate table space size allocation more accurately SYSIBM.SYSTABLESPACE UNLOAD utility space allocation AVGROWLEN SPACEF REORG & LOAD space allocation
work datasets sort space
SYSIBM.SYSTABLEPART_HIST
AVGROWLEN
46
Part level statistics for DPSIs
Statistics are not kept at the partition level for logical partitions of NPIs Data Partitioned Secondary Indexes need to have the same partition independence and capabilities (from a statistics gathering perspective) as classic partitioning indexes. Partition level statistics for DPSIs are stored in SYSCOLDISTSTATS with rollup to SYSCOLDIST Rollup requires SYSCOLDISTSTATS rows to be sorted requiring new parameters
SORTDEVT (defaults to SYSALLDA) SORTNUM
If not specified then SORT will use sort product defaults Can also use FORCEROLLUP to aggregate partition level statistics when not all partitions have statistics
47
Distribution Statistics Enhanced
As queries become
more complex less predictable
Data skew becomes more important Problem with skewed data and regular statistics
Optimizer assumes inaccurate distribution of values Less efficient join sequence could be chosen Less efficient method of accessing individual tables
DSTATS program could be downloaded to collect statistical data for non-indexed columns
Great improvement in access path selection, however Run separate from RUNSTATS Slow with big impact to DB2 work file database
48
Filter factors and catalog statistics
SYSCOLDIST contains frequency (or distribution) If frequency statistics do not exist, DB2 assumes that the data is uniformly distributed For example:
AGE_CATEGORY INFANT CHILD ADOLESCENT ADULT SENIOR FREQUENCY 5% 15% 25% 40% 15%
49
Distribution Statistics Enhanced
Non-uniform distribution statistics on non-indexed columns
Now part of RUNSTATS Significant performance improvement - no impact on DB2 work file and data only has to be scanned once Uses external sort requiring new parameters
SORTDEVT SORTNUM If not specified then SORT will use sort product defaults
Extend non-uniform to collect on index or non-index
most frequent values least frequent values both
As part of this, the previous limit of 10 names in the COLUMN parameter has been removed.
50
Distribution Statistics Enhanced
Changed/new syntax
RUNSTATS INDEX REBUILD, REORG INDEX RUNSTATS TABLESPACE
51
KEYCARD versus Distribution Statistics from an index
State CA CA CA CA CA CA CA TX
KEYCARD collects all of the distinct values in all of the 1 to n key column combinations So these are cardinality statisics across column sets... if we had a 3-column index on State, City, Zipcode: Numcolumns 1 2 3 Card 2 4 6
City San Jose San Jose San Jose San Jose Riverside Riverside Glendora Austin
Zipcode 95123 95110 95141 95141 92504 92504 91741 78732
FREQVAL NUMCOLS 3 collects
Frequency 1/8 = 0.125 1/8 = 0.125 2/8 = 0.25 2/8 = 0.25 1/8 = 0.125 1/8 = 0.125
52
Colvalue CA, San Jose, 95123 CA, San Jose, 95110 CA, San Jose, 95141 CA, Riverside, 92504 CA, Glendora, 91741 TX, Austin, 78732
Distribution Statistics Enhanced
Example: Collect distribution statistics for specific columns in a table space and retrieve the most and least frequently occurring values. Collect statistics for the columns EMPLEVEL, EMPGRADE, and EMPSALARY and use the FREQVAL and COUNT keywords to collect the 10 most frequently occurring values for each column and the 10 least frequently occurring values for each column. RUNSTATS TABLESPACE DSN8D81A.DSN8S81E TABLE(DSN8810.DEPT) COLGROUP(EMPLEVEL,EMPGRADE,EMPSALARY) FREQVAL COUNT 10 BOTH
53
Distribution Statistics Enhanced
Example: Collect distribution statistics for specific columns in a table space and retrieve the most and least frequently occurring values. Collect statistics for the columns EMPLEVEL, EMPGRADE, and EMPSALARY and use the FREQVAL and COUNT keywords to collect the 10 most frequently occurring values for each column and the 10 least frequently occurring values for each column. RUNSTATS TABLESPACE DSN8D81A.DSN8S81E TABLE(DSN8810.DEPT) COLGROUP(EMPLEVEL,EMPGRADE,EMPSALARY) FREQVAL COUNT 10 BOTH
Not currently collected via in-line statistics from LOAD and REORG
54
HISTORY statistics without updating main statistics
V7 required update of main catalog statistics if history statistics were wanted V8 relaxes this and history statistics can now be kept without updating current statistics.
Monitor statistics such as SYSTABLES.CARDF No surprises for dynamic SQL access paths CAUTION: If you use this you have to be remember that your static packages bound in that time frame may not have used the statistics in the history tables.
For example,
in V7 UPDATE NONE HISTORY OPTIMIZER was prohibited. in V8 UPDATE NONE HISTORY OPTIMIZER is allowed and you can monitor statistics changes over time without concern that access paths may change.
55
Flushing the dynamic statement cache
RUNSTATS with UPDATE NONE REPORT NO Any statement in the Dynamic Statement Cache which is dependent on the affected table space or index space will be removed from the cache. Why? If users manually update the statistics in the catalog tables, the related dynamic SQL in the cache needs to be invalidated and the next prepare of the statements will cause the access paths to be reevaluated. Granularity is at the table space/index level (not the table level)
56
What statistics should I gather?
No simple answer
Some collect no or insufficient statistics
Prime reason for poor performing access paths
Do you want to collect statistics on every column and permutations of combination of columns?
No way!
Requires similar analysis of SQL as for index design
Have to include columns which you may not benefit from adding to an index Analysis of queries labor intensive Iterative process analyzing explain data (as always)
57
Input SQL, Click start
58
Suggestions for one Siebel query
Click here to run thats it!
59
Statistics Advisor Current Status
Statistics Advisor is integrated with VE now as a no-charge item Used as a serviceability tool
Service team use prototype on real problems Demonstrates research of automation of query analysis
Identifying, addressing areas of improvement
Move forward from prototype status
60
DB2 V9 for z/OS Changes to RUNSTATS
New histogram statistics
Think of these as frequency distribution statistics on a range of data Ideal for numeric, date, and time data types
10
CPU reduction for RUNSTATS INDEX: 30-40%
61
Summary
Why RUNSTATS? Invoking RUNSTATS Commonly asked questions (about the stats) Real-time Statistics Rebinding considerations Reorg recommendations When is RUNSTATS needed? New/changed data statistics New/changed index statistics Handling part level statistics for DPSIs Distribution Statistics Enhanced HISTORY statistics changes Flushing the dynamic statement cache What statistics should I gather?
62
References
DB2 UDB for z/OS home page
http://www.software.ibm.com/data/db2/os390/
utilities@work
http://www.ibm.com/software/data/db2imstools/details/html/us_text.html
The IDUG Solutions Journal March 1999 - Volume 6, Number 1
Improving DB2 for OS/390 Query Performance with DSTATS By Steve Bower http://www.idug.org/neo_apps/cfmfiles/mainnavbar.cfm?body=/journal/index.html
DB2 UDB for z/OS and OS/390 Version 7 Performance Topics, SG24-6129 DB2 UDB for z/OS and OS/390 Version 7: Using the Utilities Suite, SG246289 DB2 UDB for z/OS Version 8 What's New
http://www-3.ibm.com/software/data/db2/os390/v8/dsnwnj1.pdf
DB2 UDB for z/OS Version 8 Administration Guide DB2 UDB for z/OS Version 8 Utilities Guide and Reference
63
DB2 UDB for z/OS information resources
Take advantage of the following information resources available for Information center DB2 UDB for z/OS:
http://publib.boulder.ibm.com/infocenter/dzichelp/index.jsp
Information roadmap
http://ibm.com/software/db2zos/roadmap.html
DB2 UDB for z/OS library page
http://ibm.com/software/db2zos/library.html
Examples trading post
http://ibm.com/software/db2zos/exHome.html
DB2 for z/OS support
http://ibm.com/software/db2zos/support.html
Official Introduction to DB2 for z/OS
http://ibm.com/software/data/education/bookstore
64
Disclaimers & Trademarks*
Information in this presentation about IBM's future plans reflect current thinking and is subject to change at IBM's business discretion. You should not rely on such information to make business plans. Any discussion of OEM products is based upon information which has been publicly available and is subject to change. The following terms are trademarks or registered trademarks of the IBM Corporation in the United States and/or other countries: AIX, AS/400, DATABASE 2, DB2, OS/390, OS/400, ES/9000, MVS/ESA, Netfinity, RISC, RISC SYSTEM/6000, SYSTEM/390, SQL/DS, VM/ESA, IBM, Lotus, NOTES. The following terms are trademarks or registered trademarks of the MICROSOFT Corporation in the United States and/or other countries: MICROSOFT, WINDOWS, ODBC
65
IBM DB2 RUNSTATS Utility and Real-Time Statistics
Bryan F. Smith
IBM [email protected]
66