IBM DS8900F Performance Best Practices and Monitoring
IBM DS8900F Performance Best Practices and Monitoring
IBM DS8900F
Performance Best Practices
and Monitoring
Peter Kimmel
Sherri Brunson
Lisa Martinez
Luiz Moreira
Ewerson Palacio
Rick Pekosh
Ali Rizvi
Paul Smith
Redbooks
IBM Redbooks
December 2021
SG24-8501-00
Note: Before using this information and the product it supports, read the information in “Notices” on
page xi.
This edition applies to Version 9, Release 2, of the DS8000 family of storage systems.
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Contents vii
7.8 End-to-end analysis of I/O performance problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
7.9 Performance analysis examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
7.9.1 Example 1: Array bottleneck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
7.9.2 Example 2: Hardware connectivity part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
7.9.3 Example 3: Hardware connectivity part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
7.9.4 Example 4: Port bottleneck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
7.10 IBM Storage Insights Pro in mixed environments . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Contents ix
x IBM DS8900F Performance Best Practices and Monitoring
Notices
This information was developed for products and services offered in the US. This material might be available
from IBM in other languages. However, you may be required to own a copy of the product or product version in
that language in order to access it.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user’s responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not grant you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, MD-NC119, Armonk, NY 10504-1785, US
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you provide in any way it believes appropriate without
incurring any obligation to you.
The performance data and client examples cited are presented for illustrative purposes only. Actual
performance results may vary depending on specific configurations and operating conditions.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
Statements regarding IBM’s future direction or intent are subject to change or withdrawal without notice, and
represent goals and objectives only.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to actual people or business enterprises is entirely
coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are
provided “AS IS”, without warranty of any kind. IBM shall not be liable for any damages arising out of your use
of the sample programs.
The following terms are trademarks or registered trademarks of International Business Machines Corporation,
and might also be trademarks or registered trademarks in other countries.
AIX® IBM® PowerPC®
CICS® IBM Cloud® Redbooks®
Cognos® IBM FlashSystem® Redbooks (logo) ®
Db2® IBM Research® Storwize®
DB2® IBM Spectrum® System z®
DS8000® IBM Z® WebSphere®
Easy Tier® IBM z Systems® z Systems®
Enterprise Storage Server® IBM z14® z/Architecture®
FICON® Interconnect® z/OS®
FlashCopy® Parallel Sysplex® z/VM®
GDPS® POWER® z/VSE®
Global Technology Services® POWER8® z13®
HyperSwap® POWER9™ z15™
The registered trademark Linux® is used pursuant to a sublicense from the Linux Foundation, the exclusive
licensee of Linus Torvalds, owner of the mark on a worldwide basis.
Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States,
other countries, or both.
Java, and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its
affiliates.
Red Hat, are trademarks or registered trademarks of Red Hat, Inc. or its subsidiaries in the United States and
other countries.
UNIX is a registered trademark of The Open Group in the United States and other countries.
VMware, and the VMware logo are registered trademarks or trademarks of VMware, Inc. or its subsidiaries in
the United States and/or other jurisdictions.
Other company, product, or service names may be trademarks or service marks of others.
This IBM® Redbooks® publication is intended for individuals who want to maximize the
performance of their DS8900 storage systems and investigate the planning and monitoring
tools that are available.
Performance: Any sample performance measurement data that is provided in this book is
for comparative purposes only. The data was collected in controlled laboratory
environments at a specific point by using the configurations, hardware, and firmware levels
that were available then. The performance in real-world environments can vary. Actual
throughput or performance that any user experiences also varies depending on
considerations, such as the I/O access methods in the user’s job, the I/O configuration, the
storage configuration, and the workload processed. The data is intended only to help
illustrate how different hardware technologies behave in relation to each other. Contact
your IBM representative or IBM Business Partner if you have questions about the expected
performance capability of IBM products in your environment.
Authors
This book was produced by a team of specialists from around the world:
Peter Kimmel is an IT Specialist and Advanced Technical Skills team lead of the Enterprise
Storage Solutions team at the ESCC in Frankfurt, Germany. He joined IBM Storage in 1999,
and since then has worked with various DS8000® generations, with a focus on architecture
and performance. Peter co-authored several DS8000 IBM publications. He holds a Diploma
(MSc) degree in physics from the University of Kaiserslautern.
Sherri Brunson joined IBM in March of 1985 and worked as a large-system IBM service
representative before becoming a Top Gun in 1990. Sherry is a Top Gun in the Eastern US
for all storage products, Power Systems servers, and IBM Z®. She has supported and
implemented DS8000 and scaled-out network appliance storage products globally, and
developed and taught educational classes. She also has taught IBM Z classes in the
United States.
Lisa Martinez has been working as a client technical specialist within the Advanced
Technology Group since January 2012. Her primary focus has been with pre-sales support
for DS8000 and IBM Copy Services Manager. Her experience includes roles as a storage
architect in the Specialty Services Area in IBM Global Technology Services® (IBM GTS)
and a test architect in disk storage. Lisa holds degrees in computer science from New
Mexico Highlands University and electrical engineering from the University of New Mexico.
Rick Pekosh is a Certified IT Specialist working in the IBM Advanced Technology Group
as a DS8000 subject matter expert specializing in High Availability, Disaster Recovery, and
Logical Corruption Protection solutions. He works with customers, IBM Business Partners,
and IBMers in North America. Rick began working with the DS8000 in early 2005 while
working as a technical sales specialist and functioning as a regional designated specialist.
He joined IBM in 2001 after spending 20 years in application development in various roles
for the Bell System and as a consultant. Rick earned a BS degree in Computer Science
from Northern Illinois University; and an MBA from DePaul University.
Paul Smith joined IBM in August 1994 and worked as a Mid-Range system and storage
IBM service representative before joining Lab Services in the United Kingdom.
Paul has supported the IBM storage portfolio including DS8000 from inept through to SAN
Volume Controller, Storwize® V7000 and IBM FlashSystem® 9200 along with associated
SAN technology from Cisco and Brocade. He is also experienced with data migration of
storage systems along with design and providing education classes to colleagues and
clients.
Thanks to the authors of the previous editions of the DS8000 performance Redbooks.
Authors of DS8800 Performance Monitoring and Tuning, SG24-8013, published in 2012,
were:
Gero Schmidt, Bertrand Dufrasne, Jana Jamsek, Peter Kimmel, Hiroaki Matsuno, Flavio
Morais, Lindsay Oxenham, Antonio Rainero, Denis Senin
Authors of IBM System Storage DS8000 Performance Monitoring and Tuning,
SG24-8318, published in 2016, were:
Axel Westphal, Bert Dufrasne, Wilhelm Gardt, Jana Jamsek, Peter Kimmel, Flavio Morais,
Paulus Usong, Alexander Warmuth, Kenta Yuge.
Find out more about the residency program, browse the residency index, and apply online at:
[Link]/redbooks/[Link]
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
[Link]/redbooks
Send your comments in an email to:
redbooks@[Link]
Mail your comments to:
IBM Corporation, IBM Redbooks
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
Preface xv
xvi IBM DS8900F Performance Best Practices and Monitoring
Part 1
Data continually flows from one component to another within a storage server. The objective
of server design is hardware that has sufficient throughput to keep data flowing smoothly
without waiting because a particular component is busy. When data stops flowing because a
component is busy, a bottleneck forms. Obviously, it is preferable to minimize the frequency
and severity of bottlenecks.
The ideal storage server is one in which all components are used effectively and efficiently
with minimal bottlenecks. This scenario is the case if the following conditions are met:
The storage system is designed well, with all hardware components in balance. To provide
balance over a range of workloads, a storage server must allow a range of hardware
component options.
The storage system is appropriately sized for the client workload. Where options exist, the
correct quantities of each option are chosen to maximize price/performance.
The storage system is set up well. Where options exist in hardware installation and logical
configuration, these options are chosen correctly.
Easy Tier® automatic rebalancing and tiering options can help achieve and maintain optimum
performance even in an environment of ever-changing workload patterns, but they cannot
replace a correct sizing of the storage system.
Throughput numbers are achieved in controlled tests that push as much data as possible
through the storage server as a single component. At the point of maximum throughput, the
system is so overloaded that response times are greatly extended. Trying to achieve such
throughput numbers in a normal business environment brings protests from the users of the
system because response times are poor.
To assure yourself that the DS8900F family offers the fastest technology, consider the
performance numbers for the individual flash drives, host adapters, and other components of
the various models, and for the storage system as a whole. The DS8900F family uses the
most current technology available and using a rigorous approach when planning the DS8000
hardware configuration to meet the requirements of a specific environment.
For more information about StorM, see Chapter 6.1, “IBM Storage Modeller” on page 126.
Spreading the workload maximizes the usage and performance of the storage server as a
whole. Isolating a workload is a way to maximize the workload performance, making the
workload run as fast as possible. Automatic I/O privatization can help avoid a situation in
which less-important workloads dominate the mission-critical workloads in shared
environments, and allow more shared environments.
If you expect growing loads, for example, when replacing one system with a new one that also
has a much bigger capacity, add some contingency for this amount of foreseeable I/O growth.
This means there is a wide range of possible hardware configurations available to fit a
multitude of performance requirements in both workload type and size.
The DS8900Fs include and share many common features that improve performance
including:
Dual-clustered POWER9 servers with symmetrical multiprocessor (SMP) architecture
Multi-threaded by design, simultaneous multithreading (SMT)
Intelligent caching algorithms developed by IBM Research®
PCIe 16 Gb/sec and 32 Gb/sec Fibre Channel/FICON® Host Adapters
High-bandwidth, fault-tolerant internal interconnections with a switched PCIe Gen3
architecture
High Performance Flash Enclosures (HPFE) Gen2
Three distinct tiers of flash media
Easy Tier which is a set of built-in advanced functions that autonomically rebalance
backend resources to reduce or eliminate hot spots
In addition to this list of hardware and microcode based features, the following sections
provide additional on those and other performance-enhancing features not listed above.
Table 1-1 provides a high level overview of the DS8900F models for comparison. This is only
a high level overview and comparison. Note that all three are available with either single
phase or 3-phase power.
IBM POWER9 CPCs MTM 9009-22A MTM 9009-22A MTM 9009-42A MTM 9009-42A
System memory (Min / Max) 192 GB / 512 GB 192 GB / 512 GB 512 GB / 3.4 TB 4.3 TB / 4.3 TB
The following sections provide a short description of the main hardware components.
The IBM POWER® processor architecture offers superior performance and availability
features, compared to other conventional processor architectures on the market. POWER7+
allowed up to four intelligent simultaneous threads per core, which is twice of what larger x86
processors have. Like the DS8880s, which were based on POWER8® architecture, the
DS8900Fs allow up to eight simultaneous threads per core.
Typically, DS8900Fs are configured with just one or two flash tiers. Depending on the
particular workload, that may mean a single tier of high performance flash or high capacity
flash or a combination of the two. That flexibility is a testament to the overall performance
capabilities of DS8900Fs and the effectiveness of Easy Tier which autonomically rebalances
backend resources by eliminating or greatly reducing hot spots.
For example, a workload with extremely high transaction rates, that demands the lowest
latency possible, may require a single tier of high performance flash; a workload with high
capacity requirements and low I/O rates may be satisfied with a single tier of high capacity
flash; or, for smaller workloads, suitable for a DS8910F, a combination of 800 GB HPF and
1.92 TB HCF may be appropriate and cost effective; or, for some very large workloads,
suitable for a DS8950F, a combination of 3.2 TB HPF and 7.68 TB HCF may be appropriate
and cost effective. The point is, the configuration to consider is workload dependent and the
goal is to align it with price for performance.
Host adapters
Host Adapters serve two purposes. One, they provide connectivity between the host server
and the DS8000 storage server for I/O (input/output or reads and writes). Two, they provide
connectivity between DS8000s to carry synchronous (Metro Mirror) and asynchronous
(Global Mirror and Global Copy) replication traffic.
The DS8900F models offer enhanced connectivity with the availability of both 16 Gb/sec and
32 Gb/sec four-port Fibre Channel/IBM FICON Host Adapters. Each adapter can
auto-negotiate two generations back. That is the fiber channel standard for backward
compatibility and is known as N-2 compatibility. The 16 Gb/sec adapter can auto-negotiate to
8 Gb/sec and 4 Gb/sec. Similarly, the 32 Gb/sec adapter can auto-negotiate to 16 Gb/sec and
8 Gb/sec. Both are available with either short wave or long wave Small Form-factor Pluggable
(SFP). With this flexibility, you can benefit from the higher performance while maintaining
compatibility with existing host processors and SAN infrastructures. Also, you can configure
the ports individually which enables Fibre Channel Protocol (FCP) and FICON intermix on the
same adapter. That enables the potential for very cost effective and efficient use of host
adapter resources to meet both performance and availability requirements.
AMP provides a provable, optimal sequential read performance and maximizes the sequential
read throughputs of all RAID arrays where it is used, and therefore of the system.
SARC, AMP, and IWC play complementary roles. Although SARC is carefully dividing the
cache between the RANDOM and the SEQ lists to maximize the overall hit ratio, AMP is
managing the contents of the SEQ list to maximize the throughput that is obtained for the
sequential workloads. IWC manages the write cache and decides the order and rate to
destage to the flash media.
Note: Without getting too deep into technical details, this note is a brief description of the
backend resources that Easy Tier monitors and optimizes. Flash drivesets come in groups
of sixteen like type drives with eight flash drives comprising an array or rank. That means
for every flash driveset there are two ranks. Ranks are grouped into storage pools during
logical configuration. Volumes are carved up from one or more “extents”, which are internal
units of storage, within a particular storage pool. For both Count Key Data (CKD) and Fixed
Block (FB) based operating systems, the DS8000 offers a choice of using all small or all
large extents within a given storage pool. (Our recommendation, with few exceptions, is to
configure storage pools using small extents, rather than large extents, because they
provide far more granularity; therefore, small extents enable Easy Tier to work more
efficiently and effectively.) Flash drivesets physically reside in High-
Performance Flash Enclosures which are connected to the CPCs (POWER9 servers) via
Device Adapter (DAs) pairs. So, the backend resources that Easy Tier is monitoring and
optimizing are the ranks and device adapters.
For a deep dive on the DS8000’s internal virtualization, see “IBM DS8900F Architecture
and Implementation Release 9.2” at:
[Link]
While there are likely noticeable or predictable patterns over time (across a day, week, or
month), different volumes, or portions of them, typically experience different periods of
extremely busy activity while at other times they may be idle. Some volumes, or portions of
them, may rarely be busy. All of that is common and normal. Easy Tier is constantly
monitoring and evaluating backend activity to decide what to move and whether to move it
based on a cost/benefit analysis. It maintains statistics on backend activity collected on
five-minute intervals and kept for seven days with the most recent twenty-four hours being the
most important and the last or oldest twenty-four the least. Easy Tier collects statistics on
large block I/Os, small block I/Os, reads & writes, latency (response time), I/O rates, and
amount of data transferred. As a new day’s statistics are collected the oldest day’s statistics
roll off. Hot extents are those that are overly busy compared with other extents in the same
storage pool; cold extents are those will little to no activity. So, hot extents are eligible for
promotion and cold extents for demotion.
Easy Tier is able to move extents between storage tiers within a storage pool with two or three
tiers; that is known as inter-tier rebalancing. Think of this as vertical movement between tiers.
Extents only move between adjacent tiers; there is no double promotion nor double demotion
in three-tier storage pools. Easy Tier builds the inter-tier heatmap once approximately every
Easy Tier has two operating modes, automatic mode and manual mode. In automatic mode,
Easy Tier promotes and demotes extents (sub volume) between different flash tiers in the
same storage pool (inter-tier rebalancing) for multi-tier storage pools; performs intra-tier
rebalancing within a given tier in a storage pool; and, rebalances extents when ranks are
added to a storage pool. Manual mode enables things like dynamic volume relocation from
like type storage pool to another; dynamic storage pool merge of two like type storage pools;
rank depopulation; and restripe a volume within the same pool.
Note: This discussion on Easy Tier is not meant to be exhaustive. For additional details on
Easy Tier see, IBM DS8000 Easy Tier, REDP-4667 at:
[Link]
Summary
Easy Tier is a built-in advanced function designed to help optimize performance with minimal
to no involvement of a storage administrator. The DS8000s are really smart storage systems
and our recommendation is let them do what they are designed to do:
Leave Easy Tier automatic mode running 24 x 7 x 365. The only exception to that is if you
want to pause and resume Easy Tier learning before and after, respectively, an unusual
workload runs.
With few exceptions, configure storage pools using small extents, rather than large
extents, because they provide far more granularity and enable Easy Tier to work more
efficiently and effectively.
Implement the Heat Map Transfer Utility to propagate the primary’s heatmap to the other
DS8000 storage systems in the solution.
Monitor Easy Tier via the built-in reporting integrated into the GUI.
Understanding the hardware components, the functions they perform, and the DS8000
technology as a whole can help you select the appropriate DS8900F model and quantities of
each component for the target workload(s). However, do not focus too much on any one
component. Instead, focus on the storage system as a whole because they are designed to
work as fully integrated systems. The ultimate criteria for storage system performance are
response times (latency) and the total throughput (MBps or GBps).
With few exceptions, most of those components can be scaled independently from the others
which enables very flexible custom configurations to address very specific performance
requirements. For example, a DS8910F Flexibility Class, configured with sixteen POWER9
cores, 192 GB of system memory, and all 1.92 TB High Capacity Flash drives, may be
appropriate for a moderately intensive workload; a DS8980F Analytic Class, which includes
forty-four POWER9 cores and 4.3 TB of system memory, configured with the smallest
available High Performance Flash drives (800 GB), may be required for a very I/O intensive
workload requiring the lowest latency; or, a DS8950F Agility Class, configured with forty
POWER9 cores, 1024 GB of system memory, and a mix of HPF and HCF may be appropriate
other workloads.
The point is the DS8900F family offers configuration flexibility across and within the models to
address widely varying customer performance and capacity needs. The best way to
determine which DS8900F model and specific configuration is most appropriate for a given
workload is to create a performance model of the actual workload. The performance modeling
tool we use is StorM. To learn more about performance modeling and StorM see “IBM
Storage Modeller” on page 126.
The pair of CPCs in the two DS8910F models are IBM Power System 9009-22A servers; the
DS8950F and DS8980F each utilize a pair of 9009-42A servers. The number of cores in
those CPCs and available system memory options are directly related and detailed below. On
all DS8900F models, each central processor complex has its own dedicated system memory.
Within each processor complex, the system memory is divided into:
Memory used for the DS8000 control program
Cache
Non-Volatile Storage (NVS) or persistent memory
The DS8900F models and their respective number of processor cores and system memory
options are shown below. The amount allocated as NVS or persistent memory scales
according to the system memory configured:
DS8910F - Flexibility Class (models 993 & 994)
– 16 POWER9 cores with system memory options:
• 192 GB (NVS memory 8 GB) or
• 512 GB (NVS memory 32 GB)
DS8950F - Agility Class (model 996)
– 20 POWER9 cores with 512 GB system memory (NVS memory 32 GB)
– 40 POWER9 cores with system memory options:
• 1,024 GB (NVS memory 64 GB) or
• 2,048 GB (NVS memory 128 GB) or
• 3.4 TB (NVS memory 128 GB)
DS8980F - Analytic Class (model 998)
– 44 POWER9 cores with 4.3 TB system memory (NVS memory 128 GB)
The DS8900F can be equipped with up to 4.3 TB of system memory, most of which is
configured as cache. With its POWER9 processors, the server architecture of the DS8900F
makes it possible to manage such large caches with small cache segments of 4 KB (and
large segment tables). The POWER9 processors have the power to support these
sophisticated caching algorithms, which contribute to the outstanding performance that is
available in the DS8900F.
These algorithms and the small cache segment size optimize cache hits and cache utilization.
Cache hits are also optimized for different workloads, such as sequential workloads and
transaction-oriented random workloads, which might be active at the same time. Therefore,
the DS8900F provides excellent I/O response times.
Write data is always protected by maintaining a second copy of cached write data in the NVS
of the other internal server until the data is destaged to the flash media.
The DS8900F series cache is organized in 4-KB pages that are called cache pages or slots.
This unit of allocation (which is smaller than the values that are used in other storage
systems) ensures that small I/Os do not waste cache memory. The decision to copy data into
the DS8900F cache can be triggered from the following policies:
Demand paging
Eight disk blocks (a 4 K cache page) are brought in only on a cache miss. Demand paging
is always active for all volumes and ensures that I/O patterns with locality discover at least
recently used (LRU) data in the cache.
Prefetching
Data is copied into the cache speculatively even before it is requested. To prefetch, a
prediction of likely data accesses is needed. Because effective, sophisticated prediction
schemes need an extensive history of page accesses (which is not feasible in real
systems), SARC uses prefetching for sequential workloads. Sequential access patterns
naturally arise in video-on-demand, database scans, copy, backup, and recovery. The goal
of sequential prefetching is to detect sequential access and effectively prefetch the likely
cache data to minimize cache misses. Today, prefetching is ubiquitously applied in web
servers and clients, databases, file servers, on-disk caches, and multimedia servers.
In this manner, the DS8900F monitors application read I/O patterns and dynamically
determines whether it is optimal to stage into cache the following I/O elements:
Only the page requested.
The page that is requested plus the remaining data on the disk track.
An entire disk track (or a set of disk tracks) that was not requested.
The decision of when and what to prefetch is made in accordance with the Adaptive
Multi-stream Prefetching (AMP) algorithm. This algorithm dynamically adapts the number and
timing of prefetches optimally on a per-application basis (rather than a system-wide basis).
For more information about AMP, see “Adaptive Multi-stream Prefetching” on page 19.
To decide which pages are evicted when the cache is full, sequential and random
(non-sequential) data is separated into separate lists. The SARC algorithm for random and
sequential data is shown in Figure 2-2.
RANDOM SEQ
MRU MRU
Desired size
SEQ bottom
LRU
RANDOM bottom
LRU
A page that was brought into the cache by simple demand paging is added to the head of the
Most Recently Used (MRU) section of the RANDOM list. Without further I/O access, it goes
down to the bottom of LRU section. A page that was brought into the cache by a sequential
access or by sequential prefetching is added to the head of MRU of the SEQ list and then
goes in that list. Other rules control the migration of pages between the lists so that the
system does not keep the same pages in memory twice.
In the DS8900F, AMP, an algorithm that was developed by IBM Research, manages the SEQ
list. AMP is an autonomic, workload-responsive, and self-optimizing prefetching technology
that adapts the amount of prefetch and the timing of prefetch on a per-application basis to
By carefully choosing the prefetching parameters, AMP provides optimal sequential read
performance and maximizes the aggregate sequential read throughput of the system. The
amount that is prefetched for each stream is dynamically adapted according to the
application’s needs and the space that is available in the SEQ list. The timing of the
prefetches is also continuously adapted for each stream to avoid misses and any cache
pollution.
SARC and AMP play complementary roles. SARC carefully divides the cache between the
RANDOM and the SEQ lists to maximize the overall hit ratio. AMP manages the contents of
the SEQ list to maximize the throughput that is obtained for the sequential workloads. SARC
affects cases that involve both random and sequential workloads. However, AMP helps any
workload that has a sequential read component, including pure sequential read workloads.
AMP dramatically improves performance for common sequential and batch processing
workloads. It also provides excellent performance synergy with Db2 by preventing table scans
from being I/O-bound and improves performance of index scans and Db2 utilities, such as
Copy and Recover. Furthermore, AMP reduces the potential for array hot spots that result
from extreme sequential workload demands
The CLOCK algorithm uses temporal ordering. It keeps a circular list of pages in memory,
with the clock hand that points to the oldest page in the list. When a page must be inserted in
the cache, then an R (recency) bit is inspected at the clock hand’s location. If R is zero, the
new page is put in place of the page the clock hand points to and R is set to 1. Otherwise, the
R bit is cleared and set to zero. Then, the clock hand moves one step clockwise forward and
the process is repeated until a page is replaced.
The CSCAN algorithm uses spatial ordering. The CSCAN algorithm is the circular variation of
the SCAN algorithm. The SCAN algorithm tries to minimize the disk head movement when
the disk head services read and write requests. It maintains a sorted list of pending requests
with the position on the drive of the request. Requests are processed in the current direction
of the disk head until it reaches the edge of the disk. At that point, the direction changes. In
the CSCAN algorithm, the requests are always served in the same direction. After the head
arrives at the outer edge of the disk, it returns to the beginning of the disk and services the
new requests in this one direction only. This process results in more equal performance for all
head positions.
The basic idea of IWC is to maintain a sorted list of write groups, as in the CSCAN algorithm.
The smallest and the highest write groups are joined, forming a circular queue. The new idea
is to maintain a recency bit for each write group, as in the CLOCK algorithm. A write group is
always inserted in its correct sorted position and the recency bit is set to zero at the
beginning. When a write hit occurs, the recency bit is set to one.
In the DS8900F implementation, an IWC list is maintained for each rank. The dynamically
adapted size of each IWC list is based on workload intensity on each rank. The rate of
destage is proportional to the portion of NVS that is occupied by an IWC list. The NVS is
shared across all ranks in a cluster. Furthermore, destages are smoothed out so that write
bursts are not converted into destage bursts.
Another enhancement to IWC is an update to the cache algorithm that increases the
residency time of data in NVS. This improvement focuses on maximizing throughput with
excellent average response time (latency).
In summary, IWC has better or comparable peak throughput to the best of CSCAN and
CLOCK across a wide variety of write cache sizes and workload configurations. In addition,
even at lower throughputs, IWC has lower average response times than CSCAN and CLOCK.
The DS8900F models offer both 16 Gbps and 32 Gbps four-port Fibre Channel/IBM FICON
Host Adapters. Each adapter can auto-negotiate up to two generations back. That is the fiber
channel standard for backwards compatibility and is known as N - 2 compatibility. The
16 Gbps adapter can auto-negotiate down to 8 Gbps and 4 Gbps. Similarly, the 32 Gbps
adapter can auto negotiate down to 16 Gbps and 8 Gbps. Both HAs are available with either
all short wave (SW) or all long wave (LW) SFPs. An individual HA does not offer an intermix of
SW and LW nor an intermix of 16 Gbps and 32 Gbps ports. But, a DS8900F can have an
intermix of SW and LW HAs running at either 16 Gbps or 32 Gbps.
With flexibility like that, you can benefit from the higher performance while maintaining
compatibility with existing host processor and SAN infrastructures. Additionally, you can
configure ports individually (and dynamically) which enables FICON and FCP intermix on the
same adapter. It is also possible to scale-up host adapter connectivity within the first frame
(all DS8900F models) or into a second frame (models DS8950F and DS8980F) without
For more information on DS8000 Host Adapter best practices, see “DS8000 Host Adapter
Configuration Guidelines”, created and maintained by IBM’s Advanced Technology Group. It
is available at:
[Link]
2.4 zHyperLinks
zHyperLink is a point-to-point optical connection designed to provide ultra-low response times
(latency) between IBM Z servers (z14 and later) and DS8900F (and DS8880) storage
systems. It works alongside the FICON infrastructure whether its direct point-to-point
connectivity or via a storage area network (SAN); zHyperLink does not replace FICON
connectivity. It reduces I/O latency, for selected eligible I/Os, to the point that z/OS no longer
needs to undispatch the running task. zHyperLink enables response times (latency) as low a
~20 µ seconds (micro seconds) and can be enabled for both read and write I/Os.
zHyperLinks are typically added in pairs for redundancy and have a point-to-point distance
limitation of 150 m (492 feet) between the System Z and DS8000 storage system. The shorter
the route the lower the latency added by the physical infrastructure. It requires specific
zHyperlink adapters on System Z, zHyperLink cabling, and zHyperLink adapters on the
DS8900F (and DS8880). For two-frame DS8900F (and DS8880) storage systems we
recommend spreading the zHyperLinks evenly, or as evenly as possible, across the frames.
zHyperLink is one of the many Z Synergy features on the DS8900Fs. For more information
about zHyperLink, see IBM DS8000 and IBM Z Synergy, REDP-5186 at:
[Link]
The DS8900F models, like the predecessor DS8880 all flash array models, can be configured
with custom flash drive placement (custom placement) to scale the “back-end” or internal
bandwidth. There are some workloads that would benefit from more Device Adapters (which
provide the internal bandwidth), but do not require additional storage capacity. In other words,
it is not necessary to add additional flash drivesets in order to add more device adapters. Two
examples of custom placement are shown below to introduce the concept. More elaborate
configurations are possible.
The first example shows a DS8910F, on the left, configured with 2 x 3.84 TB HCF Tier 1 flash
drivesets installed in 1 x HPFE. The DS8910F, on the right in the same example, is configured
with 4 x 1.92 TB HCF Tier 2 flash drivesets, spread evenly across 2 x HPFEs via custom
placement. For reference see Figure 2-3 on page 24.
Both configurations have the same usable capacity (~73.35 TiBs). The differences are noted
on the diagram and restated here. The DS8910F on the left has flash drives with a higher
performance profile (HCF Tier 1). The DS8910F on the right has flash drives with a lower
performance profile (HCF Tier 2), but twice the internal bandwidth due to the 2 x HPFEs
which add a second set of device adapters. Both may be suitable for the same workload. In
some cases the higher performing drives may be required; in other cases the additional
internal bandwidth may be the deciding factor. Or, physical space limitations may ultimately
be the determining factor which is to say, in this example, the configuration on the left may be
required.
The second example shows a DS8950F, on the left, configured with 6 x 3.84 TB HCF Tier 1
flash drivesets installed in 2 x HPFEs. On the right in this example, the DS8950F is also
configured with 6 x 3.84 TB HCF Tier 1 flash drivesets, but they are spread evenly across 3 x
HPFEs via custom placement. That provides 50% additional internal bandwidth due to an
additional pair of device adapters. For reference see Figure 2-4 on page 25.
The two DS8950Fs are configured with the same number of 3.84 TB HCF Tier 1 flash
drivesets, but there is a slight difference in usable capacity. As noted on the diagram, the
DS8950F on the left has ~227 Tebibyte (TiB) usable capacity, but the DS8950F on the right
has ~220 TiB because it has 2 x additional spare drives (see blue disk icons labeled with an
S). Both may be capable of delivering the same response time (latency) for a particular
workload. If there is a significant sequential workload, requiring more internal bandwidth, then
the configuration on the right may be more appropriate.
Device Adapters and Flash Tiers are discussed in the sections that follow immediately below.
For more information about HPFEs, see IBM DS8000 High-Performance Flash Enclosure
Gen2, REDP-5422 at:
[Link]
The goal is to configure a storage system that meets the performance and capacity
requirements effectively and efficiently. In other words, achieve a balanced price for
performance configuration. The best way to determine the appropriate flash tier or tiers is to
create a performance model of the actual workload. For details on performance modeling,
see “IBM Storage Modeller” on page 126.
After you determine your throughput workload requirements, you must choose the
appropriate number of connections to put between your Open Systems hosts and the
DS8000 storage system to sustain this throughput. Use an appropriate number of HA cards
to satisfy high throughput demands. The number of host connections per host system is
primarily determined by the required bandwidth.
Host connections frequently go through various external connections between the server and
the DS8000 storage system. Therefore, you need enough host connections for each server
so that if half of the connections fail, processing can continue at the level before the failure.
This availability-oriented approach requires that each connection carry only half the data
traffic that it otherwise might carry. These multiple lightly loaded connections also help to
minimize the instances when spikes in activity might cause bottlenecks at the HA or port. A
multiple-path environment requires at least two connections. Four connections are typical,
and eight connections are not unusual. Typically, these connections are spread across as
many I/O enclosures in the DS8880 storage system that you equipped with HAs.
Usually, SAN directors or switches are used. Use two separate switches to avoid a single
point of failure.
In a z Systems environment, you must select a SAN switch or director that also supports
FICON. An availability-oriented approach applies to the z Systems environments similar to
the Open Systems approach. Plan enough host connections for each server so that if half of
the connections fail, processing can continue at the level before the failure.
For more information, see 4.10.1, “Fibre Channel host adapters: 16 Gbps and 32 Gbps” on
page 100.
[Link]
Your IBM representative or IBM Business Partner has access to these and additional white
papers and can provide them to you.
Cache processing improves the performance of the I/O operations that are done by the host
systems that attach to the DS8000 storage system. Cache size, the efficient internal
structure, and algorithms of the DS8000 storage system are factors that improve I/O
performance. The significance of this benefit is determined by the type of workload that is run.
Read operations
These operations occur when a host sends a read request to the DS8000 storage system:
A cache hit occurs if the requested data is in the cache. In this case, the I/O operation
does not disconnect from the channel/bus until the read is complete. A read hit provides
the highest performance.
A cache miss occurs if the data is not in the cache. The I/O operation logically disconnects
from the host. Other I/O operations occur over the same interface. A stage operation from
the flash MEDIA back-end occurs.
The data remains in the cache and persistent memory until it is destaged, at which point it is
flushed from cache. Destage operations of sequential write operations to RAID 5 arrays are
done in parallel mode, writing a stripe to all disks in the RAID set as a single operation. An
entire stripe of data is written across all the disks in the RAID array. The parity is generated
one time for all the data simultaneously and written to the parity disk. This approach reduces
the parity generation penalty that is associated with write operations to RAID 5 arrays. For
RAID 6, data is striped on a block level across a set of drives, similar to RAID 5
configurations. A second set of parity is calculated and written across all the drives. This
technique does not apply for the RAID 10 arrays because there is no parity generation that is
required. Therefore, no penalty is involved other than a double write when writing to RAID 10
arrays.
It is possible that the DS8000 storage system cannot copy write data to the persistent cache
because it is full, which can occur if all data in the persistent cache waits for destage to disk.
In this case, instead of a fast write hit, the DS8000 storage system sends a command to the
host to retry the write operation. Having full persistent cache is not a good situation because
it delays all write operations. On the DS8000 storage system, the amount of persistent cache
is sized according to the total amount of system memory. The amount of persistent cache is
designed so that the probability of full persistent cache occurring in normal processing is low.
It is a common approach to base the amount of cache on the amount of disk capacity, as
shown in these common general rules:
For Open Systems, each TB of drive capacity needs 1 GB - 2 GB of cache.
For z Systems, each TB of drive capacity needs 2 GB - 4 GB of cache.
For SAN Volume Controller attachments, consider the SAN Volume Controller node cache in
this calculation, which might lead to a slightly smaller DS8000 cache. However, most
installations come with a minimum of 128 GB of DS8000 cache. Using flash in the DS8000
storage system does not typically change the prior values. HPFE with its flash media are
beneficial with cache-unfriendly workload profiles because they reduce the cost of cache
misses.
Most storage servers support a mix of workloads. These general rules can work well, but
many times they do not. Use them like any general rule, but only if you have no other
information on which to base your selection.
When coming from an existing disk storage server environment and you intend to consolidate
this environment into DS8000 storage systems, follow these recommendations:
Choose a cache size for the DS8000 series that has a similar ratio between cache size
and disk storage to that of the configuration that you use.
When you consolidate multiple disk storage servers, configure the sum of all cache from
the source disk storage servers for the target DS8000 processor memory or cache size.
For example, consider replacing four DS8880 storage systems, each with 200 TB and
512 GB cache, with a single DS8950F storage system. The ratio between cache size and
disk storage for each DS8880 storage system is 0.25% (512 GB/200 TB). The new DS8950F
storage system is configured with 900 TB to consolidate the four 200 TB DS8880 storage
systems, plus provide some capacity for growth. This DS8950F storage system should be fine
with 2 TB of cache to keep mainly the original cache-to-disk storage ratio. If the requirements
are somewhere in between or in doubt, round up to the next available memory size. When
using a SAN Volume Controller in front, round down in some cases for the DS8000 cache.
The cache size is not an isolated factor when estimating the overall DS8000 performance.
Consider it with the DS8000 model, the capacity and speed of the disk drives, and the
number and type of HAs. Larger cache sizes mean that more reads are satisfied from the
cache, which reduces the load on DAs and the disk drive back end that is associated with
reading data from disk. To see the effects of different amounts of cache on the performance
of the DS8000 storage system, run a StorM or Disk Magic model, which is described in
Chapter 6.1, “IBM Storage Modeller” on page 126.
DS8900F uses the process of virtualization which is the abstraction of a physical drive to one
or more logical volumes. This virtual drive is presented to hosts and systems as though it is a
physical drive. This process allows the host to think that it is using a storage device that
belongs to it, but the device is implemented in the storage system.
From a physical point of view, the DS8900F offers all flash storage. Flash Storage has no
moving parts and a lower energy consumption. The performance advantages are the fast
seek time and the average access time. Within flash, there is a wide choice between
high-performing Flash Tier 0 and the more economical Flash Tier 1 and 2 drives. In a
combination, these tiers provide a solution that is optimized for both performance and cost.
Array sites are the building blocks that are used to define arrays. See Figure 3-1 on page 33.
Switch
Loop 1 Loop 2
Figure 3-1 Array Site
3.1.2 Arrays
An array is created from one array site. When an array is created, its RAID level, array type,
and array configuration are defined. This process is also called defining an array. In all IBM
DS8000 series implementations, one array is always defined as using one array site.
A request for price quotation (RPQ) is required for flash drives greater than 1 TB (RPQ is not
available for drive sizes of 4 TB and over).
Each HPFE Gen2 pair can contain up to six array sites. The first set of 16 flash drives creates
two 8-flash-drive array sites. RAID 6 arrays are created by default on each array site. RAID 5
is optional for flash drives smaller than 1 TB, but is not recommended. RAID 10 is optional for
all flash drive sizes.
During logical configuration, RAID 6 arrays and the required number of spares are created.
Each HPFE Gen2 pair has two global spares that are created from the first increment of 16
flash drives. The first two arrays to be created from these array sites are 5+P+Q. Subsequent
RAID 6 arrays in the same HPFE Gen2 Pair are 6+P+Q.
Important: Using RAID 6 is recommended, and it is the default in the DS Graphical User
Interface (DS GUI). As with large drives in particular, the RAID rebuild times (after one
drive failure) get ever larger. Using RAID 6 reduces the danger of data loss due to a
double-drive failure.
Smart Rebuild
Smart Rebuild is a function that is designed to help reduce the possibility of secondary
failures and data loss of RAID arrays. It can be used to rebuild a RAID 6 array when certain
drive errors occur and a normal determination is made that it is time to use a spare to
proactively replace a failing flash drive. If the suspect drive is still available for I/O, it is kept in
the array rather than being rejected as under a standard RAID rebuild. A spare is brought into
the array, as an extra member, concurrently.
Note: Smart Rebuild for flash drives is available for DS8900F. It is available for RAID 6
arrays only.
Smart Rebuild is not applicable in all situations, so it is not always used. Smart Rebuild runs
only for healthy RAID 6 arrays. If two drives with errors are in a RAID 6 configuration, or if the
drive mechanism failed to the point that it cannot accept any I/O, the standard RAID rebuild
procedure is used for the RAID array. If communications across a drive fabric are
compromised, such as an SAS path link error that causes the drive to be bypassed, standard
RAID rebuild procedures are used because the suspect drive is not available for a one-to-one
copy with a spare. If Smart Rebuild is not possible or does not provide the designed benefits,
a standard RAID rebuild occurs.
dscli> lsarray -l
Array State Data RAIDtype arsite Rank DA Pair DDMcap (10^9B) diskclass encrypt
=========================================================================================
A0 Assigned Normal 6 (6+P+Q) S6 R0 11 1600.0 FlashTier0 supported
A1 Assigned Normal 6 (5+P+Q+S) S2 R1 10 7680.0 FlashTier2 supported
A2 Assigned Normal 6 (6+P+Q) S5 R2 11 1600.0 FlashTier0 supported
A3 Assigned Normal 6 (5+P+Q+S) S1 R3 10 7680.0 FlashTier2 supported
A4 Assigned Normal 6 (5+P+Q+S) S3 R4 11 1600.0 FlashTier0 supported
A5 Assigned Normal 6 (5+P+Q+S) S4 R5 11 1600.0 FlashTier0 supported
Using the DS CLI command mkarray or through the DS GUI interface, 6 Arrays (1 Arraysite
<-> 1 Array) were created:
A3 <- S1 (5+P+Q+S)
RAID 6 (default) was used to create the Arrays, and as result we have the following:
5+P+Q+S RAID 6 configuration: The array consists of five data drives and two parity
drives (P and Q). The remaining drive on the array site is used as a spare (S).
6+P+Q RAID 6 configuration: The array consists of six data drives and two parity drives
See Figure 3-2 on page 35 to see how a RAID 6 is created from an Array Site and Figure 3-3
on page 36 for RAID 5, RAID 6 and RAID 10 Array types and resulting digit configurations.
Array
Site
D1 D7 D13 ...
D2 D8 D14 ...
D3 D9 D15 ...
D6 P D17 ...
"
"
"
!"
"
""
"
"
"
3.1.3 Ranks
After the arrays are created, the next task is to define a rank. A rank is a logical representation
of the physical array that is formatted for use as FB or CKD storage types. In the DS8900F,
ranks are defined in a one-to-one relationship to arrays.
An FB rank features an extent size of either 1 GB (more precisely a gibibyte (GiB), which is a
binary gigabyte that is equal to 230 bytes), called large extents, or an extent size of 16 MiB,
called small extents.
IBM Z storage CKD is defined in terms of the original 3390 volume sizes. A 3390 Model 3 is
three times the size of a Model 1. A Model 1 features 1113 cylinders, which are about
0.946 GB. The extent size of a CKD rank is one 3390 Model 1, or 1113 cylinders. A 3390
Model 1 (1113 cylinders) is the large extent size for CKD ranks. The smallest CKD extent size
is 21 cylinders, which corresponds to the z/OS allocation unit for Extended Address Volume
(EAV) volumes larger than 65520 cylinders. z/OS changes the addressing modes and
allocates storage in 21 cylinder units.
DS8900F ranks are created using the DS CLI command mkrank or through the DS GUI.
6 Ranks were created (see Example 3-2 ) and can verify the following:
4 Fixed Block (fb) Ranks - R0 to R3
– Small Extents of 16 MiB
2 CKD (ckd) - R4 and R5
– Small Extents of 21 cyl
Figure 3-4 on page 38 shows an example of an array that is formatted for FB data with 1 GiB
extents (the squares in the rank indicate that the extent is composed of several blocks from
DDMs).
Data
Data
RAID D3 D9 D15 ...
Creation of
a Rank
. . . .
. . . .
. . . .
FB Rank
1 GiB 1 GiB 1 GiB 1 GiB . . . .
. . . .
of 1 GiB
. . . .
extents
. . . .
. . . .
If you want Easy Tier to automatically optimize rank utilization, configure more than one rank
in an extent pool. A rank can be assigned to only one extent pool. As many extent pools as
ranks can exist.
Heterogeneous extent pools, with a mixture of Tier 0, Tier 1, and Tier 2 flash drives can take
advantage of the capabilities of Easy Tier to optimize I/O throughput. Easy Tier moves data
across different storage tiering levels to optimize the placement of the data within the extent
pool.
With storage pool striping, you can create logical volumes that are striped across multiple
ranks to enhance performance. To benefit from storage pool striping, more than one rank in
an extent pool is required.
Storage pool striping can enhance performance significantly. However, in the unlikely event
that a whole RAID array fails, the loss of the associated rank affects the entire extent pool
because data is striped across all ranks in the pool. For data protection, consider mirroring
your data to another DS8000 storage system.
A minimum of two extent pools must be configured to balance the capacity and workload
between the two servers. One extent pool is assigned to internal server 0. The other extent
pool is assigned to internal server 1. In a system with both FB and CKD volumes, four extent
pools provide one FB pool for each server and one CKD pool for each server.
This is an example of a mixed environment with CKD and FB extent pools. Additional extent
pools might be wanted to segregate workloads. See Figure 3-5 on page 40.
Extent pools are expanded by adding more ranks to the pool. Ranks are organized into two
rank groups: Rank group 0 is controlled by storage server 0 (processor complex 0), and rank
Group 1 is controlled by storage server 1 (processor complex 1).
Server1
1GiB 1GiB 1GiB 1GiB
FB FB FB FB
Fixed-block LUNs
A logical volume that is composed of FB extents is called a LUN. An FB LUN is composed of
one or more 1 GiB large extents or one or more 16 MiB small extents from one FB extent
pool. A LUN cannot span multiple extent pools, but a LUN can have extents from multiple
ranks within the same extent pool. You can construct LUNs up to 16 TiB when using large
extents.
Before a CKD volume can be created, an LCU must be defined that provides up to 256
possible addresses that can be used for CKD volumes. Up to 255 LCUs can be defined.
Classically, to start an I/O to a base volume, z/OS can select any alias address only from the
same LCU to perform the I/O. With SuperPAV, the z/OS can use alias addresses from other
LCUs to perform an I/O for a base address.
The restriction is that the LCU of the alias address belongs to the same DS8000 server
(processor complex). In other words, if the base address is from an even (odd) LCU, the alias
address that z/OS can select must also be from an even (odd) LCU.
The DS8900F maintains a sequence of ranks. The first rank in the list is randomly picked at
each power-on of the storage system. The DS8900F tracks the rank in which the last
allocation started. The allocation of the first extent for the next volume starts from the next
rank in that sequence. See Figure 3-6 on page 42.
Note: Although the preferred storage allocation method was storage pool striping in the
past, it is now a better choice to let Easy Tier manage the storage pool extents. This item
describes rotate extents for the sake of completeness, but it is now mostly irrelevant.
If more than one volume is created in one operation, the allocation for each volume starts in
another rank.
Dynamic extent pool merge allows one extent pool to be merged into another extent pool
while the logical volumes in both extent pools remain accessible to the host servers. Dynamic
extent pool merge can be used for the following scenarios:
Use dynamic extent pool merge for the consolidation of smaller extent pools of the same
storage type into a larger homogeneous extent pool that uses storage pool striping.
Creating a larger extent pool allows logical volumes to be distributed evenly over a larger
number of ranks, which improves overall performance by minimizing skew and reducing
the risk of a single rank that becomes a hot spot. In this case, a manual volume rebalance
must be initiated to restripe all existing volumes evenly across all available ranks in the
new pool. Newly created volumes in the merged extent pool allocate capacity
automatically across all available ranks by using the rotate extents EAM (storage pool
striping), which is the default.
Use dynamic extent pool merge for consolidating extent pools with different storage tiers
to create a merged multitier extent pool with a mix of storage classes (High-Performance
flash and High-Capacity flash) for automated management by Easy Tier automatic mode.
Under certain circumstances, use dynamic extent pool merge for consolidating extent
pools of the same storage tier but different drive types or RAID levels that can eventually
benefit from storage pool striping and Easy Tier automatic mode intra-tier management
(auto-rebalance) by using the Easy Tier micro-tiering capabilities.
Important: You can apply dynamic extent pool merge only among extent pools that are
associated with the same DS8000 storage system affinity (storage server 0 or storage
server 1) or rank group. All even-numbered extent pools (P0, P2, P4, and so on) belong to
rank group 0 and are serviced by storage server 0. All odd-numbered extent pools (P1, P3,
P5, and so on) belong to rank group 1 and are serviced by storage server 1 (unless one
DS8000 storage system failed or is quiesced with a failover to the alternative storage
system).
Additionally, the dynamic extent pool merge is not supported in these situations:
If source and target pools have different storage types (FB and CKD).
If you select an extent pool that contains volumes that are being migrated.
For more information about dynamic extent pool merge, see IBM DS8000 Easy Tier,
REDP-4667.
With the Easy Tier feature, you can easily add capacity and even single ranks to existing
extent pools without concern about performance.
For further information about manual volume rebalance, see IBM DS8000 Easy Tier,
REDP-4667.
Auto-rebalance
With Easy Tier automatic mode enabled for single-tier or multitier extent pools, you can
benefit from Easy Tier automated intratier performance management (auto-rebalance), which
relocates extents based on rank utilization, and reduces skew and avoids rank hot spots.
Easy Tier relocates subvolume data on extent level based on actual workload pattern and
rank utilization (workload rebalance) rather than balance the capacity of a volume across all
ranks in the pool (capacity rebalance, as achieved with manual volume rebalance).
When adding capacity to managed pools, Easy Tier automatic mode performance
management, auto-rebalance, takes advantage of the new ranks and automatically populates
the new ranks that are added to the pool when rebalancing the workload within a storage tier
and relocating subvolume data. Auto-rebalance can be enabled for hybrid and homogeneous
extent pools.
Tip: For brand new DS8000 storage systems, the Easy Tier automatic mode switch is set
to Tiered, which means that Easy Tier is working only in hybrid pools. Have Easy Tier
automatic mode working in all pools, including single-tier pools. To do so, set the Easy Tier
automode switch to All.
For more information about auto-rebalance, see IBM DS8000 Easy Tier, REDP-4667.
In managed homogeneous extent pools with only a single storage tier, the initial extent
allocation for a new volume is the same as with rotate extents or storage-pool striping. For a
volume, the appropriate DSCLI command, showfbvol or showckdvol, which is used with the
-rank option, allows the user to list the number of allocated extents of a volume on each
associated rank in the extent pool.
The EAM attribute of any volume that is created or already in a managed extent pool is
changed to managed after Easy Tier automatic mode is enabled for the pool. When enabling
Easy Tier automatic mode for all extent pools, that is, hybrid and homogeneous extent pools,
all volumes immediately become managed by Easy Tier. Once set to managed, the EAM
attribute setting for the volume is permanent. All previous volume EAM attribute information,
such as rotate extents or rotate volumes, is lost.
New volumes are allocated to the home tier. When the allocation policy High Utilization is in
effect, the home tier is some middle tier, and when High Performance is in effect, the home
tier is the highest available flash tier. The extents of a new volume are distributed in a rotate
extents or storage pool striping fashion across all available ranks in this home tier in the
extent pool if sufficient capacity is available. Only when all capacity on the home tier in an
extent pool is used does volume creation continue on the ranks of another tier. The allocation
order is the following:
High Utilization: Flash Tier 1 -> Flash Tier 2 -> Flash Tier 0
High Performance: Flash Tier 0 -> Flash Tier 1 -> Flash Tier 2
Note: The default allocation order is High Performance for all-flash pools
You can change the allocation policy according to your needs with the chsi command and the
-ettierorder highutil or highperf option.
Important: Before you can expand a volume, you must remove any Copy Services
relationships that involve that volume.
Important: DVR can be applied among extent pools that are associated with the same
DS8000 storage system affinity (storage server 0 or storage server 1) or rank group only.
All volumes in even-numbered extent pools (P0, P2, P4, and so on) belong to rank group 0
and are serviced by storage server 0. All volumes in odd-numbered extent pools (P1, P3,
P5, and so on) belong to rank group 1 and are serviced by storage server 1. Additionally,
the DVR is not supported if source and target pools are different storage types (FB and
CKD).
If the same extent pool is specified and rotate extents is used as the EAM, the volume
migration is carried out as manual volume rebalance, as described in “Manual volume
rebalance” on page 44. Manual volume rebalance is designed to redistribute the extents of
volumes within a non-managed, single-tier (homogeneous) pool so that workload skew and
hot spots are less likely to occur on the ranks. During extent relocation, only one extent at a
time is allocated rather than preallocating the full volume and only a minimum amount of free
capacity is required in the extent pool.
Important: A volume migration with DVR back into the same extent pool (for example,
manual volume rebalance for restriping purposes) is not supported in managed or hybrid
extent pools. Hybrid pools are always supposed to be prepared for Easy Tier automatic
management. In pools under control of Easy Tier automatic mode, the volume placement
is managed automatically by Easy Tier. It relocates extents across ranks and storage tiers
to optimize storage performance and storage efficiency. However, it is always possible to
migrate volumes across extent pools, no matter if those pools are managed,
non-managed, or hybrid pools.
For more information about this topic, see IBM DS8000 Easy Tier, REDP-4667.
On the DS8000 series, there is no fixed binding between a rank and an LSS. The capacity of
one or more ranks can be aggregated into an extent pool. The logical volumes that are
configured in that extent pool are not necessarily bound to a specific rank. Different logical
volumes on the same LSS can even be configured in separate extent pools. The available
capacity of the storage facility can be flexibly allocated across LSSs and logical volumes. You
can define up to 255 LSSs on a DS8000 storage system.
For each LUN or CKD volume, you must select an LSS when creating the volume. The LSS is
part of the volume ID ‘abcd’ and must to be specified upon volume creation.
You can have up to 256 volumes in one LSS. However, there is one restriction. Volumes are
created from extents of an extent pool. However, an extent pool is associated with one
DS8000 storage system (also called a central processor complex (CPC)): server 0 or
server 1. The LSS number also reflects this affinity to one of these DS8000 storage systems.
All even-numbered LSSs (X'00', X'02', X'04', up to X'FE') are serviced by storage server 0
All logical volumes in an LSS must be either CKD or FB. LSSs are even grouped into address
groups of 16 LSSs. All LSSs within one address group must be of the same storage type,
either CKD or FB. The first digit of the LSS ID or volume ID specifies the address group. For
more information, see 3.1.8, “Address groups” on page 47.
IBM Z users are familiar with a logical control unit (LCU). z Systems operating systems
configure LCUs to create device addresses. There is a one-to-one relationship between an
LCU and a CKD LSS (LSS X'ab' maps to LCU X'ab'). Logical volumes have a logical volume
number X'abcd' in hexadecimal notation where X'ab' identifies the LSS and X'cd' is one of the
256 logical volumes on the LSS. This logical volume number is assigned to a logical volume
when a logical volume is created and determines with which LSS the logical volume is
associated. The 256 possible logical volumes that are associated with an LSS are mapped to
the 256 possible device addresses on an LCU (logical volume X'abcd' maps to device
address X'cd' on LCU X'ab'). When creating CKD logical volumes and assigning their logical
volume numbers, consider whether PAVs are required on the LCU and reserve addresses on
the LCU for alias addresses.
For Open Systems, LSSs do not play an important role other than associating a volume with a
specific rank group and server affinity (storage server 0 or storage server 1) or grouping hosts
and applications together under selected LSSs for the DS8000 Copy Services relationships
and management.
Tip: Certain management actions in Metro Mirror, Global Mirror, or Global Copy operate at
the LSS level. For example, the freezing of pairs to preserve data consistency across all
pairs is at the LSS level. The option to put all or a set of volumes of a certain application in
one LSS can make the management of remote copy operations easier under certain
circumstances.
Important: LSSs for FB volumes are created automatically when the first FB logical
volume on the LSS is created, and deleted automatically when the last FB logical volume
on the LSS is deleted. CKD LSSs require user parameters to be specified and must be
created before the first CKD logical volume can be created on the LSS. They must be
deleted manually after the last CKD logical volume on the LSS is deleted.
All devices in an address group must be either CKD or FB. LSSs are grouped into address
groups of 16 LSSs. LSSs are numbered X'ab', where a is the address group. So, all LSSs
within one address group must be of the same type, CKD or FB. The first LSS defined in an
address group sets the type of that address group. For example, LSS X'10' to LSS X'1F' are
all in the same address group and therefore can all be used only for the same storage type,
either FB or CKD. Figure 3-7 on page 48 shows the concept of volume IDs, LSSs, and
address groups.
Figure 3-7 Volume IDs, logical subsystems, and address groups on the DS8000 storage systems
The volume ID X'gabb' in hexadecimal notation is composed of the address group X'g', the
LSS ID X'ga', and the volume number X'bb' within the LSS. For example, LUN X'2101'
denotes the second (X'01') LUN in LSS X'21' of address group 2.
Host attachment
Host bus adapters (HBAs) are identified to the DS8000 storage system in a host attachment
or host connection construct that specifies the HBA worldwide port names (WWPNs). A set of
host ports (host connections) can be associated through a port group attribute in the DSCLI
that allows a set of HBAs to be managed collectively. This group is called a host attachment
within the GUI.
Each host attachment can be associated with a volume group to define which LUNs that HBA
is allowed to access. Multiple host attachments can share the volume group. The host
attachment can also specify a port mask that controls which DS8000 I/O ports the HBA is
allowed to log in to. Whichever ports the HBA logs in to, it sees the same volume group that is
defined on the host attachment that is associated with this HBA.
When used with Open Systems hosts, a host attachment object that identifies the HBA is
linked to a specific volume group. You must define the volume group by indicating which FB
logical volumes are to be placed in the volume group. Logical volumes can be added to or
removed from any volume group dynamically.
One host connection can be assigned to one volume group only. However, the same volume
group can be assigned to multiple host connections. An FB logical volume can be assigned to
one or more volume groups. Assigning a logical volume to different volume groups allows a
LUN to be shared by hosts, each configured with its own dedicated volume group and set of
volumes (in case a set of volumes that is not identical is shared between the hosts).
The maximum number of volume groups is 8,320 for the DS8000 storage system.
Next, this section described the creation of logical volumes within the extent pools (optionally
striping the volumes), assigning them a logical volume number that determined to which LSS
they are associated and indicated which server manages them. Space-efficient volumes can
be created immediately or within a repository of the extent pool. Then, the LUNs can be
assigned to one or more volume groups. Finally, the HBAs were configured into a host
attachment that is associated with a volume group.
The above concept is seen when working with the DSCLI. When working with the DS Storage
Manager GUI, some complexity is reduced externally: Instead of array sites, arrays and ranks,
the GUI just speaks of Arrays or Managed Arrays, and these are included under Pools. Also,
the concept of a volume group is not directly visible externally when working with the GUI.
This virtualization concept provides for greater flexibility. Logical volumes can dynamically be
created, deleted, migrated, and resized. They can be grouped logically to simplify storage
management. Large LUNs and CKD volumes reduce the required total number of volumes,
which also contributes to a reduction of management efforts.
Data B B B
Data F F F
B B B
Data F F F iB iB iB
B B B G G G
Data i i i 0
r 1 1 1
G G G e
v
Data 1 1 1 r
e
S
B B B
Data F F F
Parity iB iB iB
G G G
Spare 1 1 1
X'2x' FB
4096
addresses
LSS X'27'
X'3x' CKD
4096
addresses
As a best practice, use IBM Storage Modeler to size the cache and I/O adapter requirements.
It is important to understand how the volume data is placed on ranks and extent pools. This
understanding helps you decide how to create extent pools and choose the required number
of ranks within an extent pool. It also helps you understand and detect performance problems
or optimally tweak overall system performance.
It can also help you fine-tune system performance from an extent pool perspective, for
example, sharing the resources of an extent pool evenly between application workloads or
isolating application workloads to dedicated extent pools. Data placement can help you when
planning for dedicated extent pools with different performance characteristics and storage
tiers without using Easy Tier automatic management. Plan your configuration carefully to
meet your performance goals by minimizing potential performance limitations that might be
introduced by single resources that become a bottleneck because of workload skew. For
example, use rotate extents as the default EAM to help reduce the risk of single ranks that
become a hot spot and limit the overall system performance because of workload skew.
If workload isolation is required in your environment, you can isolate workloads and I/O on the
rank and DA levels on the DS8000 storage systems, if required.
You can manually manage storage tiers that are related to different homogeneous extent
pools of the same storage class and plan for appropriate extent pools for your specific
performance needs. Easy Tier provides optimum performance and a balanced resource
utilization at a minimum configuration and management effort.
Even in single-tier homogeneous extent pools, you can benefit from Easy Tier automatic
mode (by running the DSCLI chsi ETautomode=all command). It manages the subvolume
data placement within the managed pool based on rank utilization and thus reduces workload
skew and hot spots (auto-rebalance).
In multitier hybrid extent pools, you can fully benefit from Easy Tier automatic mode (by
running the DSCLI chsi ETautomode=all|tiered command). It provides full automatic
storage performance and storage economics management by optimizing subvolume data
placement in a managed extent pool across different storage tiers and even across ranks
within each storage tier (auto-rebalance). Easy Tier automatic mode and hybrid extent pool
configurations offer the most efficient way to use different storage tiers. It optimizes storage
performance and storage economics across three drive tiers to manage more applications
effectively and efficiently with a single DS8000 storage system at an optimum price versus the
performance and footprint ratio.
Before configuring extent pools and volumes, be aware of the basic configuration principles
about workload sharing, isolation, and spreading, as described in 4.2, “Configuration
principles for optimal performance” on page 66.
The first example, which is shown in Figure 3-9 on page 53, illustrates an extent pool with
only one rank, which is also referred to a single-rank extent pool. This approach is common if
you plan to use a configuration that uses the maximum isolation that you can achieve on the
rank/extpool level. In this type of a single-rank extent pool configuration, all volumes that are
created are bound to a single rank. This type of configuration requires careful logical
configuration and performance planning because single ranks are likely to become a hot spot
and might limit overall system performance. It also requires the highest administration and
management effort because workload skew typically varies over time. You might constantly
monitor your system performance and need to react to hot spots. It also considerably limits
the benefits that a DS8000 storage system can offer regarding its virtualization and Easy Tier
automatic management capabilities.
Figure 3-9 on page 53 shows the data placement of two volumes created in an extent pool
with a single rank. Volumes that are created in this extent pool always use extents from rank
R6, and are limited to the capacity and performance capability of this single rank. Without the
use of any host-based data and workload striping methods across multiple volumes from
different extent pools and ranks, this rank is likely to experience rank hot spots and
performance bottlenecks.
R6
Host B – LUN 6
Host B – LUN 6
Host C – LUN 7
Host C – LUN 7
Single Tier
Also, in this example, one host can easily degrade the whole rank, depending on its I/O
workload, and affect multiple hosts that share volumes on the same rank if you have more
than one LUN allocated in this extent pool.
The second example, which is shown in Figure 3-10 on page 53, illustrates an extent pool
with multiple ranks of the same storage class or storage tier, which is referred to as a
homogeneous or single-tier extent pool. In general, an extent pool with multiple ranks also is
called a multi-rank extent pool.
Extpool P1
R1 R2 R3 R4
Host A – LUN 1
Host A – LUN 2
Host B – LUN 3
Host B – LUN 4
Host C – LUN 5
Host C – LUN 5
Single Tier
Although in principle both EAMs (rotate extents and rotate volumes) are available for
non-managed homogeneous extent pools, it is preferable to use the default allocation method
of rotate extents (storage pool striping). Use this EAM to distribute the data and thus the
workload evenly across all ranks in the extent pool and minimize the risk of workload skew
and a single rank that becomes a hot spot.
Figure 3-10 on page 53 is an example of storage-pool striping for LUNs 1 - 4. It shows more
than one host and more than one LUN distributed across the ranks. In contrast to the
preferred practice, it also shows an example of LUN 5 being created with the rotate volumes
EAM in the same pool. The storage system tries to allocate the continuous space available for
this volume on a single rank (R1) until there is insufficient capacity that is left on this rank and
then it spills over to the next available rank (R2). All workload on this LUN is limited to these
two ranks. This approach considerably increases the workload skew across all ranks in the
pool and the likelihood that these two ranks might become a bottleneck for all volumes in the
pool, which reduces overall pool performance.
Multiple hosts with multiple LUNs, as shown in Figure 3-10 on page 53, share the resources
(resource sharing) in the extent pool, that is, ranks, DAs, and physical spindles. If one host or
LUN has a high workload, I/O contention can result and easily affect the other application
workloads in the pool, especially if all applications have their workload peaks at the same
time. Alternatively, applications can benefit from a much larger amount of disk spindles and
thus larger performance capabilities in a shared environment in contrast to workload isolation
and only dedicated resources. With resource sharing, expect that not all applications peak at
the same time, so that each application typically benefits from the larger amount of disk
resources that it can use. The resource sharing and storage-pool striping in non-managed
extent pools method is a good approach for most cases if no other requirements, such as
workload isolation or a specific quality of service (QoS) requirements, dictate another
approach.
Enabling Easy Tier automatic mode for homogeneous, single-tier extent pools always is an
additional option, and is preferred, to let the DS8000 storage system manage system
performance in the pools based on rank utilization (auto-rebalance). The EAM of all volumes
in the pool becomes managed in this case. With Easy Tier and its advanced micro-tiering
capabilities that take different RAID levels and drive characteristics into account for
determining the rank utilization in managed pools, even a mix of different drive characteristics
and RAID levels of the same storage tier might be an option for certain environments.
With Easy Tier the DS8900F family offers advanced features when taking advantage of
resource sharing to minimize administration efforts and reduce workload skew and hot spots
while benefiting from automatic storage performance, storage economics, and workload
priority management. The use of these features in the DS8000 environments is highly
encouraged. These features generally help provide excellent overall system performance
while ensuring (QoS) levels by prioritizing workloads in shared environments at a minimum of
administration effort and at an optimum price-performance ratio.
When you create a volume in a managed extent pool, that is, an extent pool that is managed
by Easy Tier automatic mode, the EAM of the volume always becomes managed. This
situation is true no matter which EAM is specified at volume creation. The volume is under
control of Easy Tier. Easy Tier moves extents to the most appropriate storage tier and rank in
the pool based on performance aspects. Any specified EAM, such as rotate extents or rotate
volumes, is ignored.
Mixing different storage tiers combined with Easy Tier automatic performance and economics
management on a subvolume level can considerably increase the performance versus price
ratio, increase energy savings, and reduce the overall footprint. The use of Easy Tier
automated subvolume data relocation and the addition of an flash tier are good for mixed
environments with applications that demand both IOPS and bandwidth at the same time. For
example, database systems might have different I/O demands according to their architecture.
Costs might be too high to allocate a whole database on High-Performance flash.
Mixing different drive technologies, for example, High-Performance flash with High-Capacity
flash, and efficiently allocating the data capacity on the subvolume level across the tiers with
Easy Tier can highly optimize price, performance, the footprint, and the energy usage. Only
the hot part of the data allocates into High-Performance flash instead of provisioning this type
of flash capacity for full volumes. Therefore, you can achieve considerable system
performance at a reduced cost and footprint with only a few High-Performance flash drives.
The ratio of High-Performance flash to High-Capacity flash in a hybrid pool depends on the
workload characteristics and skew.
For a two-tier configuration always configure an equal or greater number of HPFE Enclosure
Pairs and drive sets for the upper tier compared to the lower tier. You should not use less
drives of High-Performance flash (Tier 0) than of High-Capacity flash (Tier 2). For instance,
mixing High-Performance flash of 3.2 TB with High-Capacity flash of 15.36 TB under these
conditions, you already come to almost 20% of net capacity in Flash Tier 0.
The DS8000 GUI also provides guidance for High-Performance flash planning based on the
existing workloads on a DS8000 storage system with Easy Tier monitoring capabilities. For
more information about the DS8900F GUI, see [Link] .
Use the hybrid extent pool configurations under automated Easy Tier management. It
provides ease of use with minimum administration and performance management efforts
while optimizing the system performance, price, footprint, and energy costs.
After the initial extent allocation of a volume in the pool, the extents and their placement on
the different storage tiers and ranks are managed by Easy Tier. Easy Tier collects workload
statistics for each extent in the pool and creates migration plans to relocate the extents to the
appropriate storage tiers and ranks. The extents are promoted to higher tiers or demoted to
lower tiers based on their actual workload patterns. The data placement of a volume in a
managed pool is no longer static or determined by its initial extent allocation. The data
placement of the volume across the ranks in a managed extent pool is subject to change over
time to constantly optimize storage performance and storage economics in the pool. This
process is ongoing and always adapting to changing workload conditions. After Easy Tier
data collection and automatic mode are enabled, it might take a few hours before the first
migration plan is created and applied. For more information about Easy Tier migration plan
creation and timings, see IBM DS8000 Easy Tier, REDP-4667.
The DSCLI showfbvol -rank or showckdvol -rank commands, and the showfbvol -tier or
showckdvol -tier commands, can help show the current extent distribution of a volume
across the ranks and tiers, as shown in Example 3-5. In this example, volume 8099 is
managed by Easy Tier and distributed across ranks R0 (FlashTier0 - High-Performance
flash), R1 (FlashTier2 - High-Capacity flash). You can use the lsarray -l -rank Rxy
command to show the storage class and DA pair of a specific rank Rxy.
Example 3-5 showfbvol -tier and showfbvol -rank commands to show the volume-to-rank relationship in
a multitier pool
dscli> showfbvol -tier 8099
extpool P0
===========Tier Distribution============
Tier %allocated
=====================
FlashTier0 78
FlashTier2 22
The volume heat distribution (volume heat map), which is provided by DS GUI helps you
identify hot, warm, and cold extents for each volume and its distribution across the storage
tiers in the pool. For more information about the DS GUI volume heat distribution, see
Chapter 5 of IBM DS8000 Easy Tier, REDP-4667.
Another good configuration for heavier loads and very heavy access densities is 100% of the
capacity in Flash Tier 1 - 3.84 TB.
Only in case of very high loads and very heavy access densities, having a storage system
configuration of 100% High-Performance Flash (Flash Tier 0) is usually needed.
These above are general configuration hints for a storage system to begin with and do not
replace a proper sizing with StorM.
Tip: For a two-tier configuration always configure an equal or greater number of HPFE
Enclosure Pairs and drive sets for the upper tier (for instance Tier 0) compared to the lower
tier (for instance Tier 2).
Please also note that for certain types of large sequential load, 2 drivesets per HPFE instead
of 3 drivesets, can be recommended, and a configuration with 3 HPFEs of 2 drivesets
1.92 TB Flash Tier 2 each may be faster and cheaper for that kind of workload than a
configuration with 2 HPFEs filled with 3 drivesets 1.6 TB Flash Tier 0 each.
Again, a StorM sizing will give the precise answer for your workload profile.
Important: Before reading this chapter, familiarize yourself with the material that is
covered in Chapter 3, “Logical configuration concepts and terminology” on page 31.
This chapter introduces a step-by-step approach to configuring the IBM Storage System
DS8900F depending on workload and performance considerations:
DS8900F Starter drive choices
Reviewing the tiered storage concepts and Easy Tier
Understanding the configuration principles for optimal performance:
– Workload isolation
– Workload resource-sharing
– Workload spreading
Analyzing workload characteristics to determine isolation or resource-sharing
Planning allocation of the DS8900F drive and host connection capacity to identified
workloads
Planning spreading volumes and host connections for the identified workloads
Planning array sites
Planning RAID arrays and ranks with RAID-level performance considerations
Planning extent pools with single-tier and multitier extent pool considerations
Planning address groups, Logical SubSystems (LSSs), volume IDs, and Count Key Data
(CKD) Parallel Access Volumes (PAVs)
Planning I/O port IDs, host attachments, and volume groups
Logical configuration
4.1.1 DS8910F
DS8910F has the following characteristics:
Two IBM Power Systems POWER9 MTM 9009-22A
8 POWER9 cores per CEC
From 192 GB to 512 GB of System Memory
From 8 to 64 Host Adapter Ports
From 16 to 192 Flash drives
From 12.8 TB to 2,949 TB of capacity (with 15.36 TB Flash Drives)
860K of Maximum IOps (4K 70/30 R/W mix)
Maximum of 21 GB/s (Read)/17 GB/s (Write) of Sequential Read - Write
For the DS8910F we have the following suggested Starter drive choices: See Table 4-1 on
page 61.
4.1.2 DS8950F
DS8950F has the following characteristics:
2 × IBM Power Systems POWER9 MTM 9009-42A
20/40 POWER9 cores per CEC
From 512 GB to 3,456 GB of System Memory
From 8 to 128 Host Adapter Ports
From 16 to 384 Flash drives
From 12.8 TB to 5,898 TB of capacity (with 15.36 TB Flash Drives)
2,300K of Maximum IOps (4K 70/30 R/W mix)
Maximum of 63 GB/s (Read)/ 32 GB/s (Write) of Sequential Read - Write
Table 4-2 shows suggested Starter drive choices for the DS8950F, which can be used along
with some storage sizing tool such as StorM.
4.1.3 DS8980F
DS8980F has the following characteristics:
2 × IBM Power Systems POWER9 MTM 9009-42A
44 POWER9 cores per CEC
4,352 GB of System Memory
From 8 to 128 Host Adapter Ports
From 16 to 384 Flash drives
From 12.8 TB to 5,898 TB of capacity (with 15.36 TB Flash Drives)
2,300K of Maximum IOps (4K 70/30 R/W mix)
Maximum of 63 GB/s (Read)/ 32 GB/s (Write) of Sequential Read - Write
The DS8980F will usually be intended for very heavy workloads, and in occasions of bigger
consolidation of several older DS8000 into one new DS. As suggested, Starter drive choice,
the one for Heavy Workload as shown for the DS8950F in Table 4-2 on page 61, could apply
– but again, use a tool like StorM for the final sizing.
Using the DS8910F Starter drive choices with 100 TB for a typical workload and then the
additional custom-placement option (Feature 06060), it is often better to have 1.92 TB drive
distributed in 3 HPFEs as it is shown in the above right-hand picture. The slightly higher costs
for the additional HPFE are more than offset by the lower price of Flash Tier 2 drives. And
especially if the workload profile is heavily sequential, the right-hand alternative would give
you even bigger MB/sec throughput, at a lower price.
DS 8 9 0 0 F DS 8 9 0 0 F
6 x 1 .6 TB HPF 6 x 1 .9 2 TB HCF Tie r 2
Flas h Driv e s e t s in 2 x HPFE Flas h Driv e s e t s in 3 x HPFE
(c u s t o m plac e m e n t )
Example 2
In the second example, Figure 4-3 , we have around 300 TB being distributed between
High-Performance Flash drives of 1.6 TB with High-Capacity Flash of 7.68 TB. The left-hand
alternative uses only 2 HPFEs together.
DS 8 9 0 0 F DS 8 9 0 0 F
3 x 1 .6 TB HPF an d 3 x 1 .6 TB HPF an d
3 x 7 .6 8 TB HCF Tie r 2 3 x 7 .6 8 TB HCF Tie r 2
Flas h Driv e s e t s in 2 x HPFE Flas h Driv e s e t s in 4 x HPFE
(c u s t o m plac e m e n t fo r HPF)
Using the sizing tool with your specific workload profile will tell if such an alternative option is
beneficial for your case.
For Performance Analysis, I/O has always been the most tricky resource to analyze and
where there are some rules to follow such as:
Avoid many I/O operations in the same volume or LUN.
No I/O is the best I/O. Or in other words: make use of all kinds of cache:
– Memory cache – use of the memory to avoid I/O operations.
– DS8900F cache – this is the reason for bigger DS8000 caches and better microcode
caching techniques. A very high cache-hit ratio is always desirable.
Placement of high important I/Os or heavy workloads in the Flash High-Performance
technology and less important I/Os or near to zero workloads into Flash High-Capacity
technology.
Before the Easy Tier technology on DS8000, performance analysts tried to make all kinds of
I/O studies (called Seek Analysis) in order to understand on how the I/O operations were
distributed, and using this information they allocate the datasets, files, databases into the
different disk technologies using the performance objectives.
In a certain way this was very frustrating because some time later (weeks, months) the above
behavior changed and they would have to change again the placement of the data.
Easy Tier is a DS8900F microcode technique and when enabled allows a permanent I/O
analysis where the hot I/O goes automatically to the best technology and the cold I/O goes to
less expensive disks.
The enclosures must be installed in pairs. The HPFEs connect to the I/O enclosures over a
PCIe fabric, which increases bandwidth and transaction-processing capability.
With dramatically high I/O rates, low response times, and IOPS-energy-efficient
characteristics, flash addresses the highest performance needs and also potentially can
achieve significant savings in operational costs. It is critical to choose the correct mix of
storage tiers and the correct data placement to achieve optimal storage performance and
economics across all tiers at a low cost.
With the DS8900F storage system, you can easily implement tiered storage environments
that use high-performance flash and high-capacity flash storage tiers. Still, different storage
tiers can be isolated to separate extent pools and volume placement can be managed
manually across extent pools where required. Or, better and highly encouraged, volume
placement can be managed automatically on a subvolume level (extent level) in hybrid extent
pools by Easy Tier automatic mode with minimum management effort for the storage
administrator. Easy Tier is a no-cost feature on DS8900F storage systems. For more
information about Easy Tier, see 1.2.4, “Easy Tier” on page 11.
Consider Easy Tier automatic mode and hybrid or multi-tier extent pools for managing tiered
storage on the DS8900F storage system. The overall management and performance
monitoring effort increases considerably when manually managing storage capacity and
storage performance needs across multiple storage classes and does not achieve the
efficiency provided with Easy Tier automatic mode data relocation on the subvolume level
(extent level). With Easy Tier, client configurations show less potential to waste flash capacity
than with volume-based tiering methods.
With Easy Tier, you can configure hybrid or multi-tier extent pools (mixed high-performance
flash/high-capacity flash storage pools) and turn on Easy Tier on. It then provides automated
data relocation across the storage tiers and ranks in the extent pool to optimize storage
performance and storage economics. It also rebalances the workload across the ranks within
each storage tier (auto-rebalance) based on rank utilization to minimize skew and hot spots.
Furthermore, it constantly adapts to changing workload conditions. There is no need anymore
to bother with tiering policies that must be manually applied to accommodate changing
workload dynamics.
In environments with homogeneous system configurations or isolated storage tiers that are
bound to different homogeneous extent pools, you can benefit from Easy Tier automatic
mode. Easy Tier provides automatic intra-tier performance management by rebalancing the
workload across ranks (auto-rebalance) in homogeneous single-tier pools based on rank
utilization. Easy Tier automatically minimizes skew and rank hot spots and helps to reduce
the overall management effort for the storage administrator.
Depending on the particular storage requirements in your environment, with the DS8900F
architecture, you can address a vast range of storage needs combined with ease of
management. On a single DS8900F storage system, you can perform these tasks:
Isolate workloads to selected extent pools (or down to selected ranks and DAs).
Share resources of other extent pools with different workloads.
Use Easy Tier to manage automatically multitier extent pools with different storage tiers
(or homogeneous extent pools).
Adapt your logical configuration easily and dynamically at any time to changing
performance or capacity needs by migrating volumes across extent pools, merging extent
pools, or removing ranks from one extent pool (rank depopulation) and moving them to
another pool.
For many initial installations, an approach with two extent pools (with or without different
storage tiers) and enabled Easy Tier automatic management might be the simplest way to
start if you have FB or CKD storage only; otherwise, four extent pools are required. You can
plan for more extent pools based on your specific environment and storage needs, for
example, workload isolation for some pools, different resource sharing pools for different
departments or clients, or specific Copy Services considerations.
Easy Tier provides a significant benefit for mixed workloads, so consider it for resource-
sharing workloads and isolated workloads dedicated to a specific set of resources.
Furthermore, Easy Tier automatically supports the goal of workload spreading by distributing
the workload in an optimum way across all the dedicated resources in an extent pool. It
provides automated storage performance and storage economics optimization through
dynamic data relocation on extent level across multiple storage tiers and ranks based on their
access patterns. With auto-rebalance, it rebalances the workload across the ranks within a
storage tier based on utilization to reduce skew and avoid hot spots. Auto-rebalance applies
to managed multitier pools and single-tier pools and helps to rebalance the workloads evenly
across ranks to provide an overall balanced rank utilization within a storage tier or managed
single-tier extent pool. Figure 4-4 shows the effect of auto-rebalance in a single-tier extent
pool that starts with a highly imbalanced workload across the ranks at T1. Auto-rebalance
rebalances the workload and optimizes the rank utilization over time.
Isolation provides ensured availability of the hardware resources that are dedicated to the
isolated workload. It removes contention with other applications for those resources.
However, isolation limits the isolated workload to a subset of the total DS8900F hardware so
that its maximum potential performance might be reduced. Unless an application has an
entire DS8900F storage system that is dedicated to its use, there is potential for contention
with other applications for any hardware (such as cache and processor resources) that is not
dedicated. Typically, isolation is implemented to improve the performance of certain
workloads by separating different workload types.
One traditional practice to isolation is to identify lower-priority workloads with heavy I/O
demands and to separate them from all of the more important workloads. You might be able
to isolate multiple lower priority workloads with heavy I/O demands to a single set of hardware
resources and still meet their lower service-level requirements, particularly if their peak I/O
demands are at different times.
Multiple resource-sharing workloads can have logical volumes on the same ranks and can
access the same DS8900F HAs or I/O ports. Resource-sharing allows a workload to access
more DS8900F hardware than can be dedicated to the workload, providing greater potential
performance, but this hardware sharing can result in resource contention between
applications that impacts overall performance at times. It is important to allow
resource-sharing only for workloads that do not consume all of the DS8900F hardware
resources that are available to them. Pinning volumes to one certain tier can also be
considered temporarily, and then you can release these volumes again.
Easy Tier extent pools typically are shared by multiple workloads because Easy Tier with its
automatic data relocation and performance optimization across multiple storage tiers
provides the most benefit for mixed workloads.
To better understand the resource-sharing principle for workloads on disk arrays, see 3.2.3,
“Extent pool considerations” on page 51.
You must allocate the DS8900F hardware resources to either an isolated workload or multiple
resource-sharing workloads in a balanced manner, that is, you must allocate either an
isolated workload or resource-sharing workloads to the DS8900F ranks that are assigned to
DAs and both processor complexes in a balanced manner. You must allocate either type of
workload to I/O ports that are spread across HAs and I/O enclosures in a balanced manner.
You must distribute volumes and host connections for either an isolated workload or a
resource-sharing workload in a balanced manner across all DS8900F hardware resources
that are allocated to that workload.
You should create volumes as evenly distributed as possible across all ranks and DAs
allocated to those workloads.
One exception to the recommendation of spreading volumes might be when specific files or
data sets are never accessed simultaneously, such as multiple log files for the same
application where only one log file is in use at a time. In that case, you can place the volumes
required by these data sets or files on the same resources.
You must also configure host connections as evenly distributed as possible across the I/O
ports, HAs, and I/O enclosures that are available to either an isolated or a resource-sharing
workload. Then, you can use host server multipathing software to optimize performance over
multiple host connections. For more information about multipathing software, see Chapter 8,
“Host attachment” on page 187.
Additionally, you might identify any workload that is so critical that its performance can never
be allowed to be negatively impacted by other workloads.
Then, identify the remaining workloads that are considered appropriate for resource-sharing.
Next, define a balanced set of hardware resources that can be dedicated to any isolated
workloads, if required. Then, allocate the remaining DS8000 hardware for sharing among the
resource-sharing workloads. Carefully consider the appropriate resources and storage tiers
for Easy Tier and multitier extent pools in a balanced manner.
The next step is planning extent pools and assigning volumes and host connections to all
workloads in a way that is balanced and spread. By default, the standard allocation method
when creating volumes is stripes with one-extent granularity across all arrays in a pool, so on
the rank level, this distribution is done automatically.
Without the explicit need for workload isolation or any other requirements for multiple extent
pools, starting with two extent pools (with or without different storage tiers) and a balanced
distribution of the ranks and DAs/HPFEs might be the simplest configuration to start with
using resource-sharing throughout the whole DS8900F storage system and Easy Tier
automatic management if you have either FB or CKD storage. Otherwise, four extent pools
The final step is the implementation of host-level striping (when appropriate) and multipathing
software, if needed. If you planned for Easy Tier, do not consider host-level striping because it
dilutes the workload skew and is counterproductive to the Easy Tier optimization.
For example, the ratio of High-Performance flash capacity to High-Capacity flash capacity in a
hybrid pool depends on the workload characteristics and skew.
For a two-tier configuration always configure an equal or greater number of HPFE Enclosure
Pairs and drive sets for the upper tier compared to the lower tier. You should not use less
drives of High-Performance flash (Tier 0) than of High-Capacity flash (Tier 2). For instance,
mixing High-Performance flash of 3.2 TB with High-Capacity flash of 15.36 TB under these
conditions, you already come to almost 20% of net capacity in Flash Tier 0.
The DS8900F Storage Management GUI also can provide guidance for capacity planning of
the available storage tiers based on the existing workloads on a DS8900F storage system
with Easy Tier monitoring enabled.
You must also consider organizational and business considerations in determining which
workloads to isolate. Workload priority (the importance of a workload to the business) is a key
consideration. Application administrators typically request dedicated resources for high
The most important consideration is preventing lower-priority workloads with heavy I/O
requirements from impacting higher priority workloads. Lower-priority workloads with heavy
random activity must be evaluated for rank isolation. Lower-priority workloads with heavy,
large blocksize, and sequential activity must be evaluated for I/O port, and eventually DA
(HPFE) isolation.
Workloads that require different disk drive types (capacity and speed), different RAID types
(RAID 5, RAID 6, or RAID 10), or different storage types (CKD or FB) dictate isolation to
different DS8000 arrays, ranks, and extent pools, unless this situation can be solved by
pinning volumes to one certain tier. For more information about the performance implications
of various RAID types, see “RAID-level performance considerations” on page 77.
Workloads that use different I/O protocols (FCP or FICON) dictate isolation to different I/O
ports. However, workloads that use the same drive types, RAID type, storage type, and I/O
protocol can be evaluated for separation or isolation requirements.
Workloads with heavy, continuous I/O access patterns must be considered for isolation to
prevent them from consuming all available DS8900F hardware resources and impacting the
performance of other types of workloads. Workloads with large blocksize and sequential
activity can be considered for separation from those workloads with small blocksize and
random activity.
Isolation of only a few workloads that are known to have high I/O demands can allow all the
remaining workloads (including the high-priority workloads) to share hardware resources and
achieve acceptable levels of performance. More than one workload with high I/O demands
might be able to share the isolated DS8900F resources, depending on the service level
requirements and the times of peak activity.
The following examples are I/O workloads, files, or data sets that might have heavy and
continuous I/O access patterns:
Sequential workloads (especially those workloads with large-blocksize transfers)
Log files or data sets
Sort or work data sets or files
Business Intelligence and Data Mining
Disk copies (including Point-in-Time Copy background copies, remote mirroring target
volumes, and tape simulation on disk)
Video and imaging applications
Engineering and scientific applications
Certain batch workloads
You must consider workloads for all applications for which DS8900F storage is allocated,
including current workloads to be migrated from other installed storage systems and new
workloads that are planned for the DS8900F storage system. Also, consider projected growth
for both current and new workloads.
For existing applications, consider historical experience first. For example, is there an
application where certain data sets or files are known to have heavy, continuous I/O access
patterns? Is there a combination of multiple workloads that might result in unacceptable
performance if their peak I/O times occur simultaneously? Consider workload importance
(workloads of critical importance and workloads of lesser importance).
Estimate the requirements for new application workloads and for current application workload
growth. You can obtain information about general workload characteristics in Chapter 5,
“Understanding your workload” on page 107.
As new applications are rolled out and current applications grow, you must monitor
performance and adjust projections and allocations. You can obtain more information about
this topic in Chapter 7, “Practical performance management” on page 139.
You can use the StorM modeling tool to model the current or projected workload and estimate
the required DS8900F hardware resources. They are described in Chapter 6, “Performance
planning tools” on page 125.
The DS8900F Storage Management GUI can also provide workload information and capacity
planning recommendations that are associated with a specific workload to reconsider the
need for isolation and evaluate the potential benefit when using a multitier configuration and
Easy Tier.
Choose the DS8900F resources to dedicate in a balanced manner. If ranks are planned for
workloads in multiples of two, half of the ranks can later be assigned to extent pools managed
by processor complex 0, and the other ranks can be assigned to extent pools managed by
processor complex 1. You may also note the DAs and HPFEs to be used. If I/O ports are
allocated in multiples of four, they can later be spread evenly across all I/O enclosures in a
DS8900F frame if four or more HA cards are installed. If I/O ports are allocated in multiples of
two, they can later be spread evenly across left and right I/O enclosures.
Easy Tier later provides automatic intra-tier management in single-tier and multitier pools
(auto-rebalance) and cross-tier management in multitier pools for the resource-sharing
workloads.
Host connection: In this chapter, we use host connection in a general sense to represent
a connection between a host server (either z Operating Systems or Open Systems) and
the DS8000 storage system.
After the spreading plan is complete, use the DS8900F hardware resources that are identified
in the plan as input to order the DS8900F hardware.
There also can be Open Systems host server or multipathing software considerations that are
related to the number or the size of volumes, so you must consider these factors in addition to
workload requirements.
There are significant performance implications with the assignment of logical volumes to
ranks and DAs. The goal of the entire logical configuration planning process is to ensure that
volumes for each workload are on ranks and DAs that allow all workloads to meet
performance objectives.
To spread volumes across allocated hardware for each isolated workload, and then for each
workload in a group of resource-sharing workloads, complete the following steps:
1. Review the required number and the size of the logical volumes that are identified during
the workload analysis.
2. Review the number of ranks that are allocated to the workload (or group of
resource-sharing workloads) and the associated DA pairs.
3. Evaluate the use of multi-rank or multitier extent pools. Evaluate the use of Easy Tier in
automatic mode to automatically manage data placement and performance.
4. Assign the volumes, preferably with the default allocation method rotate extents (DSCLI
term: rotateexts, GUI: rotate capacity).
There are significant performance implications from the assignment of host connections to
I/O ports, HAs, and I/O enclosures. The goal of the entire logical configuration planning
process is to ensure that host connections for each workload access I/O ports and HAs that
allow all workloads to meet the performance objectives.
To spread host connections across allocated hardware for each isolated workload, and then
for each workload in a group of resource-sharing workloads, complete the following steps:
1. Review the required number and type (SW, LW, FCP, or FICON) of host connections that
are identified in the workload analysis. You must use a minimum of two host connections
to different DS8900F HA cards to ensure availability. Some Open Systems hosts might
impose limits on the number of paths and volumes. In such cases, you might consider not
exceeding four paths per volume, which in general is a good approach for performance
and availability. The DS8900F front-end host ports are 16 Gbps and 32 Gbps capable and
if the expected workload is not explicitly saturating the adapter and port bandwidth with
high sequential loads, you might share ports with many hosts.
2. Review the HAs that are allocated to the workload (or group of resource-sharing
workloads) and the associated I/O enclosures.
3. Review requirements that need I/O port isolation, for example, remote replication Copy
Services, or SAN Volume Controller. If possible, try to split them as you split hosts among
When using the DS Storage Manager GUI to create managed arrays and pools, the GUI
automatically chooses a good distribution of the arrays across all DAs, and initial formatting
with the GUI gives optimal results for many cases. Only for specific requirements (for
example, isolation by DA pairs) is the command-line interface (DSCLI) advantageous
because it gives more options for a certain specific configuration.
After the DS8900F hardware is installed, you can use the output of the DS8000 DSCLI
lsarraysite command to display and document array site information, including flash drive
type and DA pair. Check the disk drive type and DA pair for each array site to ensure that
arrays, ranks, and ultimately volumes that are created from the array site are created on the
DS8900F hardware resources required for the isolated or resource-sharing workloads.
The result of this step is the addition of specific array site IDs to the plan of workload
assignment to ranks.
Storage servers: Array sites, arrays, and ranks do not have a fixed or predetermined
relationship to any DS8900F processor complex (storage server) before they are finally
assigned to an extent pool and a rank group (rank group 0/1 is managed by processor
complex 0/1).
RAID 6 is now the default and preferred setting for the DS8900F. RAID 5 can be configured
for drives of less than 1 TB, but this configuration is not preferred and requires a risk
acceptance. Flash Tier 0 drive sizes larger than 1 TB can be configured by using RAID 5, but
require an RPQ and an internal control switch to be enabled. RAID 10 continues to be an
option for all drive types.
When configuring arrays from array sites, you must specify the RAID level, either RAID 5,
RAID 6, or RAID 10. These RAID levels meet different requirements for performance, usable
storage capacity, and data protection. However, you must determine the correct RAID types
and the physical flash drives (speed and capacity) that are related to initial workload
performance objectives, capacity requirements, and availability considerations before you
order the DS8900F hardware.
Each HPFE Gen2 pair can contain up to six array sites. The first set of 16 flash drives creates
two 8-flash drive array sites. RAID 6 arrays are created by default on each array site.
During logical configuration, RAID 6 arrays and the required number of spares are created.
Each HPFE Gen2 pair has two global spares, created from the first increment of 16 flash
drives. The first two arrays to be created from these array sites are 5+P+Q. Subsequent
RAID 6 arrays in the same HPFE Gen2 Pair will be 6+P+Q.
RAID 5 had been for long one of the most commonly used levels of RAID protection because
it optimizes cost-effective performance while emphasizing usable capacity through data
striping. It provides fault tolerance if one disk drive fails by using XOR parity for redundancy.
Hot spots within an array are avoided by distributing data and parity information across all of
the drives in the array. The capacity of one drive in the RAID array is lost because it holds the
parity information. RAID 5 provides a good balance of performance and usable storage
capacity.
RAID 6 provides a higher level of fault tolerance than RAID 5 in disk failures, but also provides
less usable capacity than RAID 5 because the capacity of two drives in the array is set aside
to hold the parity information. As with RAID 5, hot spots within an array are avoided by
distributing data and parity information across all of the drives in the array. Still, RAID 6 offers
more usable capacity than RAID 10 by providing an efficient method of data protection in
double disk errors, such as two drive failures, two coincident medium errors, or a drive failure
and a medium error during a rebuild. Because the likelihood of media errors increases with
the capacity of the physical disk drives, consider the use of RAID 6 with large capacity disk
drives and higher data availability requirements. For example, consider RAID 6 where
rebuilding the array in a drive failure takes a long time.
RAID 10 optimizes high performance while maintaining fault tolerance for disk drive failures.
The data is striped across several disks, and the first set of disk drives is mirrored to an
identical set. RAID 10 can tolerate at least one, and in most cases, multiple disk failures if the
primary copy and the secondary copy of a mirrored disk pair do not fail at the same time.
Regarding random-write I/O operations, the different RAID levels vary considerably in their
performance characteristics. With RAID 10, each write operation at the disk back end initiates
two disk operations to the rank. With RAID 5, an individual random small-block write
operation to the disk back end typically causes a RAID 5 write penalty, which initiates four I/O
operations to the rank by reading the old data and the old parity block before finally writing the
new data and the new parity block. For RAID 6 with two parity blocks, the write penalty
increases to six required I/O operations at the back end for a single random small-block write
operation. This assumption is a worst-case scenario that is helpful for understanding the
back-end impact of random workloads with a certain read/write ratio for the various RAID
levels. It permits a rough estimate of the expected back-end I/O workload and helps you to
plan for the correct number of arrays. On a heavily loaded system, it might take fewer I/O
operations than expected on average for RAID 5 and RAID 6 arrays. The optimization of the
queue of write I/Os waiting in cache for the next destage operation can lead to a high number
of partial or full stripe writes to the arrays with fewer required back-end disk operations for the
parity calculation.
On modern disk systems, such as the DS8900F storage system, write operations are cached
by the storage subsystem and thus handled asynchronously with short write response times
for the attached host systems. So, any RAID 5 or RAID 6 write penalties are shielded from the
attached host systems in disk response time. Typically, a write request that is sent to the
DS8900F subsystem is written into storage server cache and persistent cache, and the I/O
operation is then acknowledged immediately to the host system as complete. If there is free
space in these cache areas, the response time that is seen by the application is only the time
to get data into the cache, and it does not matter whether RAID 5, RAID 6, or RAID 10 is
used.
There is also the concept of rewrites. If you update a cache segment that is still in write cache
and not yet destaged, update segment in the cache and eliminate the RAID penalty for the
previous write step. However, if the host systems send data to the cache areas faster than the
storage server can destage the data to the arrays (that is, move it from cache to the physical
disks), the cache can occasionally fill up with no space for the next write request. Therefore,
the storage server signals the host system to retry the I/O write operation. In the time that it
takes the host system to retry the I/O write operation, the storage server likely can destage
part of the data, which provides free space in the cache and allows the I/O operation to
complete on the retry attempt.
RAID 10 is not as commonly used as RAID 5 or RAID 6 for the following reason: RAID 10
requires more raw disk capacity for every TB of effective capacity.
Table 4-3 shows a short overview of the advantages and disadvantages for the RAID level
reliability, space efficiency, and random write performance.
Table 4-3 RAID-level comparison of reliability, space efficiency, and write penalty
RAID level Reliability Space efficiencya Performance
(number of erasures) write penalty
(number of disk
operations)
Because RAID 5, RAID 6, and RAID 10 perform equally well for both random and sequential
read operations, RAID 5 or RAID 6 might be a good choice for space efficiency and
performance for standard workloads with many read requests. RAID 6 offers a higher level of
data protection than RAID 5, especially for large capacity drives.
For array rebuilds, RAID 5, RAID 6, and RAID 10 require approximately the same elapsed
time, although RAID 5 and RAID 6 require more disk operations and therefore are more likely
to affect other disk activity on the same disk array.
Today, High-Capacity flash drives are mostly used as some Tier 1 (High-Capacity flash of
3.84 TB) and lower tiers in a hybrid pool where most of the IOPS are handled by Tier 0
High-Performance flash drives. Yet, even if the High-Performance flash tier might handle, for
example, 70% and more of the load, the High-Capacity flash drives still handle a considerable
workload amount because of their large bulk capacity.
Finally, using the lsarray -l, and lsrank -l commands can give you an idea of which DA
pair (HPFE) is used by each array and rank respectively, as shown in Example 4-1. You can
do further planning from here.
Example 4-1 lsarray -l and lsrank -l commands showing Array ID sequence and DA pair
dscli> lsarray -l
Array State Data RAIDtype arsite Rank DA Pair DDMcap (10^9B) diskclass encrypt
=========================================================================================
A0 Assigned Normal 6 (6+P+Q) S6 R0 11 1600.0 FlashTier0 supported
A1 Assigned Normal 6 (5+P+Q+S) S2 R1 10 7680.0 FlashTier2 supported
A2 Assigned Normal 6 (6+P+Q) S5 R2 11 1600.0 FlashTier0 supported
A3 Assigned Normal 6 (5+P+Q+S) S1 R3 10 7680.0 FlashTier2 supported
A4 Assigned Normal 6 (5+P+Q+S) S3 R4 11 1600.0 FlashTier0 supported
A5 Assigned Normal 6 (5+P+Q+S) S4 R5 11 1600.0 FlashTier0 supported
dscli> lsrank -l
ID Group State datastate Array RAIDtype extpoolID extpoolnam stgtype exts usedexts keygrp marray extsi
===========================================================================================================
R0 0 Normal Normal A0 6 P0 Open_ET fb 554647 547943 1 MA6 16MiB
R1 0 Normal Normal A1 6 P0 Open_ET fb 2219583 276917 1 MA2 16MiB
R2 1 Normal Normal A2 6 P1 Open_ET fb 554647 553725 1 MA5 16MiB
R3 1 Normal Normal A3 6 P1 Open_ET fb 2219583 1840518 1 MA1 16MiB
R4 0 Normal Normal A4 6 P2 CKD_z/OS ckd 429258 35783 1 MA3 21cyl
R5 1 Normal Normal A5 6 P3 CKD_z/OS ckd 429258 39931 1 MA4 21cyl
Extent pools are automatically numbered with system-generated IDs starting with P0, P1, and
P2 in the sequence in which they are created. Extent pools that are created for rank group 0
are managed by processor complex 0 and have even-numbered IDs (P0, P2, and P4, for
example). Extent pools that are created for rank group 1 are managed by processor complex
1 and have odd-numbered IDs (P1, P3, and P5, for example). Only in a failure condition or
during a concurrent code load is the ownership of a certain rank group temporarily moved to
the alternative processor complex.
To achieve a uniform storage system I/O performance and avoid single resources that
become bottlenecks (called hot spots), it is preferable to distribute volumes and workloads
evenly across all of the ranks and DA pairs that are dedicated to a workload by creating
appropriate extent pool configurations.
The assignment of the ranks to extent pools together with an appropriate concept for the
logical configuration and volume layout is the most essential step to optimize overall storage
system performance. A rank can be assigned to any extent pool or rank group. Each rank
provides a particular number of storage extents of a certain storage type (either FB or CKD)
to an extent pool. Finally, an extent pool aggregates the extents from the assigned ranks and
provides the logical storage capacity for the creation of logical volumes for the attached host
systems.
On the DS8900F storage system, you can configure homogeneous single-tier extent pools
with ranks of the same storage class, and hybrid multitier extent pools with ranks from
different storage classes. The Extent Allocation Methods (EAM), such as rotate extents or
storage-pool striping, provide easy-to-use capacity-based methods of spreading the workload
data across the ranks in an extent pool. Furthermore, the use of Easy Tier automatic mode to
automatically manage and maintain an optimal workload distribution across these resources
over time provides excellent workload spreading with the best performance at a minimum
administrative effort.
The following sections present concepts for the configuration of single-tier and multitier extent
pools to spread the workloads evenly across the available hardware resources. Also, the
benefits of Easy Tier with different extent pool configurations are outlined. Unless otherwise
noted, assume that enabling Easy Tier automatic mode refers to enabling the automatic
management capabilities of Easy Tier and Easy Tier monitoring.
Single-tier extent pools consist of one or more ranks that can be referred to as single-rank or
multi-rank extent pools.
With single-rank extent pools, you choose a configuration design that limits the capabilities of
a created volume to the capabilities of a single rank for capacity and performance. A single
volume cannot exceed the capacity or the I/O performance provided by a single rank. So, for
demanding workloads, you must create multiple volumes from enough ranks from different
extent pools and use host-level-based striping techniques, such as volume manager striping,
to spread the workload evenly across the ranks dedicated to a specific workload. You are also
likely to waste storage capacity easily if extents remain left on ranks in different extent pools
because a single volume can be created only from extents within a single extent pool, not
across extent pools.
Furthermore, you benefit less from the advanced DS8000 virtualization features, such as
dynamic volume expansion (DVE), storage pool striping, Easy Tier automatic performance
management, and workload spreading, which use the capabilities of multiple ranks within a
single extent pool.
Single-rank extent pools are selected for environments where isolation or management of
volumes on the rank level is needed, such as in some z/OS environments. Single-rank extent
pools are selected for configurations by using storage appliances, such as the SAN Volume
Controller, where the selected RAID arrays are provided to the appliance as simple back-end
storage capacity and where the advanced virtualization features on the DS8000 storage
system are not required or not wanted to avoid multiple layers of data striping. However, the
use of homogeneous multi-rank extent pools and storage pool striping to minimize the
storage administrative effort by shifting the performance management from the rank to the
extent pool level and letting the DS8000 storage system maintain a balanced data distribution
across the ranks within a specific pool is popular. It provides excellent performance in relation
to the reduced management effort.
Also, you do not need to strictly use only single-rank extent pools or only multi-rank extent
pools on a storage system. You can base your decision on individual considerations for each
workload group that is assigned to a set of ranks and thus extent pools. The decision to use
single-rank and multi-rank extent pools depends on the logical configuration concept that is
chosen for the distribution of the identified workloads or workload groups for isolation and
resource-sharing.
In general, single-rank extent pools might not be good in the current complex and mixed
environments unless you know that this level of isolation and micro-performance
management is required for your specific environment. If not managed correctly, workload
skew and rank hot spots that limit overall system performance are likely to occur.
With a homogeneous multi-rank extent pool, you take advantage of the advanced DS8900F
virtualization features to spread the workload evenly across the ranks in an extent pool to
achieve a well-balanced data distribution with considerably less management effort.
Performance management is shifted from the rank level to the extent pool level. An extent
pool represents a set of merged ranks (a larger set of disk spindles) with a uniform workload
distribution. So, the level of complexity for standard performance and configuration
The DS8900F capacity allocation methods take care of spreading the volumes and thus the
individual workloads evenly across the ranks within homogeneous multi-rank extent pools.
Rotate extents is the default and preferred EAM to distribute the extents of each volume
successively across all ranks in a pool to achieve a well-balanced capacity-based distribution
of the workload. Rotate volumes is rarely used today, but it can help to implement a strict
volume-to-rank relationship. It reduces the configuration effort compared to single-rank extent
pools by easily distributing a set of volumes to different ranks in a specific extent pool for
workloads where the use of host-based striping methods is still preferred.
The size of the volumes must fit the available capacity on each rank. The number of volumes
that are created for this workload in a specific extent pool must match the number of ranks (or
be at least a multiple of this number). Otherwise, the result is an imbalanced volume and
workload distribution across the ranks and rank bottlenecks might emerge. However, efficient
host-based striping must be ensured in this case to spread the workload evenly across all
ranks, eventually from two or more extent pools. For more information about the EAMs and
how the volume data is spread across the ranks in an extent pool, see , “Extent Allocation
Method” on page 41.
Even multi-rank extent pools that are not managed by Easy Tier provide some level of control
over the volume placement across the ranks in cases where it is necessary to enforce
manually a special volume allocation scheme: You can use the DSCLI command chrank
-reserve to reserve all of the extents from a rank in an extent pool from being used for the
next creation of volumes. Alternatively, you can use the DSCLI command chrank -release to
release a rank and make the extents available again.
Multi-rank extent pools that use storage pool striping are the general configuration approach
today on modern DS8900F storage systems to spread the data evenly across the ranks in a
homogeneous multi-rank extent pool and thus reduce skew and the likelihood of single-rank
hot spots. Without Easy Tier automatic mode management, such non-managed,
homogeneous multitier extent pools consist only of ranks of the same drive type and RAID
level. Although not required (and probably not realizable for smaller or heterogeneous
configurations), you can take the effective rank capacity into account, grouping ranks with and
without spares into different extent pools when using storage pool striping to ensure a strict
balanced workload distribution across all ranks up to the last extent. Otherwise, take
additional considerations for the volumes that are created from the last extents in a mixed
homogeneous extent pool that contains ranks with and without spares because these
volumes are probably allocated only on part of the ranks with the larger capacity and without
spares.
In combination with Easy Tier, a more efficient and automated way of spreading the
workloads evenly across all ranks in homogeneous multi-rank extent pool is available. The
automated intra-tier performance management (auto-rebalance) of Easy Tier efficiently
spreads the workload evenly across all ranks. It automatically relocates the data across the
ranks of the same storage class in an extent pool based on rank utilization to achieve and
maintain a balanced distribution of the workload, minimizing skew and avoiding rank hot
spots. You can enable auto-rebalance for homogeneous extent pool by setting the Easy Tier
management scope to all extent pools (ETautomode=all).
In addition, Easy Tier automatic mode can also handle storage device variations within a tier
that uses a micro-tiering capability.
With multi-rank extent pools, you can fully use the features of the DS8900F virtualization
architecture and Easy Tier that provide ease of use when you manage more applications
effectively and efficiently with a single DS8900F storage system. Consider multi-rank extent
pools and the use of Easy Tier automatic management especially for mixed workloads that
will be spread across multiple ranks. Multi-rank extent pools help simplify management and
volume creation. They also allow the creation of single volumes that can span multiple ranks
and thus exceed the capacity and performance limits of a single rank.
Easy Tier manual mode features, such as dynamic extent pool merge, dynamic volume
relocation (volume migration), and rank depopulation also help to manage easily complex
configurations with different extent pools. You can migrate volumes from one highly used
extent pool to another less used one, or from an extent pool with a lower storage class to
another one associated with a higher storage class, and merge smaller extent pools to larger
ones. You can also redistribute the data of a volume within a pool by using the manual volume
rebalance feature, for example, after capacity is added to a pool or two pools are merged, to
optimize manually the data distribution and workload spreading within a pool. However,
manual extent pool optimization and performance management, such as manual volume
rebalance, is not required (and not supported) if the pools are managed by Easy Tier
automatic mode. Easy Tier automatically places the data in these pools even if the pools are
merged or capacity is added to a pool.
For more information about data placement in extent pool configurations, see 3.2.3, “Extent
pool considerations” on page 51.
Important: Multi-rank extent pools offer numerous advantages with respect to ease of use,
space efficiency, and the DS8900F virtualization features. Multi-rank extent pools, in
combination with Easy Tier automatic mode, provide both ease of use and excellent
performance for standard environments with workload groups that share a set of
homogeneous resources.
A multitier extent pool can consist of one of the following storage class combinations with up
to three storage tiers, for instance:
HPFE High-Performance flash cards (Tier 0) + HPFE High-Capacity flash cards (Tier 2)
HPFE High-Performance flash cards (Tier 0) + HPFE High-Capacity of 3.84 TB (Tier1) +
HPFE High-Capacity flash cards (Tier 2)
Multitier extent pools are especially suited for mixed, resource-sharing workloads. Tiered
storage, as described in 4.1, “DS8900F Models” on page 60, is an approach of using types of
storage throughout the storage infrastructure. It is a mix of higher-performing/higher-cost
Always create hybrid extent pools for Easy Tier automatic mode management. The extent
allocation for volumes in hybrid extent pools differs from the extent allocation in homogeneous
pools. Any specified EAM, such as rotate extents or rotate volumes, is ignored when a new
volume is created in, or migrated into, a hybrid pool. The EAM is changed to managed when
the Easy Tier automatic mode is enabled for the pool, and the volume is under the control of
Easy Tier. Easy Tier then automatically moves extents to the most appropriate storage tier
and rank in the pool based on performance aspects.
Easy Tier automatically spreads workload across the resources (ranks and DAs) in a
managed hybrid pool. Easy Tier automatic mode adapts to changing workload conditions and
automatically promotes hot extents from the lower tier to the next upper tier. It demotes colder
extents from the higher tier to the next lower tier. The auto-rebalance feature evenly
distributes extents across the rank of the same tier based on rank utilization to minimize skew
and avoid hot spots. Auto-rebalance takes different device characteristics into account when
different devices or RAID levels are mixed within the same storage tier (micro-tiering).
Regarding the requirements of your workloads, you can create one or multiple pairs of extent
pools with different two-tier or three-tier combinations that depend on your needs and
available hardware resources. You can create three-tier extent pools for mixed, large
resource-sharing workload groups and benefit from fully automated storage performance and
economics management at a minimum management effort. You can boost the performance of
your high-demand workloads with High-Performance flash and reduce the footprint and costs
with High-Capacity flash for the lower-demand data.
You can use the DSCLI showfbvol/showckdvol -rank or -tier commands to display the
current extent distribution of a volume across the ranks and tiers, as shown in Example 3-5 on
page 56. Additionally, the volume heat distribution (volume heat map), provided by the Easy
Tier Reporting in the DS8900F GUI, can help identify the amount of hot, warm, and cold
extents for each volume and its distribution across the storage tiers in the pool. For more
information about Easy Tier Reporting in the DS8900F GUI, see IBM DS8000 Easy Tier,
REDP-4667.
The ratio of High-Performance flash and High-Capacity flash drive capacity in a hybrid pool
depends on the workload characteristics and skew and must be planned when ordering the
drive hardware for the identified workloads.
With the Easy Tier manual mode features, such as dynamic extent pool merge, dynamic
volume relocation, and rank depopulation, you can modify existing configurations easily,
depending on your needs. You can grow from a manually managed single-tier configuration
into a partially or fully automatically managed tiered storage configuration. You add tiers or
merge appropriate extent pools and enable Easy Tier at any time. For more information about
Easy Tier, see IBM DS8000 Easy Tier, REDP-4667.
Important: Multitier extent pools and Easy Tier help you to implement a tiered storage
architecture on a single DS8900F storage system with all its benefits at a minimum
management effort and ease of use. Easy Tier and its automatic data placement within
and across tiers spread the workload efficiently across the available resources in an extent
pool. Easy Tier constantly optimizes storage performance and storage economics and
adapts to changing workload conditions. Easy Tier can reduce overall performance
management efforts and help consolidate more workloads efficiently and effectively on a
single DS8000 storage system. It optimizes performance and reduces energy costs and
the footprint.
In this case, only used capacity is allocated in the pool and Easy Tier does not move unused
extents around or move hot extents on a large scale up from the lower tiers to the upper tiers.
However, thin-provisioned volumes are not fully supported by all DS8000 Copy Services or
advanced functions and platforms yet, so it might not be a valid approach for all environments
at this time. For more information about the initial volume allocation in hybrid extent pools, see
“Extent allocation in hybrid and managed extent pools” on page 44.
You can change the allocation policy according to your needs with the chsi command and the
t-ettierorder highutil or highperf option.
The default allocation order is High Performance for all-flash pools (DS8900F) accordingly
with the following priorities:
In a migration scenario, if you are migrating all the servers at once, some server volumes, for
reasons of space, might be placed completely on Flash tier 0 first, although these servers
also might have higher performance requirements. For more information about the initial
volume allocation in hybrid extent pools, see “Extent allocation in hybrid and managed extent
pools” on page 44.
When bringing a new DS8900F storage system into production to replace an older one, with
the older storage system often not using Easy Tier, consider the timeline of the
implementation stages by which you migrate all servers from the older to the new storage
system.
One good option is to consider a staged approach when migrating servers to a new multitier
DS8900F storage system:
Assign the resources for the high-performing and response time sensitive workloads first,
then add the less performing workloads.
Split your servers into several subgroups, where you migrate each subgroup one by one,
and not all at once. Then, allow Easy Tier several days to learn and optimize. Some
extents are moved to High-Performance flash and some extents are moved to
High-Capacity flash. After a server subgroup learns and reaches a steady state, the next
server subgroup can be migrated. You gradually allocate the capacity in the hybrid extent
pool by optimizing the extent distribution of each application one by one.
Another option that can help in some cases of new deployments is to reset the Easy Tier
learning heatmap for a certain subset of volumes, or for some pools. This action cuts off all
the previous days of Easy Tier learning, and the next upcoming internal auto-migration
plan is based on brand new workload patterns only.
With the default rotate extents (rotateexts in DSCLI, Rotate capacity in the GUI) algorithm,
the extents (1 GiB for FB volumes and 1113 cylinders or approximately 0.86 GiB for CKD
volumes) of each single volume are spread across all ranks within an extent pool and thus
across more drives. This approach reduces the occurrences of I/O hot spots at the rank level
within the storage system. Storage pool striping helps to balance the overall workload evenly
across the back-end resources. It reduces the risk of single ranks that become performance
bottlenecks while providing ease of use with less administrative effort.
The rotate extents and rotate volumes EAMs determine the initial data distribution of a
volume and thus the spreading of workloads in non-managed, single-tier extent pools. With
Easy Tier automatic mode enabled for single-tier (homogeneous) or multitier (hybrid) extent
pools, this selection becomes unimportant. The data placement and thus the workload
spreading is managed by Easy Tier. The use of Easy Tier automatic mode for single-tier
extent pools is highly encouraged for an optimal spreading of the workloads across the
resources. In single-tier extent pools, you can benefit from the Easy Tier automatic mode
feature auto-rebalance. Auto-rebalance constantly and automatically balances the workload
across ranks of the same storage tier based on rank utilization, minimizing skew and avoiding
the occurrence of single-rank hot spots.
Certain, if not most, application environments might benefit from the use of storage pool
striping (rotate extents):
Operating systems that do not directly support host-level striping.
VMware datastores.
Microsoft Exchange.
Windows clustering environments.
Older Solaris environments.
Environments that need to suballocate storage from a large pool.
Applications with multiple volumes and volume access patterns that differ from day to day.
Resource sharing workload groups that are dedicated to many ranks with host operating
systems that do not all use or support host-level striping techniques or application-level
striping techniques.
However, there might also be valid reasons for not using storage-pool striping. You might use
it to avoid unnecessary layers of striping and reorganizing I/O requests, which might increase
latency and not help achieve a more evenly balanced workload distribution. Multiple
independent striping layers might be counterproductive. For example, creating a number of
volumes from a single multi-rank extent pool that uses storage pool striping and then,
additionally, use host-level striping or application-based striping on the same set of volumes
might compromise performance. In this case, two layers of striping are combined with no
overall performance benefit. In contrast, creating four volumes from four different extent pools
from both rank groups that use storage pool striping and then use host-based striping or
application-based striping on these four volumes to aggregate the performance of the ranks in
all four extent pools and both processor complexes is reasonable.
DS8000 storage-pool striping is based on spreading extents across different ranks. So, with
extents of 1 GiB (FB) or 0.86 GiB (1113 cylinders/CKD), the size of a data chunk is rather
large. For distributing random I/O requests, which are evenly spread across the capacity of
each volume, this chunk size is appropriate. However, depending on the individual access
pattern of a specific application and the distribution of the I/O activity across the volume
capacity, certain applications perform better. Use more granular stripe sizes for optimizing the
distribution of the application I/O requests across different RAID arrays by using host-level
striping techniques or have the application manage the workload distribution across
independent volumes from different ranks.
Consider the following points for selected applications or environments to use storage-pool
striping in homogeneous configurations:
Db2: Excellent opportunity to simplify storage management by using storage-pool striping.
You might prefer to use Db2 traditional recommendations for Db2 striping for
performance-sensitive environments.
Db2 and similar data warehouse applications, where the database manages storage and
parallel access to data. Consider independent volumes on individual ranks with a careful
volume layout strategy that does not use storage-pool striping. Containers or database
partitions are configured according to suggestions from the database vendor.
Oracle: Excellent opportunity to simplify storage management for Oracle. You might prefer
to use Oracle traditional suggestions that involve ASM and Oracle striping capabilities for
performance-sensitive environments.
Small, highly active logs or files: Small highly active files or storage areas smaller than
1 GiB with a high access density might require spreading across multiple ranks for
performance reasons. However, storage-pool striping offers a striping granularity on extent
levels only around 1 GiB, which is too large in this case. Continue to use host-level striping
techniques or application-level striping techniques that support smaller stripe sizes. For
In general, storage-pool striping helps improve overall performance and reduces the effort of
performance management by evenly distributing data and workloads across a larger set of
ranks, which reduces skew and hot spots. Certain application workloads can also benefit from
the higher number of disk spindles behind one volume. But, there are cases where host-level
striping or application-level striping might achieve a higher performance, at the cost of higher
overall administrative effort. Storage-pool striping might deliver good performance in these
cases with less management effort, but manual striping with careful configuration planning
can achieve the ultimate preferred levels of performance. So, for overall performance and
ease of use, storage-pool striping might offer an excellent compromise for many
environments, especially for larger workload groups where host-level striping techniques or
application-level striping techniques are not widely used or available.
You must distribute the I/O workloads evenly across the available front-end resources:
I/O ports
HA cards
I/O enclosures
You must distribute the I/O workloads evenly across both DS8900F processor complexes
(called storage server 0/CEC#0 and storage server 1/CEC#1) as well.
Configuring the extent pools determines the balance of the workloads across the available
back-end resources, ranks, DA pairs, and both processor complexes.
Tip: If you use the GUI for an initial configuration, most of this balancing is done
automatically for you.
Each extent pool is associated with an extent pool ID (P0, P1, and P2, for example). Each
rank has a relationship to a specific DA pair and can be assigned to only one extent pool. You
can have as many (non-empty) extent pools as you have ranks. Extent pools can be
expanded by adding more ranks to the pool. However, when assigning a rank to a specific
extent pool, the affinity of this rank to a specific DS8900F processor complex is determined.
By hardware, a predefined affinity of ranks to a processor complex does not exist. All ranks
that are assigned to even-numbered extent pools (P0, P2, and P4, for example) form rank
group 0 and are serviced by DS8900F processor complex 0. All ranks that are assigned to
odd-numbered extent pools (P1, P3, and P5, for example) form rank group 1 and are serviced
by DS8900F processor complex 1.
To spread the overall workload across both DS8900F processor complexes, a minimum of
two extent pools is required: one assigned to processor complex 0 (for example, P0) and one
assigned to processor complex 1 (for example, P1).
For a balanced distribution of the overall workload across both processor complexes and both
DA cards of each DA pair, apply the following rules. For each type of rank and its RAID level,
storage type (FB or CKD), and drive characteristics (flash type, HCF/HPF, and capacity),
apply these rules:
Assign half of the ranks to even-numbered extent pools (rank group 0) and assign half of
them to odd-numbered extent pools (rank group 1).
Spread ranks with and without spares evenly across both rank groups.
Distribute ranks from each DA pair evenly across both rank groups.
Use the GUI for creating a new configuration; even if there are fewer controls, as far as the
balancing is concerned, the GUI takes care of rank and DA distribution when creating pools.
Multiple homogeneous extent pools, each with different storage classes, easily allow tiered
storage concepts with dedicated extent pools and manual cross-tier management. For
example, you can have extent pools with slow, large-capacity drives for backup purposes and
other extent pools with high-speed, small capacity drives or flash for performance-critical
transaction applications. Also, you can use hybrid pools with Easy Tier and introduce fully
automated cross-tier storage performance and economics management.
Using dedicated extent pools with an appropriate number of ranks and DA pairs for selected
workloads is a suitable approach for isolating workloads.
The minimum number of required extent pools depends on the following considerations:
The number of isolated and resource-sharing workload groups
The number of different storage types, either FB for Open Systems or IBM i, or CKD for
z Systems
Definition of failure boundaries (for example, separating logs and table spaces to different
extent pools)
in some cases, Copy Services considerations
Although you are not restricted from assigning all ranks to only one extent pool, the minimum
number of extent pools, even with only one workload on a homogeneously configured
DS8000 storage system, must be two (for example, P0 and P1). You need one extent pool for
each rank group (or storage server) so that the overall workload is balanced across both
processor complexes
To optimize performance, the ranks for each workload group (either isolated or
resource-sharing workload groups) must be split across at least two extent pools with an
equal number of ranks from each rank group. So, at the workload level, each workload is
balanced across both processor complexes. Typically, you assign an equal number of ranks
from each DA pair to extent pools assigned to processor complex 0 (rank group 0: P0, P2,
and P4, for example) and to extent pools assigned to processor complex 1 (rank group 1: P1,
P3, and P5, for example). In environments with FB and CKD storage (Open Systems and z
Systems), you additionally need separate extent pools for CKD and FB volumes. It is often
useful to have a minimum of four extent pools to balance the capacity and I/O workload
between the two DS8900F processor complexes. Additional extent pools might be needed to
meet individual needs, such as ease of use, implementing tiered storage concepts, or
However, the maximum number of extent pools is given by the number of available ranks (that
is, creating one extent pool for each rank).
In most cases, accepting the configurations that the GUI offers when doing initial setup and
formatting already gives excellent results. For specific situations, creating dedicated extent
pools on the DS8900F storage system with dedicated back-end resources for separate
workloads allows individual performance management for business and performance-critical
applications. Compared to share and spread everything storage systems without the
possibility to implement workload isolation concepts, creating dedicated extent pools on the
DS8900F storage system with dedicated back-end resources for separate workloads is an
outstanding feature of the DS8900F storage system as an enterprise-class storage system.
With this feature, you can consolidate and manage various application demands with different
performance profiles, which are typical in enterprise environments, on a single storage
system.
Before configuring the extent pools, collect all the hardware-related information of each rank
for the associated DA pair, disk type, available storage capacity, RAID level, and storage type
(CKD or FB) in a spreadsheet. Then, plan the distribution of the workloads across the ranks
and their assignments to extent pools.
Plan an initial assignment of ranks to your planned workload groups, either isolated or
resource-sharing, and extent pools for your capacity requirements. After this initial
assignment of ranks to extent pools and appropriate workload groups, you can create
additional spreadsheets to hold more details about the logical configuration and finally the
volume layout of the array site IDs, array IDs, rank IDs, DA pair association, extent pools IDs,
and volume IDs, and their assignments to volume groups and host connections.
4.9 Planning address groups, LSSs, volume IDs, and CKD PAVs
After creating the extent pools and evenly distributing the back-end resources (DA pairs and
ranks) across both DS8900F processor complexes, you can create host volumes from these
pools. When creating the host volumes, it is important to follow a volume layout scheme that
evenly spreads the volumes of each application workload across all ranks and extent pools
that are dedicated to this workload to achieve a balanced I/O workload distribution across
ranks, DA pairs, and the DS8900F processor complexes.
So, the next step is to plan the volume layout and thus the mapping of address groups and
LSSs to volumes created from the various extent pools for the identified workloads and
workload groups. For performance management and analysis reasons, it can be useful to
relate easily volumes, which are related to a specific I/O workload, to ranks, which finally
provide the physical disk spindles for servicing the workload I/O requests and determining the
I/O processing capabilities. Therefore, an overall logical configuration concept that easily
relates volumes to workloads, extent pools, and ranks is wanted.
Each volume is associated with a hexadecimal four-digit volume ID that must be specified
when creating the volume. An example for volume ID 1101 is shown in Table 4-4.
The first digit of the hexadecimal volume ID specifies the address group, 0 - F, of that volume.
Each address group can be used only by a single storage type, either FB or CKD. The first
and second digit together specify the logical subsystem ID (LSS ID) for Open Systems
volumes (FB) or the logical control unit ID (LCU ID) for z Systems volumes (CKD). There are
16 LSS/LCU IDs per address group. The third and fourth digits specify the volume number
within the LSS/LCU, 00 - FF. There are 256 volumes per LSS/LCU. The volume with volume
ID 1101 is the volume with volume number 01 of LSS 11, and it belongs to address group 1
(first digit).
The LSS/LCU ID is related to a rank group. Even LSS/LCU IDs are restricted to volumes that
are created from rank group 0 and serviced by processor complex 0. Odd LSS/LCU IDs are
restricted to volumes that are created from rank group 1 and serviced by processor complex
1. So, the volume ID also reflects the affinity of that volume to a DS8000 processor complex.
All volumes, which are created from even-numbered extent pools (P0, P2, and P4, for
example) have even LSS IDs and are managed by DS8000 processor complex 0. All volumes
that are created from odd-numbered extent pools (P1, P3, and P5, for example) have odd
LSS IDs and are managed by DS8000 processor complex 1.
In the past, for performance analysis reasons, it was useful to identify easily the association of
specific volumes to ranks or extent pools when investigating resource contention. But, since
the introduction of storage-pool striping, the use of multi-rank extent pools is the preferred
configuration approach for most environments. Multitier extent pools are managed by Easy
Tier automatic mode anyway, constantly providing automatic storage intra-tier and cross-tier
performance and storage economics optimization. For single-tier pools, turn on Easy Tier
management. In managed pools, Easy Tier automatically relocates the data to the
appropriate ranks and storage tiers based on the access pattern, so the extent allocation
across the ranks for a specific volume is likely to change over time. With storage-pool striping
or extent pools that are managed by Easy Tier, you no longer have a fixed relationship
between the performance of a specific volume and a single rank. Therefore, planning for a
hardware-based LSS/LCU scheme and relating LSS/LCU IDs to hardware resources, such as
ranks, is no longer reasonable. Performance management focus is shifted from ranks to
extent pools. However, a numbering scheme that relates only to the extent pool might still be
viable, but it is less common and less practical.
The common approach that is still valid today with Easy Tier and storage pool striping is to
relate an LSS/LCU to a specific application workload with a meaningful numbering scheme
for the volume IDs for the distribution across the extent pools. Each LSS can have 256
volumes, with volume numbers 00 - FF. So, relating the LSS/LCU to a certain application
workload and additionally reserving a specific range of volume numbers for different extent
This approach provides a logical configuration concept that provides ease of use for storage
management operations and reduces management efforts when using the DS8900F related
Copy Services because basic Copy Services management steps (such as establishing
Peer-to-Peer Remote Copy (PPRC) paths and consistency groups) are related to LSSs. If
Copy Services are not planned, plan the volume layout because overall management is
easier if you must introduce Copy Services in the future (for example, when migrating to a
new DS8000 storage system that uses Copy Services).
However, the strategy for the assignment of LSS/LCU IDs to resources and workloads can
still vary depending on the particular requirements in an environment.
The following section introduces suggestions for LSS/LCU and volume ID numbering
schemes to help to relate volume IDs to application workloads and extent pools.
Typically, when using LSS/LCU IDs that relate to application workloads, the simplest
approach is to reserve a suitable number of LSS/LCU IDs according to the total number of
volumes requested by the application. Then, populate the LSS/LCUs in sequence, creating
the volumes from offset 00. Ideally, all volumes that belong to a certain application workload
or a group of related host systems are within the same LSS. However, because the volumes
must be spread evenly across both DS8000 processor complexes, at least two logical
subsystems are typically required per application workload. One even LSS is for the volumes
that are managed by processor complex 0, and one odd LSS is for volumes managed by
processor complex 1 (for example, LSS 10 and LSS 11). Moreover, consider the future
capacity demand of the application when planning the number of LSSs to be reserved for an
application. So, for those applications that are likely to increase the number of volumes
beyond the range of one LSS pair (256 volumes per LSS), reserve a suitable number of LSS
pair IDs for them from the beginning.
Figure 4-5 shows an example of an application-based LSS numbering scheme. This example
shows three applications, application A, B, and C, that share two large extent pools. Hosts A1
and A2 both belong to application A and are assigned to LSS 10 and LSS 11, each using a
different volume ID range from the same LSS range. LSS 12 and LSS 13 are assigned to
application B, which runs on host B. Application C is likely to require more than 512 volumes,
so use LSS pairs 28/29 and 2a/2b for this application.
1000 1010 1200 2800 2a00 1100 1110 1300 2900 2b00
1001 1011 1201 2801 2a01 1101 1111 1301 2901 2b01
1002 1012 1202 2802 2a02 1102 1112 1302 2902 2b02
1003 1013 1203 2803 2a03 1103 1113 1303 2903 2b03
1004 1014 1204 2804 2a04 1104 1114 1304 2904 2b04
1005 1015 1205 2805 2a05 Ranks Ranks 1105 1115 1305 2905 2b05
1006 1016 1206 2806 2a06 . . 1106 1116 1306 2906 2b06
1007 1017 1207 2807 2a07 . . 1107 1117 1307 2907 2b07
1008 1018 1208 2808 2a08 . . 1108 1118 1308 2908 2b08
1009 1019 1209 2809 2a09 Ranks Ranks 1109 1119 1309 2909 2b09
100a 101a 120a 280a 2a0a 110a 111a 130a 290a 2b0a
100b 101b 120b 280b 2a0b 110b 111b 130b 290b 2b0b
100c 101c 120c 280c 2a0c 110c 111c 130c 290c 2b0c
100d 101d 120d 280d 2a0d 110d 111d 130d 290d 2b0d
100e 101e 120e 280e 2a0e 110e 111e 130e 290e 2b0e
100f 101f 120f 280f 2a0f 110f 111f 130f 290f 2b0f
P0 P1
Host A1 Host A2 Host B Host C Host A1 Host A2 Host B Host C
Application A Application B Application C Application A Application B Application C
Figure 4-5 Application-related volume layout example for two shared extent pools
In Figure 4-6 , the workloads are spread across four extent pools. Again, assign two LSS/LCU
IDs (one even, one odd) to each workload to spread the I/O activity evenly across both
processor complexes (both rank groups). Additionally, reserve a certain volume ID range for
each extent pool based on the third digit of the volume ID. With this approach, you can quickly
create volumes with successive volume IDs for a specific workload per extent pool with a
single DSCLI mkfbvol or mkckdvol command.
Hosts A1 and A2 belong to the same application A and are assigned to LSS 10 and LSS 11.
For this workload, use volume IDs 1000 - 100f in extent pool P0 and 1010 - 101f in extent pool
P2 on processor complex 0. Use volume IDs 1100 - 110f in extent pool P1 and 1110 - 111f in
extent pool P3 on processor complex 1. In this case, the administrator of the host system can
easily relate volumes to different extent pools and thus different physical resources on the
same processor complex by looking at the third digit of the volume ID. This numbering
scheme can be helpful when separating, for example, DB table spaces from DB logs on to
volumes from physically different pools.
1 1 2 Ranks Ranks 1 1 2
0 0 a . . 1 1 b
0 0 0 . . 0 0 0
0 1 0 Ranks Ranks 0 1 0
P0 P1
1 1 2 Ranks Ranks 1 1 2
0 0 a . . 1 1 b
1 1 1 . . 1 1 1
0 1 0 Ranks Ranks 0 1 0
P2 P3
Host A1 Host A2 Host B Host A1 Host A2 Host B
Application A Application B Application A Application B
Figure 4-6 Application and extent pool-related volume layout example for four shared extent pools
The example that is depicted in Figure 4-7 on page 97 provides a numbering scheme that can
be used in a FlashCopy scenario. Two different pairs of LSS are used for source and target
volumes. The address group identifies the role in the FlashCopy relationship: address group 1
is assigned to source volumes, and address group 2 is used for target volumes. This
numbering scheme allows a symmetrical distribution of the FlashCopy relationships across
source and target LSSs. For example, source volume 1007 in P0 uses the volume 2007 in P2
as the FlashCopy target. In this example, use the third digit of the volume ID within an LSS as
a marker to indicate that source volumes 1007 and 1017 are from different extent pools. The
same approach applies to the target volumes, for example, volumes 2007 and 2017 are from
different pools.
However, for the simplicity of the Copy Services management, you can choose a different
extent pool numbering scheme for source and target volumes (so 1007 and 2007 are not from
the same pool) to implement the recommended extent pool selection of source and target
volumes in accordance with the FlashCopy guidelines. Source and target volumes must stay
on the same rank group but different ranks or extent pools. For more information about this
topic, see the FlashCopy performance chapters in IBM DS8000 Copy Services, SG24-8367.
Figure 4-7 Application and extent pool-related volume layout example in a FlashCopy scenario
Tip: Use the GUI advanced Custom mode to select a specific LSS range when creating
volumes. Choose this Custom mode and also the appropriate Volume definition mode, as
shown in Figure 4-8 .
Figure 4-8 Create volumes in Custom mode and specify an LSS range
For high availability, each host system must use a multipathing device driver, such as the
native MPIO of the respective operating system. Each host system must have a minimum of
two host connections to HA cards in different I/O enclosures on the DS8900F storage system.
Preferably, they are evenly distributed between left side (even-numbered) I/O enclosures and
right side (odd-numbered) I/O enclosures. The number of host connections per host system is
primarily determined by the required bandwidth. Use an appropriate number of HA cards to
satisfy high throughput demands.
With typical transaction-driven workloads that show high numbers of random, small-blocksize
I/O operations, all ports in a HA card can be used likewise. For the preferred performance of
workloads with different I/O characteristics, consider the isolation of large-block sequential
and small-block random workloads at the I/O port level or the HA card level.
The preferred practice is to use dedicated I/O ports for Copy Services paths and host
connections. For more information about performance aspects that are related to Copy
Services, see the performance-related chapters in IBM DS8000 Copy Services, SG24-8367.
To assign FB volumes to the attached Open Systems hosts by using LUN masking, when
using the DSCLI, these volumes must be grouped in the DS8900F volume groups. A volume
group can be assigned to multiple host connections, and each host connection is specified by
the worldwide port name (WWPN) of the host FC port. A set of host connections from the
same host system is called a host attachment. The same volume group can be assigned to
multiple host connections; however, a host connection can be associated only with one
volume group. To share volumes between multiple host systems, the most convenient way is
to create a separate volume group for each host system and assign the shared volumes to
each of the individual volume groups as required. A single volume can be assigned to multiple
volume groups. Only if a group of host systems shares a set of volumes, and there is no need
to assign additional non-shared volumes independently to particular hosts of this group, can
you consider using a single shared volume group for all host systems to simplify
management. Typically, there are no significant DS8000 performance implications because of
the number of DS8000 volume groups or the assignment of host attachments and volumes to
the DS8000 volume groups.
Do not omit additional host attachment and host system considerations, such as SAN zoning,
multipathing software, and host-level striping.
After the DS8900F storage system is installed, you can use the DSCLI lsioport command to
display and document I/O port information, including the I/O ports, HA type, I/O enclosure
location, and WWPN. Use this information to add specific I/O port IDs, the required protocol
(FICON or FCP), and the DS8000 I/O port WWPNs to the plan of host and remote mirroring
connections that are identified in 4.4, “Planning allocation of disk and host connection
capacity” on page 73.
Additionally, the I/O port IDs might be required as input to the DS8900F host definitions if host
connections must be restricted to specific DS8900F I/O ports by using the -ioport option of
the mkhostconnect DSCLI command. If host connections are configured to allow access to all
DS8000 I/O ports, which is the default, typically the paths must be restricted by SAN zoning.
The I/O port WWPNs are required as input for SAN zoning. The lshostconnect -login
The DS8900F I/O ports use predetermined, fixed DS8900F logical port IDs in the form I0xyz,
where:
x: I/O enclosure
y: Slot number within the I/O enclosure
z: Port within the adapter
Slot numbers: The slot numbers for logical I/O port IDs are one less than the physical
location numbers for HA cards, as shown on the physical labels and in IBM Spectrum
Control/Storage Insights, for example, I0101 is R1-XI2-C1-T2.
A simplified example of spreading the DS8000 I/O ports evenly to two redundant SAN fabrics
is shown in Figure 4-9 . The SAN implementations can vary, depending on individual
requirements, workload considerations for isolation and resource-sharing, and available
hardware resources.
0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 16 1 Bay 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 7 7 7 7 7 7 7 17 1
3 3 3 3 0 0 0 0 3 3 3 3 0 0 0 0 3 3 3 3 0 0 0 0 3 3 3 3 0 0 0 00 0 Card 0 0 0 0 3 3 3 3 0 0 0 0 3 3 3 3 0 0 0 0 3 3 3 3 0 0 0 0 3 3 3 03 0
0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 03 1 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 03 1
Port
01 02 03 04 05 06 07 08 01 02 03 04 05 06 07 08
Figure 4-9 Example of spreading DS8000 I/O ports evenly across two redundant SAN fabrics
Each adapter type is available in both longwave (LW) and shortwave (SW) versions. The
DS8900F I/O bays support up to four host adapters for each bay, allowing up to 128 ports
maximum for each storage system. This configuration results in a theoretical aggregated host
I/O bandwidth around 128 x 32 Gbps. Each port provides industry-leading throughput and I/O
rates for FICON and FCP.
The host adapters that are available in the DS8900F have the following characteristics:
32 Gbps FC HBAs (32 GFC):
– Four FC ports
– FC Gen7 technology
– New IBM Custom ASIC with Gen3 PCIe interface
– Quad-core PowerPC processor
– Negotiation to 32, 16, or 8 Gbps (4 Gbps or less is not possible)
16 Gbps FC HBAs (16 GFC):
– Four FC ports
– Gen2 PCIe interface
– Quad-core PowerPC processor
– Negotiation to 16, 8, or 4 Gbps (2 Gbps or less is not possible)
The DS8900F supports a mixture of 32 Gbps and 16 Gbps FC adapters. Hosts with slower
FC speeds like 4 Gbps are still supported if their HBAs are connected through a switch.
The 32 Gbps FC adapter is encryption-capable. Encrypting the host bus adapter (HBA) traffic
usually does not cause any measurable performance degradation.
With FC adapters that are configured for FICON, the DS8900F series provides the following
configuration capabilities:
Fabric or point-to-point topologies
A maximum of 128 host adapter ports, depending on the DS8900F system memory and
processor features
A maximum of 509 logins for each FC port
A maximum of 8192 logins for each storage unit
A maximum of 1280 logical paths on each FC port
Access to all 255 control-unit images (65,280 Count Key Data (CKD) devices) over each
FICON port
A maximum of 512 logical paths for each control unit image
An IBM Z server supports 32,000 devices per FICON host channel. To fully access 65,280
devices, it is necessary to connect multiple FICON host channels to the storage system. You
can access the devices through an FC switch or FICON director to a single storage system
FICON port.
Figure 4-10 on page 101 illustrates read/write performance with zHPF protocol
Look at Example 4-2, which shows a DS8900F storage system with a selection of different
HAs:
Example 4-2 DS8000 HBA example - DSCLI lsioport command output (shortened)
dscli> lsioport -l
ID WWPN State Type topo portgrp Speed Frame I/O Enclosure HA
Card
================================================================================================
I0200 500507630A1013E7 Online Fibre Channel-SW SCSI-FCP 0 16 Gb/s 1 3 1
I0201 500507630A1053E7 Online Fibre Channel-SW FICON 0 16 Gb/s 1 3 1
I0202 500507630A1093E7 Online Fibre Channel-SW FICON 0 16 Gb/s 1 3 1
I0203 500507630A10D3E7 Online Fibre Channel-SW FICON 0 16 Gb/s 1 3 1
I0230 500507630A1313E7 Offline Fibre Channel-LW - 0 32 Gb/s 1 3 4
I0231 500507630A1353E7 Offline Fibre Channel-LW - 0 32 Gb/s 1 3 4
I0232 500507630A1393E7 Offline Fibre Channel-LW - 0 32 Gb/s 1 3 4
I0233 500507630A13D3E7 Offline Fibre Channel-LW - 0 32 Gb/s 1 3 4
I0240 500507630A1413E7 Online Fibre Channel-SW FICON 0 16 Gb/s 1 3 5
I0241 500507630A1453E7 Offline Fibre Channel-SW - 0 16 Gb/s 1 3 5
I0242 500507630A1493E7 Online Fibre Channel-SW SCSI-FCP 0 16 Gb/s 1 3 5
I0243 500507630A14D3E7 Offline Fibre Channel-SW - 0 16 Gb/s 1 3 5
When planning the paths for the host systems, ensure that each host system uses a
multipathing device driver and a minimum of two host connections to two different HA cards in
different I/O enclosures on the DS8900F. Preferably, they are evenly distributed between left
side (even-numbered) I/O enclosures and the right side (odd-numbered) I/O enclosures for
highest availability. Multipathing additionally optimizes workload spreading across the
available I/O ports, HA cards, and I/O enclosures.
You must tune the SAN zoning scheme to balance both the oversubscription and the
estimated total throughput for each I/O port to avoid congestion and performance bottlenecks.
PPRC fine-tuning
For optimal use of your PPRC connections, observe the following guidelines:
Avoid using FCP ports for both host attachment and replication.
With bi-directional PPRC: Avoid using a specific adapter port for both replication output
and input I/Os, rather consider to use each port in one direction only.
When having a larger number of replication ports, assign even and odd LSS/LCU to
different ports (example: With 4 paths, could use two only for all even LSSs to be
replicated, and the other two for all odd LSSs to be replicated).
These guidelines help the adapters to reduce internal switching, what slightly increases their
throughput for PPRC.
Information in this chapter is not dedicated to IBM System Storage DS8900F. You can apply
this information generally to other flash or disk storage systems.
In general, you describe the workload in these terms. The following sections cover the details
and describe the different workload types.
To help you with the implementation of storage systems with Microsoft Exchange Server 2019
is the Microsoft Jetstess tool. This tool provides the ability to simulate and verify the
performance and stability of storage subsystems before putting them into production
environments. Microsoft Exchange documentation can be found at:
[Link]
See also Microsoft Exchange Solution Reviewed Program (ESRP) Storage which is primarily
designed for the testing of third-party storage solutions. Refer to the following Microsoft
Exchange article:
[Link]
It is not as easy to divide known workload types into cache friendly and cache unfriendly. An
application can change its behavior during the day several times. When users work with data,
it is cache friendly. When the batch processing or reporting starts, it is not cache friendly. High
random-access numbers mean a not cache-friendly workload type. However, if the amount of
data that is accessed randomly is not large, 10% for example, it can be placed totally into the
disk system cache and becomes cache friendly.
Sequential workloads are always cache-friendly because of prefetch algorithms that exist in
the DS8900 storage system. Sequential workload is easy to prefetch. You know that the next
10 or 100 blocks are accessed, and you can read them in advance. For the random
workloads, it is different. There are no purely random workloads in the actual applications,
and it is possible to predict some moments. The DS8900 storage systems use the following
powerful read-caching algorithms to deal with cache unfriendly workloads:
Sequential Prefetching in Adaptive Replacement Cache (SARC)
Adaptive Multi-stream Prefetching (AMP)
Intelligent Write Caching (IWC)
The write workload is always cache friendly because every write request comes to the cache
first and the application gets the reply when the request is placed into cache. Write requests
are served at least two times longer by the back end than read requests. You always need to
wait for the write acknowledgment, which is why cache is used for every write request.
To learn more about the DS8900 caching algorithms, see Chapter 2.2.1, “DS8000 caching
algorithms” on page 17.
Table 5-1 on page 111 provides a summary of the characteristics of the various types of
workloads.
The database environment is often difficult to typify because I/O characteristics differ greatly.
A database query has a high read content and is of a sequential nature. It also can be
random, depending on the query type and data structure. Transaction environments are more
random in behavior and are sometimes cache unfriendly. At other times, they have good hit
ratios. You can implement several enhancements in databases, such as sequential prefetch
and the exploitation of I/O priority queuing, that affect the I/O characteristics. Users must
understand the unique characteristics of their database capabilities before generalizing the
performance.
The workload pattern for the logging is sequential writes mostly. Block size is about 64 KB.
Reads are rare and might not be considered. The write capability and location of the online
transaction logs are most important. The entire performance of the database depends on the
writes to the online transaction logs, if the database is very write intensive, consider RAID-10.
If the configuration has several extent pool pairs in its logical layout also consider physically
separating log files from the flash modules or disks on which the data and index files are.
A database can benefit from using a large amount of server memory for the large buffer pool.
For example, the database large buffer pool, when managed correctly, can avoid a large
percentage of the accesses to flash or disk. Depending on the application and the size of the
buffer pool, this large buffer pool can convert poor cache hit ratios into synchronous reads in
Db2. You can spread data across several RAID arrays to increase the throughput even if all
accesses are read misses. Db2 administrators often require that table spaces and their
indexes are placed on separate volumes. This configuration improves both availability and
performance.
3 Digital video 100/0, 0/100, or 50/50 128 KB, 256 - Sequential, good
editing 1024 KB caching.
An example of a data warehouse is a design around a financial institution and its functions,
such as loans, savings, bank cards, and trusts for a financial institution. In this application,
there are three kinds of operations: initial loading of the data, access to the data, and
updating of the data. However, because of the fundamental characteristics of a warehouse,
these operations can occur simultaneously. At times, this application can perform 100% reads
when accessing the warehouse, 70% reads and 30% writes when accessing data while
record updating occurs simultaneously, or even 50% reads and 50% writes when the user
load is heavy. The data within the warehouse is a series of snapshots and after the snapshot
of data is made, the data in the warehouse does not change. Therefore, there is typically a
higher read ratio when using the data warehouse.
Object-Relational DBMSs (ORDBMSs) are being developed, and they offer traditional
relational DBMS features and support complex data types. Objects can be stored and
manipulated, and complex queries at the database level can be run. Object data is data about
real objects, including information about their location, geometry, and topology. Location
describes their position, geometry relates to their shape, and topology includes their
relationship to other objects. These applications essentially have an identical profile to that of
the data warehouse application.
Depending on the host and operating system that are used to perform this application,
transfers are typically medium to large and access is always sequential. Image processing
consists of moving huge image files for editing. In these applications, the user regularly
moves huge high-resolution images between the storage device and the host system. These
applications service many desktop publishing and workstation applications. Editing sessions
can include loading large files of up to 16 MB into host memory, where users edit, render,
modify, and store data onto the storage system. High interface transfer rates are needed for
these applications, or the users waste huge amounts of time by waiting to see results. If the
interface can move data to and from the storage device at over 32 MBps, an entire 16 MB
image can be stored and retrieved in less than 1 second. The need for throughput is all
important to these applications and along with the additional load of many users, I/O
operations per second (IOPS) are also a major requirement.
For general rules for application types, see Table 5-1 on page 111.
Transaction distribution
Table 5-4 breaks down the number of times that key application transactions are run by
the average user and how much I/O is generated per transaction. Detailed application and
database knowledge is required to identify the number of I/O’s and the type of I/O’s per
transaction. The following information is a sample.
Table 5-5 Logical I/O profile from user population and transaction profiles
Transaction Iterations I/O’s I/O type Average Peak users
per user user I/O’s
Transfer money 0.5 4 reads/4 RR, random 1000, 1000 3000 R/W
to checking writes write I/O’s
(RW)
Transfer money to checking 1000, 1000 100 RR, 1000 RW 300 RR, 3000 RW
Configure new bill payee 1000, 1000 100 RR, 1000 RW 300 RR, 3000 RW
As you can see in Table 5-6, to meet the peak workloads, you must design an I/O
subsystem to support 6000 random reads/sec and 6000 random writes/sec:
Physical I/O’s The number of physical I/O’s per second from the host perspective
RR Random Read I/O’s
RW Random Write I/O’s
These commands are standard tools that are available with most UNIX and UNIX like (Linux)
systems. Use iostat to obtain the data that you need to evaluate your host I/O levels. Specific
monitoring tools are also available for AIX, Linux, Hewlett-Packard UNIX (HP-UX), and Oracle
Solaris.
To activate:
1. Open Windows Start
2. Search for Performance Monitor and click on perfmon app.
The IBM i tools, such as Performance Explorer and iDoctor, are used to analyze the hot data
in IBM i and to size flash modules for this environment. Other tools, such as Job Watcher, are
used mostly in solving performance problems, together with the tools for monitoring the
DS8900 storage system such as Storage Insights.
For more information about the IBM i tools and their usage, see 9.4.1, “IBM i performance
tools” on page 217.
z Systems environment
The z/OS systems have proven performance monitoring and management tools that are
available to use for performance analysis. Resource Measurement Facility (RMF), a z/OS
performance tool, collects performance data and reports it for the wanted interval. It also
provides cache reports. The cache reports are similar to the disk-to-cache and cache-to-disk
reports that are available in IBM Spectrum Control, except that the RMF cache reports are in
text format. RMF collects the performance statistics of the DS8900 storage system that are
related to the link or port and also to the rank and extent pool. The REPORTS(ESS) parameter in
the RMF report generator produces the reports that are related to those resources.
The RMF Spreadsheet Reporter is an easy way to create Microsoft Excel Charts based on
RMF postprocessor reports. It is used to convert your RMF data to spreadsheet format and
generate representative charts for all performance charts for all performance-relevant areas,
and is described here:
[Link]
For more information, see Chapter 10, “Performance considerations for IBM z Systems
servers” on page 225.
IBM and IBM Business Partner specialists use the IBM Storage Modeler (StorM) tool for
performance modeling of the workload and capacity planning on the systems.
IBM StorM can be used to help to plan the DS8900 hardware configuration. With IBM StorM,
you model the DS8900 performance when migrating from another disk system or when
making changes to an existing DS8000 configuration and the I/O workload. IBM StorM is for
use with both z Systems and Open Systems server workloads. In addition, IBM StorM also
models storage capacity requirements.
You can model the following major DS8000 components by using IBM StorM:
Supported DS8000 model: DS8884F, DS8884, DS8886F, DS8886, DS8888F, DS8882F,
DS8980F, DS8910F and DS8950F models
Capacity sizing in IBM i
Importing data for performance assessments
When working with IBM StorM, always ensure that you input accurate and representative
workload information because IBM StorM results depend on the input data that you provide.
Also, carefully estimate the future demand growth that you input to IBM StorM for modeling
projections. The hardware configuration decisions are based on these estimates.
For more information about using StorM, see Chapter 6.1, “IBM Storage Modeller” on
page 126.
Workload testing
There are various reasons for conducting I/O load tests. They all start with a hypothesis and
have defined performance requirements. The objective of the test is to determine whether the
hypothesis is true or false. For example, a hypothesis might be that you think that a DS8900
storage system with 18 flash arrays and 256 GB of cache can support 100,000 IOPS with a
70/30/50 workload and the following response time requirements:
Read response times: 95th percentile < 10 ms
Write response times: 95th percentile < 5 ms
With these configuration settings, you can simulate and test most types of workloads. Specify
the workload characteristics to reflect the workload in your environment.
From 2019 Subsystem Device Driver Specific Module (SDDDSM) and Subsystem Device
Driver Path Control Module (SDDPCM) are no longer supported for DS8000. Users will need
to use native operating system device drivers for multipath support. For information, refer to:
[Link]
This section describes how to perform these tasks using Multipath I/O devices:
To test the sequential read speed of a rank, run the following command:
time dd if=/dev/rhdiskxx of=/dev/null bs=128k count=781
The rhdiskxx is the character or raw device file for the logical unit numbers (LUN) that is
presented to the operating system by Multipath I/O (MPIO). This command reads 100 MB off
rhdiskxx and reports how long it takes in seconds. Take 100 MB and divide by the number of
seconds that is reported to determine the MBps read speed.
Your nmon monitor (the e option) reports that this previous command imposed a sustained 100
MBps bandwidth with a blocksize=128 K on rhdiskxx. Notice the xfers/sec column; xfers/sec
is IOPS. Now, if your dd command did not error out because it reached the end of the disk,
press Ctrl+c to stop the process. Now, nmon reports an idle status. Next, run the following dd
command with a 4 KB blocksize and put it in the background:
For this command, nmon reports a lower MBps but a higher IOPS, which is the nature of I/O as
a function of blocksize. Run your dd sequential read command with a bs=1024 and you see a
high MBps but a reduced IOPS.
Try different block sizes, different raw hdisk devices and combinations of reads and writes.
Run the commands against the block device (/dev/hdiskxx) and notice that block size does
not affect performance.
Because the dd command generates a sequential workload, you still must generate the
random workload. You can use a no-charge open source tool, such as Vdbench.
Vdbench is a disk and tape I/O workload generator for verifying data integrity and measuring
the performance of direct-attached and network-connected storage on Windows, AIX, Linux,
Solaris, OS X, and HP-UX. It uses workload profiles as the inputs for the workload modeling
and has its own reporting system. All output is presented in HTML files as reports and can be
analyzed later. For more information, refer to:
[Link]
[Link]
We also consider special aspects of the sizing like planning for Multi-Target PPRC, or working
with DS8000 storage systems using Easy Tier.
A precise storage modeling, for both capacity and performance, is paramount, and the StorM
tool follows the latest mathematical modeling for it. Storage systems which are supported
include the DS8000 models starting with the DS8880 (POWER8 based) generation.
Some aspects of the outgoing Disk Magic tool are still mentioned here, given that this tool can
also model DS8000 generations before DS8880 or given its still excellent means of importing
collected host performance data and doing a quick number crunching on them. Disk Magic
however does not pick up anymore the latest support for product features coming new in 2021
and later.
This chapter gives and idea about the basic usage of these tools. Clients who want to have
such a study run best contact their IBM or BP representative for it.
Performing an extensive and elaborate lab benchmark by using the correct hardware and
software provides a more accurate result because it is real-life testing. Unfortunately, this
approach requires much planning, time, and preparation, plus a significant amount of
resources, such as technical expertise and hardware/software in an equipped lab.
Doing a study with the sizing tool requires much less effort and resources. Retrieving the
performance data of the workload and getting the configuration data from the servers and the
storage systems is all that is required from the client. Especially when replacing a storage
system, or when consolidating several, these tools are valuable. In the past we often had
small amounts of flash like 5% and less, and then many 10K-rpm HDDs in multi-frame
installations. Now we sharply shrink footprint by switching to all-flash installations with some
major amount of high-capacity flash, and often we consolidate several older and smaller
DS8000 models into one bigger of a newer generation: all that can be simulated. But we can
also simulate an upgrade for an existing storage system that we want to keep for longer: For
instance: Does it help to upgrade cache, and what would be the positive effect on latency; or
am I having a utilization problem with my host bus adapters, and how many should I add to
get away with it.
Now for collecting your performance data, especially when on host side: What matters is, that
the time frames are really representative, and also for most busy times of a year even. And:
Different applications can peak at different times. For example, a processor-intensive online
application might drive processor utilization to a peak while users are actively using the
system. However, the bus or drive utilization might be at a peak when the files are backed up
during off-hours. So, you might need to model multiple intervals to get a complete picture of
your processing environment.
But also the capacity planning must be clear: is today‘s capacity to be retained 1:1? Or would
it grow further, and by what factor. And for the performance: shall we also factor in some load
growth, and to what extent.
A part of these data can be obtained from either the collected Z host performance data, or
when using a tool like IBM Spectrum Control, or IBM Storage Insights.
In both Storage Insights and IBM Spectrum Control, when going to the Block Storage
Systems and then doing a right-mouse click, the menu opens to “Export Performance Data”,
as in Figure 6-1 on page 128. This performance data package gives valuable and detailed
performance and configuration information, for instance, on drive types and quantity, RAID
type, extent pool structure, number of HBAs, ports, and their speeds, if PPRC is used, and
more.
It also shows those ports that are not used or RAID arrays outside pools and, therefore, not
used.
Before exporting the package, extend the period from the default of a few hours to the desired
longer interval, incorporating many, and representative, busy days, as in Figure 6-2.
Figure 6-2 SC/SI exporting performance data: Select the right time duration
You can consider may aspects of the configuration, including PPRC (synchronous,
asynchronous; what is the distance in km, Metro/Global Mirror, or multi-site).
SC_PerfPkg_<DS8000_name>_<date>_<duration>.ZIP
and contain mainly CSV files, as shown in Figure 6-3 on page 129.
The StorageSystem CSV file is essential, as it contains the view on the entire storage system
in terms of performance data to the SAN and to the hosts. But for setting up your model, the
other ones also give you important information beforehand:
From the Nodes CSV, get some information on the DS8000 cache available and used.
In the Pools CSV, get an information how many extent pools exist.
In the RAIDArrays CSV, get the drive types used, and the type of RAID formatting for each
rank (e.g., RAID-6). When you see arrays which are consistent with zero load, don’t take
them into your modeling.
In the StoragePorts CSV, see how many HBAs are installed, how many ports thereof are
in use, and their speeds. Also, you see how many of these HBAs and ports do PPRC
traffic.
Look at different points of time: If some ports (or even HBAs) are consistent with zero load,
we take them out of the modeling.
Looking at the HostConnections CSV, you might want to check if, eventually, not all hosts
take part in the PPRC.
The StorageSystem CSV finally tells you about the exact DS8000 model (e.g., 986) and
finally gives the primary peak data for the modeling. Again here, you also can check the
amount of PPRC activity.
May also check if there are any exceptionally high values somewhere, which gives additional
ideas about why a customer might ask for exchanging the storage system.
All this information from the several CSVs can build up your configuration of an existing
DS8000. But competition storage systems can be also monitored by SC/SI.
The CSV, or ZIP file, can then be imported into StorM, in an automated way, to let the tool
determine the peak load intervals, and use them for your monitoring.
Workload profiles
After importing the workload file into the StorM tool, we see several categories there that
typically make up a full workload profile: The Read I/O Percentage (or, Read:Write ratio), the
transfer sizes for reads and writes respectively, the read-cache hit ratio, and the sequential
ratios for reads and for writes. The latter ones are optional, in case you would enter a
workload manually, and hence are under the Advanced Options, as in Figure 6-4 on
page 130.
The workload peak profiles thus obtained can then be scaled further with a factor for “Future
growth”. For this value, it is even possible to enter negative values like -50 %, for instance in
case you would want to cut your workload in half when splitting it into pools and into the StorM
“Data Spaces”.
The Cache Read Hit percentage is shown exactly as measured from the previous legacy
storage system. However in case you already are modeling a replacement system with a
bigger cache, can switch from automated input to manual and then use the Estimator button
to calculate the new cache-hit ratio on the new storage system.
Once the data are read in automatically, you have up to 9 different kinds of peak workload
intervals to choose from when doing your modeling, like for instance the Total I/O Rate (=
usually the most important overall), or the Total Data Rate (= also important and should
always be looked at), or response time peaks (= which are interesting to consider because
here we often have low cache-hit ratios, and deviating read:write ratios or suddenly higher
block sizes), or the peak of the Write Data Rate (= this one should be additionally considered
However, a significant number of IBM i clients is entirely focused on that platform, and they
prefer to collect their storage performance data on the host platform.
On host side, one of the options is then the Performance Tools (PT1) reports. IBM i clients
sometimes can have very heavy short-term I/O peaks, and if it’s an i-only platform, then a
data collection with the PT1 reports can also give you a precise idea of what is happening on
the storage side.
From the PT1 reports, IO peaks can be manually transfered into StorM.
Use a collection interval of 5 min or less with it, and find more background on this tool and
method at:
[Link]
Figure 6-6 shows in the QMGTOOLS QPERF Menu, which options exist to gather data for
modeling storage performance.
Figure 6-6 QMGTOOLS QPERF menu: Gathering data for storage performance modeling
The newer option now is 15, to collect the needed metrics that can directly be used with the
StorM tool.
IIBM i has a very low Easy Tier skew typically. Unless you have a precisely measured skew
factor, enter “Easy Tier Skew = Very Low” (Factor 2.0) in the Performance tab of StorM.
The SMF data are packed by using RMFPACK. The installation package, when expanded,
shows a user guide and two XMIT files: [Link] and [Link].
To pack the SMF data set into a ZRF file, complete the following steps:
1. Install RMFPACK on your z/OS system.
2. Prepare the collection of SMF data.
3. Run the $1SORT job to sort the SMF records.
4. Run the $2PACK job to compress the SMF records and to create the ZRF file.
For most mainframe customers, sampling periods of 15 min are fine. The main thing is to
cover those days which are really busy, as they can occur only at certain times in a month, or
even a year potentially. So 1 or 2 busy days to have performance data of could be sufficient,
Figure 6-7 shows the modeling of DS8000 component utilization for a given peak workload
interval. We can make the following observations:
For the selected (peak) workload of 600 KIOps, and with the selected number of drives,
host bus adapters, amount of cache, number of DS8000 processor cores and so on, how
busy each internal component would with this model. We see utilization numbers for the
FICON adapters and for the FICON ports, to be around 50%. This utilization rate could
easily be lowered, by adding more HBAs even though the utilization is still below the
critical “amber” threshold value.
The utilization numbers for the HPFE and for the Flash Tier 1 drives (consisting of 3.84 TB
drives) are extremely low. As a result, you could consider a cheaper configuration using
7.68 TB Flash Tier 2 drives.
Expected utilization levels for the internal DS8950 bus and for the processing capability
shows an expected utilization of about 69%, which is a bit too high. Assuming the storage
system is currently a DS8950F with dual 10-core processors, you could upgrade it to a
dual 20-core instead.
We also see that this is a storage system actively using zHyperLink, Metro Mirror, and z/OS
Global Mirror. Mirroring consumes a significant percentage of the internal storage system
processing resources and must be modeled specifically. Metro Mirror also triggers important
FICON adapters usage, and a Metro Mirror modeling will allow you to see the throughput at
The sizing must be done in a way that optimizes the utilization of all internal components
while making certain that all peak workload profiles stay below the critical amber or red
threshold. When coming close to these thresholds, elongated response time occurs and in
hitting the threshold the system latency will start to exhibit a non-linear behavior and any
workload increase at that stage will deteriorate the response time. This is a situation to be
definitely avoided.
Being at 0.34 ms overall response for the 600 KIOps workload, we can compare these
expected 0.34 ms (new/upgraded system) with what the customer has today, and make sure
that there is an improvement. But we can also make sure that we are still in some near-linear
range of workload increase, so that further I/O increases will not lead to sudden bigger spikes
in latency. The total latency is shown with the four components that make up the response
time.
Being below 700 KIOps here the response time remains at a lower level. This indicates the
configured system has enough room for growth against the customer’s peak workload data
which was collected using RMF Utility. Based on the modeled latency and response time
curve, the proposed storage system is suitable for a customer proposal.
But, again, we can go back to the Utilization figures and make sure that these are all in a
lower “green”' range first. If not the case, expand and upgrade the configuration still. So both
aspects, expected utilization levels and expected response time values, need to fit - or the
configuration upgraded.
In case no zHyperLink is used yet, the zBNA tool can for instance be used to determine how
big the zHyperLink ratio is expected to be, once zHyperLink is switched on. Find the tool and
more background information at:
[Link]
Lightly skewed workloads might be solved already with a 100% High-Capacity Flash design,
and in many cases, this can even mean using Flash Tier 2 drives only. Whereas with a higher
skew and higher load, a mix of High-Performance Flash (Flash Tier 0) and High-Capacity
Flash (then mostly Flash Tier 2) is usually done, and the question is, which capacity ratio to
best choose between the various tiers.
There are three different approaches on how to model Easy Tier on a DS8000. Here are the
three options:
Use one of the predefined skew levels.
Use an existing skew level based on the current workload on the current DS8000
(measuring on the host side).
Use heatmap data from a DS8000 (measuring on storage system side).
Storm uses this setting to predict the number of I/Os that are serviced by the higher
performance tier.
A skew level value of 1 means that the workload does not have any skew at all, meaning that
the I/Os are distributed evenly across all ranks.
The skew level settings affect the modeling tool predictions. A heavy skew level selection
results in a more aggressive sizing of the higher performance tier. A low skew level selection
provides a conservative prediction. It is important to understand which skew level best
matches the actual workload before you start the modeling.
Your IBM support or Business Partner personnel have access to additional utilities that allows
them to retrieve the exact skew directly from the DS8000, from the Easy Tier summary.
At the DS8000 Graphical User Interface, you can, at the top, export the Easy Tier Summary,
as in Figure 6-9:
The output ZIP that is created contains CSV file reports on the Easy Tier activity that can be
used for additional planning and more accurate sizing. The skew curve is available in a format
to help you differentiate the skew according to whether the skew is for small or large blocks,
reads or writes, and IOPS or MBps, and it is described in more detail in IBM DS8000 Easy
Tier, REDP-4667.
The comparable DSCLI command, which yields the same data, is shown in Example 6-1.
Table 7-1 IBM Spectrum Control and Storage Insights supported activities for performance processes
Process Activities Feature
Tactical Performance analysis and tuning. Tool facilitates thorough data collection
and reporting.
Additional performance management processes that complement IBM Spectrum Control are
shown in Table 7-2 on page 141.
7.2.1 IBM Spectrum Control and IBM Storage Insights Pro overview
IBM Spectrum Control and IBM Storage Insights Pro reduce the complexity of managing SAN
storage devices by allowing administrators to configure, manage, and monitor storage
devices and switches from a single console. Most of the examples that will be used in this
chapter will focus on IBM Storage Insights Pro. Just remember that you have the ability to see
as much or more in IBM Spectrum Control and a subset for a shorter period of time in IBM
Storage Insights.
For more information about the configuration and deployment of storage by using IBM
Spectrum Control, see these publications:
IBM Spectrum Family: IBM Spectrum Control Standard Edition, SG24-8321
Regain Control of your Environment with IBM Storage Insights, REDP-5231
The table in Figure 7-1 on page 142 provides a side by side comparison of these very
powerful options.
Enhanced reporting Π Π
Analytics and Full customizable alerting Π Π
Optimization
Business impact analysis Π Π
Optimize capacity with reclamation Π Π
Figure 7-1 Capabilities for IBM Storage Insights, Storage Insights Pro and Spectrum Control
On the IBM Spectrum Control and IBM Storage Insights Pro Overview window of a selected
DS8000 storage system, performance statistics for that device are displayed. Figure 7-2 on
page 143 is an example of IBM Spectrum Control. While IBM Storage Insights Pro has a
slightly different look and feel, the same information can be seen when selecting the storage
system, as shown in Figure 7-3.
To view similar items in IBM Storage Insights Pro simply click on the down arrow under the
category or one of the tabs as shown in Figure 7-5 on page 144.
Figure 7-6 Physical view compared to IBM Spectrum Control Performance Report Items
Metrics: A metric is a numerical value that is derived from the information that is provided
by a device. It is the raw data and a calculated value. For example, the raw data is the
transferred bytes, but the metric uses this value and the interval to show the bytes/second.
For the DS8000 storage system, the native application programming interface (NAPI) is used
to collect performance data, in contrast to the SMI-S Standard that is used for third-party
devices.
The DS8000 storage system interacts with the NAPI in the following ways:
Access method used: Enterprise Storage Server® Network Interface (ESSNI)
Failover:
– For the communication with a DS8000 storage system, IBM Spectrum Control and
Storage Insights use the ESSNI client. This library is basically the same library that is
included in any DS8000 command-line interface (DSCLI). Because this component
has built-in capabilities to fail over from one Hardware Management Console (HMC) to
another HMC, a good approach is to specify the secondary HMC IP address of your
DS8000 storage system.
Subsystem
On the subsystem level, metrics are aggregated from multiple records to a single value per
metric to give the performance of a storage subsystem from a high-level view, based on the
metrics of other components. This aggregation is done by adding values, or calculating
average values, depending on the metric.
Cache
The cache in Figure 7-6 plays a crucial role in the performance of any storage subsystem.
Metrics, such as disk-to-cache operations, show the number of data transfer operations from
drives to cache. The number of data transfer operations from drives to cache is called staging
for a specific volume. Disk-to-cache operations are directly linked to read activity from hosts.
When data is not found in the DS8000 cache, the data is first staged from back-end drives
into the cache of the DS8000 storage system and then transferred to the host.
Read hits occur when all the data that is requested for a read data access is in cache. The
DS8000 storage system improves the performance of read caching by using Sequential
Prefetching in Adaptive Replacement Cache (SARC) staging algorithms. For more
information about the SARC algorithm, see 1.2.2, “Advanced caching algorithms” on page 9.
The SARC algorithm seeks to store those data tracks that have the greatest probability of
being accessed by a read operation in cache.
The cache-to-disk operation shows the number of data transfer operations from cache to
drives, which is called destaging for a specific volume. Cache-to-disk operations are directly
linked to write activity from hosts to this volume. Data that is written is first stored in the
persistent memory (also known as nonvolatile storage (NVS)) at the DS8000 storage system
and then destaged to the back-end drive. The DS8000 destaging is enhanced automatically
by striping the volume across all the drives in one or several ranks (depending on your
configuration). This striping, or volume management that is done by Easy Tier, provides
automatic load balancing across DDMs in ranks and an elimination of the hot spots.
The Write-cache Delay I/O Rate or Write-cache Delay Percentage because of persistent
memory allocation gives you information about the cache usage for write activities. The
DS8000 storage system stores data in the persistent memory before sending an
acknowledgment to the host. If the persistent memory is full of data (no space available), the
host receives a retry for its write request. In parallel, the subsystem must destage the data
that is stored in its persistent memory to the back-end drive before accepting new write
operations from any host.
If a volume experiences delays in the write operations due to persistent memory constraint,
consider moving, consider moving the volume to a less busy rank or spread this volume on
multiple ranks (increase the number of DDMs used). If this solution does not fix the persistent
memory constraint problem, consider adding cache capacity to your DS8000 storage system.
As shown in Figure 7-7 on page 147 and Figure 7-8, you can use IBM Spectrum Control and
Storage Insights Pro to monitor the cache metrics easily.
Controller/Nodes
The DS8000 processor complexes are referred to as Nodes. A DS8000 storage system has
two processor complexes, and each processor complex independently provides major
functions for the storage system. Examples include directing host adapters (HAs) for data
transfer to and from host processors, managing cache resources, and directing lower device
interfaces for data transfer to and from the flash media. To analyze performance data, you
must know that most volumes can be assigned/used by only one controller at a time.
You can view the performance of the volumes (all or a subset) by selecting Volumes in the
Internal Resources section and then selecting the Performance tab, as shown in
Example 7-10.
Ports
The fiber channel port information reflects the performance metrics for the front-end DS8000
ports that connect the DS8000 storage system to the SAN switches or hosts. Additionally,
port error rate metrics, such as Error Frame Rate, are also available. The DS8000 HA card
has four or eight ports. The WebUI does not reflect this aggregation, but if necessary, custom
reports can be created with native SQL statements to show port performance data that is
grouped by the HA to which they belong. Monitoring and analyzing the ports that belong to
the same card are beneficial because the aggregated throughput is less than the sum of the
stated bandwidth of the individual ports.
For more information about the DS8000 port cards, see Chapter 2.3, “Host adapters” on
page 21.
.
Port metrics: IBM Spectrum Control and Storage Insights Pro report on many port
metrics, so the ports on the DS8000 storage system are the front-end part of the storage
device.
Array
The array name that is shown in the WebUI, as shown in Figure 7-12, directly refers to the
array on the DS8000 storage system as listed in the DS GUI or DSCLI.
When you click the Performance tab, the top five performing arrays are displayed with their
corresponding graphs, as shown in Figure 7-13 on page 150.
A DS8000 array is defined on an array site with a specific RAID type. A rank is a logical
construct to which an array is assigned. A rank provides a number of extents that are used to
create one or several volumes. A volume can use the DS8000 extents from one or several
ranks but all within the same pool. For more information, see Chapter 3.1, “Logical
Configuration” on page 32, and Chapter 3.2.4, “Easy Tier considerations” on page 55.
In most common logical configurations, users typically sequence them in order, for
example, Array Site = S1 = Array A0 = Rank R0. If they are not in order, you must
understand on which array the analysis is performed.
Example 7-1 on page 151 shows the relationships among a DS8000 rank, an array, and an
array site with a typical divergent numbering scheme by using DSCLI commands. Use the
showrank command to show which volumes have extents on the specified rank.
In the Array performance chart, you can include both front-end and back-end metrics. The
back-end metrics can be selected on the Disk Metrics Tab. They provide metrics from the
perspective of the controller to the back-end array sites. The front-end metrics relate to the
activity between the server and the controller.
There is a relationship between array operations, cache hit ratio, and percentage of read
requests:
When the cache hit ratio is low, the DS8000 storage system has frequent transfers from
drives to cache (staging).
When the percentage of read requests is high and the cache hit ratio is also high, most of
the I/O requests can be satisfied without accessing the drives because of the cache
management prefetching algorithm.
When there is heavy write activity, it leads to frequent transfers from cache to drives
(destaging).
Comparing the performance of different arrays shows whether the global workload is equally
spread on the drives of your DS8000 storage system. Spreading data across multiple arrays
increases the number of drives that is used and optimizes the overall performance.
Volumes
The volumes, called LUNs, are shown in Figure 7-14 on page 153. The host server sees the
volumes as physical drives and treats them as physical drives.
Analysis of volume data facilitates the understanding of the I/O workload distribution among
volumes, and workload characteristics (random or sequential and cache hit ratios). A DS8000
volume can belong to one or several ranks, as shown in Figure 7-14 on page 153 (for more
information, see Chapter 3.2, “Data placement on ranks and extent pools” on page 50).
Especially in managed multi-rank extent pools with Easy Tier automatic data relocation
enabled, the distribution of a certain volume across the ranks in the extent pool can change
over time.
With IBM Spectrum Control and Storage Insights Pro, you can see the Easy Tier, Easy Tier
Status, and the capacity values for pools, shown in Figure 7-15 on page 153, and volumes,
shown in Figure 7-16.
Figure 7-15 Easy Tier pool information in IBM Storage Insights Pro
The analysis of volume metrics shows the activity of the volumes on your DS8000 storage
system and can help you perform these tasks:
Determine where the most accessed data is and what performance you get from the
volume.
Understand the type of workload that your application generates (sequential or random
and the read or write operation ratio).
Determine the cache benefits for the read operation (cache management prefetching
algorithm SARC).
Determine cache bottlenecks for write operations.
Compare the I/O response observed on the DS8000 storage system with the I/O response
time observed on the host.
The relationship of certain RAID arrays and ranks to the DS8000 pools can be derived from
the RAID array list window, which is shown in Figure 7-12.
From there, you can easily see the volumes that belong to a certain pool by right-clicking a
pool, selecting View Properties, and clicking the Volumes tab.
Figure 7-17 on page 154 shows the relationship of the raid array, pools, and volumes for a
raid array.
In addition, to associate quickly the DS8000 arrays to array sites and ranks, you might use the
output of the DSCLI commands lsrank -l and lsarray -l, as shown in Example 7-1 on
page 151.
Random read Attempt to find data in cache. If not present in cache, read from back end.
Sequential write Write data to the NVS of the processor complex owning volume and send a
copy of the data to cache in the other processor complex. Upon back-end
destaging, perform prefetching of read data and parity into cache to reduce
the number of disk operations on the back end.
Random write Write data to NVS of the processor complex owning volume and send a copy
of the data to cache in the other processor complex. Destage modified data
from NVS to disk as determined by Licensed Internal Code.
The read hit ratio depends on the characteristics of data on your DS8000 storage system and
applications that use the data. If you have a database and it has a high locality of reference, it
shows a high cache hit ratio because most of the data that is referenced can remain in the
cache. If your database has a low locality of reference, but it has the appropriate sets of
indexes, it might also have a high cache hit ratio because the entire index can remain in the
cache.
For a logical volume that has sequential files, you must understand the application types that
access those sequential files. Normally, these sequential files are used for either read only or
write only at the time of their use. The DS8000 cache management prefetching algorithm
(SARC) determines whether the data access pattern is sequential. If the access is sequential,
contiguous data is prefetched into cache in anticipation of the next read request.
IBM Storage Insights Pro report the reads and writes through various metrics. For a
description of these metrics in greater detail, see 7.3, “IBM Storage Insights Pro data
collection considerations” on page 156.
The data is collected at the indicated resource time stamp in the server time zone. It receives
the time zone information from the devices (or the NAPIs) and uses this information to adjust
the time in the reports to the local time. Certain devices might convert the time into
Coordinate Universal Time time stamps and not provide any time zone information.
This complexity is necessary to compare the information from two subsystems in different
time zones from a single administration point. This administration point is the GUI. If you open
the GUI in different time zones, a performance diagram might show a distinct peak at different
times, depending on its local time zone.
When using IBM Storage Insights Pro to compare data from a server (for example, iostat
data) with the data of the storage subsystem, it is important to know the time stamp of the
storage subsystem. The time zone of the device is shown in the DS8000 Properties window.
As IBM Storage Insights Pro can synchronize multiple performance charts that are opened in
the WebUI to display the metrics at the same time, use an NTP server for all components in
the SAN environment.
7.3.2 Duration
IBM Storage Insights Pro collects data continuously. From a performance management
perspective, collecting data continuously means that performance data exists to facilitate
reactive, proactive, and even predictive processes, as described in Chapter 7, “Practical
performance management” on page 139.
7.3.3 Intervals
In IBM Storage Insights Pro, the data collection interval is referred to as the sample interval.
The sample interval for the DS8000 performance data collection tasks is 5- 60 minutes. A
shorter sample interval results in a more granular view of performance data at the expense of
requiring additional database space. The appropriate sample interval depends on the
objective of the data collection. Table 7-4 on page 157 displays example data collection
objectives and reasonable values for a sample interval.
Performance 5 minutes
Changes in the logical configuration of a system will result in a non-scheduled probe in both
IBM Storage Insights Pro and IBM Spectrum Control.
For a list of available performance metrics for the DS8000 storage system, see the IBM
Storage Insights Pro in IBM Documentation Center:
[Link]
Note: IBM Storage Insights Pro and IBM Spectrum Control also have other metrics that
can be adjusted and configured for your specific environment to suit customer demands.
Colors are used to distinguish the components that are shown in Table 7-5.
Node Volume Cache Holding Time < 200 Indicates high cache track turnover and
Metrics possibly cache constraint.
Node Volume Write Cache Delay > 1% Indicates writes delayed because of
Metrics Percentage insufficient memory resources.
Array Drive Metrics Utilization Percentage > 70% Indicates drive saturation. For
IBM Storage Insights Pro, the default
value on this threshold is 50%.
Array Drive Metrics Overall Response Time > 35 Indicates busy drives.
Array Drive Metrics Write Response Time > 35 Indicates busy drives.
Array Drive Metrics Read Response Time > 35 Indicates busy drives.
Port Port Metrics Total Port I/O Rate Depends Indicates transaction intensive load. The
configuration depends on the HBA,
switch, and other components.
Port Port Metrics Total Port Data Rate Depends If the port data rate is close to the
bandwidth, this rate indicates saturation.
The configuration depends on the HBA,
switch, and other components.
Port Port Metrics Port Send Response >2 Indicates contention on I/O path from the
Time DS8000 storage system to the host.
Port Port Metrics Port Receive Response >2 Indicates a potential issue on the I/O path
Time or the DS8000 storage system back end.
Port Port Metrics Total Port Response >2 Indicates a potential issue on the I/O path
Time or the DS8000 storage system back end.
You can use IBM Storage Insights Pro to define performance-related alerts that can trigger an
event when the defined thresholds are reached. Even though it works in a similar manner to a
monitor without user intervention, the actions are still performed at specified intervals of the
data collection job.
With Storage Insights Pro you can create alert policies to manage the alert conditions for
multiple resources. You can create a new policy or copy an existing policy to modify the alerts
definitions and to add resources to monitor. To create a new policy for DS8000 complete the
following steps:
1. From the dashboard select Configuration > Alert Policies as shown in Figure 7-18 on
page 159. You can copy and modify an existing policy or create a new policy.
2. To create a new policy, select Create Policy. Enter a Name and select the type of
resource from the list, for example block storage. Select the type of storage system you
want to monitor for example DS8000. A list of DS8000 names managed by Storage
Next define alerts for the category of the attributes that you want to alert on:
3. General: Attributes for the key properties of a resource, such as status, data collection
status and firmware.
4. Capacity: Attributes for capacity statistics of a resource, such as available capacity, used
capacity, drive capacity, Safeguarded capacity, and more. Total of 26 alerts can be set as
shown in Figure 7-20 on page 160.
5. Performance: You can define alerts that are triggered when the performance of a
resource falls outside a specified threshold. You:
Reference: For more information about setting Thresholds and Alert suppressions in IBM
Storage Insights Pro, see IBM Documentation Storage Insights at:
[Link]
You should configure the thresholds that are most important and most relevant to the
environmental needs to assist with good planning.
IBM Storage Insights Pro provides recommended values for threshold values that do not vary
much between environments. However, for metrics that measure throughput and response
times, thresholds can vary because of workload, model of hardware, amount of cache
memory, and other factors. In these cases, there are no recommended values. To help
determine threshold values for a resource, collect performance data over time to establish a
baseline of the normal and expected performance behavior for that resource.
After you determine a set of baseline values, define alerts to trigger if the measured
performance behavior falls outside of the normally expected range.
The alerts for a DS8000 storage system can be seen, filtered, removed, acknowledged, or
exported in the storage system Alert window, as shown in Figure 7-22.
False positive alerts: Configuring thresholds too conservatively can lead to an excessive
number of false positive alerts.
In Storage Insights Pro you can create a number of reports that you can schedule and send
by email:
Predefined capacity reports that you can configure and refine the information that is
included in the report. Create a predefined report about storage systems, pools, or tiered
pools.
Custom reports that you can create to include asset, capacity, configuration, or health
status or performance information about your storage resources. You can specify a
relative time range for the capacity information in the report, such as the last hour, 6 hours,
12 hours, day, week, month, 6 months, or year. Depending on the time range that you
specify, the aggregated values for the performance information are shown in the report.
Consumer reports that you can create to help plan capacity purchases and make your
organization aware of the cost and the amount of the storage that is used by storage
consumers, create chargeback and consumer reports.
Note: You must have an Administrator role to create, edit, or delete custom, predefined
capacity and inventory reports, and chargeback and storage consumption reports. Users
with a Monitor role can run and view the reports that are shown on the Reports page, but
they can't edit or delete reports.
Additional reporting
In IBM Storage Insights Pro, export options are available. To export the data that is used in
the performance chart, use the export function, which is found in the upper right of the chart
as shown in Figure 7-23 on page 163, or under the Actions menu in other views.
To export the summary table underneath the chart, click Action → More → Export, and
select the desired format.
Charts are automatically generated for most of the predefined reports. Depending on the type
of resource, the charts show statistics for space usage, workload activity, bandwidth
percentage, and other statistics. You can schedule reports and specify the report output as
HTML, PDF, and CSV formats. You can also configure reports to save the report output to
your local file system, and to send reports as mail attachments.
7.6 Insights
IBM Storage Insights Pro provides additional functions to gain insights into the performance
of the resources that are monitored in your storage environment. You can view the
recommendations that help you address issues in your storage environment and gain insights
into storage reclamation and performance.
These functions are not included in the entitled version of IBM Storage Insights.
Click the recommendation detail from the Action list to view the details of a recommended
action; an example of this is shown in Figure 7-25 on page 164. You can also export the table
that displays the recommended actions to a file.
7.6.2 Performance
You can use performance metrics for volumes, drives, or ports to help you measure, identify,
and troubleshoot performance issues and bottlenecks in storage systems.
To display the performance page, click Insights > Performance as shown in Figure 7-26. Up
to 10 storage systems with the highest overall total I/O rate over a 12-hour period are
To view performance trends, select metrics for volumes, drives, ports, or node and specify a
time range.
For more information about Insights to performance see the IBM Documentation center for
Insights Pro at:
[Link]
7.6.3 Reclamation
Use the recommendations on the reclamation page to determine if you can reclaim capacity
before planning new capacity purchases. Click Insights > Reclamation to see how much
capacity can be reclaimed as shown in Figure 7-27. See the savings that can be made by
reclaiming capacity for tiered and non-tiered storage and a list of the reclaimable volumes. To
identify the volumes that are not being used, the storage resources that you add for
monitoring are regularly analyzed. A list of the volumes that are not being used is generated.
You can decide in accordance with the internal procedures of your organization which of the
volumes in the list can be decommissioned.
Before you delete or reclaim space that was identified by IBM Storage Insights Pro, keep in
mind for the volumes on IBM Spectrum Accelerate and CKD volumes on DS8000, the
volumes are identified as reclaimable based on I/O activity, because information about the
assignment of volumes to servers is not available. To exclude volumes in the reclamation
analysis, right-click the volumes and click Exclude from Analysis. Additional actions can be
performed on the volumes in the recommended reclamation list such as add to an application
or general group.
For more information about reclamation with IBM Insights Pro see IBM Documentation center
for Insights Pro at:
[Link]
For more information about monitoring performance through a SAN switch or director point
product, see the following websites:
[Link]
[Link]
Most SAN management software includes options to create SNMP alerts based on
performance criteria, and to create historical reports for trend analysis. Certain SAN vendors
offer advanced performance monitoring capabilities, such as measuring I/O traffic between
specific pairs of source and destination ports, and measuring I/O traffic for specific LUNs.
In addition to the vendor point products, IBM Storage Insights Pro can be used as a
monitoring and reporting tool for switch and fabric environments. It collects and reports on
IBM Storage Insights Pro provides facilities to report on fabric topology, configurations and
switches, and port performance and errors. In addition, you can use IBM Storage Insights Pro
to configure alerts or thresholds for port congestion. send and receive bandwidth and others.
Configuration options allow the creation of events to be triggered if thresholds are exceeded.
Seamless ticket management from opening and automatically uploading diagnostic
information to updating and tracking tickets:
Ability to store performance data from multiple switch vendors in a common database
Advanced reporting and correlation between host data and switch data through custom
reports
Centralized management and reporting
Aggregation of port performance data for the entire switch
For more information about IBM Storage Insights Pro functions and how to work with them,
refer to:
[Link]
Perceived or actual I/O bottlenecks can result from hardware failures on the I/O path,
contention on the server, contention on the SAN Fabric, contention on the DS8000 front-end
ports, or contention on the back-end device adapters or arrays. This section provides a
process for diagnosing these scenarios by using IBM Storage Insights Pro. This process was
developed for identifying specific types of problems and is not a substitute for common sense,
knowledge of the environment, and experience. Figure 7-28 on page 168 shows the
high-level process flow.
I/O bottlenecks that are referenced in this section relate to one or more components on the
I/O path that reached a saturation point and can no longer achieve the I/O performance
requirements. I/O performance requirements are typically throughput-oriented or
transaction-oriented. Heavy sequential workloads, such as tape backups or data warehouse
environments, might require maximum bandwidth and use large sequential transfers.
However, they might not have stringent response time requirements. Transaction-oriented
workloads, such as online banking systems, might have stringent response time
requirements, but have no requirements for throughput.
To troubleshoot performance problems, IBM Storage Insights Pro data must be augmented
with host performance and configuration data. Figure 7-29 on page 169 shows a logical
end-to-end view from a measurement perspective.
Although IBM Insights Pro does not provide host performance, configuration, or error data,
IBM Insights Pro provides performance data from host connections, SAN switches, and the
DS8000 storage system, and configuration information and error logs from SAN switches and
the DS8000 storage system.
You can create and update support tickets and automatically upload logs directly in the IBM
Storage Insights interface. You can also give IBM Support permission to collect and upload
log packages for storage systems without contacting you every time.
Tip: Performance analysis and troubleshooting must always start top-down, starting with
the application (for example, database design and layout), then the operating system,
server hardware, SAN, and then storage. The tuning potential is greater at the higher
levels. The best I/O tuning is never carried out because server caching or a better
database design eliminated the need for it.
Process assumptions
This process assumes that the following conditions exist:
The server is connected to the DS8000 storage system natively.
Tools exist to collect the necessary performance and configuration data for each
component along the I/O path (server disk, SAN fabric, and the DS8000 arrays, ports, and
volumes).
Skills exist to use the tools, extract data, and analyze data.
Data is collected in a continuous fashion to facilitate performance management.
2. Consider checking the application level first. Has all potential tuning on the database level
been performed? Does the layout adhere to the vendor recommendations, and is the
server adequately sized (RAM, processor, and buses) and configured?
3. Correctly classify the problem by identifying hardware or configuration issues. Hardware
failures often manifest themselves as performance issues because I/O is degraded on one
or more paths. If a hardware issue is identified, all problem determination efforts must
focus on identifying the root cause of the hardware errors:
a. Gather any errors on any of the host paths.
Physical component: If you notice significant errors when querying the path and
the errors increase, there is most likely a problem with a physical component on the
I/O path.
b. Gather the host error report and look for Small Computer System Interface (SCSI) or
Fibre errors.
Hardware: Often a hardware error that relates to a component on the I/O path
shows as a TEMP error. A TEMP error does not exclude a hardware failure. You
must perform diagnostic tests on all hardware components in the I/O path, including
the host bus adapter (HBA), SAN switch ports, and the DS8000 HBA ports.
c. Gather the SAN switch configuration and errors. Every switch vendor provides different
management software. All of the SAN switch software provides error monitoring and a
way to identify whether there is a hardware failure with a port or application-specific
integrated circuit (ASIC). For more information about identifying hardware failures, see
your vendor-specific manuals or contact vendor support.
Patterns: As you move from the host to external resources, remember any patterns.
A common error pattern that you see involves errors that affect only those paths on
the same HBA. If both paths on the same HBA experience errors, the errors are a
result of a common component. The common component is likely to be the host
HBA, the cable from the host HBA to the SAN switch, or the SAN switch port.
Ensure that all of these components are thoroughly reviewed before proceeding.
d. If errors exist on one or more of the host paths, determine whether there are any
DS8000 hardware errors. Log on to the HMC as "customer" and look to ensure that
there are no hardware alerts. Figure 7-30 on page 171 provides a sample of a healthy
DS8000 storage system. If there are any errors, you might need to open a hardware
case with DS8000 hardware support.
4. After validating that no hardware failures exist, analyze server performance data and
identify any disk bottlenecks. The fundamental premise of this methodology is that I/O
performance degradation that relates to SAN component contention can be observed at
the server through analysis of the key server-based I/O metrics.
Degraded end-to-end I/O response time is the strongest indication of I/O path contention.
Typically, server physical disk response times measure the time that a physical I/O request
takes from the moment that the request was initiated by the device driver until the device
driver receives an interrupt from the controller that the I/O completed. The measurements
are displayed as either service time or response time. They are averaged over the
measurement interval. Typically, server wait or queue metrics refer to time spent waiting at
the HBA, which is usually an indication of HBA saturation. In general, you need to interpret
the service times as response times because they include potential queuing at various
storage subsystem components, for example:
– Switch
– Storage HBA
– Storage cache
– Storage back-end drive controller
– Storage back-end paths
– Drives
I/O-intensive disks: The number of total I/Os per second indicates the relative
activity of the device. This relative activity provides a metric to prioritize the analysis.
Those devices with high response times and high activity are more important to
understand than devices with high response time and infrequent access. If
analyzing the data in a spreadsheet, consider creating a combined metric of
Average I/Os × Average Response Time to provide a method for identifying the most
I/O-intensive disks. You can obtain additional detail about OS-specific server
analysis in the OS-specific chapters.
b. Gather configuration data from the default multipathing software for distributed
systems. In addition to the multipathing configuration data, you must collect
configuration information for the host and DS8000 HBAs, which includes the bandwidth
of each adapter.
c. Format the data and correlate the host LUNs with their associated DS8000 resources.
Formatting the data is not required for analysis, but it is easier to analyze formatted
data in a spreadsheet.
The following steps represent the logical steps that are required to format the data and
do not represent literal steps. You can codify these steps in scripts:
i. Read the configuration file.
ii. Build a hdisk hash with key = hdisk and value = LUN SN.
iii. Read I/O response time data.
iv. Create hashes for each of the following values with hdisk as the key: Date, Start
time, Physical Volume, Reads, Avg Read Time, Avg Read Size, Writes, Avg Write
Time, and Avg Write Size.
v. Print the data to a file with headers and commas to separate the fields.
vi. Iterate through the hdisk hash and use the common hdisk key to index into the other
hashes and print those hashes that have values.
d. Analyze the host performance data:
i. Determine whether I/O bottlenecks exist by summarizing the data and analyzing
key performance metrics for values in excess of the thresholds. Identify those
vpaths/LUNs with poor response time. Hardware errors and multipathing
configuration issues must already be excluded. The hot LUNs must already be
identified. Proceed to step 5 on page 173 to determine the root cause of the
performance issue.
ii. If no degraded disk response times exist, the issue is likely not internal to the
server.
5. If there are drive constraints that are identified, continue the identification of the root cause
by collecting and analyzing the DS8000 configuration and performance data:
a. Gather the configuration information. IBM Storage Insights Pro can also be used to
gather configuration data through the Properties window, as shown in Figure 7-31 on
page 174.
Analyze the DS8000 performance data first: Check for Alerts (2) and errors (3) in
the left navigation. Then, look at the performance data of the internal resources (4).
Analysis of the SAN fabric and the DS8000 performance data can be completed in
either order. However, SAN bottlenecks occur less frequently than drive bottlenecks,
so it can be more efficient to analyze the DS8000 performance data first.
b. Use IBM Storage Insights Pro to gather the DS8000 performance data for fiber channel
ports, pools, arrays, volumes, nodes, and host connections. Compare the key
performance indicators from Table 7-5 on page 158 with the performance data. To
analyze the performance, complete the following steps:
i. For those server LUNs that show poor response time, analyze the associated
volumes during the same period. If the problem is on the DS8000 storage system, a
correlation exists between the high response times observed on the host and the
volume response times observed on the DS8000 storage system.
Compare the same period: Meaningful correlation with the host performance
measurement and the previously identified hot LUNs requires analysis of the
DS8000 performance data for the same period that the host data was collected.
The synchronize time function of IBM Storage Insights Pro can help you with this
task (see Figure 7-32 on page 175 (1)). For more information, see IBM
Documentation: [Link] For more
information about time stamps, see 7.3.1, “Time stamps” on page 156.
ii. Correlate the hot LUNs with their associated arrays. When using the IBM Storage
Insights Pro, the relationships are provided automatically in the drill-down feature,
as shown in Figure 7-32 on page 175 (2).
If you use export reports, shown in Figure 7-32 on page 175 (3), and want to
correlate the volume data to the rank data, you can correlate the volume data to the
rank data manually or by using the script. If multiple ranks per extent pool and
storage pool striping, or Easy Tier managed pools are used, one volume can exist
on multiple ranks. Easy Tier can help alleviate any hot spots with automatic
rebalancing.
Analyze storage subsystem ports for the ports associated with the server in
question.
6. Continue the identification of the root cause by collecting and analyzing SAN fabric
configuration and performance data:
a. Gather the connectivity information and establish a visual diagram of the environment.
Visualize the environment: Sophisticated tools are not necessary for creating this
type of view; however, the configuration, zoning, and connectivity information must
be available to create a logical visual representation of the environment.
b. Gather the SAN performance data. Each vendor provides SAN management
applications that provide the alerting capability and some level of performance
management. Often, the performance management software is limited to real-time
monitoring, and historical data collection features require additional licenses. In
addition to the vendor-provided solutions, IBM Storage Insights Pro can collect further
metrics, which are shown in Table 7-5 on page 158.
c. Consider graphing the Overall Port Response Time, Port Bandwidth Percentage, and
Total Port Data Rate metrics to determine whether any of the ports along the I/O path
Note: If you are eligible for Premium Support, it's recommended that you call IBM Support
or open a support case at [Link] to access that service for your
issue. Ensure that you have your Direct Access Code (DAC) number ready so IBM can
best assist you.
30.00
25.00
)
s Disk1
m 20.00
(
e Disk2
im Disk3
T 15.00
e Disk4
s
n Disk5
o
p 10.00
s Disk6
e
R
5.00
-
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
0
: 0
: 0
: 0
: 0
: 0
: 0
: 0
: 0
: 0 : 0 : 0
: 0 : 0 : 0
: 0
: 0
: 0
: 0
: 0
: 0
:
5 1 7 3 9 5 1 7 3 9 5 1 7 3 9 5 1 7 3 9 5
5
: 0
: 0
: 1
: 1
: 2
: 3
: 3
: 4
: 4 : 5 : 0
: 0 : 1 : 1
: 2
: 3
: 3
: 4
: 4
: 5
:
6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Time - 1 Minute Intervals
Figure 7-34 Windows Server perfmon - Average Physical Disk Read Response Time
At approximately 18:39 hours, the average read response time jumps from approximately
15 ms to 25 ms. Further investigation of the host reveals that the increase in response time
correlates with an increase in load, as shown in Figure 7-35 on page 178.
1,000.00
900.00
800.00
700.00 Disk1
c
e Disk2
s
/ 600.00
s Disk3
d
a 500.00
e Disk4
R 400.00
k Disk5
s
i
D 300.00 Disk6
200.00
100.00
-
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
:0 :0 :0 :0 :0 :0 :0 :0 :0 :0 :0 :0 :0 :0 :0 :0 :0 :0 :0 :0
5 1 7 3 9 5 1 7 3 9 5 1 7 3 9 5 1 7 3 9
:5 :0 :0 :1 :1 :2 :3 :3 :4 :4 :5 :0 :0 :1 :1 :2 :3 :3 :4 :4
6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Time - 1 Minute Interval
As described in 7.8, “End-to-end analysis of I/O performance problems” on page 167, there
are several possibilities for high average disk read response time:
DS8000 array contention
DS8000 port contention
SAN fabric contention
Host HBA saturation
Because the most probable reason for the elevated response times is the drive utilization on
the array, gather and analyze this metric first. Figure 7-36 on page 178 shows the drive
utilization on the DS8000 storage system.
Problem definition
The online transactions for a Windows Server SQL server appear to take longer than normal
and time out in certain cases.
Problem classification
After reviewing the hardware configuration and the error reports for all hardware components,
we determined that there are errors on the paths associated with one of the host HBAs, as
shown in Figure 7-37 on page 179. This output shows the errors on path 0 and path 1, which
are both on the same HBA (SCSI port 1). For a Windows Server that runs MPIO, additional
information about the HAs is available. The command that you use to identify errors depends
on the multipathing software installation.
Disabling a path: In cases where there is a path with significant errors, you can disable
the path with the multipathing software, which allows the non-working paths to be disabled
without causing performance degradation to the working paths.
1000000
900000
800000 Dev Disk7
700000 Dev Disk6
600000 Dev Disk5
c
e
/s 500000 Dev Disk4
B
K Production Disk5
400000
300000 Production Disk2
200000 Production Disk1
100000
0
The DS8000 port data reveals a peak throughput of around 300 MBps per 4-Gbps port.
Note: The HA ports speed might differ from DS8000 models, so it is a value that depends
on the hardware configuration of the system and the SAN environment.
700.00
600.00
500.00
c
e
/s 400.00
B R1-I3-C4-T0
M
l 300.00 R1-I3-C1-T0
a
t
o
T 200.00
100.00
Time - 5 minute
Figure 7-40 Total port data rate << image reference missing in text >>
Before beginning the diagnostic process, you must understand your workload and your
physical configuration. You must know how your system resources are allocated, and
understand your path and channel configuration for all attached servers.
Assume that you have an environment with a DS8000 storage system attached to a z/OS
host, an AIX on IBM Power Systems host, and several Windows Server hosts. You noticed
that your z/OS online users experience a performance degradation 07:30 - 08:00 hours each
morning.
You might notice that there are 3390 volumes that indicate high disconnect times, or high
device busy delay time for several volumes in the RMF device activity reports. Unlike UNIX or
Device busy delay is an indication that another system locks up a volume, and an extent
conflict occurs among z/OS hosts or applications in the same host when using Parallel
Access Volumes (PAVs). The DS8000 multiple allegiance or PAVs capability allows it to
process multiple I/Os against the same volume at the same time. However, if a read or write
request against an extent is pending while another I/O is writing to the extent, or if a write
request against an extent is pending while another I/O is reading or writing data from the
extent, the DS8000 storage system delays the I/O by queuing. This condition is referred as
extent conflict. Queuing time because of extent conflict is accumulated to device busy (DB)
delay time. An extent is a sphere of access; the unit of increment is a track. Usually, I/O
drivers or system routines decide and declare the sphere.
To determine the possible cause of high disconnect times, check the read cache hit ratios,
read-to-write ratios, and bypass I/Os for those volumes. If you see that the cache hit ratio is
lower than usual and you did not add other workloads to your IBM z environment, I/Os against
FB volumes might be the cause of the problem. It is possible that FB volumes which are
defined on the same server have a cache-unfriendly workload, thus affecting your IBM z
volumes hit ratio.
To get more information about cache usage, you can check the cache statistics of the FB
volumes that belong to the same server. You might be able to identify the FB volumes that
have a low read hit ratio and short cache holding time. Moving the workload of the FB logical
disks, or the CKD volumes, that you are concerned about to the other side of the cluster
improves the situation by concentrating cache-friendly I/O workload across both clusters. If
you cannot or if the condition does not improve after this move, consider balancing the I/O
distribution on more ranks. Balancing the I/O distribution on more ranks optimizes the staging
and destaging operation.
The scenarios that use IBM Storage Insights Pro as described in this chapter might not cover
all the possible situations that can be encountered. You might need to include more
information, such as application and host operating system-based performance statistics or
other data collections to analyze and solve a specific performance problem.
Part 3 Performance
considerations for host
systems and databases
This part provides performance considerations for various host systems or appliances that
are attached to the IBM System Storage DS8000 storage system, and for databases.
This chapter provides detailed information about performance tuning considerations for
specific operating systems in later chapters of this book.
The DS8900F supports up to 32 Fibre Channel (FC) host adapters, with four FC ports for
each adapter. Each port can be independently configured to support Fibre Channel
connection (IBM FICON) or Fibre Channel Protocol (FCP).
Each adapter type is available in both longwave (LW) and shortwave (SW) versions. The
DS8900F I/O bays support up to four host adapters for each bay, allowing up to 128 ports
maximum for each storage system. This configuration results in a theoretical aggregated host
I/O bandwidth around 128 x 32 Gbps. Each port provides industry-leading throughput and I/O
rates for FICON and FCP.
The host adapters that are available in the DS8900F have the following characteristics:
32 Gbps FC HBAs (32 Gigabit Fibre Channel - GFC):
– Four FC ports
– FC Gen7 technology
– New IBM Custom ASIC with Gen3 PCIe interface
– Quad-core PowerPC processor
– Negotiation to 32, 16, or 8 Gbps; 4 Gbps or less is not possible
16 Gbps FC HBAs (16 GFC):
– Four FC ports
– Gen2 PCIe interface
– Quad-core PowerPC processor
– Negotiation to 16, 8, or 4 Gbps (2 Gbps or less is not possible)
The DS8900F supports a mixture of 32 Gbps and 16 Gbps FC adapters. Hosts with slower
FC speeds like 4 Gbps are still supported if their HBAs are connected through a switch.
The 32 Gbps FC adapter is encryption-capable. Encrypting the host bus adapter (HBA) traffic
usually does not cause any measurable performance degradation.
With FC adapters that are configured for FICON, the DS8900F series provides the following
configuration capabilities:
Fabric or point-to-point topologies.
A maximum of 128 host adapter ports, depending on the DS8900F system memory and
processor features.
A maximum of 509 logins for each FC port.
A maximum of 8192 logins for each storage unit.
A maximum of 1280 logical paths on each FC port.
An IBM Z server supports 32,768 devices per FICON host channel. To fully access 65,280
devices, it is necessary to connect multiple FICON host channels to the storage system. You
can access the devices through an FC switch or FICON director to a single storage system
FICON port.
The 32 GFC host adapter doubles the data throughput of 16 GFC links. When comparing the
two adapter types, the 32 GFC adapters provide I/O improvements in full adapter I/Os per
second (IOPS) and reduced latency.
The DS8000 storage system can support host and remote mirroring links by using
Peer-to-Peer Remote Copy (PPRC) on the same I/O port. However, it is preferable to use
dedicated I/O ports for remote mirroring links.
Planning and sizing the HAs for performance are not easy tasks, so use modeling tools, such
as Storage Modeller, see Chapter 6.1, “IBM Storage Modeller” on page 126.
The factors that might affect the performance at the HA level are typically the aggregate
throughput and the workload mix that the adapter can handle. All connections on a HA share
bandwidth in a balanced manner. Therefore, host attachments that require maximum I/O port
performance may be connected to HAs that are not fully populated. You must allocate host
connections across I/O ports, HAs, and I/O enclosures in a balanced manner (workload
spreading).
Find further fine-tuning guidelines in Chapter 4.10.2, “Further optimization” on page 103.
Note: With DS8900F, Fibre Channel Arbitrated Loop (FC-AL) is no longer supported.
The next section describes best practices for implementing a switched fabric.
If a HA fails and starts logging in and out of the switched fabric, or a server must be restarted
several times, you do not want it to disturb the I/O to other hosts. Figure 8-1 shows zones that
include only a single HA and multiple DS8900F ports (single initiator zone). This approach is
the preferred way to create zones to prevent interaction between server HAs.
Tip: Each zone contains a single host system adapter with the wanted number of ports
attached to the DS8000 storage system.
By establishing zones, you reduce the possibility of interactions between system adapters in
switched configurations. You can establish the zones by using either of two zoning methods:
Port number
Worldwide port name (WWPN)
Important: A DS8000 HA port configured to run with the FICON topology cannot be
shared in a zone with non-z/OS hosts. Ports with non-FICON topology cannot be shared in
a zone with z/OS hosts.
LUN masking
In FC attachment, LUN affinity is based on the WWPN of the adapter on the host, which is
independent of the DS8000 HA port to which the host is attached. This LUN masking function
on the DS8000 storage system is provided through the definition of DS8000 volume groups. A
volume group is defined by using the DS Storage Manager or DS8000 command-line
interface (DSCLI), and host WWPNs are connected to the volume group. The LUNs to be
accessed by the hosts that are connected to the volume group are defined to be in that
volume group.
Although it is possible to limit through which DS8000 HA ports a certain WWPN connects to
volume groups, it is preferable to define the WWPNs to have access to all available DS8000
HA ports. Then, by using the preferred process of creating FC zones, as described in
“Importance of establishing zones” on page 190, you can limit the wanted HA ports through
the FC zones. In a switched fabric with multiple connections to the DS8000 storage system,
this concept of LUN affinity enables the host to see the same LUNs on different paths.
The number of times that a DS8000 logical disk is presented as a disk device to an open host
depends on the number of paths from each HA to the DS8000 storage system. The number
of paths from an open server to the DS8000 storage system is determined by these factors:
The number of HAs installed in the server
The number of connections between the SAN switches and the DS8000 storage system
The zone definitions created by the SAN switch software
Physical paths: Each physical path to a logical disk on the DS8000 storage system is
presented to the host operating system as a disk device.
I0000
SAN I0030
Switch
FC 0 I0100
A
I0130
Host DS8000
I0200
FC 1 SAN I0230
Switch
I0300
B
I0330
Zone A - FC 0 Zone B - FC 1
DS8000_I0000 DS8000_I0230
DS8000_I0130 DS8000_I0300
You can see how the number of logical devices that are presented to a host can increase
rapidly in a SAN environment if you are not careful about selecting the size of logical disks
and the number of paths from the host to the DS8000 storage system.
Typically, it is preferable to cable the switches and create zones in the SAN switch software for
dual-attached hosts so that each server HA has 2 - 4 paths from the switch to the DS8000
storage system. With hosts configured this way, you can allow the multipathing module to
balance the load across the four HAs in the DS8000 storage system.
Between 2 and 4 paths for each specific LUN in total are usually a good compromise for
smaller and mid-size servers. Consider eight paths if a high data rate is required. Zoning
more paths to a certain LUN, such as more than eight connections from the host to the
DS8000 storage system, does not improve SAN performance and can cause too many
devices to be presented to the operating system.
As illustrated in Figure 8-2 on page 193, attaching a host system by using a single-path
connection implements a solution that depends on several single points of failure. In this
example, as a single link, failure either between the host system and the switch, between the
switch and the storage system, or a failure of the HA on the host system, the DS8000 storage
system, or even a failure of the switch leads to a loss of access of the host system.
Additionally, the path performance of the whole system is reduced by the slowest component
in the link.
Host System
Host adapter
single point of
failure
SAN switch
single point of
failure
Host
Port
I0001
Logical
disk
DS8000
Figure 8-2 SAN single-path connection
Adding additional paths requires you to use multipathing software (Figure 8-3 ). Otherwise,
the same LUN behind each path is handled as a separate disk from the operating system
side, which does not allow failover support.
Multipathing provides the DS8000 attached Open Systems hosts that run Windows, AIX,
HP-UX, Oracle Solaris, VMware, KVM, or Linux with these capabilities:
Support for several paths per LUN.
Load balancing between multiple paths when there is more than one path from a host
server to the DS8000 storage system. This approach might eliminate I/O bottlenecks that
occur when many I/O operations are directed to common devices through the same I/O
path, thus improving the I/O performance.
Host System
multipathing module
Host Host
Port Port
I0001 I0131
LUN
DS8000
Important: Do not intermix several multipathing solutions within one host system. Usually,
the multipathing software solutions cannot coexist.
Multipath I/O
Multipath I/O (MPIO) summarizes native multipathing technologies that are available in
several operating systems, such as AIX, Linux, and Windows. Although the implementation
differs for each of the operating systems, the basic concept is almost the same:
The multipathing module is delivered with the operating system.
The multipathing module supports failover and load balancing for standard SCSI devices,
such as simple SCSI disks or SCSI arrays.
Example 8-1 shows a smaller AIX LPAR which has 8 volumes, belonging to two different
DS8000 storage systems. The lsmpio -q command gives us the sizes and the DS8000
volume serial numbers. lsmpio -a gives us the WWPNs of the host being used, that we have
already told the DS8000 to make our volumes known to. lsmpio -ar additionally gives us the
WWPNs of the DS8000 host ports being used, and how many SAN paths are being used
between a DS8000 port and the IBM Power server port.
In the DS8000, we can use lshostconnect -login to see that our server is logged in with its
adapter WWPNs, and through which DS8000 ports.
With lsmpio -q -l hdiskX we get additional information about one specific volume. As seen
in Example 8-2 on page 196, we see the capacity and which DS8000 storage system is
hosting that LUN. We see that it is a 2107-998 model (DS8980F), and in the Volume Serial,
the WWPN of the DS8000 is also encoded.
As we see in Example 8-3 with the lsmpio -l hdiskX command, initially, only one of the
paths is selected by the operating system, for this LUN. lsattr -El shows us having the
algorithm fail_over, for that LUN (and besides, here it also shows the serial number of the
DS8000 we use).
But we can change this path algorithm, with the chdev -l -a command, as in Example 8-4.
For this to work, the hdisk needs to be taken offline.
[Link]
FCP: z/VM, z/VSE, and Linux for z Systems can also be attached to the DS8000 storage
system with FCP. Then, the same considerations as for Open Systems hosts apply.
8.3.1 FICON
FICON is a Fibre Connection used with z Systems servers. Each storage unit HA has either
four or eight ports, and each port has a unique WWPN. You can configure the port to operate
with the FICON upper-layer protocol. When configured for FICON, the storage unit provides
the following configurations:
Either fabric or point-to-point topology.
A maximum of 128 host ports for a DS8950F and DS8980F models and a maximum of 64
host ports for a DS8910F model.
A maximum of 1280 logical paths per DS8000 HA port.
Access to all 255 control unit images (65280 count key data (CKD) devices) over each
FICON port. FICON HAs support 4, 8, 16or 32 Gbps link speeds in DS8000 storage
systems.
Operating at 16 Gbps speeds, FICON Express16S channels achieve up to 2600 MBps for a
mix of large sequential read and write I/O operations, as shown in the following charts.
Figure 8-4 shows a comparison of the overall throughput capabilities of various generations
of channel technology.
Due to the implementation of new Application-Specific Integrated Circuits (ASIC), and new
internal design, the FICON Express16S+ and FICON EXPRESS16SA on IBM z15, represent
a significant improvement in maximum bandwidth capability compared to FICON Express16S
channels and previous FICON offerings. The response time improvements are expected to be
noticeable for large data transfers. The FICON Express16SA channel operates at the same
16 Gbps line rate as the previous generation FICON Express16S+ channel. Therefore, the
FICON Express16SA channel provides similar throughput performance when compared to
the FICON Express16S+.
With the introduction of the FEx16SA channel, improvements can be seen in both response
times and maximum throughput for IO/sec and MB/sec for workloads using the zHPF protocol
on FEx16SA channels.
With Fibre Channel Endpoint Security encryption enabled, the FEx16SA channel has the
added benefit of Encryption of Data In Flight for FCP (EDIF) operations with less than 4%
impact on maximum channel throughput.
High Performance FICON for IBM Z (zHPF) is implemented for throughput and latency, which
it optimizes by reducing the number of information units (IU) that are processed.
Enhancements to the z/Architecture and the FICON protocol provide optimizations for online
transaction processing (OLTP) workloads.
The size of most online database transaction processing (OLTP) workload I/O operations is
4K bytes. In laboratory measurements, a FICON Express16SA channel, using the zHPF
protocol and small data transfer I/O operations, achieved a maximum of 310 KIO/sec. Also,
using FICON Express16SA in a z15 using the zHPF protocol and small data transfer FC-ES
Encryption enabled achieved a maximum of 304 KIO/sec.
Figure 8-4 on page 199 shows the maximum single channel throughput capacities of each
generation of FICON Express channels supported on z15. For each of the generations of
FICON Express channels, it displays the maximum capability using the zHPF protocol
exclusively.
Figure 8-5 displays the maximum READ/WRITE (mix) MB/sec for each channel. Using
FEx16SA in a z15 with the zHPF protocol and a mix of large sequential read and update write
data transfer I/O operations, laboratory measurements achieved a maximum throughput of
3,200 READ/WRITE (mix) MB/sec. Also, using FEx16SA in a z15 with the zHPF protocol with
FC-ES Encryption and a mix of large sequential read and update write data transfer I/O
operations achieved a maximum throughput of 3,200 READ/WRITE (mix) MB/sec.
Figure 8-5 FICON Express Channels Maximum MB/sec over latest generations
The z14 and z15 CPCs offer FICON Express16S and FICON Express 16S+ SX and LX
features with two independent channels. Each feature occupies a single I/O slot and uses one
CHPID per channel. Each channel supports 4 Gbps, 8 Gbps, and 16 Gbps link data rates with
auto-negotiation to support existing switches, directors, and storage devices.
For any generation of FICON channels, you can attach directly to a DS8000 storage system,
or you can attach through a FICON capable FC switch.
FICON topologies
As shown in Figure 8-6 on page 201, FICON channels in FICON native mode, which means
CHPID type FC in Input/Output Configuration Program (IOCP), can access the DS8000
storage system through the following topologies:
Point-to-Point (direct connection)
Switched Point-to-Point (through a single FC switch)
Cascaded FICON Directors (through two FC switches)
Switched Point-to-Point
z Systems DS8000
z Systems DS8000
FICON FICON
FC Link Switch FC Link
FC Link Switch
z Systems DS8000
Figure 8-6 FICON topologies between z Systems and a DS8000 storage system
FICON connectivity
Usually, in IBM Z environments, a one-to-one connection between FICON channels and
storage HAs is preferred because the FICON channels are shared among multiple logical
partitions (LPARs) and heavily used. Carefully plan the oversubscription of HA ports to avoid
any bottlenecks.
Figure 8-7 on page 202 shows an example of FICON attachment that connects a z Systems
server through FICON switches. This example uses 16 FICON channel paths to eight HA
ports on the DS8000 storage system and addresses eight logical control units (LCUs). This
channel consolidation might be possible when your aggregate host workload does not exceed
the performance capabilities of the DS8000 HA.
FC FC FC FC FC FC FC FC FC FC FC FC FC FC FC FC
A one-to-many configuration is also possible, as shown in Figure 8-8 on page 202, but careful
planning is needed to avoid performance issues.
z Systems
FICON (FC) channels
FC FC FC FC FC FC FC FC
For more information about DS8000 FICON support, see IBM System Storage DS8000 Host
Systems Attachment Guide, SC26-7917, and FICON Native Implementation and Reference
Guide, SG24-6266.
You can monitor the FICON channel utilization for each CHPID in the RMF Channel Path
Activity report. For more information about the Channel Path Activity report, see Chapter 10,
“Performance considerations for IBM z Systems servers” on page 225.
The following statements are some considerations and best practices for paths in z/OS
systems to optimize performance and redundancy:
Do not mix paths to one LCU with different link speeds in a path group on one z/OS.
It does not matter in the following cases, even if those paths are on the same CPC:
– The paths with different speeds, from one z/OS to different multiple LCUs
– The paths with different speeds, from each z/OS to one LCU
Place each path in a path group on different I/O bays.
Do not have two paths from the same path group sharing a card.
The FICON features support FC devices to z/VM, z/VSE, KVM, and Linux on z Systems,
which means that these features can access industry-standard SCSI devices. These FCP
storage devices use FB 512-byte sectors rather than Extended Count Key Data (IBM ECKD)
format for disk applications. All available FICON features can be defined in FCP mode.
The following IBM i specific features are important for the performance of external storage:
Single-level storage
Object-based architecture
Storage management
Types of storage pools
This section describes these features and explains how they relate to the performance of a
connected DS8900F storage system.
The IBM i system takes responsibility for managing the information in auxiliary disk pools.
When you create an object, for example, a file, the system places the file in the best location
that ensures the best performance. It normally spreads the data in the file across multiple
flash units. Advantages of such design are ease of use, self-management, automation of
using the added flash units, and so on. IBM i object-based architecture is shown on
Figure 9-1 on page 207.
When the application performs an I/O operation, the portion of the program that contains read
or write instructions is first brought into LPAR main memory where the instructions are then
run.
With the read request, the virtual addresses of the needed record are resolved, and for each
needed page, storage management first looks to see whether it is in LPAR main memory. If
the page is there, it is used to resolve the read request. However, if the corresponding page is
not in LPAR main memory, a page fault is encountered and it must be retrieved from the
Auxiliary Storage Pool (ASP). When a page is retrieved, it replaces another page in LPAR
main memory that recently was not used; the replaced page is paged out to the ASP, which
resides inside the DS8900F storage server.
When resolving virtual addresses for I/O operations, storage management directories map
the flash and sector to a virtual address. For a read operation, a directory lookup is performed
to get the needed information for mapping. For a write operation, the information is retrieved
from the page tables.
System ASP
The system ASP is the basic flash pool for the IBM i system. This ASP contains the IBM i
system boot flash (load source device), system libraries, indexes, user profiles, and other
system objects. The system ASP is always present in the IBM i system and is needed for the
IBM i system to operate. The IBM i system does not start if the system ASP is inaccessible.
User ASP
A user ASP separates the storage for different objects for easier management. For example,
the libraries and database objects that belong to one application are in one user ASP, and the
objects of another application are in a different user ASP. If user ASPs are defined in the IBM i
system, they are needed for the IBM i system to start.
The DS8900F storage system can connect to the IBM i system in one of the following ways:
Native: FC adapters in the IBM i system are connected through a Storage Area Network
(SAN) to the Host Bus Adapters (HBAs) in the DS8900F storage system.
With Virtual I/O Server Node Port ID Virtualization (VIOS NPIV): FC adapters in the VIOS
are connected through a SAN to the HBAs in the DS8900F storage system. The IBM i
system is a client of the VIOS and uses virtual FC adapters; each virtual FC adapter is
mapped to a port in an FC adapter in the VIOS.
For more information about connecting the DS8900F storage system to the IBM i system
with VIOS_NPIV, see DS8000 Copy Services for IBM i with VIOS, REDP-4584, and IBM
System Storage DS8000: Host Attachment and Interoperability, SG24-8887.
With VIOS: FC adapters in the VIOS are connected through a SAN to the HBAs in the
DS8900F storage system. The IBM i system is a client of the VIOS, and virtual SCSI
adapters in VIOS are connected to the virtual SCSI adapters in the IBM i system.
For more information about connecting storage systems to the IBM i system with the
VIOS, see IBM i and Midrange External Storage, SG24-7668.
Most installations use the native connection of the DS8900F storage system to the IBM i
system or the connection with VIOS_NPIV.
IBM i I/O processors: The information that is provided in this section refers to connection
with IBM i I/O processor (IOP)-less adapters. For similar information about older
IOP-based adapters, see IBM i and IBM System Storage: A Guide to Implementing
External Disks on IBM i, SG24-7120.
Note: The supported FC adapters listed above require DS8900F R9.1 or higher microcode
and the IBM i 7.4TR4 operating system installed on Power 8/9 servers.
For detailed specifications, see the IBM System Storage Interoperation Centre (SSIC) at:
[Link]
All listed adapters are IOP-less adapters. They do not require an I/O processor card to offload
the data management. Instead, the processor manages the I/O and communicates directly
with adapter. Thus, the IOP-less FC technology takes full advantage of the performance
potential in the IBM i system.
Before the availability of IOP-less adapters, the DS8900F storage system connected to
IOP-based FC adapters that required the I/O processor card.
IOP-less FC architecture enables two technology functions that are important for the
performance of the DS8900F storage system with the IBM i system: Tag Command Queuing
and Header Strip Merge.
9.2.3 Multipath
The IBM i system allows multiple connections from different ports on a single IBM i partition to
the same LVs in the DS8900F storage system. This multiple connections support provides an
extra level of availability and error recovery between the IBM i system and the DS8900F
storage system. If one IBM i adapter fails, or one connection to the DS8900F storage system
is lost, you can continue using the other connections and continue communicating with the
disk unit. The IBM i system supports up to eight active connections (paths) to a single LUN in
the DS8900F storage system.
IBM i multi-pathing is built into the IBM i System Licensed Internal Code (SLIC) and does not
require a separate driver package to be installed.
In addition to high availability, multiple paths to the same LUN provide load balancing. A
Round-Robin algorithm is used to select the path for sending the I/O requests. This algorithm
enhances the performance of the IBM i system with DS8900F connected LUNs.
When the DS8900F storage system connects to the IBM i system through the VIOS,
Multipath in the IBM i system is implemented so that each path to a LUN uses a different
VIOS. Therefore, at least two VIOSs are required to implement Multipath for an IBM i client.
This way of multipathing provides additional resiliency if one VIOS fails. In addition to IBM i
Multipath with two or more VIOS, the FC adapters in each VIOS can multipath to the
connected DS8900F storage system to provide additional resiliency and enhance
performance.
The default and preferred RAID configuration for the DS8900F is now RAID 6. RAID 6 arrays
provide better resiliency than RAID 5 with the use of two parity flash modules inside each
array allowing an array to be rebuilt which has two failed flash modules.
Alternatively RAID 10 provides better resiliency and in some cases enables better
performance than RAID 6. The difference in performance is because of the lower RAID
penalty that is experienced with RAID 10 compared to RAID 6. The workloads with a low
read/write ratio and with many random writes benefit the most from RAID 10.
Consider RAID 10 for IBM i systems especially for the following types of workloads:
Workloads with large I/O rates
Workloads with many write operations (low read/write ratio)
Workloads with many random writes
Workloads with low write-cache efficiency
When an IBM i page or a block of data is written to flash space, storage management spreads
it over multiple flash modules. By spreading data over multiple flash modules, multiple flash
arms work in parallel for any request to this piece of data, so writes and reads are faster.
When using external storage with the IBM i system, storage management sees an Logical
Volume (LUN) in the DS8900F storage system as a “physical” flash module.
IBM i LUNs can be created with one of two Extent Allocation Methods (EAM) using the
mkfbvol command:
Rotate Volumes (rotatevols) EAM. This method occupies multiple stripes of a single rank.
Rotate Extents (rotateexts) EAM. This method is composed of multiple stripes of different
ranks.
Figure 9-3 on page 213 shows the use of the DS8900F disk with IBM i LUNs created with the
rotatexts EAM.
LUN 1
S
Disk
unit 2
7+P array
LUN 2
Figure 9-3 Use of disk arms with LUNs created in the rotate extents method
Therefore, a LUN uses multiple DS8900F flash arms in parallel. The same DS8900F flash
arms are used by multiple LUNs that belong to the same IBM i workload, or even to different
IBM i workloads. To support efficiently this structure of I/O and data spreading across LUNs
and flash modules, it is important to provide enough disk arms to an IBM i workload.
Use the new StorM tool when planning the number of ranks in the DS8900F storage system
for an IBM i workload.
To provide a good starting point for the StorM modeler, consider the number of ranks that is
needed to keep disk utilization under 60% for your IBM i workload.
Table 9-1 shows the maximal number of IBM i I/O/sec for one rank to keep the disk utilization
under 60%, for the workloads with read/write ratios 70/30 and 50/50.
Use the following steps to calculate the necessary number of ranks for your workload by using
Table 9-1:
1. Decide which read/write ratio (70/30 or 50/50) is appropriate for your workload.
2. Decide which RAID level to use for the workload.
1
The calculations for the values in Table 9-1 are based on the measurements of how many I/O operations one rank
can handle in a certain RAID level, assuming 20% read cache hit and 30% write cache efficiency for the IBM i
workload. Assume that half of the used ranks have a spare and half are without a spare.
With IBM i internal disks can potentially be physical disk units or flash modules. With a
connected DS8900F storage system, usable capacity created from flash modules is called a
LUN. Therefore, it is important to provide many LUNs to an IBM i system.
Number of disk drives in the DS8900F storage system: In addition to the suggestion for
many LUNs, use a sufficient number of flash modules in the DS8900F storage system to
achieve good IBM i performance, as described in 9.3.2, “Number of ranks” on page 212.
Since the introduction of the IBM i 7.2 TR7 and 7.3 TR3 operating systems up to 127 LUN’s
are now supported per IBM i 16 Gb (or higher) physical or 8 Gb (or higher) virtual Fibre
Channel (FC) adapter.
For IBM i operating systems before IBM i 7.2 TR7 and 7.3 TR3, the limit on the number of
LUNs was 64 per FC adapter port.
Another reason why you should define smaller LUNs for an IBM i system is the queue depth
in Tagged Command Queuing. With a natively connected DS8900F storage system, an IBM i
system manages the queue depth of six concurrent I/O operations to a LUN. With the
DS8900F storage system connected through VIOS, the queue depth for a LUN is 32
concurrent I/O operations. Both of these queue depths are modest numbers compared to
other operating systems. Therefore, you must define sufficiently small LUNs for an IBM i
system to not exceed the queue depth with I/O operations.
Also, by considering the manageability and limitations of external storage and an IBM i
system, define LUN sizes of about 70.5 GB - 141 GB.
Note: For optimal performance do not create LUNs of different sizes within the same ASP
or IASP, choose a suitable LUN capacity from the outset and stick to it.
Since the introduction of code bundle [Link], two new IBM i volume data types were
introduced to support variable volume sizes:
A50 an unprotected variable size volume.
A99 a protected variable size volume.
Using these volume data types provide greater flexibility in choosing an optimum LUN size
for your requirements.
An extent size can depend on the extent type and which type of operating system will be
using them, in this case IBM i which is FB. The extents are the capacity building blocks of the
LUNs and you can choose between large extents and small extents when creating the ranks
during initial DS8900F configuration.
A Fixed Block (FB) rank can potentially have an extent size of either:
Large Extent 1 GiB
Small Extent size of 16 MiB
To utilize more effective capacity utilization within the DS8900F the newly available SCSI
Un-Map capability takes advantage of the small extents feature and is now the recommended
extent type for IBM i storage implementations.
You might think that the rotate volumes EAM for creating IBM i LUNs provides sufficient flash
modules for I/O operations and that the use of the rotate extents EAM is “over-virtualizing”.
However, based on the performance measurements and preferred practices, the rotate
extents EAM of defining LUNs for an IBM i system still provides the preferred performance, so
use it.
Sharing the ranks among the IBM i systems enables the efficient use of the DS8900F
resources. However, the performance of each LPAR is influenced by the workloads in the
other LPARs.
For example, two extent pools are shared among IBM i LPARs A, B, and C. LPAR A
experiences a long peak with large block sizes that causes a high I/O load on the DS8900F
You cannot predict when the peaks in each LPAR happen, so you cannot predict how the
performance in the other LPARs is influenced.
Many IBM i data centers successfully share the ranks with little unpredictable performance
because the flash modules and cache in the DS8900F storage system are used more
efficiently this way.
Other IBM i data centers prefer the stable and predictable performance of each system even
at the cost of more DS8900F resources. These data centers dedicate extent pools to each of
the IBM i LPARs.
Many IBM i installations have one or two LPARs with important workloads and several
smaller, less important LPARs. These data centers dedicate ranks to the large systems and
share the ranks among the smaller ones.
Recommendations as to how many fiber ports to use can be found in the following
publication. You should take into account that a fiber port should not exceed 70% utilization
when running peak workload. An IBM i client partition supports up to eight multi-path
connections to a single flash module based LUN. Recommendations can be found in the
following publication:
Limitations and Restrictions for IBM i Client Logical Partitions:
[Link]
When implementing a multi-pathing design for the IBM i and DS8900F, plan to zone one fiber
port in an IBM i system with one fiber port in the DS8900F storage system when running the
workload on flash modules.
Using the Service Tools Function within the IBM i partition, the number of paths to a LUN can
be seen:
1. On IBM i LPAR command Line type STRSST
2. Take Option 3 - Work With Disk Units
3. Take Option 2 - Work With Disk Configuration
4. Take Option 1 - Display Disk Configuration
5. Take Option 9 - Display Disk Path Status
Two disk units are shown (1 & 2) with two paths allocated to each. For a single path disk the
Resource Name column would show DDxxx. Instead as each disk has more than one path
available the Resource Name begins with (Disk Multi Pathing) DMP001 and DMP003 for Disk
Unit 1.
The IBM i multi-pathing driver supports up to 8 active paths (+8 standby paths for
HyperSwap) configurations to the DS8000 storage systems.
Since the introduction of operating system level IBM i 7.1 TR2 the multi-pathing driver uses
an advanced load balancing algorithm which accounts for path usage by the amount of
outstanding I/O per path.
From a performance perspective the use of more than 2 or 4 active paths with DS8000
storage systems is typically not required.
To help you better understand the tool functions, they are divided into two groups:
performance data collectors (the tools that collect performance data) and performance data
investigators (the tools to analyze the collected data).
Collectors can be managed by IBM System Director Navigator for i, IBM System i Navigator,
or IBM i commands.
Most of these comprehensive planning tools address the entire spectrum of workload
performance on System i, including processor, system memory, disks, and adapters. To plan
or analyze performance for the DS8900F storage system with an IBM i system, use the parts
of the tools or their reports that show the disk performance.
Collection Services
Collection Services for IBM i can collect system and job level performance data. It can run all
the time and will sample system and job level performance data with collection intervals as
low as 15 seconds or alternatively up to an hour. It provides data for performance health
checks or analysis of a sudden performance problem. For detailed documentation, refer to:
[Link]
Collection Services can look at jobs, threads, processor, disk, and communications. It also
has a set of specific statistics for the DS8900F storage system. For example, it shows which
IBM i storage units are located within the DS8900F LUNs, whether they are connected in a
single path or multipath, the disk service time, and wait time.
The following tools can be used to manage the data collection and report creation of
Collection Services:
IBM System i Navigator
IBM System Director navigator
IBM Performance Tools for i
iDoctor Collection Service Investigator can be used to create graphs and reports based on
Collection Services data. For more information about iDoctor, see the IBM i iDoctor online
documentation at:
[Link]
With IBM i level V7R4, the Collection Services tool offers additional data collection categories,
including a category for external storage. This category supports the collection of
nonstandard data that is associated with certain external storage subsystems that are
attached to an IBM i partition. This data can be viewed within iDoctor, which is described in
“iDoctor” on page 220.
Job Watcher
Job Watcher is an advanced tool for collecting and analyzing performance information to help
you effectively monitor your system or to analyze a performance issue. It is job-centric and
thread-centric and can collect data at intervals of seconds. To use IBM i Job Watcher
functions and content require the installation of IBM Performance Tools for i (5770-PT1)
Option 3 - Job Watcher.
Disk Watcher
Disk Watcher is a function of an IBM i system that provides disk data to help identify the
source of disk-related performance problems on the IBM i platform. Its functions require the
installation of IBM Performance Tools for i (5770-PT1) Option 1 Manager Feature.
Disk Watcher gathers detailed information that is associated with I/O operations to flash
modules and provides data beyond the data that is available in other IBM i integrated tools,
such as Work with Disk Status (WRKDSKSTS), Work with System Status (WRKSYSSTS), and Work
with System Activity (WKSYSACT).
Performance Explorer
Performance Explorer is a data collection tool that helps the user identify the causes of
performance problems that cannot be identified by collecting data using Collection Services
or by doing general trend analysis.
[Link]
[Link]
Performance Tools helps you gain insight into IBM i performance features, such as dynamic
tuning, expert cache, job priorities, activity levels, and pool sizes. You can also identify ways
to use these services better. The tool also provides analysis of collected performance data
and produces conclusions and recommendations to improve system performance.
The Job Watcher part of Performance Tools analyzes the Job Watcher data through the IBM
Systems Director Navigator for i Performance Data Visualizer.
Collection Services reports about disk utilization and activity, which are created with IBM
Performance Tools for i, are used for sizing and Disk Magic modeling of the DS8900F storage
system for the IBM i system:
The Disk Utilization section of the System report
The Disk Utilization section of the Resource report
The Disk Activity section of the Component report
The Performance section of IBM Systems Director Navigator for i provides tasks to manage
the collection of performance data and view the collections to investigate potential
performance issues. Figure 9-4 on page 220 shows the menu of performance functions in the
IBM Systems Director Navigator for i.
iDoctor
iDoctor is a suite of tools that is used to manage the collection of data, investigate
performance data, and analyze performance data on the IBM i system. The goals of iDoctor
are to broaden the user base for performance investigation, simplify and automate processes
of collecting and investigating the performance data, provide immediate access to collected
data, and offer more analysis options.
The iDoctor tools are used to monitor the overall system health at a high level or to drill down
to the performance details within jobs, flash modules and programs. Use iDoctor to analyze
data that is collected during performance situations. iDoctor is frequently used by IBM, clients,
and consultants to help solve complex performance issues quickly. Further information about
these tools can be found at:
[Link]
IBM i Storage Manager spreads the IBM i data across the available flash modules (LUNs)
contained within an extent pool so that each flash module is about equally occupied. The data
is spread in extents that are 4 KB - 1 MB or even 16 MB. The extents of each object usually
span as many LUNs as possible to provide many volumes to serve the particular object.
Therefore, if an object experiences a high I/O rate, this rate is evenly split among the LUNs.
The extents that belong to the particular object on each LUN are I/O-intense.
Many of the IBM i performance tools work on the object level; they show different types of
read and write rates on each object and disk service times on the objects. For more
information about the IBM i performance tools, see 9.4.1, “IBM i performance tools” on
page 217. You can relocate hot data by objects by using the Media preference method, which
is described in “IBM i Media preference” on page 222.
Also, the Easy Tier tool monitors and relocates data on the 1 GB extent level. IBM i ASP
balancing, which is used to relocate data to Flash, works on the 1 MB extent level. Monitoring
extents and relocating extents do not depend on the object to which the extents belong; they
occur on the subobject level.
ASP balancing
This IBM i method is similar to DS8900F Easy Tier because it is based on the data movement
within an ASP by IBM i ASP balancing. The ASP balancing function is designed to improve
The HSM balancer function, which traditionally supports data migration between
high-performance and low-performance internal disk drives, is extended for the support of
data migration between Flash Modules and HDDs. The flash drives can be internal or on the
DS8900F storage system. The data movement is based on the weighted read I/O count
statistics for each 1 MB extent of an ASP. Data monitoring and relocation is achieved by the
following two steps:
1. Run the ASP balancer tracing function during the important period by using the TRCASPBAL
command. This function collects the relevant data statistics.
2. By using the STRASPBAL TYPE(*HSM) command, you move the data to Flash and HDD
based on the statistics that you collected in the previous step.
The Media preference balancer function is the ASP balancing function that helps to correct
any issues with Media preference-flagged database objects or UDFS files not on their
preferred media type, which is either Flash or HDD, based on the specified subtype
parameter.
The function is started by the STRASPBAL TYPE(*MP) command with the SUBTYPE parameter
equal to either *CALC (for data migration to both Flash (SSD) and HDD), *SSD, or *HDD.
The ASP balancer migration priority is an option in the ASP balancer so that you can specify
the migration priority for certain balancing operations, including *HSM or *MP in levels of
either *LOW, *MEDIUM, or *HIGH, thus influencing the speed of data migration.
Location: For data relocation with Media preference or ASP balancing, the LUNs defined
on Flash and on HDD must be in the same IBM i ASP. It is not necessary that they are in
the same extent pool in the DS8900F storage system.
Additional information
For more information about the IBM i methods for hot-spot management, including the
information about IBM i prerequisites, refer to:
[Link]
A note about DS8000 sizing: z Systems I/O workload is complex. Use RMF data, StorM
and Disk Magic models for sizing, where appropriate.
For more information about StorM with IBM Z see “Modeling and sizing for IBM Z” on
page 132. For more information about the sizing tools, see Chapter 6.1, “IBM Storage
Modeller” on page 126.
Disk Magic for IBM is the legacy as-is performance modeling and hardware configuration
planning tool for legacy IBM data storage solutions and is not updated for product currency
effective January 1, 2020. Users will continue to have access to the Magic tools to support
legacy product sizings.
IBM z Systems® and the IBM System Storage DS8000 storage systems family have a long
common history. Numerous features were added to the whole stack of server and storage
hardware, firmware, operating systems, and applications to improve I/O performance. This
level of synergy is unique to the market and is possible only because IBM is the owner of the
complete stack. This chapter does not describe these features because they are explained in
other places. For an overview and description of IBM Z Systems synergy features, see IBM
DS8000 and IBM Z Synergy, REDP-5186.
Contact your IBM representative or IBM Business Partner if you have questions about the
expected performance capability of IBM products in your environment.
The data is then stored and processed in several ways. The ones that are described and used
in this chapter are:
Data that the three monitors collect can be stored as SMF records (SMF types 70 - 79) for
later reporting.
RMF Monitor III can write VSAM records to in-storage buffer or into VSAM data sets.
The RMF Postprocessor is the function to extract historical reports for Monitor I data.
Other methods of working with RMF data, which are not described in this chapter, are:
The RMF Spread Sheet Reporter provides a graphical presentation of long-term
Postprocessor data. It helps you view and analyze performance at a glance or for system
health checks.
The RMF Distributed Data Server (DDS) supports HTTP requests to retrieve RMF
Postprocessor data. The data is returned as an XML document so that a web browser can
act as Data Portal to RMF data.
z/OS Management Facility provides the presentation for DDS data.
RMF XP enables Cross Platform Performance Monitoring for installation running operating
systems other than z/OS, namely: AIX on Power, Linux on System x, Linux on z Systems.
The following sections describe how to gather and store RMF Monitor I data and then extract
it as reports using the RMF Postprocessor.
RMF Monitor I gatherer session options and write SMF record types
To specify which types of data RMF is collecting, you specify Monitor I session gatherer
options in the ERBRMFxx parmlib member.
Table 10-1 shows the Monitor I session options and associated SMF record types that are
related to monitoring I/O performance. The defaults are emphasized.
Table 10-1 Monitor I gatherer session options and write SMF record types
Activities Session options in SMF record types
ERBRMFxx parmlib member (Long-term Monitor I)
Note: The Enterprise Storage Server activity is not collected by default. Change this and
turn on the collection if you have DS8000 storage systems installed. It provides valuable
information about DS8000 internal resources.
Note: FCD performance data is collected from the FICON directors. You must have the
FICON Control Unit Port (CUP) feature licensed and installed on your directors.
Certain measurements are performed by the storage hardware. The associated RMF record
types are 74.5, 74.7, and 74.8. They are marked with an (H) in Table 10-1 . They do not have
to be collected by each attached z/OS system separately; it is sufficient to get them from one
or, for redundancy reasons, two systems.
Note: Many customers, who have several z/OS systems sharing disk systems, typically
collect these records from two production systems that are always up and not running at
the same time.
In the ERBRMFxx parmlib member, you also find a TIMING section, where you can set the
RMF sampling cycle. It defaults to 1 second, which should be good for most cases. The RMF
cycle does not determine the amount of data that is stored in SMF records.
To store the collected RMF data, you must make sure that the associated SMF record types
(70 - 78) are included in the SYS statement in the SMFPRMxx parmlib member.
You also must specify the interval at which RMF data is stored. You can either do this explicitly
for RMF in the ERBRMFxx parmlib member or use the system wide SMF interval. Depending
on the type of data, RMF samples are added up or averaged for each interval. The number of
SMF record types you store and the interval make up the amount of data that is stored.
For more information about setting up RMF and the ERBRMFxx and SMFPRMxx parmlib
members, see z/OS Resource Measurement Facility User's Guide, SC34-2664-40.
Important: The shorter your interval, the more accurate your data is. However, there is
always a trade-off between shorter intervals and the size of the SMF data sets.
IFASMFDP can also be used to extract and concatenate certain record types or time ranges
from existing sequential SMF dump data sets. For more information about the invocation and
control of ISASMFDP, see z/OS MVS System Management Facilities (SMF), SA38-0667-40.
To create meaningful RMF reports or analysis, the records in the dump data set must be
chronological. This is important if you plan to analyze RMF data from several LPARs. Use a
SORT program to combine the individual data sets and sort them by date and time.
Example 10-2 shows a sample job snippet with the required sort parameters by using the
DFSORT program.
RMF postprocessor
The RMF postprocessor analyses and summarizes RMF data into human readable reports.
Example 10-3 shows a sample job to run the ERBRMFPP post processing program.
In the control statements, you specify the reports that you want to get by using the REPORTS
keyword. Other control statements define the time frame, intervals, and summary points for
the reports to create. For more information about the available control statements, see z/OS
Resource Measurement Facility User's Guide, SC34-2664-40.
Note: You can also generate and start the postprocessor batch job from the ISPF menu of
RMF.
To get a first impression, you can sort volumes by their I/O intensity, which is the I/O rate
multiplied by Service Time (PEND + DISC + CONN component). Also, look for the largest
component of the response time. Try to identify the bottleneck that causes this problem. Do
not pay too much attention to volumes with low or no Device Activity Rate, even if they show
high I/O response time. The following sections provide more detailed explanations.
The device activity report accounts for all activity to a base and all of its associated alias
addresses. Activity on alias addresses is not reported separately, but accumulated into the
base address.
The Parallel Access Volume (PAV) value is the number of addresses assigned to a unit
control block (UCB), including the base address and the number of aliases assigned to that
base address.
RMF reports the number of PAV addresses (or in RMF terms, exposures) that are used by a
device. In a HyperPAV environment, the number of PAVs is shown in this format: [Link]. The H
indicates that this volume is supported by HyperPAV. The n.n is a one decimal number that
shows the average number of PAVs assigned to the address during the RMF report period.
Example 10-4 shows that address 7010 has an average of 1.5 PAVs assigned to it during this
RMF period. When a volume has no I/O activity, the PAV is always 1, which means that no
alias is assigned to this base address. In HyperPAV an alias is used or assigned to a base
address only during the period that is required to run an I/O. The alias is then released and
put back into the alias pool after the I/O is completed.
Important: The number of PAVs includes the base address plus the number of aliases
assigned to it. Thus, a PAV=1 means that the base address has no aliases assigned to it.
DEVICE AVG AVG AVG AVG AVG AVG AVG AVG % % % AVG %
STORAGE DEV DEVICE NUMBER VOLUME PAV LCU ACTIVITY RESP IOSQ CMR DB INT PEND DISC CONN DEV DEV DEV NUMBER ANY
GROUP NUM TYPE OF CYL SERIAL RATE TIME TIME DLY DLY DLY TIME TIME TIME CONN UTIL RESV ALLOC ALLOC
7010 33909 10017 ST7010 1.5H 0114 689.000 1.43 .048 .046 .000 .163 .474 .741 33.19 54.41 0.0 2.0 100.0
7011 33909 10017 ST7011 1.5H 0114 728.400 1.40 .092 .046 .000 .163 .521 .628 29.72 54.37 0.0 2.0 100.0
YGTST00 7100 33909 60102 YG7100 1.0H 003B 1.591 12.6 .000 .077 .000 .067 .163 12.0 .413 0.07 1.96 0.0 26.9 100.0
YGTST00 7101 33909 60102 YG7101 1.0H 003B 2.120 6.64 .000 .042 .000 .051 .135 6.27 .232 0.05 1.37 0.0 21.9 100.0
Figure 10-2 illustrates how these components relate to each other and to the common
response and service time definitions.
Before learning about the individual response time components, you should know about the
different service time definitions in simple terms:
I/O service time is the time that is required to fulfill an I/O request after it is dispatched to
the storage hardware. It includes locating and transferring the data and the required
handshaking.
I/O response time is the I/O service time plus the time the I/O request spends in the I/O
queue of the host.
System service time is the I/O response time plus the time it takes to notify the requesting
application of the completion.
Only the I/O service time is directly related to the capabilities of the storage hardware. The
additional components that make up I/O response or system service time are related to host
system capabilities or configuration.
The following sections describe all response time components in more detail. They also
provide possible causes for unusual values.
PEND time
PEND time represents the time that an I/O request waits in the hardware. It can become
increased by the following conditions:
High DS8000 HA utilization:
– An HA can be saturated even if the individual ports have not yet reached their limits.
HA utilization is not directly reported by RMF.
– The Command response (CMR) delay, which is part of PEND, can be an indicator for
high HA utilization. It represents the time that a Start- or Resume Subchannel function
needs until the device accepts the first command. It should not exceed a few hundred
microseconds.
– The Enterprise Storage Server report can help you further to find the reason for
increased PEND caused by a DS8000 HA. For more information, see “Enterprise
Storage Server” on page 261.
High FICON Director port utilization:
– High FICON Director port or DS8000 HA port utilization can occur due to a number of
factors:
• Multiple FICON channels connecting to the same outbound switch port causing
over commission on the port.
In this case, the FICON channel utilization as seen from the host might be low, but
the combination or sum of the utilization of these channels that share the outbound
port can be significant. A one-to-one mapping between one FICON channel and
one outbound switch port is recommended to avoid any over utilization of the
outbound port issues.
• Faulty Small form-factor plugable (SFP) transceivers at the DS8000 or Switch port
end where the DS8000 connects.
In this case, the SFP will undergo recovery to try and maintain a viable connection
increasing the port utilization.
– The FICON Director report can help you isolate the ports that cause increased PEND.
For more information, see 10.1.6, “FICON Director Activity report” on page 237.
Device Busy (DB) delay: Time that an I/O request waits because the device is busy. Today,
mainly because of the Multiple Allegiance feature, DB delay is rare. But it can occur due
to:
DISC time
DISC is the time that the storage system needs to process data internally. During this time, it
disconnects from the channel to free it for other operations:
The most frequent cause of high DISC time is waiting for data to be staged from the
storage back end into cache because of a read cache miss. This time can be elongated
because of the following conditions:
– Low read hit ratio. For more information, see 10.1.7, “Cache and NVS report” on
page 238. The lower the read hit ratio, the more read operations must wait for the data
to be staged from the DDMs to the cache. Adding/Increasing cache to the DS8000
storage system can increase the read hit ratio.
– High rank utilization. You can verify this condition with the Enterprise Storage Server
Rank Statistics report, as described in “Enterprise Storage Server” on page 261.
Heavy write workloads sometimes cause an NVS full condition. Persistent memory full
condition or NVS full condition can also elongate the DISC time. For more information, see
10.1.7, “Cache and NVS report” on page 238.
In a Metro Mirror environment, the time needed to send and store the data to the
secondary volume also adds to the DISCONNECT component.
CONN time
For each I/O operation, the channel subsystem measures the time that storage system,
channel, and CPC are connected for data transmission. CONN depends primarily on the
amount of data that is transferred per I/O. Large I/Os naturally have a higher CONN
component than small ones.
If there is a high level of utilization of resources, time can be spent in contention rather than
transferring data. Several reasons exist for higher than expected CONN time:
FICON channel saturation. CONN time increases if the channel or BUS utilization is high.
FICON data is transmitted in frames. When multiple I/Os share a channel, frames from an
I/O are interleaved with those from other I/Os. This elongates the time that it takes to
transfer all of the frames of that I/O. The total of this time, including the transmission time
of the interleaved frames, is counted as CONN time. For details and thresholds, see
“FICON channels” on page 247.
Contention in the FICON Director. A DS8000 HA port can also affect CONN time, although
they primarily affect PEND time.
Tip: A CPENABLE=(5,15) setting for all z/OS LPARs running on z14 and later processors
is recommended.
[Link]
Note: I/O interrupt delay time is not counted as part of the I/O response time.
If the utilization (% IOP BUSY) is unbalanced and certain IOPs are saturated, it can help to
redistribute the channels assigned to the storage systems. An IOP is assigned to handle a
certain set of channel paths. Assigning all of the channels from one IOP to access a busy disk
system can cause a saturation on that particular IOP. For more information, see the hardware
manual of the CPC that you use.
Example 10-5 shows an I/O Queuing Activity report for the IOPs section.
In a SuperPAV environment, the AMG section of the I/O Queueing Activity report shows the
combined LCU performance attributed to each AMG. This report is to be used in conjunction
with the LCU section of the I/O Queuing Activity report. The LCU section shows a breakdown
of each LCU and the related AMG.
Example 10-6 on page 235 show an I/O Queuing Activity report with SuperPAV enabled,
AMGs formed and LCUs related to the AMGs.
---------------------------------------------------------------------------------------------------------------------------
LOGICAL CONTROL UNITS
---------------------------------------------------------------------------------------------------------------------------
AVG AVG DELAY AVG AVG DATA
LCU/ CU DCM GROUP CHAN CHPID % DP % CU CUB CMR CONTENTION Q CSS HPAV OPEN XFER
AMG MIN MAX DEF PATHS TAKEN BUSY BUSY DLY DLY RATE LNGTH DLY WAIT MAX EXCH CONC
015A 8201 82 0.030 0.00 0.00 0.0 0.0
00000007 85 0.020 0.00 0.00 0.0 0.0
83 0.030 0.00 0.00 0.0 0.0
86 0.023 0.00 0.00 0.0 0.0
* 0.103 0.00 0.00 0.0 0.0 0.000 0.00 0.1 0.000 0 0.00 0.00
015C 8401 82 0.010 0.00 0.00 0.0 0.0
00000007 85 0.010 0.00 0.00 0.0 0.0
83 0.013 0.00 0.00 0.0 0.0
86 0.013 0.00 0.00 0.0 0.0
* 0.047 0.00 0.00 0.0 0.0 0.000 0.00 0.1 0.000 0 0.00 0.00
Note: If the system is not running in SuperPAV mode or the conditions to enable SuperPAV
are not met, no AMGs can be formed and the AMG section is not shown in the I/O Queuing
Activity report.
In a HyperPAV environment, you can also check the usage of HyperPAV alias addresses.
Example 10-7 shows the LCU section of the I/O Queueing Activity report. It reports on
HyperPAV alias usage in the HPAV MAX column. Here, a maximum of 2 PAV alias addresses
were used for that LCU during the reporting interval. You can compare this value to the
number of aliases that are defined for that LCU. If all are used, you might experience delays
because of a lack of aliases.
The HPAV WAIT value also indicates this condition. It is calculated as the ratio of the number of
I/O requests that cannot start because no HyperPAV aliases are available to the total number
of I/O requests for that LCU. If it is nonzero in a significant number of intervals, you might
consider defining more aliases for this LCU.
Note: If your HPAV MAX value is constantly below the number of defined alias addresses,
you can consider unassigning some aliases and use these addresses for additional base
devices. Do this only if you are short of device addresses. Monitor HPAV MAX over an
extended period to ensure that you do not miss periods of higher demand for PAV.
Note: We explain this estimation with CHPID 30 in the example in Figure 10-3. The
SPEED value is 16 Gbps, which roughly converts to 1600 MBps. TOTAL READ is 50.76
MBps, which is higher than TOTAL WRITE. Therefore, the link utilization is
approximately 50.78 divided by 1600, which results in 0.032 or 3.2%
Exceeding the thresholds significantly causes frame pacing, which eventually leads to higher
than necessary CONNECT times. If this happens only for a few intervals, it is most likely no
problem.
For small block transfers, the BUS utilization is less than the FICON channel utilization. For
large block transfers, the BUS utilization is greater than the FICON channel utilization.
Depending on the type of Fibre Channel host adapter installed in the DS8000 storage
system, the link between the director and the DS8000 storage system can run at 32Gbps and
16Gbps and negotiate down to 8Gbps and 4Gbps.
If the channel is point-to-point connected to the DS8000 HA port, the G field indicates the
negotiated speed between the FICON channel and the DS8000 port. The SPEED column
indicates the actual channel path speed at the end of the interval.
The RATE field in the FICON OPERATIONS or zHPF OPERATIONS columns means the
number of FICON or zHPF I/Os per second initiated at the physical channel level. It is not
broken down by LPAR.
Note: To get the data that is related to the FICON Director Activity report, the CUP device
must be online on the gathering z/OS system.
The measurements that are provided are on a director port level. It represents the total I/O
passing through this port and is not broken down by LPAR or device.
The important performance metric is AVG FRAME PACING. This metric shows the average
time (in microseconds) that a frame waited before it was transmitted. The higher the
contention on the director port, the higher the average frame pacing metric. High frame
pacing negatively influences the CONNECT time.
PORT ---------CONNECTION-------- AVG FRAME AVG FRAME SIZE PORT BANDWIDTH (MB/SEC) ERROR
ADDR UNIT ID SERIAL NUMBER PACING READ WRITE -- READ -- -- WRITE -- COUNT
05 CHP FA 0000000ABC11 0 808 285 50.04 10.50 0
07 CHP 4A 0000000ABC11 0 149 964 20.55 5.01 0
09 CHP FC 0000000ABC11 0 558 1424 50.07 10.53 0
0B CHP-H F4 0000000ABC12 0 872 896 50.00 10.56 0
12 CHP D5 0000000ABC11 0 73 574 20.51 5.07 0
13 CHP C8 0000000ABC11 0 868 1134 70.52 2.08 1
14 SWITCH ---- 0ABCDEFGHIJK 0 962 287 50.03 10.59 0
15 CU C800 0000000XYG11 0 1188 731 20.54 5.00 0
Note: Cache reports by LCU calculate the total activities of volumes that are online.
------------------------------------------------------------------------------------------------------------------------------------
CACHE SUBSYSTEM OVERVIEW
------------------------------------------------------------------------------------------------------------------------------------
TOTAL I/O 12180 CACHE I/O 12180
TOTAL H/R 1.000 CACHE H/R 1.000
CACHE I/O -------------READ I/O REQUESTS------------- ----------------------WRITE I/O REQUESTS---------------------- %
REQUESTS COUNT RATE HITS RATE H/R COUNT RATE FAST RATE HITS RATE H/R READ
NORMAL 6772 22.6 6770 22.6 1.000 3556 11.9 3556 11.9 3556 11.9 1.000 65.6
SEQUENTIAL 871 2.9 871 2.9 1.000 981 3.3 981 3.3 981 3.3 1.000 47.0
CFW DATA 0 0.0 0 0.0 N/A 0 0.0 0 0.0 0 0.0 N/A N/A
TOTAL 7643 25.5 7641 25.5 1.000 4537 15.1 4537 15.1 4537 15.1 1.000 62.8
-----------------------CACHE MISSES----------------------- ----------------MISC---------------
REQUESTS READ RATE WRITE RATE TRACKS RATE COUNT RATE
DELAYED DUE TO NVS 0 0.0
NORMAL 2 0.0 0 0.0 4 0.0 DELAYED DUE TO CACHE 0 0.0
SEQUENTIAL 0 0.0 0 0.0 0 0.0 DFW INHIBIT 0 0.0
CFW DATA 0 0.0 0 0.0 ASYNC (TRKS) 1093 3.6
TOTAL 2 RATE 0.0
---CKD STATISTICS--- ---RECORD CACHING-- -HOST ADAPTER ACTIVITY- --------DISK ACTIVITY-------
BYTES BYTES RESP BYTES BYTES
WRITE 3333 READ MISSES 0 /REQ /SEC TIME /REQ /SEC
WRITE HITS 3332 WRITE PROM 3449 READ 16.1K 410.7K READ 4.000 32.8K 437
WRITE 10.2K 154.2K WRITE 13.261 38.4K 104.0K
The report breaks down the LCU level I/O requests by read and by write. It shows the rate,
the hit rate, and the hit ratio of the read and the write activities. The read-to-write ratio is also
calculated.
Note: The total I/O requests here can be higher than the I/O rate shown in the DASD
report. In the DASD report, one channel program is counted as one I/O. However, in the
cache report, if there are multiple Locate Record commands in a channel program, each
Locate Record command is counted as one I/O request.
In this report, you can check the value of the read hit ratio. Low read hit ratios contribute to
higher DISC time. For a cache-friendly workload, you see a read hit ratio of better than 90%.
The write hit ratio is usually 100%.
Note: In cases with high DELAYED DUE TO NVS, you usually see high rank utilization in
the DS8000 rank statistics, which are described in 10.1.8, “Enterprise Disk Systems
report” on page 240.
The DISK ACTIVITY part of the report can give you a rough indication of the back-end
performance. This information, along with the ESS report (see 10.1.8, “Enterprise Disk
The report also shows the number of sequential I/O as SEQUENTIAL row and random I/O as
NORMAL row for read and write operations. These metrics can also help to analyze and
specify I/O bottlenecks.
Example 10-10 is the second part of the CACHE SUBSYSTEM ACTIVITY report, providing
measurements for each volume in the LCU. You can also see to which extent pool each
volume belongs.
------------------------------------------------------------------------------------------------------------------------------------
CACHE SUBSYSTEM DEVICE OVERVIEW
------------------------------------------------------------------------------------------------------------------------------------
VOLUME DEV XTNT % I/O ---CACHE HIT RATE-- ----DASD I/O RATE---- ASYNC TOTAL READ WRITE %
SERIAL NUM POOL I/O RATE READ DFW CFW STAGE DEL NVS OTHER RATE H/R H/R H/R READ
*ALL 100.0 7723 3869 3854 0.0 0.0 0.0 0.0 9323 1.000 1.000 1.000 50.1
AT8200 08200 0002 50.0 3858 3858 0.0 0.0 0.0 0.0 0.0 0.0 1.000 1.000 N/A 100.0
AT8201 08201 0002 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 N/A N/A N/A N/A
AT8202 08202 0002 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 N/A N/A N/A N/A
AT8203 08203 0002 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 N/A N/A N/A N/A
AT8230 08230 0002 0.7 55.1 0.1 55.1 0.0 0.0 0.0 0.0 55.1 1.000 1.000 1.000 0.1
AT8231 08231 0002 0.0 0.1 0.1 0.0 0.0 0.0 0.0 0.0 55.1 1.000 1.000 1.000 75.9
AT8232 08232 0002 0.7 55.1 0.1 55.1 0.0 0.0 0.0 0.0 55.1 1.000 1.000 1.000 0.1
AT8233 08233 0002 0.0 0.1 0.1 0.0 0.0 0.0 0.0 0.0 55.1 1.000 1.000 1.000 75.9
If you specify REPORTS(CACHE(DEVICE)) when running the cache report, you get the complete
report for each volume, as shown in Example 10-11. You get detailed cache statistics of each
volume. By specifying REPORTS(CACHE(SSID(nnnn))), you can limit this report to only certain
LCUs.
Important: Enterprise Storage Server data is not gathered by default. Make sure that
Enterprise Storage Server data is collected as processed, as described in 10.1.1, “RMF
Overview” on page 226.
DS8000 rank
ESS RANK STATISTICS provides measurements of back-end activity on the extent pool and
RAID array (rank) levels, such as OPS/SEC, BYTES/OP, BYTES/SEC, and RTIME/OP, for
read and write operations.
Example 10-12 shows rank statistics for a system with multi-rank extent pools, which contain
High Performance and High Capacity Flash Enclosures (i.e., an All Flash storage system with
multiple tiers of flash).
Note: This report also shows the relationship of ranks to DA pairs (ADAPT ID). With all
Flash DS8000 storage systems, the MIN RPM value will be N/A.
If I/O response elongation and rank saturation is suspected, it is important to check IOPS
(OPS/SEC) and throughput (BYTES/SEC) for both read and write rank activities. Also, you
must make sure what kind of workloads were run and how was the ratio of those workloads. If
your workload is of a random type, the IOPS is the significant figure. If it is more sequential,
the throughput is more significant.
The rank response times (RTIME) can be indicators for saturation. In a balanced and sized
system with growth potential, the read and write response times are in the order of 1 - 2 ms
for HPFE ranks. Ranks with response times reaching the range of 3 - 5 ms for HPFE are
approaching saturation.
Note: Both Spectrum Control and Storage Insights can also provide relevant information.
For more information see: 6.2, “Data from IBM Spectrum Control and Storage Insights” on
page 127 and 7.2.1, “IBM Spectrum Control and IBM Storage Insights Pro overview” on
page 141
For a definition of the HA port ID (SAID), see IBM DS8900F Architecture and Implementation,
SG24-8456.
The I/O INTENSITY is the result of multiplication of the operations per second and the
response time per operation. For FICON ports, it is calculated for both the read and write
operations, and for PPRC ports, it is calculated for both the send and receive operations. The
total I/O intensity is the sum of those two numbers on each port.
For FICON ports, the I/O intensity should be below 2000 for most of the time. With higher
values, the interface might start becoming ineffective. Much higher I/O intensities can affect
the response time, especially PEND and CONN components. With a value of 4000, it is
saturated. For more information, see the description of PEND and CONN times in “PEND
time” on page 232 and “CONN time” on page 233. This rule does not apply for PPRC ports,
especially if the distance between the primary and secondary sites is significant.
Note: IBM Spectrum Control performance monitoring also provides a calculated port
utilization value. For more information, see 7.2.1, “IBM Spectrum Control and IBM Storage
Insights Pro overview” on page 141.
If you must monitor or analyze statistics for specific ports over time, you can generate an
overview report and filter for the subject port IDs. Provide post processor control statements
like the ones in Example 10-14 and you get an overview report, as shown in Example 10-15
on page 242.
[Link]
The DDS exploiters of Monitor III performance data are, among others, z/OS Capacity
Provisioning, z/OSMF, or RMF PM. If you want to monitor systems in a sysplex, you must set
up a Distributed Data Server (DDS) host session on the system in the sysplex with the
highest RMF release. If you want to monitor several sysplexes, each one needs to have an
active DDS. DDS supports the request of performance data in a z/OS environment via HTTP
requests.
With current z/OS versions, RMF Data Portal for z/OS is an additional exploiter of the RMF
DDS data. The RMF Data Portal for z/OS panel is shown in Figure 10-4 on page 243. The
web browser based user interface allows access to a cross-sysplex RMF Monitor III and RMF
Postprocessor Reports from a single point of control.
To access the RMF Data Portal for z/OS, after performing the proper customizations, point
your browser to:
[Link]
At the top of the RMF Data Portal for z/OS page, under the Explore link, you gain access to
the RMF Monitor III Data, Resources, Metrics, and to the RMF Postprocessor Reports.
Selecting the Metrics link, under the RMF Monitor III Data, you have the option to view the
Full RMF Reports and individual metrics with their description and explanation.
Under the RMF Post Processor Reports panel, you can run any of the reports available
(CACHE, CHAN, CPU, CRYPTO, DEVICE, EADM, ENQ, ESS, FCD, HFS, IOQ, OMVS
PAGESP, PAGING, PCIE, SDELAY, XCF, CF, SDEVICE, WLMGL and OVW) by providing the
filter options to tailor the display output. An explanation about the format of data and values to
be entered in the option fields is shown when you mouse over them. Figure 10-13 on
page 256 shows a PCIE report obtained from the RMF Data Portal for z/OS explore option.
Note: For systems in a z/OS environment, the DDS GPMSERVE gathers data from RMF
instances running on the Sysplex members. The HTTP API of the DDS can also serve
requests for AIX and Linux performance data, which are directed against an active
GPM4CIM instance. RMF XP supports the following operating systems:
AIX on System p.
Linux on System z®.
Linux on System x.
For more information about RMF Distributed Data Server (DDS) refer to:
Performance data in a z/OS environment can be retrieved by sending HTTP requests to the
DDS server on the monitored z/OS Sysplex. Here is an example request for a certain
performance metric for a specific resource; it requests the response time of volume SYSLIB
on system SYSA:
[Link]
Application programs can exploit the extended DDS HTTP API by sending standard URL
requests for historical RMF Postprocessor data. An example request for the Postprocessor
CPU and CRYPTO reports looks similar to the following:
[Link]
04
In most installations, access to historical data is needed for in-depth performance analysis.
This allows keeping track of whether a critical situation has been persistent or not. The
existing HTTP API of the RMF Distributed Data Server (DDS) already provides sysplex-wide
access of the data collected by RMF Monitor III. As mentioned, this API grants instant and
easy access to RMF long-term historical data as reported by the RMF Postprocessor.
[Link]
In a Remote Mirror and Copy environment, you can use IBM Spectrum Control to monitor the
performance of the remote disk system if there is no z/OS system that accesses it to gather
RMF data.
In mixed (mainframe and distributed) environments, IBM Spectrum Control can also help to
gain a quick view of the complete loads which run on the DS8000.
For more information about IBM Spectrum Control, see 7.2.1, “IBM Spectrum Control and
IBM Storage Insights Pro overview” on page 141.
All of the above can be combined to develop a comprehensive overview of the replication
performance for a DS8000 storage system.
There are several products that are provided by other vendors that use RMF data to provide
performance analysis capabilities.
A z Systems workload is too complex to be estimated and described with only a few numbers
and general guidelines. Furthermore, the capabilities of the storage systems are not just the
sum of the capabilities of their components. Advanced technologies, such as zHyperLink and
zHyperWrite, can positively influence the complete solution's throughput and other functions,
such as point-in-time copies or remote replication, add additional workload.
Most of these factors can be accounted for by modeling the new solution with StorM or Disk
Magic. For z Systems, the modeling is based on RMF data and real workload characteristics.
You can compare the current to potential new configurations and consider growth (capacity
and workload) and the influence of Copy Services and other Z synergy features.
For more information about StorM and Disk Magic, see Chapter 6.1, “IBM Storage Modeller”
on page 126, which describes the following items:
Note: It is important, particularly for mainframe workloads, which often are I/O response
time sensitive, to observe the thresholds and limitations provided by the StorM and Disk
Magic model.
Easy Tier
Easy Tier is a performance enhancement to the DS8000 family of storage systems that helps
avoid issues that might occur if the back-end workload is not balanced across all the available
resources. For a description of Easy Tier, its capabilities, and how it works, see 1.2.4, “Easy
Tier” on page 11.
There are some IBM Redbooks publications that you can refer to if you need more information
about Easy Tier:
IBM DS8000 Easy Tier, REDP-4667-08
DS8870 Easy Tier Application, REDP-5014
IBM DS8870 Easy Tier Heat Map Transfer, REDP-5015
Easy Tier provides several capabilities that you can use to solve performance issues. Here
are some examples:
Manage skewed workload in multitier (hybrid) extent pools: If part of your back-end
storage is overutilized (hot), consider adding some more flash arrays to the affected pools
and let Easy Tier move the hot data to the faster resources.
Distribute workload across all resources within a storage tier: In both uniform or hybrid
extent pools, Easy Tier automatically and transparently moves data from heavily used
resources to less used ones within the same tier.
Give new workloads a good start: If you deploy a new application, you can manually
assign its storage space (volumes) to a specific tier by using the Easy Tier Application
feature. After the application starts and Easy Tier learns about its requirements, you can
switch it to automatic mode.
Add or remove resources: Using Easy Tier manual mode, you can transparently add new
resources (arrays) to an extent pool or remove them from it. Easy Tier automatically
redistributes the data for best performance so that you can optimize the use of the
back-end resources you have available.
The predominant feature of Easy Tier for a Hybrid storage system is to manage data
distribution in hybrid pools. For an all-flash system, the primary objective is to enable a
lower-cost configuration to perform equally to or better than a higher cost configuration. It is
important to know how much of each resource type is required to optimize performance while
keeping the cost as low as possible. Since the latency characteristics of different types of
Flash drives are very similar, the main purpose of Easy Tier is to avoid overloading the lower
tier especially with regards to large block and sequential writes.
Easy Tier is designed to provide a balanced and stable workload distribution. It aims at
minimizing the amount of data that must be moved as part of the optimization. To achieve this
task, it monitors data access permanently and establishes migration plans based on current
FICON channels
The StorM model established for your configuration indicates the number of host channels,
DS8000 HAs, and HA FICON ports necessary to meet the performance requirements. The
modeling results are valid only if the workload is evenly distributed across all available
resources. It is your responsibility to make sure this really is the case.
In addition, there are certain conditions and constraints that you must consider:
IBM mainframe systems support a maximum of eight channel paths to each logical control
unit (LCU). A channel path is a connection from a host channel to a DS8000 HA port.
A FICON channel port can be shared between several z/OS images and can access
several DS8000 HA ports.
A DS8000 HA port can be accessed by several z/OS images or even multiple z Systems
servers.
A FICON host channel or DS8000 HA port has specific throughput capabilities that are not
necessarily equivalent to the link speed they support.
A z Systems FICON feature or DS8000 HA card has specific throughput capabilities that
might be less than the sum of all individual ports on this card. This is especially true for the
DS8000 8-port HA cards.
The following sections provide some examples that can help to select the best connection
topology.
The simplest case is that you must connect one host to one storage system. If the StorM
model indicates that you need eight or less host channels and DS8000 host ports, you can
use a one-to-one connection scheme, as shown in Figure 10-5.
z Systems host
. . . FICON Channel Ports
. . .
FICON Fabric
. . .
FICON HA Ports
DS8000
To simplify the figure, it shows only one fabric “cloud”, where you normally have at least two
for redundancy reasons. The orange lines that here directly connect one port to another via
the fabric. It can range from a direct connection without any switch components to a cascaded
FICON configuration.
z Systems host
. . . . . . FICON Channel Ports
FICON Fabric
. . . . . .
FICON HA Ports
A B C X Y Z DS8000 LCUs
DS8000
Figure 10-6 Split LCUs into groups and assign them to connections
Therefore, you can have up to eight connections and host and storage ports in each group. It
is your responsibility to define the LCU split in a way that each group gets the amount of
workload that the assigned number of connections can sustain.
Note: StorM models can be created on several levels. To determine the best LCU split, you
might need LCU level modeling.
Therefore, in many environments, more than one host system shares the data and accesses
the same storage system. If eight or less storage ports are sufficient according to your StorM
model, you can implement a configuration as shown in Figure 10-7, where the storage ports
are shared between the host ports.
. . .
FICON Fabric
. . .
FICON HA Ports
DS8000
Figure 10-7 Several hosts accessing the same storage system - sharing all ports
. . .
FICON Fabric
. . . . . .
FICON HA Ports
DS8000
Figure 10-8 Several hosts accessing the same storage system - distributed ports
As you split storage system resources between the host systems, you must ensure that you
assign a sufficient number of ports to each host to sustain its workload.
Figure 10-9 shows another variation, where more than one storage system is connected to a
host. In this example, all storage system ports share the host ports. This works well if the
StorM model does not indicate that more than eight host ports are required for the workload.
z Systems host
. . . FICON Channel Ports
FICON Fabric
. . . . . . FICON HA Ports
DS8000 DS8000
z Systems host
FICON Fabric
. . . . . . FICON HA Ports
DS8000 DS8000
Another way of splitting HA ports is by LCU, in a similar fashion as shown in Figure 10-6 on
page 248. As in the earlier examples with split resources, you must make sure that you
provide enough host ports for each storage system to sustain the workload to which it is
subject.
Mixed workload: A DS8000 HA has several ports. You can set each individual port to run
FICON or FCP topology. There is nothing wrong with using an HA for FICON, FCP, and
remote replication connectivity concurrently. However, if you need the highest possible
throughput or lowest possible response time for a given topology, consider isolating this
topology on separate HAs.
Remote replication: To optimize the HA throughput if you have remote replication active,
consider not sharing HA ports for the following items:
Synchronous and asynchronous remote replication
For FCP host I/O and remote replication
zHPF Extended Distance II: With DS8000 and IBM z Systems current models, zHPF is
further improved to deliver better response times for long write operations, as used, for
example, by DB2 utilities, at greater distances. It reduces the required round trips on the
FICON link. This benefits environments that are IBM HyperSwap enabled and where the
auxiliary storage system is further away.
zHyperLink: Introduced with IBM z14, zHyperLink provides a short distance direct
connection of up to 150 meters (492 feet) to the DS8000. zHyperLink introduced a new
synchronous I/O paradigm that eliminates z/OS dispatcher delays, I/O interrupt
processing, and the time needed to reload the processor caches that occurs after
regaining control of the processor when I/O completes. See “IBM zHyperLink” on
page 254.
Logical configuration
This section provides z Systems specific considerations for the logical configuration of a
DS8000 storage system. For a detailed and host system independent description of this topic,
see Chapter 4, “Logical configuration performance considerations” on page 59.
Note: The DS8000 architecture is symmetrical, based on the two CPCs. Many resources,
like cache, device adapters (DAs), and RAID arrays, become associated to a CPC. When
defining a logical configuration, you assign each array to one of the CPCs. Make sure to
spread them evenly, not only by count, but also by their capabilities. The I/O workload must
also be distributed across the CPCs as evenly as possible.
The preferred way to achieve this situation is to create a symmetrical logical configuration.
A fundamental question that you must consider is whether there is any special workload that
must be isolated, either because it has high performance needs or because it is of low
importance and should never influence other workloads.
The major groups of resources in a DS8000 storage system are the storage resources, such
as RAID arrays and DAs, and the connectivity resources, such as HAs. The traditional
approach in mainframe environments to assign storage resources is to divide them into the
smallest possible entity (single rank extent pool) and distribute the workload either manually
or managed by the Workload Manager (WLM) and System Managed Storage (SMS) as
evenly as possible. This approach still has its advantages:
RMF data provides granular results, which can be linked directly to a resource.
If you detect a resource contention, you can use host system tools to fix it, for example, by
moving a data set to a different volume in a different pool.
It is easy to detect applications or workloads that cause contention on a resource.
Isolation of critical applications is easy.
On the contrary, this approach comes with some significant disadvantages, especially with
modern storage systems that support automated tiering and autonomous balancing of
resources:
Statically assigned resources might be over- or underutilized for various reasons:
– Monitoring is infrequent and only based on events or issues
– Too many or too few resources are allocated to certain workloads
– Workloads change without resources being adapted
All rebalancing actions can be performed only on a volume level
Modern automatic workload balancing methods cannot be used:
– Storage pool striping
– Easy Tier automatic tiering
– Easy Tier intra-tier rebalancing
For the host connectivity resources (HAs and FICON links), similar considerations apply. You
can share FICON connections by defining them equally for all accessed LCUs in the
z Systems I/O definitions. That way, the z Systems I/O subsystem takes care of balancing the
load over all available connections. If there is a need to isolate a certain workload, you can
define specific paths for their LCUs and volumes. This is a normal requirement if the storage
system is used in a multi-tenancy environment.
Volume sizes
The DS8000 storage system now supports CKD logical volumes of any size 1 - 1182006
cylinders, which is 1062 times the capacity of a 3390-1 (1113 cylinders).
Note: The DS8000 storage system provides options for allocating storage with a
granularity of:
Large extents which is the equivalent of 1113 cylinders.
Small extents which are 21 cylinders.
When planning the CKD volume configuration and sizes, a key factor to consider is the limited
number of devices a z/OS system can address within one Subchannel Set (65,535). You must
define volumes with enough capacity so that you satisfy you storage requirements within this
supported address range, including room for PAV aliases and future growth.
Apart from saving device addresses, using large volumes brings additional benefits:
Simplified storage administration
Reduced number of X37 abends and allocation failures because of larger pools of free
space
Reduced number of multivolume data sets to manage
One large volume performs the same as though you allocated the same capacity in several
smaller ones, if you use the DS8000 built-in features to distribute and balance the workload
across resources. There are two factors to consider to avoid potential I/O bottlenecks when
using large volumes:
Use HyperPAV/SuperPAV to reduce IOSQ.
With equal I/O density (I/Os per GB), the larger a volume, the more I/Os it gets. To avoid
excessive queuing, the use of PAV is of key importance. With HyperPAV, you can reduce
the total number of alias addresses because they are assigned automatically as needed.
SuperPAV goes one step further by using aliases among "like" (=same DS8000 cluster
affinity) control units. For more information about the performance implications of PAV, see
“Parallel Access Volumes” on page 253.
PAV is implemented by defining alias addresses to the conventional base address. The alias
address provides the mechanism for z/OS to initiate parallel I/O to a volume. An alias is
another address/UCB that can be used to access the volume that is defined on the base
address. An alias can be associated with a base address that is defined in the same LCU
only. The maximum number of addresses that you can define in an LCU is 256. Theoretically,
you can define one base address plus 255 aliases in an LCU.
Aliases are initially defined to be associated to a specific base address. In a traditional static
PAV environment, the alias is always associated with the same base address, which requires
many aliases and manual tuning.
In Dynamic PAV or HyperPAV environments, an alias can be reassigned to any base address
as your needs dictate. Therefore, you need less aliases and no manual tuning.
With dynamic PAV, the z/OS WLM takes care of the alias assignment. Therefore, it
determines the need for additional aliases at fixed time intervals and adapts to workload
changes rather slowly.
The more modern approach of HyperPAV assigns aliases in real-time, based on outstanding
I/O requests to a volume. The function is performed by the I/O subsystem with the storage
system. HyperPAV reacts immediately to changes. With HyperPAV, you achieve better
average response times and higher total throughput. Today, there is no technical reason
anymore to use either static or dynamic PAV.
With the introduction of SuperPAV (an extension of HyperPAV), the pool of aliases has
increased since aliases of “like” control units can be used to drastically reduce IOS queue
time (IOSQ). The word “like” means that the even LCUs go with other even LCUs, and the odd
LCUs go with other odd LCUs. Therefore all aliases in the even LCUs will be pooled together
and can be used by any even LCU when required. The same paradigm will apply to the odd
LCUs.
You can check the usage of alias addresses by using RMF data. Chapter 10.1.4, “I/O
Queuing Activity report” on page 234 provides an overview of SuperPAV and also provides
sample reports showing SuperPAV reporting. You can use such reports to determine whether
you assigned enough alias addresses for an LCU.
Number of aliases: With HyperPAV/SuperPAV, you need fewer aliases than with the older
PAV algorithms. Assigning 32 aliases per LCU is a good starting point for most workloads.
It is a preferred practice to leave a certain number of device addresses in an LCU initially
unassigned in case it turns out that a higher number of aliases is required.
IBM zHyperwrite
IBM zHyperWrite is a technology that is provided by the DS8000 storage system, and used by
DFSMS to accelerate Db2 and IMS log writes in a HyperSwap enabled Metro Mirror
environment. The environment can be managed by GDPS or CSM. zHyperWrite is also
supported in Multi-Target Metro Mirror (MTMM) environments and can be used in each
relationship independently, depending on the state of the relationship.
When an application sends a write I/O request to a volume that is in synchronous data
replication, the response time is affected by the latency caused by the replication
management. The distance between the source and the target disk control units also adds to
the latency.
With zHyperWrite, Database Log writes for Db2 and IMS are not replicated by PPRC but with
the help of Media Manager (DHFSMS), the writes are written to the primary and secondary
volumes simultaneously by the host itself. The application, DFSMS, the I/O subsystem, and
the DS8000 storage system are closely coordinating the process to maintain data
consistency.
Note: At the time of writing, only Db2 and IMS uses zHyperwrite for log writes.
IBM zHyperLink
zHyperLink is a ultra-low latency, synchronous I/O interface which is point-to-point connected
(up to a maximum cable length of 150 meters or 492 feet) to an IBM z System server (z14
onwards). This feature whilst representing a substantial leap over zHPF, it does not replace
FICON. Instead, zHyperLink works with FICON and zHPF to bring about significant
application latency reductions.
Low latencies are provided for read and write operations for storage systems by using a
point-to-point link from IBM Z to the storage system’s I/O bay. Low I/O latencies deliver value
through improved workload elapsed times and faster transactional response times. The
DS8000 implementation of zHyperLink I/O delivers service times fast enough to enable a
synchronous I/O model in high performance IBM Z Servers. zHyperLink speeds Db2 for z/OS
transaction processing and improve logs throughput.
Synchronous I/O, in a nutshell, means that the entire path handling an I/O request stays
within the process context that initiated the I/O. When synchronous I/O is performed, the CPU
waits or “spins” until the I/O is completed, or the time-out value is reached. zHyperLink can
significantly reduce the time that is required to complete the I/O because the dispatching,
interrupt handling, CPU queue time, and CPU cache reload activities are no longer necessary
zHyperLink is fast enough the CPU can just wait for the data:
No Un-dispatch of the running task.
No CPU Queueing Delays to resume it.
No host CPU cache disruption.
Very small I/O service time.
Operating System and Middleware (e.g. Db2) are changed to keep running over an I/O.
For more information about zHyperLink, see Getting Started with IBM zHyperLink for z/OS,
REDP-5493.
Figure 10-11 RMF Postprocessor Device Activity Report with I/O devices performing Synchronous I/O
Synchronous I/O Device Activity report is available if at least one DASD device actively
performed synchronous I/O requests using IBM zHyperLink. Detailed synchronous I/O
performance statistics for the marked DASD devices are displayed in the Synchronous I/O
Device Activity section of the report, as shown in Figure 10-12 on page 256. For easy
comparison this report lists the devices’ activity rate and average response time for
asynchronous I/O requests adjacent to the appropriate measurements for synchronous I/O
read and write requests and provides additional rates and percentages related to
synchronous I/O processing. A device with synchronous I/O activity may be mapped back to
the synchronous I/O link by which it is reached by looking up the serial number and node
descriptor information of the device’s storage controller in the RMF Cache Subsystem Device
Overview report.
RMF collects and reports new zHyperlink synchronous I/O statistics in following reports:
1. PCIE Activity Report. Figure 10-13
– The RMF Monitor III PCIE Activity Report is generated using SMF record type 74.9.
2. Synchronous I/O Link Activity. Figure 10-14 on page 257
– The synchronous I/O statistics are stored in SMF 74-1 and SMF 74-9.
3. Synchronous I/O Response Time Distribution. Figure 10-15 on page 257
Figure 10-13 shows the output of a PCIE report extracted using RMF Data Portal for z/OS.
For more information see, “RMF Data Portal for z/OS” on page 242.
Note: Values on CPC level are only reported if Global Performance Reporting is
enabled in the LPAR image profile of the Hardware Management Console (HMC).
Other buckets: Percentage of I/Os with a response time less than n microseconds and greater
or equal to the prior bucket time limit.
Example: % Read <30 msec = 17.5 means that 17.5 percent of the read I/Os had a response
time of more than or equal to 20 microseconds but less than 30 microseconds. Refer to
Figure 10-15.
The RMF Postprocessor ESS Report is collected by RMF Monitor I and reported in the ESS
Synchronous I/O Link Statistics report section of RMF Postprocessor Enterprise Disk
Systems (ESS) report, provides new performance metrics for the IBM zHyperLink
connections to the DS8000. Figure 10-16 on page 258 shows a Hardware (DS8000) view of
synchronous I/O activity over the zHyperLink.
The RMF Postprocessor Cache Subsystem Activity (CACHE) report is enhanced to provide
performance metrics for synchronous I/O over IBM zHyperLink. Synchronous I/O
performance statistics are displayed in the Cache Subsystem Overview and Cache Device
Activity sections of the report as shown in Figure 10-17. A set of new OVW conditions was
introduced which can be used to report the new synchronous I/O metrics in SMF 74-5 and
SMF 74-8 records.
Figure 10-17 Cache Subsystem Activity / Cache Subsystem Overview showing Synch I/O Activity
This is only an introduction. It cannot replace a thorough analysis by IBM Technical Support in
more complex situations or if there are product issues.
Being responsible for the I/O part of your infrastructure, you must convert such general
statements to I/O terms and discover whether the problem is I/O-related. You need more
information from several instances. First, get as much detail as possible about how the issue
appears. Ask the client or user who reports the problem:
How is the issue perceived?
At which times does it occur?
Is it reproducible?
Does it show up under specific circumstances?
What kind of workload is running at the time?
Was there already any analysis that links the issue to I/O? If yes, get the details.
Was anything changed, either before or after the issue started to appear?
Next, gather performance data from the system. For I/O related investigations, use RMF data.
For a description about how to collect, process, and interpret RMF data, see 10.1, “DS8000
performance monitoring with RMF” on page 226. There might be many data that you must
analyze. The faster you can isolate the issue up front, both from a time and a device point of
view, the more selective your RMF analysis can be.
For more information about these other tools, see 10.1.9, “Alternatives and supplements to
RMF” on page 242.
To match physical resources to logical devices, you also need the exact logical configuration
of the affected DS8000 storage systems, and the I/O definition of the host systems you are
analyzing.
The following sections point you to some key metrics in the RMF reports, which might help
you isolate the cause of a performance issue.
The RMF Summary Report gives an overview over the system load. From an I/O perspective,
it provides only LPAR-wide I/O rate and average DASD response time. Use it to get an overall
impression and discover periods of peak I/O load or response time. See whether these
periods match the times for which the performance problems are reported.
Attention: Because the summary report provides a high-level overview, there might be
issues with individual components that are not directly visible here.
After you isolate a certain time and a set of volumes that are conspicuous, you can analyze
further. Discover which of the response time components are higher than usual. A description
If you also need this information on a Sysplex scope, create the Shared Direct Access Device
Activity report. It provides a similar set of measurements for each volume by LPAR and also
summarized for the complete Sysplex.
Important: Most of the time, devices with no or almost no activity should not be
considered. Their response time values are not relevant and might be inaccurate. This rule
can also be applied to all other reports and measurements.
The exception is when the issue is outbound to the storage system such as on the
SAN/switching layer, then the Device utilization can show up as very low or no activity. In
this case, the SAN/switching layer needs to be interrogated to identify the cause for low
utilization. Port Statistics on the switch will provide clues if error counters are incrementing
at a substantial rate over a short period of time.
You specifically might want to check the volumes you identified in the previous section and
see whether they have the following characteristics:
They have a high write ratio.
They show a DELAYED DUE TO NVS rate greater than 0, which indicates that you are
running into an NVS Full condition.
Use the Enterprise Storage Server Link Statistics to analyze the throughput of the DS8000
HA ports. Pay particular attention to those that have higher response time than others. Also,
use the I/O intensity value to determine whether a link might be close to its limitations. All HA
ports are listed here. You can also analyze remote replication and Open Systems workload.
The Enterprise Storage Server Rank Statistics show the back-end I/O load for each of the
installed RAID arrays. Again, look for those that stand out, either with a high load, or much
higher response time.
The second part of the report provides queuing details about an LCU and channel path level.
You can see whether I/Os are delayed on their way to the device. Check for LCUs / paths that
have the following features:
Higher average control unit busy delay (AVG CUB DLY), which can mean that devices are
in use or reserved by another system.
Higher average command response delay (AVG CMR DLY), which can indicate a
saturation of certain DS8000 resources, such as HA, internal bus, or processor.
Nonzero HPAV wait times and HPAV max values in the order of the number of defined
alias addresses, which can indicate that the number of alias addresses is not sufficient.
In many cases, your analysis shows that one or more resources either on the host system or
DS8000 storage system are saturated or overloaded. The first thing to check is whether your
storage system is configured to use all available features that improve performance or
automate resource balancing.
If these features do not solve the issue, consider the following actions:
Distribute the workload further over additional existing resources with less utilization.
Add more resources of the same type, if there is room.
Exchange the existing, saturated resources for different ones (other or newer technology)
with higher capabilities.
If you isolated applications (for example, by using their own set of HPFE ranks) but still
experience poor response times, check the following items:
Are the dedicated resources saturated? If yes, you can add more resources, or consider
switching to a shared resource model.
Is the application doing something that the dedicated resources are not suited for (for
example, mostly sequential read and write operations on HPFE ranks)? If yes, consider
changing the resource type, or again, switching to a shared model.
Does the contention come from other resources that are not dedicated, such as HA ports
in our example with dedicated HPFE ranks? Here, you can consider increasing the
isolation by dedicating host ports to the application as well.
If you are running in a resource sharing model and find that your overall I/O performance is
good, but there is one critical application that suffers from poor response times, you can
consider moving to an isolation model, and dedicate certain resources to this application. If
the issue is limited to the back end, another solution might be to use advanced functions:
Changes in DHSMS/DFSMSHSM/WLM to adjust/meet performance goals.
Easy Tier Application to manually assign certain data to a specific storage tier
If you cannot identify a saturated resource, but still have an application that experiences
insufficient throughput, it might not use the I/O stack optimally. For example, modern storage
systems can process many I/Os in parallel. If an application does not use this capability and
serializes all I/Os, it might not get the required throughput, although the response times of
individual I/Os are good.
You can obtain additional information about IBM Db2 and IMS at these websites:
[Link]
[Link]
Db2 and IMS data sets can be allocated in the extended addressing space (EAS) of a
extended address volume (EAV) to significantly reduce addressing footprint and ongoing
management.
Note: The maximum amount of data that you can store in a single Db2 table space or
index space is the same for extended and non-extended address volumes. The same Db2
data sets might use more space on extended address volumes than on non-extended
address volumes because space allocations in the extended area are multiples of 21
cylinders on extended address volumes. For more information, see:
[Link]
orage
[Link]
orage
For more information about IOSQ, see: “IOSQ time” on page 232.
For more information about HyperPAV/SuperPAV and enablement, see: IBM DS8000 and IBM
Z Synergy, REDP-5186-05
The HPFEs in the DS8900F storage family deliver performance to such an extent that it leads
to more stress on the channel subsystem on z Systems. zHPF in combination with flash
provides an effective way to reduce this channel subsystem stress. zHPF should be enabled
in the environment.
Latest versions of Db2 (Db2 v12) combined with zHPF and current FICON hardware (FICON
Express cards 16S/16S+/16SA) deliver improvements to the following Db2 functions:
Db2 queries
Table scans
Index-to-data access, especially when the index cluster ratio is low
Index scans, especially when the index is disorganized
Reads of fragmented large objects (LOBs)
New extent allocation during inserts
Db2 REORG
Sequential reads
Writes to the shadow objects
Reads from a non-partitioned index
Log applies
Db2 LOAD and REBUILD
Db2 Incremental COPY
RECOVER and RESTORE
Db2 RUNSTATS table sampling
In addition, Db2 can benefit from the new caching algorithm at the DS8900F level, which is
called List Prefetch Optimizer (LPO).
For more information about list prefetch, see DB2 for z/OS and List Prefetch, REDP-4862.
Now the DS8900F can be notified through the Db2 Media Manager that the multiple I/Os in a
castout can be treated as a single logical I/O even though there are multiple embedded I/Os.
In other words, the data hardening requirement is for the entire I/O chain. This enhancement
brings significant response time reduction.
For more information see IMS High Performance Image Copy User’s Guide, SC19-2756-04.
With zHyperWrite Db2 log writes are performed to the primary and secondary volumes in
parallel, which reduces Db2 log write response times. Implementation of zHyperWrite
requires that HyperSwap is enabled through either IBM Geographically Dispersed Parallel
Sysplex® (IBM GDPS) or CSM. zHyperWrite combines Metro Mirror (PPRC) synchronous
replication and software mirroring through media manager (DFSMS) to provide substantial
improvements in Db2 log write latency.
For more information about zHyperWrite (z/OS and Db2 enablement and display), see:
[Link]
[Link]
[Link]
ation-status-ioshyperwrite
Furthermore, IMS v15.1 introduced the use of DFSMS Media Manager to write data to the
WADS and OLDS. This change enables IMS to use important I/O features, such as zHPF,
which increases I/O throughput, and zHyperWrite, which reduces latency time for
synchronous replication.
The use of zHPF and zHyperWrite can be specially useful for data sets with high write rates,
such as WADS and OLDS, which increases logging speed. Service times for WADS and
OLDS can be reduced by up to 50%, depending on the environment setup.
For more information about zHyperWrite enablement, WADS and OLDS support for
zHyperWrite, including migration considerations, see:
[Link]
[Link]
[Link]
ment
Note: At the time of writing, zHyperLink provides writes support for Metro Mirror (MM),
Multi-Target Metro Mirror (MTMM), Metro Global Mirror (MGM) and Global Mirror (GM)
replication flavors.
For zHyperLink write support in MM, MTMM and MGM, zHyperWrite is also required on
MM leg.
The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this book.
IBM Redbooks
The following IBM Redbooks publications provide additional information about the topic in this
document. Note that some publications referenced in this list might be available in softcopy
only.
IBM DS8900F Architecture and Implementation, SG24-8456
IBM DS8000 Copy Services, SG24-8367
IBM DS8000 Easy Tier, REDP-4667
IBM DS8000 and IBM Z Synergy, REDP-5186
IBM DS8000 High-Performance Flash Enclosure Gen2, REDP-5422
Getting Started with IBM zHyperLink for z/OS, REDP-5439
FICON Native Implementation and Reference Guide, SG24-6266
IBM Storage Modeller Guidance, ZG24-8401
You can search for, view, download or order these documents and other Redbooks,
Redpapers, Web Docs, draft and additional materials, at the following website:
[Link]/redbooks
Other publications
These publications are also relevant as further information sources:
H.A. Fetene, N. Clayton, T. Camacho, D.V. Valverde, S.E. Williams, Y. Xu: IBM System
Storage DS8900F Performance Whitepaper, DS8900F Release R9.0, WP102814
C. Gordon: DS8000 Host Adapter Configuration Guidelines,
[Link]
Online resources
These websites are also relevant as further information sources:
IBM Documentation Center
[Link]
IBM System Storage DS8000 Host Systems Attachment Guide, SC26-7917
[Link]
IBM Storage Modeller Help
[Link]
SG24-8501-00
ISBN 0738460192
Printed in U.S.A.
®
[Link]/redbooks