Hitachi Design Doc High Availability Disaster Recovery Solution Using Oracle Data Guard
Hitachi Design Doc High Availability Disaster Recovery Solution Using Oracle Data Guard
Feedback
Hitachi Data Systems welcomes your feedback. Please share your thoughts by
sending an email message to SolutionLab@hds.com. To assist the routing of this
message, use the paper number in the subject and the title of this white paper in
the text.
Table of Contents
Solution Overview ............................................................................................ 3
Solution Components ...................................................................................... 5
Hitachi Compute Blade 2000 .................................................................. 7
Hitachi Unified Storage 150 .................................................................... 7
Hitachi Dynamic Provisioning ................................................................. 8
Hitachi Dynamic Link Manager Advanced ............................................. 9
Hitachi Storage Navigator Modular 2 ..................................................... 9
Oracle Data Guard ............................................................................... 10
Fusion-io ioDrive PCIe Flash Card ....................................................... 10
Solution Design ............................................................................................... 11
High-level Infrastructure ........................................................................
Storage Architecture .............................................................................
Server and Application Architecture .....................................................
SAN Architecture ..................................................................................
Network Architecture ...........................................................................
11
14
24
28
29
1
1
Fusion-io ioDrive
2
2
Note This testing was done in a lab environment. Many things affect production
environments beyond prediction or duplication in a lab environment. Follow
recommended practice by conducting proof-of-concept testing for acceptable results
before solution implementation in your production environment. This means to test
applications in a non-production, isolated test environment that otherwise matches your
production environment.
3
3
Solution Overview
This reference architecture guide illustrates zero RPO and sub-minute RTO
solution in the event of a disaster or an unplanned failure of the production server,
database, or storage at the local site using Oracle Data Guard 11g R2 on the
Hitachi Unified Compute Platform Select for Oracle Database reference solution.
This reference architecture describes the following:
Using Oracle Data Guard 11g R2 on Hitachi Unified Storage 150 (local site)
and Hitachi Adaptable Modular Storage 2500 (remote site)
Replicating the data from a primary database instance at the local site to a
secondary instance at a remote site
Preparing the copy for access by a secondary database instance, also known
as the standby database instance, at the remote site in case the remote site is
activated in the event of a disaster or an unplanned fail over
Note This reference architecture uses Hitachi Adaptable Modular Storage 2500 at
the remote site. You may choose to use Hitachi Unified Storage 150 at the remote site.
Hitachi Adaptable Modular Storage 2500 (may use Hitachi Unified Storage
150, instead)
SAN infrastructure
4
4
Figure 1 shows the infrastructure used to validate this solution in the Hitachi Data
Systems lab.
Figure 1
5
5
Solution Components
Table 1 and Table 2 have the hardware and software components used in this
reference architecture.
Table 1. Reference Architecture Hardware Components
Component
Description
Version
Server Chassis
Firmware
1
Version
A0195-C-6443
Chassis at the
local site
Firmware
1
Version
A0195-C-6443
Chassis at the
remote site
Quantity
Role
SAN
Connectivity
V 642b
Server Blade
EFI BIOS
Version
4.6.3.7
Database server at
the local site
Database server at
the remote site
6
6
Table 1. Reference Architecture Hardware Components
Component
Description
Storage System
Version
Symmetric
Multiprocessing
(SMP)
Role
Storage array at
the local site
Storage array at
the remote site
Quantity
08B7/G-Z
SMP connector at
the local site
SMP connector at
the remote site
Note This reference architecture uses X57-A1 (GVAX57A1) server blades. You may
upgrade to X57-A2 server blades using Intel Xeon Processor E7-8870. In addition, you
may upgrade the 4 GB DIMMs to 8 GB DIMMs at the local site. The four Fusion-io
ioDrive 320 GB MLC PCIe Flash Cards for the database server at the local site are
placed in the Hitachi Compute Blade PCIe Slots 0, 1, 2 and 3. The two dual port PCIe
Fibre Channel card for the database server at the remote site are placed in the Hitachi
Compute Blade PCIe Slot 0 and 2.
7
7
Table 2. Reference Architecture Software Components
Description
Component
Version
Operating system
Database software
Oracle
7.1.1
21.10 Beta 2
6.6.2-01
Microcode dependent
Replication software
Database client communication
software
Note This reference architecture uses Oracle Linux version 5, Update 5, and Oracle
Database 11g R2. Instead of using Oracle Linux version 5, Update 5, you may use Red
Hat Linux Version 5, Update 5, or later.
Configuration flexibility
Fast server failure recovery using a N+1 cold standby design that allows
replacing failed servers within minutes
8
8
The performance provided by Hitachi Unified Storage is reliable, scalable, and
available for block and file data. Unified Storage is simple to manage, optimized
for critical business applications, and efficient.
Using Unified Storage requires a smaller capital investment. Deploy this storage,
which grows to meet expanding requirements and service level agreements, for
critical business applications. Simplify your operations with integrated set-up and
management for a quicker time to value.
Unified Storage enables extensive cost savings through file and block
consolidation. Build a cloud infrastructure at your own pace to deliver your
services.
Hitachi Unified Storage 150 provides reliable, flexible, scalable, and cost-effective
modular storage. Its symmetric active-active controllers provide input-output load
balancing that is integrated, automated, and hardware-based.
Both controllers in Unified Storage 150 dynamically and automatically assign the
access paths from the controller to a logical unit (LU). All LUs are accessible,
regardless of the physical port or the server that requests access.
Use Oracle Enterprise Manager Grid Control System Monitoring Plug-in for
Hitachi Storage to monitor your storage infrastructure from within Oracle
Enterprise Manager Grid Control. This gives you real-time visibility to utilization,
availability, and performance metrics.
Optimize your hardware infrastructure for Oracle database applications using the
following on Unified Storage 150:
LUN configuration
Database performance
9
9
Using Dynamic Provisioning is like using a host-based logical volume manager
(LVM), but without incurring host processing overhead. It provides one or more
wide-striping pools across many RAID groups. Each pool has one or more
dynamic provisioning virtual volumes (DP-VOLs) of a logical size you specify of up
to 60 TB created against it without allocating any physical space initially.
Deploying Dynamic Provisioning avoids the routine issue of hot spots that occur
on logical devices (LDEVs). These occur within individual RAID groups when the
host workload exceeds the IOPS or throughput capacity of that RAID group.
Dynamic provisioning distributes the host workload across many RAID groups,
which provides a smoothing effect that dramatically reduces hot spots.
When used with Hitachi Unified Storage, Hitachi Dynamic Provisioning has the
benefit of thin provisioning. Physical space assignment from the pool to the
dynamic provisioning volume happens as needed using 1 GB chunks, up to the
logical size specified for each dynamic provisioning volume. There can be a
dynamic expansion or reduction of pool capacity without disruption or downtime.
You can rebalance an expanded pool across the current and newly added RAID
groups for an even striping of the data and the workload.
Used for SAN multipathing, the Hitachi Dynamic Link Manager Advanced
configuration for this solution uses its extended round-robin multipathing policy.
This policy automatically selects a path by rotating through all available paths.
This balances the load across all available paths and optimizing IOPS and
response time.
This solution uses Hitachi Dynamic Link Manager Advanced on the local and
remote database servers.
RAID-level configurations
Performance metrics
10
10
You need Storage Navigator Modular 2 to take advantage of the full features on
Hitachi Unified Storage 150.
11
11
Solution Design
This solution includes Oracle clients that access the database at the local site,
which to store data in and access data from a Hitachi Unified Storage 150 storage
system.
If there is system failure at the local site, these Oracle clients can access the
Oracle physical standby database at the remote site, to store data and access
data from the Hitachi Adaptable Modular Storage 2500 storage system at the
remote site.
The architecture uses the synchronous software replication model in an Oracle
Data Guard 11g R2 environment for zero data loss.
High-level Infrastructure
The description of the hardware components used by Hitachi Data Systems for
testing is in Solution Components on page 5. Specific infrastructure
configuration includes the following:
ServerAt the local and remote site, there is one server node consisting of
two X57-A1 server blades connected using an SMP connector for one Oracle
Database server.
Storage SystemAt the local and remote site, there are LDEVs mapped to
each port presented to the server as LUNs.
SAN FabricAt the local site, there are two zones created on each switch
module to zone the two mezzanine Fibre Channel ports on each server blade
and the four storage host ports. At the remote site, the PCIe Fibre Channel
dual ports on each server blade connect to the four storage host ports directly.
12
12
Figure 2 illustrates the reference architecture configuration at the local site.
Figure 2
13
13
Figure 3 illustrates the reference architecture configuration at the remote site.
Figure 3
14
14
Storage Architecture
This describes the storage architecture of this reference architecture. It takes into
consideration Hitachi Data Systems and Oracle recommended practices for the
deployment of database storage design.
Storage Configuration
At the local site, this reference architecture uses RAID groups and Hitachi
Dynamic Provisioning storage pools on Hitachi Unified Storage 150. The same
configuration is on the remote site, but using Hitachi Adaptable Modular Storage
2500.
Figure 4 on page 15 shows the RAID groups, dynamic provisioning pools, and
host groups for the architecture at the local site.
Figure 5 on page 16 shows the RAID groups, dynamic provisioning pools, and
host groups for the architecture at the remote site.
15
15
Figure 4
16
16
Figure 5
17
17
Table 3 has the details of the RAID groups for Oracle online redo logs, archived
redo logs, standby redo logs, and flash recovery area at the local site.
Table 3. Local Site RAID Configuration
RAID
Group
Purpose
RAID Level
Drive Type
No of
Drives
Capacity
(GB)
002
RAID-10 (2D+2D)
1000
003
RAID-10 (2D+2D)
1000
004
RAID-10 (2D+2D)
1000
100
RAID-10 (2D+2D)
1000
199
2600
Table 4 has the details for the LUNs created in these RAID groups at the local
site.
Table 4. Local Site LUNs
RAID Group
LUNs
Purpose
Storage Port
002
0003
50
002
0004
50
002
0005
50
002
0006
50
003
0007
50
003
0008
50
003
0009
50
003
0010
50
004
0011
300
004
0012
300
004
0013
300
100
0100
50
199
0199
2200
18
18
Table 5 has the details of the RAID groups for Oracle online redo logs, archived
redo logs, and standby redo logs at the remote site.
Table 5. Remote Site RAID Configuration
RAID
Group
Purpose
RAID Level
Drive Type
No of
Drives
Capacity
(GB)
002
RAID-10 (2D+2D)
1000
003
RAID-10 (2D+2D)
1000
004
RAID-10 (2D+2D)
1000
005
RAID-10 (2D+2D)
1000
050
2600
Table 6 has the details for the LUNs created in these RAID groups at the remote
site.
Table 6. Remote Site LUNs
RAID Group
LUNs
Purpose
Storage Port
002
0003
50
002
0004
50
002
0005
50
002
0006
50
003
0007
50
003
0008
50
003
0009
50
003
0010
50
004
0011
300
004
0012
300
004
0013
300
005
0041
50
050
0050
2200
19
19
Table 7 has the details for the dynamic provisioning pool at the local site.
Table 7. Local Site Hitachi Dynamic Provisioning Pool
001
005 014
RAID Level
RAID-10 (4D+4D)
Drive Type
Number of Drive
80
Pool Capacity
20.8 TB
Table 8 has the details for the LUNs created from the Hitachi Dynamic
Provisioning pool at the local site.
Dynamic
Provisioning
Pool ID
LUNs
Purpose
Storage Port
001
0014 0020
200
0A
1A
0B
001
0021 0025
200
Oracle System
Sysaux
1A
Undo
0B
Temp
1B
1B
0A
20
20
The Fusion-io ioDrives representation at the local site is as regular block devices
on fioa-fiod. Each device is 320 GB. The use of the ioDrives is for the following:
Table 9 has the details for the Hitachi Dynamic Provisioning pool at the remote
site.
Table 9. Remote Site Hitachi Dynamic Provisioning Pool
001
005 014
RAID Level
RAID-10 (4D+4D)
Drive Type
Number of Drives
80
Pool Capacity
20.8 TB
Table 10 has the details for the LUNs created from the dynamic provisioning pool
at the remote site.
Table 10. Remote Site Hitachi Dynamic Provisioning Pool LUN Information
Dynamic
Provisioning
Pool ID
LUNs
Purpose
Storage Port
001
0014 0025
200
Oracle System
0A
Sysaux
1A
Undo
0B
Temp
1B
21
21
Database Layout
The database layout design uses recommended practices from Hitachi Data
Systems for Hitachi Unified Storage 150 and Hitachi Adaptable Modular Storage
2500 for small random I/Os, such as the ones in OLTP transactions. It also takes
into account the Oracle ASM best practices when using Hitachi storage.
The storage design for database layout needs is based on the requirements of a
specific application implementation. The design can vary greatly from one
implementation to another. The components in this solution set have the flexibility
for use in various deployment scenarios to provide the right balance between
performance and ease of management for a given scenario.
The database layout at the remote site has the same configuration as the local
site, except that it does not use ASM Preferred Mirror Read.
22
22
Table 11 lists the disk mappings from the LUNs to the following at the local site:
Table 11. Local Site LUNs, OS Disks, and Oracle ASM Disk Mappings
LUN
OS Device
ASM Disk
ASMDG
Purpose
0001
0002
/dev/sddlmaa
sddlmab
N/A
N/A
0003
0005
/dev/sddlmac / RGDISK01
dev/sddlmae
RGDISK03
REDODG01
0006
/dev/sddlmaf
REDODG01
0007
0009
/dev/sddlmag / RGDISK11
dev/sddlmai
RGDISK13
REDODG11
0010
/dev/sddlmaj
REDODG11
0011
0013
/dev/sddlmak / ARDISK01
dev/sddlmam
ARDISK03
ARCHDG
N/A
ASMPFMDG
0014
0020
/dev/sddlman / DADISK01
DADISK07
dev/sddlmap
CTDISK01
CTDISK11
/dev/sddlmba /
dev/sddlmbd
0021
0025
/dev/sddlmbe / DADISK08
dev/sddlmbi
DADISK12
SANDATADG
0100
/dev/sddlmbk
STBYDISK01
STBYDG01
0199
/dev/sddlmbj
FRADISK01
FRADG
Sys
Undo
Temp
Oracle Standby Redo Log
Oracle Flash Recovery Area
23
23
Table 12 lists the disk mappings from the LUNS to the following at the remote site:
Table 12. Remote Site LUNS, OS Disks, and Oracle ASM Disk Mappings
LUN
OS Device
0001
0002
ASM Disk
ASMDG
Purpose
/dev/sddlmaa / N/A
dev/sddlmab
N/A
0003
0005
/dev/sddlmac / RGDISK01
dev/sddlmae
RGDISK03
REDODG01
0006
/dev/sddlmaf
REDODG01
0007
0009
/dev/sddlmag / RGDISK11
dev/sddlmai
RGDISK13
REDODG11
0010
/dev/Sddlmaj
REDODG11
0011
0013
/dev/sddlmak / ARDISK01
dev/sddlmam
ARDISK03
ARCHDG
0014
0025
/dev/sddlman / DADISK01
dev/sddlmbi
DADISK12
SANDATADG
Sys
Undo
CTDISK01
CTDISK11
0100
/dev/sddlmbk
STBYDISK01
STBYDG01
0199
/dev/sddlmbm
FRADISK01
FRADG
Temp
Oracle Standby Redo Log
Oracle Flash Recovery Area
The storage disks on Hitachi Unified Storage 150 at the local site form
24
24
Oracle Data Guard uses the background processes Log Writer (LGWR)
or the Archiver (ARCH) to collect transaction redo data and ship this data
to the physical standby database.
Fetch Archive Log Process (FAL) provides a client-server mechanism
for shipping archived logs to the standby database following a
communication loss between the primary and the standby databases,
for automatic gap resolution and re-synchronization.
Oracle Data Guard uses the Remote File Server (RFS) process to receive
redo records from the primary database.
The Managed Recovery Process (MRP) applies redo information to the
physical standby database.
Figure 6 on page 25 shows the process architecture for Oracle Data Guard
between the primary database instance at the local site and the physical standby
database instance at the remote site.
25
25
Figure 6
Figure 7 on page 26 shows the software stack for the reference architecture at the
local site.
Figure 8 on page 27 shows the software stack for the reference architecture at the
remote site.
26
26
Figure 7
27
27
Figure 8
28
28
SAN Architecture
At both the local and remote site, map the provisioned LDEVs to multiple ports on
the storage system. These LDEV port assignments provide multiple paths to the
storage system from the host for high availability.
The environment used two Fibre Channel switch modules installed in the Hitachi
Compute Blade 2000 chassis at the local site. This provides scalability and high
availability. Storage Configuration on page 14 has host configuration details.
The database server used four Fibre Channel ports, two ports from the mezzanine
card on each server blade at the local site. At the remote site, the database server
used four Fibre Channel ports, two ports from the PCIe HBA Emulex cards on
each server blade. At either site, this provides a four-path connection for all LUNs
mapped to the server.
Table 13 has the zoning details for the SAN at the local site for the database
server.
Table 13. Local Site Database Server SAN Switch Architecture
HBA Ports
Switch Zone
Storage Port
Switch
B0-HBA1-1
BS2K_13_B0_HBA1_1_ASE46_142_0A
0A
5300-05
B0-HBA1-2
BS2K_13_B0_HBA1_2_ASE46_142_0B
0B
5300-06
B1-HBA1-1
BS2K_13_B1_HBA1_1_ASE46_142_1A
1A
5300-05
B1-HBA1-2
BS2K_13_B1_HBA1_2_ASE46_142_1B
1B
5300-06
29
29
When designing your SAN architecture, follow these recommended practices to
ensure a secure, high-performance, and scalable database deployment:
Use at least two HBAs and place them on different I/O buses within the
server. This distributes the workload over the servers PCIe bus architecture
Use dual SAN fabrics, multiple HBAs, and host-based multipathing software
in a business-critical deployment. Connecting two or more paths from the
database servers to two independent SAN fabrics provide the redundancy
required for critical applications.
Zone your fabric to handle multiple, unique paths from HBAs to storage ports.
Use single initiator zoning. Use at least two Fibre Channel switch fabrics to
provide multiple, independent paths to Hitachi Unified Storage 150 to prevent
configuration errors from disrupting the entire SAN infrastructure.
For large bandwidth requirements that exceed the port capability of a single
HBA, do the following:
Network Architecture
This reference architecture has four onboard 1 Gb/sec NIC ports for different
types of data traffic at the local and remote site. At either site, the two NIC ports
on each blade connect to two 1 Gb/sec Ethernet switch modules in the chassis.
Figure 9 shows the network configuration for the reference architecture
environment at the local and remote site.
30
30
Figure 9
31
31
Engineering Validation
This describes the tests performed in the Hitachi Data Systems lab using Oracle
Data Guard on the Hitachi Unified Compute Platform Select for Oracle Database
reference solution.
Test Methodology
These were the steps followed to test the reference architecture:
1. Build the physical standby database instance using Oracle Data Guard.
2. Configure synchronous replication between the primary database instance at
the local site and the physical standby database instance at the remote site
for a zero RPO and sub-minute RTO solution for an unplanned site failure.
3. Create Fast-Start fail over configuration to automate the switch over and fail
over operations using Oracle Data Guard Broker.
4. For a given network bandwidth, verify zero RPO and measure RTO for the
following functions:
Switch over from the primary database instance at the local site to the
physical standby database instance at the remote site.
Switch back from the primary database instance at the remote site to the
physical standby database instance at the local site.
Fail over to the physical standby database instance at the remote site
caused by the abrupt failure of the primary server at the local site.
Fail over to the physical standby database instance at the local site
caused by the abrupt failure of the primary server at the remote site.
5. Fail over to the standby database instance at the remote site caused by the
abrupt failure of the network between the primary and the remote site.
6. Use the Oracle database instance alert logs and Oracle Data Guard Broker
commands to verify each function worked.
A test passes if each function works as intended in the reference architecture.
32
32
Results Summary
This summarizes the key observations from the test results for Oracle Data Guard
on Hitachi Unified Compute Platform Select for Oracle Database reference
solution. All functions described in the Test Methodology on page 31worked as
intended. Using the Oracle database instance alert logs and Oracle Data Guard
Broker confirmed the replicated database instance had the correct status for each
function tested.
The following is the Oracle Data Guard configuration status between the primary
database instance at the local site and the physical standby database instance at
the remote site before performing the switch over from the local site to the remote
site.
ConfigurationFSF
Databases:
pr01dbPrimary database
Threshold: 30 seconds
Target: stbydb
Auto-reinstate: True
The result of the switch over to stbydb was "switchover succeeded, new primary is
'stdydb.'" Figure 10 on page 33 is a screen shot showing the switch over results
from the local site to the remote site.
33
33
Figure 10
The following is the Oracle Data Guard configuration status between the primary
database instance at the remote site and the physical standby database instance
at the local site before performing switch over from the remote site to the local
site:
ConfigurationFSF
Databases:
pr01dbPrimary database
The result of the switch over to pr01db was "switchover succeeded, new primary
is 'pr01db.'" Figure 11 on page 34 shows Oracle Data Guard switch over results
from the remote site to the local site.
34
34
Figure 11
The following is the Oracle Data Guard configuration status between the primary
database instance at the local site and the physical standby database instance at
the remote site after an abrupt failure of the database server at the local site:
ConfigurationFSF
Databases:
stbydbPrimary database
Warning: ORA_16817: unsynchronized fast-start fail over configuration
Threshold: 30 seconds
Target: pr01db
Auto-reinstate: True
35
35
The following is the Oracle Data Guard configuration status between the primary
database instance at the remote site and the physical standby database instance
at the local site after an abrupt failure of the database server at the remote site:
ConfigurationFSF
Databases:
pr01dbPrimary database
Warning: ORA_16817: unsynchronized fast-start fail over configuration
Threshold: 30 seconds
Target: pr01db
Auto-reinstate: True
Figure 12 on page 36 shows the RTO for the functions discussed in Test
Methodology on page 31 for Oracle Data Guard on the Hitachi Unified Compute
Platform Select for Oracle Database.
Figure 13 on page 37 shows the response time of the SQL SELECT test against a
124 GB table with and without fail over operation at the local and remote site.
36
36
Figure 12
37
37
Figure 13
38
38
Conclusions
This reference architecture guide documents a high availability and disaster
recovery solution using Oracle Data Guard on Hitachi Unified Compute Platform
Select for Oracle Database.
Oracle Data Guard provides multiple database replication capabilities for different
business continuity goals. Unified Compute Platform Select for Oracle Database,
in conjunction with Oracle Data Guard, provides different modes of database
replication techniques for use, depending on product requirements. This makes
sure that you have a protected environment against unforeseen disaster.
This integrated and coordinated effort provides IT administrators with multiple
options to match the desired recovery point objective (RPO) and recovery time
objective (RTO).
39
39
Figure 14
40
40
Figure 15 shows the Oracle Data Guard configuration for the physical standby
database instance at the remote site.
Figure 15
ConfidentialProprietary information for use only in Hitachi Data Systems distribution centers.
Corporate Headquarters
2845 Lafayette Street, Santa Clara, California 95050-2627 USA
www.HDS.com
Regional Contact Information
Americas: +1 408 970 1000 or info@HDS.com
Europe, Middle East and Africa: +44 (0) 1753 618000 or info.emea@HDS.com
Asia-Pacific: +852 3189 7900 or hds.marketing.apac@HDS.com
Hitachi Data Systems Corporation 2013. All rights reserved. HITACHI is a trademark or registered trademark of Hitachi, Ltd. Innovate with Information and Hi-Track Remote Monitoring
are trademarks or registered trademarks of Hitachi Data Systems Corporation. All other trademarks, service marks, and company names are properties of their respective owners.
Notice: This document is for informational purposes only, and does not set forth any warranty, expressed or implied, concerning any equipment or service offered or to be offered by Hitachi
Data Systems Corporation.
AS-172-01, February 2013