0% found this document useful (0 votes)
26 views88 pages

BC and DR

The document discusses the importance of business continuity and disaster recovery in data centers, highlighting the critical need for organizations to recover operations quickly after disruptions. It outlines various disaster recovery strategies, including cold, warm, and hot standby options, and emphasizes the necessity of planning and risk analysis to mitigate impacts from disasters. Additionally, it presents the evolution of data centers and the components necessary for effective disaster recovery, such as data replication and server high availability.

Uploaded by

ZABI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views88 pages

BC and DR

The document discusses the importance of business continuity and disaster recovery in data centers, highlighting the critical need for organizations to recover operations quickly after disruptions. It outlines various disaster recovery strategies, including cold, warm, and hot standby options, and emphasizes the necessity of planning and risk analysis to mitigate impacts from disasters. Additionally, it presents the evolution of data centers and the components necessary for effective disaster recovery, such as data replication and server high availability.

Uploaded by

ZABI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data Center

Business Continuance
and Disaster Recovery

Maciej Bocian
mbocian@[Link]
Architecture Sales Manager
Data Center and Virtualization, Central Europe
CCIE#7785

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 1
Business Continuance Drivers

• Cost of application downtime, lost data


and productivity

• Regulatory
g y mandates ((Homeland Hurricanes

Defense, Basel II, HIPAA, GLB, SEC)


Firms must recover business operations the
same business day a disruption occurs
“Out-of-region” data center, 200+ km away
Mandates backup data centers on separate The Northeast Blackout

grids

NYC Blizzard of 2003

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 2
Business Continuance Is More Critical than Ever
ƒ 75% of IT decision-makers have altered Disaster
Recovery/Business Continuance programs as a
result of September 11

ƒ Following a disaster 43% of directly affected


businesses do not reopen and 29% fail within 24
months as a result

ƒ Only 15% of Global 2000 enterprises have a full-


fledged business continuity plan.

ƒ Disasters: fire, storm, floods, earthquakes, chemical


accidents, nuclear accidents, wars
Sources: Disaster Recovery Journal, Gartner Group

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 3
Agenda

ƒ Introduction to Data Center - The Evolution


ƒ Data Center Disaster Recovery
Objectives
Failure Scenarios
Design Options
ƒ Components of Disaster Recovery
Site Selection - Front End GSLB
Server High Availability - Clustering
D t R
Data Replication
li ti anddS
Synchronization
h i ti - SAN E
Extension
t i
ƒ Sample Design

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 4
The Evolution of
Data Centers

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 5
Data Center Evolution
NETWORKED DATA
CENTER PHASE
Data Center
Continuous
Data Center Availability
Distributed
Data Center
Network Consolidation
COMPUTE Optimization
Internet
Data Center
Business Agility

EVOLUTION Computing
Client/ Networking
Server 1. Consolidation
1
Mainframes 2. Integration
Content 3. Distributed
Networking 4. High Availability
Thin Client: HTTP

TCP/IP
NETWORK
Terminal EVOLUTION

1960 1980 2000 2010


Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 6
What is involved in a Data Center

Network infrastructure solution Application solution


Cisco GSRs, Li
Linux/HP,
/HP
CISCO CATALYST Solaris/SunFire,
6500, Cisco Catalyst WebLogic, J2EE
Cat4000 custom app, etc.

Layer 4–7 services solution


CSM, Database solution
SSLM, Linux/HP, Solaris/
CSS, SunFire, Oracle
CE, GSS 10G RAC, etc.

Network security solution


PIX®,
FWSM,
IDSM,
VPNSM,
St l ti
Storage solution
CSA MDS9000

Management and instrumentation solution


Terminal
servers, NAM,
NAM
Cisco Works
LMS/VMS,
HSE

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 7
What is Distributed Data Center

APP A APP B APP A APP C

Data Replication

FC FC

Primaryy Secondaryy
Data Center Data Center

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 8
Why Distributed Data Centers

ƒ Provide disaster recovery and business continuance


ƒ Avoid single, concentrated data depositary
ƒ High
g availability
y of applications
pp and data access
ƒ Load balancing together with performance scalability
ƒ Better response and optimal content routing: proximity
to clients

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 9
Front-end IP Access Layer
y

“Content Routing”
APP A APP B site selection APP A APP C

FC FC

Primaryy Secondaryy
Data Center Data Center

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 10
Application
pp and Database Layer
y

“Content
Content Switching
Switching”
Load Balancing
APP A APP B “Server Clustering” APP A APP C
High Availability

FC FC

Primary Secondary
Data Center Data Center

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 11
Backend SAN Extension

“Storage” & “Optical”


APP A APP B APP A APP C
Data
Mirroring
o ga and
d Replication
ep cat o

FC FC

Primary
P i Secondary
S d
Data Center Data Center

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 12
Data Center Disaster
Recovery

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 13
Agenda

ƒ Introduction to Data Center - The Evolution


ƒ Data Center Disaster Recovery
Objectives
Failure Scenarios
Design Options
ƒ Components of Disaster Recovery
Site Selection - Front End GSLB
Server High Availability - Clustering
D t R
Data Replication
li ti anddS
Synchronization
h i ti - SAN E
Extension
t i
ƒ Sample Design

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 14
Disaster Recovery

ƒ Recovery of data and resumption of service - Ensuring


business can recover and continue after failure or
disaster

ƒ Ability of a business to adapt, change and continue when


confronted with various outside impacts

ƒ Mitigating the impact of a disaster

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 15
What It means For Business

Business Resilience
Continued Operation of
Business During a Failure

Business Continuance
Restoration of Business
After a Failure
Disaster Recovery
Protecting
g Data Through
g Offsite
Data Replication
and Backup

Zero Down Time is the ultimate goal

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 16
Disaster Recovery Planning

• Business Impact Analysis (BIA)


Determines the impacts of various disasters to specific business
functions and company assets

• Risk Analysis
Identifies important functions and assets that are critical to
company’s
company s operations

• Disaster Recovery Plan (DRP)


Restores operability of the target systems
systems, applications
applications, or
computing facility at the secondary Data Center after the disaster

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 17
Disaster Recovery Objectives
ƒ Recovery Point Objective (RPO)
Th point
The i t iin titime ((prior
i tto th
the outage)
t ) in
i which
hi h system
t and
dddata
t
must be restored to
Tolerable lost of data in event of disaster or failure
The impact of data loss and the cost associated with the loss
ƒ Recovery Time Objective (RTO)
The period of time after an outage in which the systems and data
must be restored to the predetermined RPO
The maximum tolerable outage time
ƒ R
Recovery AAccess Obj
Objective
ti (RAO)
Time required to reconnect user to the recovered application,
regardless where it is recovered

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 18
Recovery Point/Time vs. Cost
Critical data is Disaster Systems recovered
recovered strikes and operational

time
Recovery point Recovery time
time t0 time t1 time t2

days hours mins secs secs mins hours days weeks

Tape Periodic Asynchronous Synchronous Extended Manual Tape


backup Replication Replication Replication Cluster Migration Restore

$$$ Increasing cost $$$ Increasing cost

ƒ Smaller RPO/RTO ƒ Larger RPO/RTO


Higher $$$, Replication, Hot Lower $$$, Tape backup/restore,
standby Cold stanby
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 19
Agenda

ƒ Introduction to Data Center - The Evolution


ƒ Data Center Disaster Recovery
Objectives
Failure Scenarios
Design Options
ƒ Components of Disaster Recovery
Site Selection - Front End GSLB
Server High Availability - Clustering
D t R
Data Replication
li ti anddS
Synchronization
h i ti - SAN E
Extension
t i
ƒ Sample Design

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 20
Failure Scenarios

Disaster could mean many types of Failure


ƒ Network Failure
ƒ Device
D i F Failure
il
ƒ Storage Failure
ƒ Site Failure

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 21
Network Failures
Internet Service
Service
P id A
Provider Provider B

ƒ ISP failure
9 Dual ISP connections
9 Multiple ISP

ƒ Connection failure within the


network
9 ether-channel
9 Multiple route paths

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 22
Device Failures
Internet Service
Service
Provider A Provider B

ƒ Routers, Switches, FWs


9 HSRP
9 VRRP

ƒ Hosts
9 HA cluster

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 23
Storage Failures
Internet Service
Service
P id A
Provider Provider B

ƒ Disk arrays
9 RAID

ƒ Disk Controllers

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 24
Site Failures
Internet Service
Service
P id A
Provider Provider B

ƒ Partial Site Failure


9 Application
pp maintenance
9 Application migration
9 Application scheduled DR
exercise

ƒ Complete Site Failure


9 Disaster

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 25
Agenda

ƒ Introduction to Data Center - The Evolution


ƒ Data Center Disaster Recovery
Objectives
Failure Scenarios
Design Options
ƒ Components of Disaster Recovery
Site Selection - Front End GSLB
Server High Availability - Clustering
D t R
Data Replication
li ti anddS
Synchronization
h i ti - SAN E
Extension
t i
ƒ Sample Design

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 26
Cold Standby

ƒ One or more data center with appropriately configured


space equipped with pre-qualified environmental,
electrical,, and communication conditioning
g
ƒ Hardware and Software installation, Network access, and
data restoration all need manual intervention
ƒ Least expensive to implement and maintain
ƒ Substantial delayy from standbyy to full operation
p

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 27
Disaster Recovery – Active/Standby

APP A APP B APP A APP B

FC FC

Primaryy Secondary
Data Center Data Center
(Cold Standby)

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 28
Warm Standby

ƒ A data center that is partially equipped with hardware and


communications interfaces capable of providing backup
operating
p g support.
pp
ƒ Latest backups from the production data center must be
delivered
ƒ Network access needs to be activated
ƒ Provides better RTO and RPO than Cold Standbyy
Backup

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 29
Disaster Recovery – Active/Standby

APP A APP B APP A APP B

IP/Optical Network

FC FC
Secondary
Primaryy
Data Center
Data Center
(Warm Standby)

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 30
Hot Standby

ƒ A data center that is environmentally ready and has


sufficient hardware, software to provide data processing
service with little down or no down time.
ƒ Hot Backup offers Disaster Recovery, with little or no
human intervention
ƒ Application
A li ti d data
t iis replicated
li t d ffrom th
the primary
i site
it
ƒ A hot backup site provides very good RTO and RPO

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 31
Disaster Recovery – Active/Standby

APP A APP B APP A APP C

IP/Optical Network

FC FC

Primaryy Secondaryy
Data Center Data Center

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 32
Disaster Recovery – Active/Active

What Does Active/Active Mean??

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 33
Multiple Tiers of Application
Internet Service
Service
P id A
Provider Provider B

Presentation Tier

Application Tier

Storage Tier

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 34
Active/Active Data Centers

Internal Internet
Network Service Service
Provider A Provider B Internal
Network

Active/Active Web
Hosting

Active/Active
Application Processing

Active/Standby
Database Processing
Or
Active/Active
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 35
Disaster Recoveryy
Components

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 36
Agenda

ƒ Introduction to Data Center - The Evolution


ƒ Data Center Disaster Recovery
Objectives
Failure Scenarios
Design Options
ƒ Components of Disaster Recovery
Site Selection - Front End GSLB
Server High Availability - Clustering
D t R
Data Replication
li ti anddS
Synchronization
h i ti - SAN E
Extension
t i
ƒ Sample Design

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 37
Site Selection Mechanisms
ƒ Site selection mechanisms depend on the technology
or mix of technologies adopted for request routing:
1. HTTP Redirect
2 DNS Based
2.
3. L3 Routing with Route Health Injection (RHI)
ƒ Health
H lth off servers and/or
d/ applications
li ti needs
d tto be
b
taken into account
ƒ Optionally, other metrics (like load ) can be measured
Optionally
and utilized for a better selection

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 38
HTTP Redirection – The Idea

ƒ Leveraging the HTTP redirect function:


HTTP return code 302
ƒ Proper site selection made after the initial DNS request
has been resolved, via redirection
ƒ Mainly as a method of providing site persistence while
providing local server farm failure recovery
ƒ Can be used with the “Location Cookie” feature of the
CSS to provide redirection after wrong site selection

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 39
HTTP Redirection – Traffic Flow

[Link]

[Link]

[Link]

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 40
Advantages of the HTTP Redirection
Approach

ƒ Can be implemented without any other


GSLB devices or mechanisms
ƒ Inherent persistence to the selected
location
ƒ Can be used in conjunction with other
methods to provide more sophisticated
site selection

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 41
Limitations of the HTTP Redirection
Approach

ƒ It is protocol specific – relies on HTTP


ƒ Requires
q redirection to fully
yqqualified
additional names – additional DNS
records
ƒU
Users may b bookmark
k k a specific
ifi llocation
i
– losing automatic failover
ƒ HTTPS redirect requires full SSL hand
shake to be completed first

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 42
DNS-Based
DNS Based Site Selection – The Idea

ƒ The client D-proxy (local name server) performs


iterative queries
ƒ The device which acts as “site selector” is the
authoritative name server for the domain(s) distributed
in multiple locations
ƒ The “site selector” sends keepalives to servers or
server load
l db balancer
l iin th
the llocall and
d remote
t llocations
ti
ƒ The “site selector” selects a site for the name
resolution, according to the pre-defined
pre defined answers and
site load balance method
ƒ The user traffic is sent to the selected location

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 43
DNS-Based
DNS Based Site Selection – Traffic Flow
Root Name Server for/
Authoritative Name Server for .com
DNS Proxy 2
3 4 Authoritative Name Server
[Link]
5
1 6
10 7
8

Client 9 Authoritative
Name Server
[Link] [Link]
UDP:53
TCP 80
TCP:80

Data Center 1 Data Center 2


Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 44
Advantages of the DNS Approach

ƒ Protocol independent:
p works with any
y
application that uses name resolution
ƒ Minimal configuration changes in the current
IP and DNS infrastructure ((DNS authoritative
server)
ƒ Implementation can be different for specific
host names
ƒ A-records can be changed on the fly
ƒ Can take load or data center size into
account
ƒ Can provide proximity

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 45
Limitations of the DNS
DNS-Based
Based Approach

ƒ Visibility limited to the D-proxy


D proxy (not the
client)
ƒ Can not gguarantee 100% session
persistency
ƒ DNS caching in the D-proxy
ƒ DNS caching in the client application
ƒ Order of multiple A-record answers
can be altered by D-proxies

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 46
Route Health Injection – The Idea

ƒ Server and application health monitoring provided by


local Server Load Balancers
ƒ SLB can advertise or with draw VIP address to upstream
routing devices depending on the availability of the local
server farm
ƒ S
Same VIP addresses
dd can be
b advertised
d ti d ffrom multiple
lti l
data centers – IP Anycast
ƒ Relying on L3 routing protocols for route propagating
and content request routing
ƒ Disaster Recoveryy p
provided by
y network convergence
g

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 47
Route Health Injection – Implementation

Client A Router 11 Client B


Router 13

Router 10

Router 12 Low Cost


Very High Cost
Location A
Backup Location for Location B
VIP x.y.w.z Preferred Location for
VIP x.y.w.z

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 48
Advantages of the RHI Approach

ƒ Supports legacy application and does not


rely on a DNS infrastructure
ƒ Veryy g
good re-convergence
g time,
especially in Intranets where L3 protocols
can be fine tuned appropriately
ƒP
Protocol-independent:
t li d d t works
k with
ith any
application
ƒ Robust protocols and proven features

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 49
Limitations of the RHI Approach

ƒ Relies on host routes (32 bits),


bits) which
cannot be propagated all over the
internet (more on this later)
ƒ Requires tight integration between the
application-aware devices and the L3
routers
ƒ Inability to intelligently load balance
among the data centers

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 50
Agenda

ƒ Introduction to Data Center - The Evolution


ƒ Data Center Disaster Recovery
Objectives
Failure Scenarios
Design Options
ƒ Components of Disaster Recovery
Site Selection - Front End GSLB
Server High Availability - Clustering
D t R
Data Replication
li ti anddS
Synchronization
h i ti - SAN E
Extension
t i
ƒ Sample Design

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 51
Cluster Overview
ƒ A cluster is two or more servers
configured to appear as one
ƒ Two types of clustering: Load
balancing (LB) and High
Availability (HA) Web Servers

ƒ Clustering provides benefits for


availability, reliability, scalability,
and manageability
ƒ LB clustering:
l t i multiple
lti l copies
i off Application Servers

the same application against the


same data set, usually read only
ƒ HA clustering: multiple copies of
long running application that Database Servers
requires access to a common data
depository, usually read and write

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 52
HA Cluster Connections
ƒ Public Network (typically
Ethernet) for client /Application
requests
ƒ Servers with same hardware,
OS, and application software
ƒ Private Network (typically
Ethernet) for interconnection
between nodes. Could be direct
connect,
t or optionally
ti ll going
i
through the public network
ƒ Storage Disk (typically Fiber)
shared storage array
array, NAS or
SAN

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 53
Typical HA Cluster Components

ƒ Application
pp software that are clustered to p
provide High
g
Availability. Example: Microsoft Exchange, SQL, Oracle
database, File and Print Services
ƒ Operating System that runs on the server hardware.
E
Example:l Mi
Microsoft
ft Wi
Windows
d 2000 or 2003
2003, Linux
Li (and
( d the
th
other flavors of UNIX), IBM VMS or z/OS (for mainframe)
ƒ Cluster Software that provides the HA clustering service
for the application
application. Example: Microsoft MSCS
MSCS, EMC
AutoStart (Legato), Veritas Cluster Server, HP TruCluster
and OpenVMS
ƒ Optionally
Optionally, Cluster Enabler
Enabler, a software that synchronizes
the cluster software with the storage disk array software

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 54
Basic HA Cluster Design

ƒ Active/Standby:
– Active node takes client
requests and writing to the data
– Standby takes over when
detecting failure on active
– Two-node or multi-node
node1 node2
ƒ Active/Active:
– Database requests load
balanced to both nodes
– Lock mechanism ensures
data integrity
– Most scalable design

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 55
File System Approaches for HA Clusters

ƒ Shared Everything
y g
– Equal access to all storage
– Each node mounts all storage resources
– Provides a single layout reference system for all nodes
– Changes updated in the layout reference
ƒ Shared Nothing
– Traditional file system with peer-peer communication
– Each node mounts only its “semi-private” storage
– Data stored on the p
peer system’s
y storage
g is accessed via the p
peer-
peer communication
– Failed node’s storage needs to be mounted by the peer

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 56
Geo clusters
Geo-clusters
Geo-cluster: cluster that span multiple data centers

WAN

Local Remote
Datacenter Datacenter

node1 node2

Disk Replication
Synchronous or Asynchronous
2 x RTT

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 57
Considerations for HA Clusters

ƒ Split Brain: Cluster partitioning when nodes can not communicate with
each other but are equally capable of forming a cluster and mount disks.
ƒ Extended L2 required in most implementations for:
– Public Network
Network, since client only knows about the Virtual IP address
– Private Network, used for Heart-beats
ƒ Storage:
– Directly Attached Disk (DAS) cannot be used
– Shared Disk needs to be visible to both Nodes
– Needs to interface with cluster software for disk failover, zoning,
LUN masking when there is a node failure

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 58
Split Brain
Split-Brain

ƒ Split-brain happens when all of the


network communication links
between two or more cluster nodes
fail.
ƒ Both nodes could potentially go
active, and concurrently access the node1 node2

disk,
d s , tthus
us corrupting
co upt g data

Data Corruption

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 59
Resolution for Split Brain: Quorum

ƒ Aqquorum device serves as a tie


breaker to arbitrate which system has
access to resources.
ƒ The qquorum ensures that even if there
is no communication between the
nodes, only one node can continue to
node1 node2
access the disk.
ƒ Only the node that owns the quorum
(or, majority quorum votes) can bring
resources online.
ƒ Any resource can be used as the
arbitrator to break the tie.

quorum

Application data
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 60
Extended Layer 2 Network

ƒ In most implementation, WAN


a common L2 network
t k iis
needed for the heartbeat Remote
Local
between the nodes, as Datacenter Datacenter

well as public client


access
Public Layer 2 network
ƒ Extending VLAN on a
geographical basis is not node1 node2
considered
id d b bestt practice
ti Private Layer 2 network
because of the impact of
broadcasts, multicast,
flooding
g and Spanning-g
Tree integration issues

Disk Replication:
Synchronous or
Asynchronous

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 61
Resolution: L3 Routed Solution
ƒ In certain cases a L3 routed solution
is possible 172.28.210.x
11 20 5 x
11.20.5.x
ƒ Microsoft MSCS
– Requires that 2 nodes be on the node2
same subnet. node1

– The
Th communication
i ti bbetween
t th
the 2
nodes is UDP unicast
– Local Area Mobility (LAM) allows
the placement of the nodes on 2
different subnets
Extended SAN
ƒ Veritas VCS
– Allows having nodes with IP
addresses in different subnets
– The Virtual Address needs to
change when moving from node1 to
node2
– DNS can be used to p provide name-
multiple IP mapping Disk Replication:
Synchronous or
Asynchronous

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 62
Storage Disk Zoning
node1 node2
ƒ What storage
g disk array y
standby
should node 2 be zoned to active
before and after a failure on
node 1
ƒ To complete the failover you
need to change the zoning Extended SAN
configuration
ƒ Software needed to
synchronize the Cluster
Software with the Disk Array’s
software, i.e. Cluster Enabler
sym1320 sym1291

RW RD

RW RD

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 63
Resolution: Cluster Enabler

ƒ The Cluster Enabler (CE) provides node1 node2


the interface between the
Clustering Software and the Disk active standby
Array’s software
ƒ When the Clustering Software
detects a failure and wants to fail
the node, the Cluster Enabler
instructs the Disk Array to perform
an failover Extended SAN

ƒ Cluster Enabler also allows node1


to be zoned to sym1320 and
node2 to be zoned to 1291
ƒ The Cluster Enabler running on
each node typically communicates sym1320 sym1291
with the Cluster Enabler Software
running on the remote node with
Local Multicast messages RW WD

RW WD

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 64
Agenda

ƒ Introduction to Data Center - The Evolution


ƒ Data Center Disaster Recovery
Objectives
Failure Scenarios
Design Options
ƒ Components of Disaster Recovery
Site Selection - Front End GSLB
Server High Availability - Clustering
D t R
Data Replication
li ti anddS
Synchronization
h i ti - SAN E
Extension
t i
ƒ Sample Design

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 65
Terminology
ƒ Storage subsystem
Just a bunch of disks (JBOD)
Redundant array of independent disks (RAID)

ƒ Storage I/O devices


Host Bus Adapter (HBA)
p
Small Computer Serial Interface ((SCSI))

ƒ Storage protocols
SCSI
iSCSI
FC (FCIP)

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 66
Terminology (Cont
(Cont’d)
d)
ƒ Direct Attached Storage (DAS)
Storage is
St i “local”
“l l” behind
b hi d th
the server
No storage sharing possible
Costly to scale; complex to manage
ƒ Network Attached Storage (NAS)
Storage is accessed at a file level over an IP network
St
Storage can be
b shared
h db between
t servers
ƒ Storage Area Networks (SAN)
Storage is accessed at a block
block-level
level
Separation of Storage from the Server
High performance interconnect providing high I/O throughput

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 67
Storage for Applications
ƒ Presentation Tier
U
Unrelated small data files commonly
y stored on internal disks
Manual distribution
ƒ Application Processing Tier
Transitional, unrelated data
Small files residing on file systems
Mayy use RAID to spread
p data over multiple
p disks
ƒ Storage Tier
Large, permanent data files or raw data
Large batch updates, most likely Real time
Log and data on separate volumes

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 68
Backup and Replication
ƒ Offsite tape vaulting
Backup tapes stored at offsite location
ƒ Electronic vaulting
Transmission of backup data to offsite location
ƒ Remote disk replication
Continuous copying of data to offsite location
Transparent to host

ƒ Other methods of replication


Host-based mirroring
Network-based replication

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 69
Replication: Modes of Operation

ƒ Synchronous
All data written to cache of local and remote arrays before I/O is
complete and acknowledged to host

ƒ Asynchronous
Write acknowledged after write to local array cache; changes
(writes) are replicated to remote array asynchronously

ƒ Semi-synchronous
Write acknowledged
g with a single
g subsequent WRITE command
pending from remote array

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 70
Synchronous Vs. Asynchronous Trade-
Off
Synchronous Asynchronous
Impact to Application No Application
Performance Performance Impact
Distance Limited (Are Both Unlimited Distance (Second
Sites within the Same Site Outside Threat Radius)
Threat Radius)
Exposure to
No Data Loss Possible Data Loss

Enterprises Must Evaluate the Trade-Offs

ƒ Maximum tolerable distance ascertained by


assessing each application
ƒ Cost of data loss

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 71
Data Replication with DB Example

• DB name Control Files ƒ Control Files identify other files


making up the database and
• creation date
records content and state of
• backup performed the db.
• redo log time period
• datafile state ƒ Datafile is only
y updated
p
periodically
ƒ Redo logs record db changes
Identify
resulting from transactions
Used
U d tto play
l b back
k changes
h th
thatt
may not have been written to
datafile when failure occurred
Typically archived as they fill to
local and DR site destinations
Record
Datafiles changes to Redo Log
Files

• Tablespaces • Database changes


• Indexes
• Data Dictionary

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 72
Data Replication with DB Example
(Cont d)
(Cont’d)
Failure or disaster occurs at
time t1
• Media Failure (e.g.
(e g disk)
time • Human Error (datafile deletion)
• Database Corruption

... ... ...

Archived Redo Logs t1


t0 Online Redo
Logs

Hot Backup of
ƒ Database restored to state at time of failure (time t1)
Datafiles and by:
Control Files taken
at Time t0 1. Restoring Control Files & Datafiles from last Hot
Backup (time t0)
2. Sequentially replaying changes from subsequent
Redo Logs (archived and online) – changes made
between time t0 and t1

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 73
Data Replication with DB Example
(Cont d)
(Cont’d)
Primary Site Secondary Site
Redo Logs (Cyclic) Redo Logs (Cyclic)
Copy of Every Committed
Transaction Synchronously Replicated
Earlier DB
for Zero Loss
Backups

Database

SAN
Extension
E t i Database
Database Transport Copy at
copy at Time t0
Point in Time time t0
Copy Taken
When DB Replicated/Copied
Quiescent

Archive Logs Archive Logs


Replicated/Copied

ƒ Mixture of sync and async replication technologies commonly used


Usually only redo logs sync replicated to remote site
Archive logs
g created from redo log
g and copied
p when redo log
g switches
Point in time (PiT) copies of datafiles and control files copied periodically
(e.g. nightly)

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 74
Data Center Interconnection Options
Internet Internet
Stateful
Stateful Firewalls
Firewalls
Content
C t t
Content Caching
High
Caching SONET/SDH High
Density Density
Multilayer Server Server
Load Balancing Load Balancing Multilayer
LAN LAN
Switch Switch
Intrusion Intrusion
Detection Detection

Front-End Application Front-End Application


Servers Servers

DWDM/
CWDM
Back-End Application Back-End Application
Servers Servers

High
g High
Density Density
Multilayer Multilayer
SAN SAN
Director Director

Enterprise-Class Storage Arrays Enterprise-Class storage Arrays


IP/Metro E

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 75
Data Center Transport Options

Increasing Distance
Data
Center Campus Metro Regional National

Dark Fiber Sync Limited by Optics (Power Budget)

CWDM Sync (2Gbps) Limited by Optics (Power Budget)


cal
Optic

DWDM Sync (2Gbps lambda) Limited by BB_Credits

SONET/SDH Sync (1Gbps+ subrate) Async


IP

MDS9000 FCIP Sync (Metro Eth) Async (1Gbps+)

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 76
Data Center Replication with SAN
Extension

Shared
Sh dDData
ƒ Extend the normal reach of
Cluster or
Remote Host
a Fibre Channel fabric
Access to
Storage Replication
Remote host to target array
Shared data clusters
SAN Extension
Network

FC FC
Replication

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 77
SAN Design for Data Replication
Server
Site A Access

ƒ Servers with two fibre


Replication
channel connections to
FC
Fabrics storage arrays for high
availability
Use of multipath software is
required in dual fabric host
design
DC
Interconnect
Network ƒ SAN extension fabrics
typically separate from
host access fabrics
Replication fabric
FC
Replication
requirements generally
fabrics specified by array vendor
Site B

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 78
Data Center
Disaster Recovery
sample design

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 79
Disaster Impact Radius
Global

Regional
< 400km
Secondary Primary
DR Site Data Center D t Center
Data C t

Metro
< 50km

ƒ Disasters are characterized by Local


their impact 1–2 km
Local metro
Local, metro, regional
regional, global
Fire, flood, earthquake, attack

ƒ Is the backup site within the threat


radius?

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 80
Active/Standby Architecture - Today
CA CA NC
High Availability Site 1 High Availability Site 2 Disaster Recovery Site

Hosts 1 Hosts 2 Hosts 3

HA Cluster(s) Electronic Journaling

Synch CWDM
MDS 9509’s Replication MDS 9509’s MDS 9509’s

Dual OC12
Synch FCIP Asynchronous
Replication FCIP Replication

MDS 9509 MDS 9509 MDS 9509


Gateway Gateway Gateway

Storage 1
Storage 2 Bunker Storage 3

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 81
Frame Based Replication
Data Center 1 Data Center 2
Production
Cluster D/R

MDS DUAL OC12 MDS

SRDF
R2 BCV/R1 BCV
Timefinder Timefinder
PiT SRDF/A
PiT PROD D/R
SRDF/A
PiT Redo Redo
SRDF/A
PiT
Arch Arch

Triple Threat
EMC/DMX EMC/DMX EMC/DMX

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 82
Active/Active Architecture - Tomorrow

Service Locator Group Data Centers


ACE ACNS ACE
User decrypts caches routes
request pages request Clustered
DC2 Backend
Y Active Standby
Active
Active
X Standby Data X
Data Y
Content
Engine
GSS performs Site (DC) selection
according to pre-configured condition, using ACE Requests
FQDN probes directed to
t k
track b k
backup
application application
health

Mirror
Presentation Layer Asynchronous
Replication
DC1
Requests
directed to
primary
application

Clustered Active Standby


Backend Data X Active
Data Y
X Active
Y Standby

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 83
SANTap and Continuous Data Protection

• SANTap Production Servers


• Appliance based storage replication
• Reliable copy of WRITE operations
• SCSI-FCIP communication

• Continuous Data Protection


• Automatic and Continuous Backups CDP
• Time Addressable Storage (TAS) Appliance
• Any Point-in-Time Recovery SAN Tap
• Application based or Network based MDS
SAN

Primary Secondary

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 84
Fabric Based Replication with CDP
Data Center 1 Data Center 2
Production
Cluster D/R

Replication/CDP Replication/CDP
Appliance Appliance
SANTap
DUAL OC12

MDS MDS

D/R
APiT APiT APiT
SRDF/A
PROD BCV
SRDF/A
Redo Redo
APiT APiT SRDF/A APiT
Arch Arch

EMC/DMX TAS/SATA TAS/SATA EMC/DMX

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 85
End-End
End End Data Center Resilience
Corp.
DNS

GSS-1 GSS-2

ACE-1 ACE-2 ACE-3

DC-1 DC-2
DC-3
Web/APP
Server
Farm

DB

IP/Optical Network
CWDM/DWDM

FC

FC FC
Primary Secondary
Location Location
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 86
Summary - Design Details
ƒ Data centers 1 and 2 are in primary location with close
enough distance that can provide DC HA for active/active
access
ƒ Data Center 3 (DR) with > tolerable disaster radius, away
for Primary DC 1 and 2
ƒ Web/App server farms are load balanced geographically
ƒ DB servers are within a geo
geo-HA
HA cluster and running in a
L3 design
ƒ Synchronize
y Data replication
p between data centers within
the primary location
ƒ Asynchronous Data replication is done between the
primary and secondary storage systems

Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 87
Presentation_ID © 2009 Cisco Systems, Inc. All rights reserved. Cisco Confidential 88

You might also like