100% found this document useful (1 vote)
58 views113 pages

VPC Configuration Best Practices

Uploaded by

yassine
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
100% found this document useful (1 vote)
58 views113 pages

VPC Configuration Best Practices

Uploaded by

yassine
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 113

Data center Operations

and Maintenance
Best Practices

Arvind Durai, Director Solutions Integration


Anis Edavalath, Technical Leader Engineering

BRKDCN-2458
Cisco Webex Teams

Questions?
Use Cisco Webex Teams to chat
with the speaker after the session

How
1 Find this session in the Cisco Events Mobile App
2 Click “Join the Discussion”
3 Install Webex Teams or go directly to the team space
4 Enter messages/questions in the team space

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 3
Arvind Durai Anis Edavalath

– 20 years with Cisco Advanced Services - 7 years with Cisco Advanced Services
– Has worked with 100+ customers in enterprise architecture, Enterprise Campus and Datacenter across different verticals
technology designs and operational simplification -Worked 10 years with BU engineering groups in Security , switching,
– 11 years of Active Cisco live presenter datacenter and Network Management products
– Co-authored five Cisco Press Books - Design and deployment of Next Gen Data center architecture
enterprise and cloud customers
Cisco Firewall Services Module, Virtual Routing in the
Cloud, TCL Scripting for Cisco IOS and IP Multicast vol1 - AS team lead for ACI, VxLAN, Tetration, SDA (uniform policy)
&2 - Worked with major telecom vendors and Cloud providers prior to
– CCIE R/S and Security #7016 Cisco
- CCIE Datacenter # 48152

Contributors: Satish Kondalam, Nick Garner Junmei Zhang and many others from the Nexus TME team.

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 4
Course Objective & Goal

• To help Data Center operations and engineering staff understand the operational
best practices when maintaining a Cisco Nexus data center network deployment.

• Attendees should leave the session with a firm understanding of


• Operational Best Practices & next gen tools
• Nexus Graceful Insertion and Removal
• Change Window Best Practices

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 5
Agenda
• vPC and VxLAN Refresher

• Operational Best Practices: Software

• Operational Best Practices: Hardware

• Node Isolation

• NX-OS Graceful Insertion and Removal

• ACI Operational Best Practices

• Data Center operation tool framework & use case demo

• Data Center Behavioral Monitoring - Tetration

• Change Window Best Practices


DC Baseline Refresher
vPC Feature Overview For Your
Reference
vPC Peer
• vPC Terminology Keepalive Link

vPC vPC Domain


Peer
Peer Link

S2
S1
Orphan Port CFS
vPC Member
Port

Failure Scenario
vPC
• If both peers are active, then Secondary vPC peer will
disable all vPCs to avoid Dual-Active.
Orphan
Device • Data will automatically forward down remaining active
S3 port channel ports.

• Loss of in-flight packets will depend on deployment of


vPC best practice.
BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 8
vPC best practice
vPC general deployment best practice

• vPC Domain ID’s


✓ Use a unique vPC domain ID within a contiguous L2 domain to avoid MAC overlap.

• vPC Peer Link QUICK WINS!

✓ Should be point-to-point connection & dedicated links.


HIGH HIGH
• vPC Peer Keepalive Link IMPACT / IMPACT /
HARD TO EASY TO
✓ Dedicate a control plane in a dual-supervisor environment. Use a management switch. IMPLEMENT IMPLEMENT

• vPC peer-gateway
LOW LOW
✓ Acts as active gateway for frames addressed to peer switch. Avoid Peer Link forwarding. IMPACT /
HARD TO
IMPACT /
EASY TO
IMPLEMENT IMPLEMENT
• Use vPC peer-switch
✓ Optimizes BPDU processing, single logical L2 entity

• Distribute port-channel member interfaces across line cards within the same chassis.

• Create a map for oversubscription aligned to current and future demand.


✓ Deployment practice – 20:1 at access and 2:1 at Core.

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 9
VXLAN Overview
Layer 2 overlay on top of your Layer 3 underlay
▪ Each VXLAN Segment is identified by a unique
SPIN
24-bit segment ID called a VXLAN Network
E
Identifier (VNI)
▪ Only hosts on the same VNI are allowed to
L3 UNDERLAY communicate with each other
Unicast/Multicast ▪ Original L2 packet is encapsulated with VXLAN
Routing
header in a UDP->IP->Ethernet
Overcome 4094 VLAN Scale Limitation
LEAF
VTEP A ▪ VLANs use a 10-bit VLAN ID
VTEP B
HOST1
HOST2
Better utilization of available network paths
MAC H1
VLAN 1 → VNI1000
MAC H2 ▪ No need of Spanning Tree (blocks paths)
VLAN 1 → VNI1000 ▪ Utilize L3 underlay network (ECMP, Link Agg,…)
DMAC SMAC Original
DMAC
H2
SMAC
H1
Original
L2 Data
Multi-Tenant with virtualization
H2 H1 L2 Data
▪ Isolation of network traffic by a tenant and
reusability of networking taxonomy for tenancy
Outer VxLAN DMA Origina
Outer VNI SMAC
SIP/DI UDP C l L2
MAC 1000 H1
P Port H2 Data

VTEP A or VTEP B in deployment will be a pair, and this


pair will provide host redundancy for Layer 2 via VPC.
VPC is still NEEDED and VTEP will represent the VPC
pair!
BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 10
VxLAN refresher with BGP EVPN Address Learning
Host C
MAC-C GARP for IP C
IP-C Target MAC: MAC-C
Broadcast, Unicast and Target IP: IP-C
Multicast traffic can use BGP EVPN
MAC-A 10000 VTEP-1-IP
either Multicast group or update 5
IP-A VRF FOO VTEP-1-IP
Ingress replication of MAC-C: VNI
traffic- not covered 10001 MAC-B 10000 VTEP-2-IP
IP-C: VNI 6 IP-B VRF FOO VTEP-2-IP

VTEP-
10001

3
VTEP-3-
Nexthop: IP
VTEP-3-IP VTEP-3- BGP EVPN
BGP EVPN MAC update
Host B
update MAC-B: VNI MAC-B
MAC-A: VNI 10000 IP-B
10000 IP-B: VNI
IP-A: VNI 2 20000
10000 4 Nexthop:
VTEP-1 VTEP-
VTEP-2-IP
Nexthop:
2 3
VTEP-1-IP
VTEP-1- VTEP-2- GARP for IP B
Host A 1 IP IP Target MAC: MAC-B
MAC-A VTEP-1- VTEP-2- Target IP: IP-B
IP-A MAC-B
MAC 10000 VTEP-2-IP MAC
Hosts’ Setup
IP-B VRF FOO VTEP-2-IP MAC-A 10000 VTEP-1-IP
GARP for IP A Vlan 10: VNI IP-A VRF FOO VTEP-1-IP
Target MAC: MAC-A MAC-C 10000 VTEP-3-IP
10000
Target IP: IP-A MAC-C 10000 VTEP-3-IP
IP-C VRF FOO VTEP-3-IP VRF FOO: VNI
20000 IP-C VRF FOO VTEP-3-IP

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 11
vPC Configuration Best Practices
vPC Auto-recovery
Operational
Primary
P S P S
P

S1 S2 S1 S2 S1 S2

S3 S3 S3
1. vPC peer-link down : S2 - secondary shuts all its vPC member ports
2. S1 down : vPC peer-keepalive link down : S2 receives no keepalives

3. After 3 keepalive timeouts, S2 changes role and brings up its vPC


Auto-recovery addresses two cases of single switch behavior P vPC Primary
• Peer-link fails and after a while primary switch (or keepalive link) fails
• Both VPC peers are reloaded and only one comes back up S vPC Secondary

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 12
vPC Configuration Best Practices
Object-tracking
• vPC object tracking, tracks both peer-link and
uplinks in a list of Boolean OR S4 S5
• Object Tracking triggered when the track object
goes down
• Suspends the vPCs on the impaired device.
• Traffic forwarded over the remaining vPC peer.
! Track the vpc peer link
track 1 interface port-channel11 line-protocol
! Track the uplinks
track 2
track 3
interface Ethernet1/1 line-protocol
interface Ethernet1/2 line-protocol S1 S2
! Combine all tracked objects into one.
! “OR” means if ALL objects are down, this object will go down
track 10 list boolean OR
object 1
object 2
object 3

! If object 10 goes down on the primary vPC peer,


S3
! system will switch over to other vPC peer and disable all local vPCs
vpc domain 1
track 10

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 13
VPC Shutdown Feature Configure

PKA

Secondary
Primary
This feature allows customer to manually “isolate” a switch
Vlan 1-100
from vPC domain. This is a vPC configuration option.
Vlan 1-100
Pre-VPC Shutdown VPC Shutdown Behavior Vlan 1-100

• No “shutdown” • Local switch isolated from


command. remote.
• Manual Shutdown • Cannot exit shutdown without De-configure
Required manual intervention. PKA
• Down vPCs • When exiting, PKA, PL, and vPCs

Secondary

Primary
Vlan 1-100
• Down Peer Link will be re-initialized; vPC domain
• vPC Members brought to normal state.
• Etc.
Vlan 1-100 Vlan 1-100

Availability 3k/5k/6k/7k/9k BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 14
VPC Self-isolation Current
▪ Automatically triggered isolation Primary PKA Secondary
▪ Example Presented: All Line Cards Fail
Vlan 1-100
Current Impact Self-isolation feature Behavior
• When this failure When this failure happens: Vlan 1-100
Vlan 1-100
happens on primary, •Physically bring down peer-link
peer-link is brought down. •Physically bring down all vPC legs
• This causes the •Send self-isolation through peer-keep-alive
secondary to bring down Peer switch:
all legs. •Receive self-isolation from the peer through Self-isolation
• Traffic is completely peer-keep-alive Secondary PKA Primary
blocked. •Change role to Primary
•Bring up all down vPC legs
BU Testing Results:
Sub-second Recovery (N>S) (S>N) (E>W)
Vlan 1-100

NOTE: Available in NX-OS 7.2, 5k/6k/7k


BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 15
Stateful Switchover Mode
SSO-Aware and SSO-Compliant Applications
SSO-Compliant Applications SSO-Aware Applications
Redundancy
Facility Forwarding Information Base
Routing Protocols
IEEE 802.1x
NetFlow
Checkpointing PAgP / LACP
Cisco Discovery Protocol
Facility …and more
…and more

Active Supervisor
Standby Hot Supervisor

SSO-Compliant Applications SSO-Aware Applications


Forwarding Information Base
Routing Protocols
Checkpointing IEEE 802.1x
NetFlow
Facility PAgP / LACP
Cisco Discovery Protocol
…and more
…and more Redundancy
Facility

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 16
Routing Protocol Redundancy With NSF (Graceful
Restart)
Active Supervisor Engine Slot 1 Standby Supervisor Engine Slot 2
EIGRP RIB OSPF RIB ARP Table EIGRP RIB OSPF RIB ARP Table

Prefix Next Hop Prefix Next Hop IP MAC Prefix Next Hop Prefix Next Hop IP MAC

10.0.0.0 10.1.1.1 192.168.0 192.168.0.1 10.1.1.1 aabbcc:ddee3 - - - - - -


2
10.1.0.0 10.1.1.1 192.168.55. 192.168.55. - - - - - -
0 1 10.1.1.2 adbb32:d34e4
10.20.0.0 10.1.1.1 3 - - - - - -
192.168.32. 192.168.32.
0 1 10.20.1. aa25cc:ddeee
1 8

FIB Table
SSO FIB Table

Prefix Next HOP Redundancy Facility Prefix Next HOP

10.1.1.1 aabbcc:ddee32 10.1.1.1 aabbcc:ddee32

10.1.1.2 adbb32:d34e43 10.1.1.2 adbb32:d34e43

192.168.0. aa25cc:ddeee8 Checkpoint Facility 192.168.0. aa25cc:ddeee8


0 0

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 17
Routing Protocol Redundancy With NSF (Graceful Restart)
Active Supervisor Engine Slot 1 Standby Supervisor Engine Slot 2
EIGRP RIB OSPF RIB ARP Table EIGRP RIB OSPF RIB ARP Table

Prefix Next Hop Prefix Next Hop IP MAC Prefix Next Hop Prefix Next Hop IP MAC

10.0.0.0 10.1.1.1 192.168.0 192.168.0.1 10.1.1.1 aabbcc:ddee3 - - - - - -


2
10.1.0.0 10.1.1.1 192.168.55. 192.168.55. - - - - - -
0 1 10.1.1.2 adbb32:d34e4
10.20.0.0 10.1.1.1 3 - - - - - -
192.168.32. 192.168.32.
0 1 10.20.1. aa25cc:ddeee
1 8

FIB Table
SSO FIB Table

Prefix Next HOP Redundancy Facility Prefix Next HOP

10.1.1.1 aabbcc:ddee32 10.1.1.1 aabbcc:ddee32

10.1.1.2 adbb32:d34e43 10.1.1.2 adbb32:d34e43

192.168.0. aa25cc:ddeee8 Checkpoint Facility 192.168.0. aa25cc:ddeee8


0 0

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 18
Routing Protocol Redundancy With NSF (Graceful Restart)
Standby Supervisor Engine Slot 2
EIGRP RIB OSPF RIB ARP Table

Prefix Next Hop Prefix Next Hop IP MAC

10.0.0.0
- -
10.1.1.1 -
192.168.0 -
192.168.0.1 -
10.1.1.1 aabbcc:ddee3
-
2
-
10.1.0.0 -
10.1.1.1 192.168.55.
- 192.168.55.
- - -
0 1 10.1.1.2 adbb32:d34e4
-
10.20.0.0 -
10.1.1.1 - - - -
3
192.168.32. 192.168.32.
0 1 10.20.1. aa25cc:ddeee
1 8

FIB Table

Prefix Next HOP

10.1.1.1 aabbcc:ddee32

10.1.1.2 adbb32:d34e43

192.168.0. aa25cc:ddeee8
0
GR/NSF Signaling per protocol

Synchronization per protocol

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 19
Routing Protocol Redundancy With NSR (Stateful Restart)

Active Supervisor Engine Slot 1 Standby Supervisor Engine Slot 2


BGP RIB OSPF RIB ARP Table BGP RIB OSPF RIB ARP Table

Prefix Next Hop Prefix Next Hop IP MAC Prefix Next Hop Prefix Next Hop IP MAC

10.0.0.0 10.1.1.1 192.168.0 192.168.0.1 10.1.1.1 aabbcc:ddee3 10.0.0.0 10.1.1.1 192.168.0 192.168.0.1 10.1.1.1 aabbcc:ddee3
2 2
10.1.0.0 10.1.1.1 192.168.55. 192.168.55. 10.1.0.0 10.1.1.1 192.168.55. 192.168.55.
0 1 10.1.1.2 adbb32:d34e4 0 1 10.1.1.2 adbb32:d34e4
10.20.0.0 10.1.1.1 3 10.20.0.0 10.1.1.1 3
192.168.32. 192.168.32. 192.168.32. 192.168.32.
0 1 10.20.1. aa25cc:ddeee 0 1 10.20.1. aa25cc:ddeee
1 8 1 8

FIB Table
SSO FIB Table

Prefix Next HOP Redundancy Facility Prefix Next HOP

10.1.1.1 aabbcc:ddee32 10.1.1.1 aabbcc:ddee32

10.1.1.2 adbb32:d34e43 10.1.1.2 adbb32:d34e43

192.168.0. aa25cc:ddeee8 Checkpoint Facility 192.168.0. aa25cc:ddeee8


0 0

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 20
Routing Protocol Redundancy With NSR (Stateful Restart)
Standby Supervisor Engine Slot 2
BGP RIB OSPF RIB ARP Table

Prefix Next Hop Prefix Next Hop IP MAC

10.0.0.0 10.1.1.1 192.168.0 192.168.0.1 10.1.1.1 aabbcc:ddee3


2
10.1.0.0 10.1.1.1 192.168.55. 192.168.55.
0 1 10.1.1.2 adbb32:d34e4
10.20.0.0 10.1.1.1 3
192.168.32. 192.168.32.
0 1 10.20.1. aa25cc:ddeee
1 8

FIB Table

Prefix Next HOP

10.1.1.1 aabbcc:ddee32

10.1.1.2 adbb32:d34e43

192.168.0. aa25cc:ddeee8
0
No additional signaling required to maintain topology

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 21
Standalone Chassis Redundant Core
Failure or Change at the Core
Best practices:
• Redundant topologies with equal cost • Layer 3
paths provide sub-second convergence. • Layer 2
• Hardware
• NSF/SSO provides superior availability in X
environments with non-redundant paths.
6

5
Seconds of Lost Voice*

RP Convergence
4 Is Dependent
on IGP and Tuning
3
• Enable BFD for all OSPF
2 neighbor links
• Adjust OSPF spf-throttling
1
timers with:
timers throttle spf
0
Link Node NSF SPF OSPF
timers throttle lsa
timers lsa arrival
Failure Failure SSO Throttle Convergence

* Route scale dependent.


BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 22
Operational Best
Practices
Software
NX-OS High Availability
Process Modularity

SNMP, XML, CLI Management


Layer-2 Protocols Layer-3 Protocols Storage Protocols Other Services
• Independent memory-protected VLAN mgr UDLD OSPF GLBP

Sysmgr, PSS & MTS


VSANs
STP CDP BGP HSRP Zoning

restart-able processes IGMP snp


LACP
802.1X
CTS
EIGRP
PIM
VRRP
SNMP
FCIP
FSPF
IVR
… …

• Service Restart-ability
Future Services
Protocol Stack (IPv4 / IPv6 / L2) Possibilities

• Stateful Restart with Interface Management


Chassis Management
Persistent Storage Service (PSS) Chip/Driver Infrastructure

• Checkpoints states to PSS Kernel


• Recover states from PSS
upon restart.
• Stateful Restart with Graceful Restart
• Recover states based on information from
other services and/or network.
• Mainly Routing Protocols
• Stateless Restart
• Fresh start, no trace of former instantiation.

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 24
In-Service Software Upgrade
Nexus# install all nxos bootflash:nxos.9.2.3.bin
Upgrade and reboot
Initiate stateful failover
Upgrade and reboot
Upgrade I/O modules
Active Standby
Release
Release Release
Release
7.0(3)I7(4)

OSPF

OSPF
9.2(3)
7.0(3)I7(4) 9.2(3)

BGP

BGP
PIM
etc.

PIM
etc.
HA Manager HA Manager
Linux Kernel Linux Kernel

Nexus Data Plane

Best Practice:
Release
Release VPCs should be distributed.
7.0(3)I7(4)
9.2(3)
I/O Module Images

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 25
NX-OS High Availability
Supervisor Switchover

• Stateful Switchover (SSO) Active Sup


• Active-backup supervisors synchronized at all times
Backup Sup
• Routing Protocols: → PSS Stateful Restart

→ NSF Graceful Restart failover


LC - NSF
• Other components: → PSS Stateful Restart

• Triggers:
• HA Policy Initiated – e.g. 3 component crashes → SSO
• User Initiated – system switchover
• ISSU initiated SSO

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 26
NX-OS High Availability
ISSU

Upgrade BIOS on Active, Standby Sup and Linecards


• Dual-supervisor failover only
• ISSU is user initiated: Bring up Standby Sup with new image
• Compatibility Check: show install all impact …
• Through CLI Switchover (Standby takes over as Active)

For N7k: install all kickstart <kickstart image>


system <system image> cmp <cmp image> Bring up old Active Sup with new image
Fir N9k: install all nxos <system image>
Upgrade CMP
• Components upgraded:
• Supervisor: BIOS, System image
Perform HITLESS LC Upgrade (one at a time)
• Linecard: BIOS and Linecard image
• System wide upgrade Upgrade Done

• Single-supervisor ISSU is not possible on the modular n9k.


Service disruption might occur.*

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 27
NX-OS High Availability - innovation
ISSU ISSU on EoR

Enhanced ISSU or LXC ISSU on N9k ToR (Single Sups)

By Creating Virtual instances on Sup and LC


Separate standby sup is brought up inside LXC
6s Control plane down time

Enhanced ISSU or LXC ISSU on N9k ToR

By Creating Virtual instances on Sup and LC


Separate standby syp is brought up inside LXC
6s Control plane down time

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 28
Defect Impact

TAC: You’ve encountered defect CSCxy12345. Belay my last.


It’s operationally impacting and, I’m sorry to say, We have a SMU
there’s no workaround. You’ll need to upgrade. for that.

You: Fine. Let’s just get it fixed. What?


Bill, start up a war room. Gesundheit.
! John, get our AS NCE on the phone.
Sally, schedule testers in two hours.
Where’s my $#@! coffee?
Sally: You know how Richard gets when we call him at 2 AM...

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 29
Software Patching in NX-OS
Who’s familiar with Software Maintenance Updates (SMU)?

Overview Benefits

• Software Patching is Platform Independent • Reduce time to resolution in your network.


• Available on Nexus 9000 (6.1(2)I2) • SMUs in NX-OS build upon years of
• FCS NX-OS 7.2 (5/6/7k) experience in IOS XR.
• Fully supported with ISSU • Simplify customer operations for defect
resolution and code qualification.
• Better utilize the software HA capabilities
of NX-OS.
• Provide a common cross-platform
experience (N9K/N7K/N6K/N5K).

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 30
SMU Lifecycle – CLI
SMU SMU

SMU
Repository
Switch# install add …
Switch# install remove …

SMU Committed Copy to Device


Memory: Process:
Any Nexus
Memory: Process:

Switch# install commit … Chassis


show install active Switch# install activate …
show install committed
Supporting
show install inactive
Patching
SMU Removed
show install packages
Memory: Process: SMU Applied
show install pkg-info … Memory: Process:

Switch# install deactivate … Switch# install commit …


SMU Committed
Memory: Process:

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 31
Patching Highlights

• Patching is for operationally impacting bugs


SMU Types
without a workaround.
• Restart: Restarts affected
• Cannot patch to next release.
process
• Process restarted in all
• Patching is done in default/admin VDC and
VDCs where running.
applies to all VDCs.
• ISSU SMU:
• Patching is not available per-VDC.
• Dual Sup -> ISSU
• Single Sup -> Reload
• ISSU will work with all, or a subset of patches
applied.
• You don’t need to apply all patches.

• Some SMUs may only have a single fix,


others may have multiple packaged.

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 32
Patching Highlights

• SMUs are TAC supported.


• SMUs are synched to standby supervisor.
• On Sup replacement, patch(es) will be synchronized.
• SMUs are not for feature implementation. A SMU cannot change
the configuration.

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 33
Operational
Best Practices
Hardware
Maintenance
Hardware Maintenance NX-OS >= 6.1: Parallel
EPLD Upgrades!
Electronic Programmable Logic Device Upgrade Example
The following example upgrades the EPLD image for module 1. The EPLD
image should be local when the upgrade is performed.
This procedure is typically not required during an NX-OS upgrade.
n7000# install module 1 epld bootflash:n7000-s1-epld.4.0.1.img
EPLD upgrades are intrusive and
EPLD image file , built on Mon Mar 31 10:31:48 2008 may take up to 30 minutes per
EPLD Curr Ver New Ver module!
-------------------------------------------------------
Power Manager 4.1 5.3
IO 2.6 2.10 The “install” command highlights the
Forwarding Engine 1.4 1.6
WARNING: Upgrade process could take up to 30 minutes.
EPLD version differences
Module could be powered down and up.

Module 1 will be powered down now!!


Do you want to continue (y/n) ? [n] y
The user is prompted to continue
Module 1 EPLD upgrade is successful.

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 35
Hardware Maintenance
• Scenario: Line Card Hardware Upgrade or Replacement

• Power down line card prior to removal.


Nexus# out-of-service module <module-number>
• Hitless with VPC provided sufficient bandwidth and port-channel distribution.
• Mixed line card deployment between VPC peers is not supported.
NOTE: Evaluate the VDC interface assignments to verify which
VDCs will experience a service impact. However:
#conf t
Line card support matrix:
#vpc domain <id>
http://www.cisco.com/c/dam/en/us/td/docs/switches/datacenter/nexus7000/sw/matrix/technical/reference/Module_Comparison_Matrix.pdf
#bypass module-check
Not BP, only corner
case, change window.

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 36
Hardware Maintenance
Scenario: Chassis Hardware Upgrade
• Bring switch being replaced into Graceful Insertion and Removal mode or
manually isolate prior to power down.
• Gas up your fork lift.

Scenario: Fabric Module Hardware Upgrade


• Don’t oversubscribe the fabric when replacing fabric modules.
• n7000# show hardware fabric-utilization
Scenario: Power Supply Hardware Upgrade
• Online Insertion and Removal (OIR) is supported.
• Be mindful of power budget.
• n7000# show environment power
BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 37
NX-OS
Graceful Insertion
and Removal
Protocol Isolation in Nexus
• IGPs

OSPF IS-IS EIGRP

Option 1 Advertise as Stub Router, Advertise as with LSP Database Manipulate Metrics
Isolate LSInfinity Overload Bit set interface e1/1
max-metric router-lsa [ on- set-overload-bit {always | ip delay eigrp
Recommended startup [ seconds | wait- on-startup {seconds | wait- instance-tag
for bgp tag ]] for bgp as-number}} [suppress seconds
[interlevel | external]]
Option 2 router ospf 1 router isis 1 router eigrp 1
Shutdown shutdown shutdown shutdown
Protocol

Option 3 interface e1/1 interface e1/1 interface e1/1


Interface ip ospf shutdown isis shutdown ip eigrp 1 \
Disable shutdown

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 39
Protocol Isolation in Nexus
BGP

Option 1: Advertise prefixes with longer AS path / higher local-preference


switch(config)# route-map prepend
switch(config-route-map)# match as-path 1
switch(config-route-map)# set as-path prepend last-as 3 Recommended
switch(config)# router bgp 65000
switch(config-router)# neighbor 192.168.10.2 remote-as 20
switch(config-router-neighbor)# address-family ipv4 unicast
switch(config-router-neighbor-af)# route-map prepend out

Option 2: Shutdown BGP (Process), Preserve Configuration


router bgp 65010
shutdown
NOTE: This is a not a graceful shutdown such as you would achieve with GSHUT / RFC
6198.

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 40
Nexus 9k/7k/6k 3k/5k/6k/7k/9k Availability

Graceful Removal
router bgp 33
Discontinue advertisement of all prefixes.
isolate
router eigrp 1
isolate Advertises maximum metrics for all K-values.

router ospf 1
isolate max-metric router-lsa
router isis 1
isolate
set-overload-bit

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 41
3k/5k/6k/7k/9k Availability
Nexus 9k/7k/6k/5k
Graceful Insertion
N9372(config)# no system
mode maintenance
Following configuration
will be applied:
• Move the switch from Maintenance mode to
Normal mode. router bgp 33
• Control plane maintained throughout isolation of no isolate
the switch.
router eigrp 1
• Protocols advertise routes only after it is
installed in hardware.
no isolate
router ospf 1
no isolate
router isis 1
no isolate
BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 42
Protocol Isolation in Nexus
• All Protocols

Option 4: System Interface Shutdown


system interface shutdown

For many, this is good enough.


And, easy!

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 43
Graceful Insertion and Removal

OSPF:
max-metric router-lsa
Isolate for
Change Window
VPC:
shutdown
feature ospf
feature vpc Scripting takes time.
It’d be nice to automate this…

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 44
Graceful Insertion and Removal

Change window begins.

vPC vPC

system mode maintenance

One command!
Pre-change System Snapshot
BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 45
Graceful Insertion and Removal

Change window complete.

vPC vPC

no system mode maintenance

One command!
Post-change System Snapshot
BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 46
Graceful Insertion and Removal

▪ Flexible framework providing a comprehensive, systemic method to isolate a node.

▪ Configuration profile foundation in NX-OS

▪ Initial support for:


– vPC/vPC+
– ISIS
– OSPF
– EIGRP
– BGP
– Interface

▪ Per VDC on Nexus 7x00

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 47
Configuration Profiles

• Maintenance-mode profile is applied when entering GIR mode,


• Normal-mode profile is applied when GIR mode is exited.
Automatic Profiles Manual Profiles
• Generated by default • User created profile for maintenance-
• Parses configuration to determine mode and normal-mode
changes going into and out of GIR • Flexible selection of protocols for
• Changes based on base protocol isolation
configuration settings.
• Use: maintenance windows and
• Use: Maintenance Windows isolation during troubleshooting using
preconfigured scripts.

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 48
Enabling Graceful Insertion and Removal
Automatic Profile Generation
N7K-1-Core# show system mode
System Mode : Normal
N7K-1-Core# config
Enter configuration commands, one per line. End with
CNTL/Z.
N7K-1-Core(config)# system mode maintenance

BGP is not enabled, nothing to be done

EIGRP is not enabled, nothing to be done


Generating maintenance-mode profile
OSPF is up..... will be shutdown Progressing...................Done.
OSPF TAG = 100, VRF = default
config terminal System mode operation completed successfully
router ospf 100
shutdown N7K-1-Core# show system mode
end System Mode : Maintenance
N7K-1-Core#
OSPFv3 is not enabled, nothing to be done

ISIS is not enabled, nothing to be done

vPC is not enabled, nothing to be done

Interfaces will be shutdown


Do you want to continue (y/n)? [n] y
NOTE: Custom profile generation
requires “dont-generate-profile”.
BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 49
Enabling Graceful Insertion and Removal
Custom Profile Generation
config-profile maintenance-mode type admin config-profile normal-mode type admin
router bgp 65001 router bgp 65001
isolate no isolate
sleep instance 1 10 sleep instance 1 10
router ospf 100 router ospf 100
isolate no isolate
sleep instance 3 20 sleep instance 3 20
vpc domain 20 vpc domain 20
shutdown no shutdown
system interface shutdown exclude fex-fabric no system interface shutdown

• By default, GIR Mode will automatically generate profiles.


• CLI to disable automatic profile generation: dont-generate-profile
• If you enter GIR mode with automatic profile, it will overwrite your custom profile.

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 50
Graceful Insertion and Removal Mode for
Unplanned Outages
system mode maintenance on-reload reset-reason reason

HW_ERROR-Hardware error,
SVC_FAILURE-Critical service failure,
KERN_FAILURE-Kernel panic,
WDOG_TIMEOUT-Watchdog timeout,
FATAL_ERROR-Fatal error,
MANUAL_RELOAD---Manual reload,
MATCH_ANY-Any of the above reasons,
ANY_OTHER-Any reload reason not specified above.

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 51
Nexus GIR Snapshots

• Used before and after a GIR mode to compare pre/post change operation.

• Snapshots are automatically generated when entering GIR mode.

switch# snapshot create snap1 For testing


Executing show interface... Done
Executing show bgp sessions vrf all... Done
Executing show ip eigrp topology summary... Done
Executing show vpc... Done
Executing show ip ospf vrf all... Done
Feature 'ospfv3' not enabled, skipping...
Snapshot 'snap1' created
Switch#

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 56
Nexus GIR Snapshots Comparison

Nexus# sh snapshots compare before_maintenance after_maintenance switch# show snapshots compare snapshot1 snapshot2 ipv4routes
================================================================================ metric snapshot1 snapshot2 changed
Feature Tag before_maintenance after_maintenance # of routes 33 3 *
================================================================================ # of adjacencies 10 4 *

[bgp]
-------------------------------------------------------------------------------- Prefix Changed Attribute
------ -----------------
[neighbor-id:100.120.1.221] 23.0.0.0/8 not in snapshot2
connectionsdropped 2 **3** 10.10.10.1/32 not in snapshot2
lastflap P1DT21H5M12S **P1DT21H25M47S** 21.1.2.3/8 adjacency index has changed from 29 (snapshot1) to 38
lastread P1DT21H25M12S **PT0S** (snapshot2)
lastwrite P1DT21H25M14S **PT0S**
state Established **Idle**
localport 52737 **0**

{+-}
remoteport 179 **0**
notificationssent 2 **3**
<...>

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 57
Nexus 5k Scenario: Dual-homed FEX w/ VPC
Software Upgrade
Primary Secondary
V1 V1
Overview
vPC
• Highly Redundant Design
• Dual-attached FEX
• Dual-attached Hosts
1 2 3 4

Po1 Po2
How do we upgrade this environment with
minimal disruption?
FEX 102
FEX 101

V1 V1

Po10 Po20

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 58
Nexus 5k Scenario: Dual-homed FEX w/ VPC
Software Upgrade
Image Version
Mismatch with
both FEXs
• Enter GIR Mode on N5k1
Secondary
V2
V1
Primary
V1
Traffic flow through N5k2
vPC
• Upgrade N5k1
• Exit GIR on N5k1
1 2 3 4

Po1 Po2
FEX 102
FEX 101

V1 V1

Po10 Po20

IF Down
IF Up, No Forwarding BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 59
Nexus 5k Scenario: Dual-homed FEX w/ VPC
Software Upgrade
• Manually shut down IF3 on N5k2
Primary Secondary
V2
V1 V1
vPC
FEX 101 goes offline.
FEX 101 HIFs go down.

1 2 3 4 FEX 101 starts pairing process with N5k1.


Po1 Po2
FEX 101 upgrades to V2.
FEX 102
FEX 101

V1
V2 V1

Po10 Po20

IF Down
IF Up, No Forwarding BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 60
Nexus 5k Scenario: Dual-homed FEX w/ VPC
Software Upgrade
• Manually shut down IF4 on N5k2
Primary Secondary
V2
V1 V1
vPC
FEX 102 goes offline.
FEX 102 HIFs go down.

1 2 3 4
FEX 102 starts pairing process with N5k1.
Po1 Po2
FEX 102
FEX 101

V1
V2 V1
V2

FEX 102 upgrades to V2.


Po10 Po20

IF Down
IF Up, No Forwarding BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 61
Nexus 5k Scenario: Dual-homed FEX w/ VPC
Software Upgrade
• Enter GIR Mode on N5k2
Secondary
V2
V1
Primary
V1
V2
IF 3 & 4 Still Admin Down
vPC
• Upgrade N5k2
• Exit GIR on N5k2
1 2 3 4 • Manual Up of IF 3 & 4
Po1 Po2
FEX 102
FEX 101

V1
V2 V1
V2
Environment upgrade completed with
minimal traffic disruption.
Po10 Po20

IF Down
IF Up, No Forwarding BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 62
ACI Operational
Practices
SDN ‘with’ FCAPS ‘and’ Automation
Application Centric
Programmable Network
Infrastructure

Turnkey integrated solution with security, centralized


management, compliance and scale Modern NX-OS with enhanced NX-APIs

Automated application centric-policy model with DevOps toolset used for Network Management
embedded security (Puppet, Chef, Ansible etc.)

Custom Script based Operations and Workflows


Broad and deep ecosystem

Fault External Tools


Configuration
Integrated
Accounting
Toolset
Performance External
Tools
Security

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 64
Application Centric Infrastructure (ACI)
Rapid Deployment of Applications onto
Networks with Scale, Security and Full Visibility

Three Tier Application

Web App DB

Application Network Profile

APPLICATION CENTRIC
NEXUS 9500 AND 9300 CONTROLLER
Nexus 9k POLICYPolicy
App Centric APIC

ACI
BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 65
Application Network Profiles (ANP) – what’s that ?

Application Network profiles are a group of EPGs and the


policies that define the communication between them.

Application Network Profile


EPG - WEB EPG - APP EPG - DB

=
POLICY MODEL

Inbound/Outbound Inbound/Outbound
Policies Policies

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 66
Application Network Profiles (ANP) & ACI: how it
works ?

F/W WEB ADC APP DB


ADC

SLA
QoS
APPLICATION STORAGE
CONNECTIVIT SECURITY
Security QOS L4..7 AND
Y POLICY POLICIES
Classification SERVICES COMPUTE

APP PROFILE

HYPERVISOR HYPERVISOR HYPERVISOR

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 67
Abstracting / Mapping via ACI’s Application
Network Profiles Application Network Profile
External
Zone Virtual Machines Docker Containers Bare-Metal Server

DMZ ACI WEB ACI


Trusted APP DB
ACI DB
EXTERNAL Policy
FW
ADC
Zone
Policy
Tier
Policy
SECURITY

HYPERVISOR HYPERVISOR HYPERVISOR

Did you notice ? There is no network device represented here ☺


BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 68
ACI Fabric – Integrated Overlay
Decoupled Identity, Location & Policy

APIC

VTEP VXLAN IP Payload

VTEP VTEP VTEP VTEP VTEP VTEP

• ACI Fabric decouples the tenant end-point address, it’s “identifier”, from the location of
that end-point which is defined by it’s “locator” or VTEP address
• Forwarding within the Fabric is between VTEPs (ACI VXLAN tunnel endpoints) and
leverages an extender VXLAN header format referred to as the ACI VXLAN policy header
• The mapping of the internal tenant MAC or IP address to location is performed by the
VTEP using a distributed mapping database
• The control plane managed by COOP (Council of Oracle Protocols)
BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 69
ACI Network Centric Deployment
Network configuration
WAN/
Corp-L3out Internet
• VRF CORP …. vrf
Tenant: Example-Corp
configuration VRF: Corp

• Interface VLAN 100


192.168.10.0/24), BD: 192.168.10.0/24 BD: 192.168.20.0/24 BD: 192.168.30.0/24
GW: 192.168.10.1 GW: 192.168.20.1 GW: 192.168.30.1
VIP 192.168.10.1, Advertise Ext: Yes Advertise Ext: Yes Advertise Ext: No
VRF corp

• Trunk the switch ports App Prof: App1


EPG: Web
with respective vlans vDS: Corp-VDS01
EPG: App
vDS: Corp-VDS01
EPG: DB
Path: 101/1/1-4
Vlan: dynamic Vlan: dynamic Vlan: 100

• VMware port Group


Assignment vDS

• Routing Configuration Portgroup: Portgroup: Physical Servers


Corp:App1:Web Corp:App1:App
for subnets
VM

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 70
Use case for Complex deployment made simple
Cisco AS DAFE- Deploy ACI from Excel
1) Fill in the excel sheet
“spicy” XML
Templates
2) Select the tasks (ACI Objects)
you want to deploy XML Files

Python Script

Excel Data APIC REST API


POSTs
Worksheets

3) Edit The credentials sheet Credentials


file

4) Run the Script

Automation-srvr$python aci_deploy_fabric_from_excel.py >> output.xml

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 71
Operational Best Practices
• MO Naming Convention • EP Loop Prevention
• Develop and plan the MO(Managed Objects) Naming
Convention according to Organizations best Practice

• Tags and Aliases


• Workaround to Rename Objects
• Objects can be grouped to make query easier
• Tags/Aliases have no functional impact- Where as
Labels have

• BD Level Configuration
• Limit IP Learning to Subnet

• Fabric Wide Configuration


• IP Aging Policy
• AAA Fallback to Local Auth • Disable Remote EP Learning – On Border Leaf
• Fallback domain should be set to local to avoid lockout • Enforce Subnet Check

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 72
Cisco ACI Fabric

Fabric View Controller Connectivity

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 73
Health Score

Aggregated View
Fabric Topology
View
Aggregation of system-wide health, including pod health scores, tenant
health scores, system fault counts domain and type and the APIC cluster
health state.

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 74
Troubleshoot a flow
Use ACI inbuilt Visibility engine

Faults

Eligible Path
Drops

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 75
Troubleshoot a flow
Use ACI inbuilt Visibility engine

Fabric Real Time


Security Traffic
Policies Capture

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 76
Maintenance Upgrade #1
Download the release on the APIC

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 77
Maintenance Upgrade #2
Upgrade APIC

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 78
Maintenance Upgrade #3
Create Groups

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 79
Maintenance Upgrade #4
Upgrade the Maintenance Groups

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 80
Capacity Dashboard
View the capacity of Data center Fabric

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 81
Cisco ACI Deployment Lifecycle
Proactive Preemptive Reactive
• Troubleshoot
• Faults • Monitor • Manage
• Image Management • Audit Logs
• Events
• Config Export / Import • iPing
• Health Score
• Fabric Inventory • iTraceroute
• Atomic Counter
• Show Usage • Endpoint Tracker
• Contract deny logs
• Configuration Rollback • ERSPAN
• Statistics
• Traffic Map
• Capacity Dashboard
• On Demand Counter
View
• CLI option
Recommended Live sessions for ACI :
BRKACI-2210, LTRACI-2143 BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 82
DCNM
DCNM : Functionality

Health Monitoring, Configuration Automation


Inventory &
Diagnostics

Trend Analysis

Visualization & Alert/Notifications Storage Management


Troubleshooting

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 94
DCNM Infrastructure & LAN Fabric Updates

Multi-Fabric Device Packs


Turn-Key Virtual HTML5 GUI Multi-Site
Appliance Topology Driven
Enterprise DB,H/A &
Large Scale

VXLAN FP

Solution Templates Nexus & MDS


& Automation SAN Zoning, Alias, PMon Platforms
Image, Config, Patch, GIR
^POAP = “Power-On Auto Provisioning”
*PM = “Physical Machine” “Infrastructure ++” Updates
BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 85
New Topology
Side-By-Side Views
• Dynamic Arrangement

• Multi-Fabric/Overlay

• Arrange by Tier
• [Core, Ag, Access
Leaf, Spine etc..]
• Metadata Tags

• Show FEX links

• Device Pop-Over
• Side-By Side View

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 87
Demo
DAFE & DCNM
Network Insights
Network Insights Applications

Apps
Proactive Software
Recommendations/Notifications Physical/Logical
Issue Vulnerability Detection & DCNM APIC Network Capacity

Platform
Remediation & Utilization
Data & Control
Plane &
Environmental
Physical/Logical Network Capacity & Utilization Health
Data & Control Plane & Environmental Health
App Hosting Framework App Hosting Framework
App Store App Store

Data collection and ingestion Data correlation and analysis Data visualization and action

Visibility Insights Proactive Troubleshooting


Learn from your network and See problems before Find root cause faster with
recognize anomalies your end users do granular details
Using Network Insights to Deliver Outcomes
Ingest and Process
Telemetry Data Derive Insights
Config File
Syslog Performance
Tech-Support
RIB
FIB Capacity
Accounting Logs
Debug Logs
Encapsulation Tables
Streaming Telemetry
SW Integrity
Environmental
Event History
Topology System Health
Cores
Consistency Checkers
Mac Table
TCAM Tables

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 91
Network Insights Advisor High Level Architecture

Customer Premise Cisco Cloud


Managed App Infra Cluster

NIA-UI TAC services

EoL/EoS
Correlation Engines
PSIRT / Field Notice

Recommendations

Image Repo

Data Collection Internet / WAN Statistics

NIA Web Service

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 92
How does NIA detect known issues?

Hardening
Check

Tech Storage
Support Data Sources Signature Advisory NIA – GUI
and ‘show
Matching Services
run’
Insights DB
collection

Updated periodically with Bugs/PSIRTs


signatures from the cloud detection

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 93
Data Center
Behavioral
Monitoring
Modern Data Centers Are Getting Increasingly
Complex

Big and fast data Hybrid cloud Rapid app deployment

• Increase in east-west traffic • Zero-trust model • Continuous development


• Expanded attack surface • Multicloud orchestration • Application mobility
• Open source • Application portability • Micro services

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 95
Evolving landscape & monitoring
Data Analysis
Data Creation

Real time tools: TAPs,


NETFLOW
Non Real time tools: SNMP,
Syslog, CLI (scripts)

Multiple
collectors

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 96
Need for Data Analytics

10 Second SW Process Push

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 97
Cisco Tetration
Profile and Context Driven Application Segmentation/ behavioral
assessment

1. Real-time Asset Tagging 2. Policy Workflows 3. Policy Enforcement


(Role Based and Hierarchical)
Cisco Tetration Application Insights (ADM) No Need to Tie Policy
+ to IP Address and
Cisco Tetration Sensors Tag and Label-Based Add-on Policy Port
(For Example, Mail Filters)

Cisco Tetration
Customer Defined
Platform Performs the
Translation

Compliance Monitoring Enforcement

Public Cloud Bare Metal Virtual Cisco ACITM* Traditional Network*


BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 98
Pervasive Sensors
Software Sensors Network Sensors Third Party Sources
Available Now Next Generation 9K switches 3rd party Data Sources

Linux VM Asset Tagging

Nexus 9200-X Load Balancers


Windows Server VM

Bare Metal IP Address Management


(Linux and Windows Server)
Nexus 9300- CMDB
Universal* EX/FX
(Basic Sensor for other OS) …

*Note: No per-packet Telemetry, Not an enforcement point ✓ Enforcement Point (Software agents)
✓ Low CPU Overhead (SLA enforced) ✓ Highly Secure (Code Signed, Authenticated)
✓ Low Network Overhead (SLA enforced) ✓ Every Flow (No sampling), NO PAYLOAD

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 99
Data Granularity Needs to Improve
Type of Problems Customers are Looking to
Address
Workload Placement

Service Level Monitoring

ADM

Security and Policy Enforcement

Microburst Detection Traffic Engineering

Capacity Planning

Troubleshooting & Remediation (Self Driving)

On-Change <= 1 sec ~10s sec ~minutes-hours


Resolution = Frequency of Data Collection

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 100
Data Granularity Needs to Improve
Sub Second HW/SW Push – Use case 1

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 101
Application Conversation View – Use Case 2

Application clusters Conversation details


conversation views including process bindings

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 102
Maintenance Windows – Golden Rules
• Change Review Board

• Schedule when environment


will be least impacted.
• Software Staging

• Verify out of band.

• Test! After and before.

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 103
Traditional vPC Environment Change
Change Best Practice & Window
Primary Secondary
Core Isolation
1. Graceful L3 Protocol Isolation
vPC
2. Layer 2 Isolation
• VPC
3. Interface Isolation
Using GIR Mode Steps 1-3 could be achieved prescriptively.
Access Isolation
1. Layer 2 Isolation
• VPC
2. Interface Isolation
1. Fex-fabric (include/exclude)
2. Dual-attached FEX Procedure * Recommended
Fex Using GIR Mode Steps 1-2 could be achieved prescriptively.
NOTE: Maintenance mode consideration should be based on Fex-
fabric connectivity.
If change window is for software upgrade or spot fix, consider ISSU or SMU feasibility.
BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 104
L3 Environment
Change Best Practice & Window
Core Isolation
1. Graceful L3 Protocol Isolation
2. Interface Isolation
Using GIR Mode Steps 1-2 could be achieved prescriptively.

Access Isolation
Layer 3 1. L3 Protocol isolation
2. Layer 2 Isolation
• vPC
3. Interface Isolation
1. Fex-fabric (include/exclude)
2. Dual-attached FEX Procedure * Recommended
Using GIR Mode, prescriptive isolation is possible.
If change window is for software upgrade or spot fix, consider ISSU or SMU feasibility.
BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 105
FabricPath Environment
Change Best Practice & Window

Spine Isolation
1. Use FabricPath IS-IS Overload Bit
Using GIR Mode with isolate configuration, Step 1 could be
achieved prescriptively.

FabricPath Leaf Isolation


1. Use FabricPath IS-IS Overload Bit
2. Shutdown the VPC+ domain.

Using GIR Mode with manual profile, step 1 could be


achieved prescriptively.

If change window is for software upgrade or spot fix, consider ISSU or SMU feasibility.
BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 106
VxLAN EVPN Environment
Change Best Practice & Window Spine Isolation
1. L3 Protocol isolation
• If iBGP EVPN, consider IGP isolation
iBGP RR
• If eBGP EVPN, consider BGP isolation
2. Interface Isolation
Using GIR Mode Steps 1-2 could be achieved prescriptively.
Leaf Isolation
1. L3 Protocol isolation
VxLAN • If iBGP EVPN, consider IGP isolation
• If eBGP EVPN, consider BGP isolation
2. Layer 2 Isolation
• vPC
VTEPs 3. Interface Isolation
1. Fex-fabric (include/exclude)
2. Dual-attached FEX Procedure * Recommended
Using GIR Mode, prescriptive isolation is possible.

If change window is for software upgrade or spot fix, consider ISSU or SMU feasibility.
BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 107
NX-OS 6.x -> 7.x
Use Case
NX-OS 6.x -> 7.x Use Case - Secondary Manual Effort
7k Upgrade

7k • Prerequisites
• Code Staging
• Peer Switch
• Peer Gateway • VPC Best Practices
• Auto-recovery
• L3 Link between • Manual Isolation of Secondary
vPC pairs • Protocol Isolation
• BFD
5k • Routing Protocol • Max-metric LSA, etc. -> No service impact (0-
Convergence 20ms)
Tuning • VPC Isolation
• Down vPCs-> No service impact (0-20ms)
• Down Peer Link-> No service impact

• Reload Upgrade->No service impact


2k • Peer link is brought UP-No Service impact
• South links UP – No Service impact
• North protocol Max-metric LSA removal – UP
–No Service impact
© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public
NX-OS 6.x -> 7.x Use Case Primary Manual Effort
7k Single Supervisor Prerequisites
7k Upgrade ✓ Code Staging
7k • Manual Isolation of Secondary ✓ VPC Best
Practices
• Protocol Isolation
• Max-metric LSA, etc. -> No service impact (0-20ms)
• VPC Isolation
• Down vPCs-> No service impact (0-20ms)
5k • VPC peer priority changes The secondary should have a lower priority
to become the primary incase of flapping.
• Down Peer Link-> No service impact

• Peer link & KPA is brought down & Reload initiated


for Upgrade->No service impact to 0-50ms impact in traffic based
on traffic pattern (this switch comes as secondary)
• Peer link is brought UP-> No Service impact
2k • South links UP ->No Service impact
• North protocol UP ->No Service impact
Note: The System did not have firewall or LB connected directly to it.
BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 110
Summary
Putting It All Together

▪ What to use? GIR Mode? Patching? ISSU? All of them?

Situation Critical Bug Hardware New


Option
Fix & PSIRT Upgrade Features

ISSU ✓ X ✓
GIR + Cold Boot ✓ X ✓
GIR + Disruptive
✓ X ✓
Installer
SMU Restart ✓ X X
GIR + SMU ISSU ✓ X X
GIR X ✓ X

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 112
Summary

• Verify environment conforms to data center networking best


practices.

• Follow the your documented change management process.

• Isolate nodes during maintenance to minimize disruption.


Use GIR Mode where possible to ease isolation configuration.

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 113
Continue Your Education

• Demos in the Cisco campus

• Walk-in Self-Paced Labs

• Lunch & Learn

• Meet the Engineer 1:1 meetings

• Related sessions

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 114
Complete your
online session
survey • Please complete your session survey
after each session. Your feedback
is very important.
• Complete a minimum of 4 session
surveys and the Overall Conference
survey (starting on Thursday) to
receive your Cisco Live t-shirt.
• All surveys can be taken in the Cisco Events
Mobile App or by logging in to the Content
Catalog on ciscolive.com/emea.

Cisco Live sessions will be available for viewing on


demand after the event at ciscolive.com.

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 115
Continue your education

Demos in the
Walk-In Labs
Cisco Showcase

Meet the Engineer


Related sessions
1:1 meetings

BRKDCN-2458 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 116
Thank you

You might also like