0% found this document useful (0 votes)
76 views22 pages

BGP, Where Are We Now?: John Scudder and David Ward May 2007

This document summarizes a talk on the current state and future directions of the Border Gateway Protocol (BGP). The talk focuses on BGP performance and stability under different conditions like load and failures. It discusses techniques like route reflection, dampening, and backup path propagation that can help with convergence times. The document outlines areas for potential near-term improvements as well as longer-term fundamental changes and calls for further analysis, definition of metrics, and alignment of costs and benefits.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views22 pages

BGP, Where Are We Now?: John Scudder and David Ward May 2007

This document summarizes a talk on the current state and future directions of the Border Gateway Protocol (BGP). The talk focuses on BGP performance and stability under different conditions like load and failures. It discusses techniques like route reflection, dampening, and backup path propagation that can help with convergence times. The document outlines areas for potential near-term improvements as well as longer-term fundamental changes and calls for further analysis, definition of metrics, and alignment of costs and benefits.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

BGP, where are we now?

John Scudder and David Ward May 2007

Agenda
Trivia Dynamic behavior Convergence properties and problems Convergence/stability work items

Goals and Priorities


BGP Goal: Maximize connectivity of Internet Convergence and stability are subsidiary to this Implication: Priorities

First: fastest service restoration Second: minimize peak load on control plane

Focus
This talk focuses on performance and stability There are other very important aspects
of BGP

Services Operations Weird behaviors (wedgies, etc) Security Policy modeling But we dont have all day

Shalt Nots
BGP uses ASes for loop suppression
and nothing else!

Speaking of overloading things ASes are not locators. No topological significance.

Auto-aggregation appears to be a nonstarter


Even proxy aggregation is tricky, but thats an operational consideration

MP-BGP
BGP carries data for multiple address
families (AFs)
VPNv4 Other things

Not all AFs need to be present on all


routers!

Plain old IP (v4, v6)

VPNs
Often observed that VPN tables larger
than Internet table
True, in aggregate But, not true of any single VPN table No single PE or RR holds all VPN tables Operational challenges to managing

Inherently parallelizable

Some tools to do this, e.g. rt-constrain

BGP dynamic behavior


Confusion even among routing experts Of course, surprising emergent behaviors are possible but important to understand
bounding conditions

BGP and TCP


BGP runs over TCP

Flow control: important implications for dynamics Intuition about TCP is usually wrong

BGP under load


When uncongested, BGP will pass Degradation mode under (CPU)
congestion: state compression updates as fast as they are received
Modulo MRAI, dampening

Adaptive low-pass filter behavior emerges Things slow down, they typically do not melt

BGP under load [2]

BGP adapts to speed of peer

Slow peer gets routes as slow as it wants (with state compression) Fast peer gets routes as fast as it wants Implication: One slow peer does not hinder overall convergence

Update packing

Low prefix/update ratios when not congested but thats fine! High ratios emerge under congestion which is when needed

At least O(n) in the size of the DFZ


table

BGP convergence
Fundamental to how BGP transports routes

But full convergences dont happen


often!

At startup (initial convergence) On rare occasions otherwise

Hard to fix completely but is it


broke?
BGPs biggest, yet least important, problem.

BGP convergence [2]


Techniques to avoid full convergences or to cover them up or to pre-converge by advertising
extra routes

Graceful Restart Nonstop Routing Different flavors of fast reroute

Best-external, multi-path and similar

Route Reflection
RRs hide backup paths Convergence:

State reduction/data hiding Faster convergence Pick one

Reduce RIB sizes (but less than you think) Bad for convergence

Known Algorithmic Deficiencies


Path hunting Nonconverging policies At least O(n) in DFZ size

Path Hunting
Well-known amplification effect Approaches to reduce

Root cause notification Propagation of backup paths

Propagation of Backup Paths


Transit ASes seldom fully partition from each other However, when a single AS-AS link
goes down, border router temporarily loses routes
Due to aggressive data hiding by lesspreferred border routers and RRs

Propagation of Backup Paths [2]


Speculation: many path disturbance events caused by this effect Intra-domain backup propagation feasible today Cost: some additional RIB state within AS Benefit: faster internal convergence and
global stability

Some Possible Tools


**** = under discussion

As-pathlimit **** Aggregate withdraw **** Best-external **** Better instrumentation reusing WRD infra BGP free core (pick your encap) ****

Dampening (with better parameters) **** Multi-path **** Root cause notification BGP - Fast Re-Route **** Better UPDATE packing algorithms/techniques

Moving Forward

Narrow down (or expand!) possible tools list Align costs and benefits

Those who pay, must benefit, or solution will never be deployed Many examples of existing technically-excellent solutions to current problems but problems still exist. Example: BCP-38 Deployment trumps all considerations!

Focus on behavior under load (or making load go away!)

Dampening
Misused in past (we were wrong about default parameters) Heavy contribution of few sites to GH
data suggests very generous parameters which only penalize egregious flappers

Study needed to validate what constitutes egregious

Given parameters, can be turned on


today
Lower-than-low hanging fruit Aligns costs and benefits

Punch Line
BGP not in danger of falling over

Lots of runway

IDR
Near-term improvements

Most cause increased use of router resources

RRG
Fundamental changes, e.g. new routing and addressing architectures

GROW (recharter)
Analysis of routing system

BMWG, IPPM
Define metrics

You might also like