ITRI CCL
Basic Requirements of a Switch Router
What does / can a router do ? What can be done in hardware / software ? How do protocols / standards influence the design ?
CCL/N300; Paul Huang
3/21/99
ITRI CCL
Types of Inter-network Nodes
7. Application 6. Presentation 5. Session 4. Transport 3. Network 2. Data Link 1. Physical Hub Bridge Switch Router Gateway
CCL/N300; Paul Huang
3/21/99
ITRI CCL
Whats the difference? Switch vs Router
Layer Purpose of the Layer Defines user-oriented services such as file transfer, messaging, and transaction processing; provides for structuring applications, coding the data, and exchanging information Delivery of data to applications, division of messages into packets End-to-end communications through one or more subnets; selects optimal routes; controls loops; manages addressing Transfer of frames across a single network link such as a LAN; manages contention Transmission over a physical circuit including physical connectors, bit encoding, etc. Role of Switching Application switching (e.g. e-mail forwarding); gateways between different application types; support for management functions; selection of destination for messages Directs the messages to the specific destination application or protocol type Forwards packets through an interconnected set of networks Controls switched circuits, switched LANs, and recovers from link errors Circuit switching as is used for telephony and port switching for LAN physical media
(7) (6) (5)
Application Presentation Session
(4) (3)
Transport Network
(2)
Data Link
(1)
Physical
CCL/N300; Paul Huang
3/21/99
ITRI CCL
Anatomy of a Node
Application Level API
Application Real Time OS
Kernel Level API Protocol API
Network Protocol
Kernel Level API Driver Specification
NIC Driver
Hardware Interface Transmit Receive
Fast Ethernet NIC
Repeater
Transmit Receive
Another Node
CCL/N300; Paul Huang
3/21/99
ITRI CCL
Fast Ethernet System
NODE
Device
Software
RTOS / Applications Protocol LLC MAC MII Reconciliation PCS PMA PMD AutoNeg Media
RTOS / Applications Protocol Fast Ethernet Standard (802.3u) LLC MAC PCS PMA PMD AutoNeg PCS PMA PMD AutoNeg Media Reconciliation PCS PMA PMD AutoNeg
Network Interface
Hardware
MDI
CCL/N300; Paul Huang
3/21/99
ITRI CCL
Fast Ethernet System
NODE
Device
Software
RTOS / Applications Protocol LLC MAC MII Reconciliation PCS PMA PMD AutoNeg Media Baseband Repeater Unit PCS PCS PMA PMA PMD PMD AutoNeg AutoNeg
RTOS / Applications Protocol Fast Ethernet Standard (802.3u) LLC MAC Reconciliation PCS PMA PMD AutoNeg Media
Network Interface
Hardware
MDI
CCL/N300; Paul Huang
3/21/99
ITRI CCL
Fast Ethernet System
NODE
Device
Software
RTOS / Applications Protocol LLC MAC MII Reconciliation PCS PMA PMD AutoNeg Media L2 Switch PCS PMA PMD AutoNeg PCS PMA PMD AutoNeg
RTOS / Applications Protocol Fast Ethernet Standard (802.3u) LLC MAC Reconciliation PCS PMA PMD AutoNeg Media
Network Interface
Hardware
MDI
CCL/N300; Paul Huang
3/21/99
ITRI CCL
Fast Ethernet System
NODE
Device
Software
RTOS / Applications Protocol LLC MAC MII Reconciliation PCS PMA PMD AutoNeg Media
RTOS / Applications Protocol Fast Ethernet Standard (802.3u) LLC MAC PCS PMA PMD AutoNeg PCS PMA PMD AutoNeg Media Reconciliation PCS PMA PMD AutoNeg
L3 Switch - Router
Network Interface
Hardware
MDI
CCL/N300; Paul Huang
3/21/99
ITRI CCL
Media Access Controller
l l
Determine when a node can transmit a packet Send frames to the PHY for conversion into packets and transmission on the media Receive frames from the PHY and send them to the software that processes frames (protocols and applications). Frame checking
Valid Frames
Frame size between 64 bytes & 1518 bytes Valid frame check sequence (CRC) Even number of octets
Non-valid Frames
Runts: Any frame that is shorter than 64 bytes (512 bits) in size Jabber: Data transmission greater than 400 ms (largest packet: 120.56 ms) Dribble: Invalid number of octets
l
CCL/N300; Paul Huang
Media independent
3/21/99 6
ITRI CCL
IEEE 802.3 CSMA / CD
Media Access Rules
Listen before sending
CSMA Carrier Sense Multiple Access Interpacket Gap (IPG = 96 bit time or 0.96 s for FE)
Backoff
CD Collision Detection Collision domain Collision window Slot time == maximum allowable collision window (512 bit times) l minimum frame size (512 bit / 64 bytes) l maximum network diameter Truncated binary exponential backoff min(N,10) ), where N is the transmit attempt counter l RAND(0, 2 l Integer multiple of 512 bit slot time (i.e. 512, 1024, 1536, 2048, , 4096, etc.) l Maximum backoff time is 5.3 ms.
CCL/N300; Paul Huang
3/21/99
ITRI CCL
CSMA / CD Flow Chart
Transmit Frame Assemble Packet Set Attempt counter to 0
N
Send Jam (32 bits)
Increment attempt counter
Is media busy ?
Other Tx Finished?
Attempts > 16
IPG passed ?
Delay Rest of IPG
Transmit 1st bit of the Packet Compute back-off
Y
Collision ?
Transmit next bit
Delay back-off
N
Tx Completed ?
Frame Tx Failed
CCL/N300; Paul Huang
Frame Tx Success
3/21/99 8
ITRI CCL
Packet Format
Preamble SFD Data Frame EFD
DA
6 Bytes
SA
6 Bytes
L/T
2B
Data
46 ~ 1500 Bytes
FCS
4 Bytes
IP Info
Protocol CHK Dest NA Src NA IP Info
Data
I/G U/L
48 47 45
OUI
24 23
Address (OUA)
0
SFD EFD DA SA L/T FCS I/G U/L OUI OUA
Preamble Start Frame Delimiter End of Frame Delimiter Destination Address Source Address Length / Type Frame Check Sequence Individual / Group Universal / Local Administration Organizationally Unique Identifier Organizationally Unique Address
10101010 10101011 ----
CCL/N300; Paul Huang
3/21/99
ITRI CCL
Wire Speed Efficiency
100.0%
97.5% 97.1% 96.4% 92.9% 86.2% 74.3%
80.0%
60.0%
54.8%
40.0%
32.2% 32.0%
38.1% 31.8% 30.6% 28.5% 24.5% 18.1% 12.6% 9.4% 6.3% 3.1% 28.6% 19.0% 9.5% 3.6% 1.2%
20.0%
0.0%
Data Packet 1500 1262 1006 494 1500 1262 1006 494 1518 1280 1024 512 238 110 238 110 256 128 46 46 64 32 32 64 24 24 64
16 16 64
8 8 64
3 3 64
CCL/N300; Paul Huang
3/21/99
10
ITRI CCL
Network Utilization
Full Utilization 30%
Saturation
Utilization
Best
Ok
Bad
Offered Load
CCL/N300; Paul Huang
3/21/99
11
ITRI CCL
Basic & Worst Case Collision Detection w/ Cat-5 Cable
Repeater (70 bt) Node-to-node
100 m 57 bt
10 m 5.7 bt
1m 0.57 bt
100 m 57 bt
Path Delay Value (rounded to the nearest whole bit time) 126 = 25 + 5.7 + 70 + 0.57 + 25 183 = 15 + 5.7 + 70 +57 + 25 178 = 25 + 0.57 + 70 +57 +25
B
25 bt
A
25 bt
AB AC BC C
25 bt
D
25 bt
CDC 468 = 2 * (25 + 57 + 70 + 57 + 25) Safety margin 4 Bit-time margin 40 = 512 - 472 -4
183 126 96
96 178
178
C A B
222 279
CCL/N300; Paul Huang 3/21/99
400 457
12
ITRI CCL
Worst-case Collision Window w/ Class-I Hub and Fiberoptic Cable
Repeater (70 bt)
134 m 67 bt 10 m 5 bt 134 m 67 bt
Node-to-node
Path Delay Value (rounded to the nearest whole bit time) 254 = 25 + 67 + 70 + 67 + 25 192 = 15 + 67 + 70 + 5 + 25 192 = 25 + 5 + 70 + 67 +25
C
25 bt
AB AC BC
A
25 bt
B
25 bt 254 253
ABA 508 = 2 * (25 + 67 + 70 + 67 + 25) Safety Margin 4 Bit time margin 0 = 512 -508 - 4
254
B A C
192 192 192
192 253
CCL/N300; Paul Huang
384
3/21/99
507
699
13
ITRI CCL
Missed Collision w/ oversized network
Repeater (70 bt)
134 m 67 bt 10 m 5 bt 150 m 75 bt
Node-to-node
Path Delay Value (rounded to the nearest whole bit time) 262 = 25 + 67 + 70 + 75 + 25 192 = 15 + 67 + 70 + 5 + 25 200 = 25 + 5 + 70 + 75 +25
C
25 bt
AB AC BC
A
25 bt
B
25 bt 262 261
ABA 524 = 2 * (25 + 67 + 70 + 75 + 25) Safety Margin 4 Bit time margin -16 = 512 -524 - 4
262
B A C
192 200
192 261
CCL/N300; Paul Huang
392
3/21/99
512 523
14
ITRI CCL
100Base-TX / FX Connection
Twisted Pair
MII
Pin 1
MII
Pair 1
Pin 2 Pin 3
MAC
PMA
PMA
REC
PCS
PCS
Node
+
Pin 6
MAC
PMA
PMA
REC
PCS
PCS
Pair 3
Repeater Unit
MAC
PMA
PMA
REC
PCS
PCS
R R
MAC
PMA
PMA
REC
PCS
PCS
Fiberoptic pair
CCL/N300; Paul Huang 3/21/99 15
ITRI CCL
100Base-T4 Connection
Twisted Pair
MII
Pin 1
MII
Pair 2
Pin 2 Pin 5
+ +
MAC
PMA
PMA
REC
PCS
PCS
Node
Pair 1
Pin 4
Repeater Unit
Pin 7
Pair 4
+
Pin 3
Pin 8
MAC
PMA
REC
PCS
+
Pin 6
PMA
PCS
Pair 3
CCL/N300; Paul Huang
3/21/99
16
ITRI CCL
Repeater Testing
Function
Transmit / Receive event
Data handling: forward packet Receive event handling: carrier sense
Error handling via partition
False carrier events: invalid start-of-stream delimiter l Partition and set LINK UNSTABLE state after two false carrier event l Send jam signal to all other ports on the repeater for 5 s or until end of FCE l Unset LINK UNSTABLE state after detecting no activity for more than 331 s or detecting a valid incoming packet after the the line has been idle for the interpacket gap time of 640 s. Excessive collision: more than 60 collisions in a row l Partition after receiving more than 60 collisions in a row l Clear after detecting activity without a collision for more than 5 s Receiver Jabber: data transmission greater than 400 s (largest packet: 120.56 s) l Clear after jabber stops
CCL/N300; Paul Huang
3/21/99
17
ITRI CCL
Basic Bridge Operation
Receive Frame
MAC Address Table
Output
Input
Lookup SA in Table
Record Address & Port # in Table
Address in Table ?
Ports the same ?
Unicast DA ?
Lookup DA in table & get corresponding port #
Forward Frame to All Ports except the inbound port
Address in Table ?
Forward Frame to DA port #
CCL/N300; Paul Huang
Inbound Port = DA port #
3/21/99
Filter the Frame
18
ITRI CCL
Multiple Bridges
A Alpha 1 D E F 2 B1 3 P Q R B C M N O Beta
B2 2
U Epsilon
Gamma J K L Y
Delta Z
Segment MAC Address Bridge #1 Bridge #2
CCL/N300; Paul Huang
Alpha
Gamma
Beta
Epsilon
Delta
ABCDEF GHIJKL MNOPQR STUVWX YZ 111111 222222 333333 222222 22 111111 111111 111111 333333 22
3/21/99 19
ITRI CCL
Multiple Bridges
A Alpha 1 D E F 2 B1 3 P Q R B C M N O Beta
B3 G H I 1 B2 2 J K L Y Z Delta V W X 3 S T U Epsilon
Gamma
Problems caused by looping
Broadcast storming Learning problems Cloned unicast frames
CCL/N300; Paul Huang 3/21/99
Solution
Spanning Tree Protocol
20
ITRI CCL
Multiple Bridges
A Alpha 1 D E F 2 B1 3 P Q R B C M N O Beta
B3 G H I 1 B2 2 J K L Y Z Delta V W X 3 S T U Epsilon
Gamma
Problems caused by looping
Broadcast storming Learning problems Cloned unicast frames
CCL/N300; Paul Huang 3/21/99
Solution
Spanning Tree Protocol
20
ITRI CCL
Ethernet Switching
Basic techniques
Cut-through
Advantages l low latency Disadvantages l forwards runt & error frames l internal speedup not possible l mixed speeds difficult
Interim Cut-through
Same as CT, but less runt frames
Store & Forward
Advantage l reduces error frames l architecturally flexible Disadvantage l longer latency (not really bad !!)
CCL/N300; Paul Huang 3/21/99 21
ITRI CCL
Router vs. Routing function
l
What does a router do
Routing function
IP packet forwarding Route calculation/ convergence Route management
Router
Conventional stand-alone router performs an IP routing function
Bus based Central CPU Cached forwarding tables Centralized routing tables SW table lookup
Multicast
IP packet duplication Multicast routing
Traffic Mgt. (QoS)
Packet Classification Packet Filtering Queue Management
Calculations required
10 Gbps throughput 64 byte packets = 50 ns / packet < 50 ns to make each routing decision.
Network Mgt. Security
Firewall Authentication
CCL/N300; Paul Huang 3/21/99 22
ITRI CCL
Network Processor
High Performance Low Cost Highly Flexible Fast time-to-market
Poor Flexibility, TTM Good
Evolution to network processors
Software programmable Optimized instruction set for networking Breakthrough performance Switching, routing, and features General Purpose Processor
Network Processor
Customer-specific differentiation
Base level instruction set Empowers the higher level software Addresses all networking markets
Enables high-level functions at the same speed as basic switching wire speed
Custom ASICs
Poor
CCL/N300; Paul Huang 3/21/99
Price / Performance
Good
23
ITRI CCL
So, Whats so hard about switching & routing ?!#
CCL/N300; Paul Huang
3/21/99
24
ITRI CCL
Typical Router Architecture
Security Mgt. Traffic Mgt. Node Mgt. Route Mgt.
Memory CPU Packet Forwarding
Interface Interface Memory CPU Slow path Fast path
Bus / Switch Fabric
* Cache assisted IP routing
Memory CPU
3/21/99 25
CCL/N300; Paul Huang
ITRI CCL
The Easy Part: Basic IP Forwarding
To forward an IP Unicast packet, you need to:
Parse the IP header Lookup the Address in a large table of address prefixes (100,000+ entries) Check the checksum Decrement the TTL and adjust the checksum
This stuff is easy to do at high speed
This is straightforward for ASIC implementation Clever implementers can do OC-192 (or even OC-768) With todays technology, this is not even close to being the bottleneck
CCL/N300; Paul Huang
3/21/99
26
ITRI CCL
So Whats So Hard?
Protocol Application
Operating System
Packet Processing
Switch Fabric
System performance = function of ALL elements A chain is only as strong as its weakest link
CCL/N300; Paul Huang
3/21/99
27
ITRI CCL
So Whats So Hard?
There are things which can make high speed forwarding hard:
Where data flows come together (backplane) Where parallelism is difficult
e.g. Optics, software, protocol
Protocol standards
Unstable or poorly designed or under-defined standards Need mature implementations Multi-lingual Too many standards
Lots of options and alternative paths Maintaining per-packet state that comes and goes
CCL/N300; Paul Huang
3/21/99
28
ITRI CCL
So Whats So Hard?
Proliferation of standards make system implementation hard:
Support for legacy protocol (i.e. Multi-protocol & conformance) Interoperability (i.e. Multi-vendor) Addressing Routing Multicasting Traffic mgt. (QoS) Network mgt. Mobility Security Virtual Private Network
CCL/N300; Paul Huang
3/21/99
29
ITRI CCL
So Whats So Hard?
Reliability, maintainability, redundancy
Hot swappable, Hot standby router Coherent network state Online upgrade Redundancy (power supply, link failure, etc.)
l l
Scalability Additional
Frame translation Load balancing Port mirroring
CCL/N300; Paul Huang
3/21/99
30
ITRI CCL
One Hard Part: The Backplane
Given 100 OC-xx ports, served by 100 line cards, somehow packets have to get between the line cards The design of the switched backplane is non- trivial
If there are n line cards, you have an n*log( n) problem You want very high switch utilization to push performance (this effects where packet is buffered) Power and heat become important Cost of hardware is meaningful
It is easy to lose your QoS guarantees across the switched backplane
CCL/N300; Paul Huang
3/21/99
31
ITRI CCL
ASIC Designers Nightmare: Options
MPLS and IP forwarding Filters (source or destination address, RSVP, ) Tunneling: Encapsulation and decapsulation (particularly if reassembly is needed) Multicast IP Options Multipath (ECMP) NAT (application addresses plus state) IPv6 alternate headers
3/21/99 32
l l l
l l l l l
CCL/N300; Paul Huang
ITRI CCL
What does MPLS do to a router ?
Answer: Provide lots of alternative forwarding paths
<IP> <IP> <IP> <Shim> <Shim>+< IP> <ATM+ shim> <ATM+ shim>+< IP> <ATM+ shim>+< Shim> IP <Shim> + <IP> <ATM+ shim> + <IP> <Shim> ; <Shim>+<Shim>; <ATM+ shim> <IP> (with or without IP lookup) <ATM+ shim>; <Shim>; <IP> (with our without IP lookup) <Shim>
etc
l
This is not popular with hardware developers L
CCL/N300; Paul Huang
3/21/99
33
ITRI CCL
A Hardware Developers View
Generally: Hardware engineers wish that folks who write standards paid attention to hardware issues IP Forwarding can be done very fast, no problem CLNP forwarding can be done very fast, But: Please dont give us so many options! It is clear that IP standards (including IPv6) were designed by folks who dont pay any attention to what it takes to build a fast router? (On the other hand, things are still within a top hardware teams capabilities)
l l l l
CCL/N300; Paul Huang
3/21/99
34
ITRI CCL
The Bottom Line on Speed
l
There are really three bottlenecks:
The switched backplane The optics How much extra complexity and flexibility you want (Filtering, MPLS, options, all make it harder to go fast
It really doesnt matter what the forwarding looks like, if its straightforward and well defined At very high speed, IP, MPLS, ATM, Frame Relay, are all constrained by the same issues are all constrained by the same issues
CCL/N300; Paul Huang
3/21/99
35
ITRI CCL
Can Routers go fast enough ?
l l
There is some limit to how fast routers can go Or, more correctly, there is some limit on how fast electronics can go
Given todays chip technology, and reasonable economics, the limit might be on the order of a few thousand * OC-192 In four years, possibly ditto but * OC-768
Past this point, we need optics
Core switches become WDM switches Very fast, very branchy routers (and ATM switches) become feeders for WDM in the core
CCL/N300; Paul Huang
3/21/99
36
ITRI CCL
Issues affecting Reliability
l
Hardware robustness
Reliable hardware, Redundancy at many levels
Software quality and robustness
e. g., How good is your routing software?
l l l l l l
Protocol Design Protocol Design Response to congestion Response to congestion Failover of links (Sonet- Like failover rates) Network Management Avoid mistakes, Diagnose failures Testing, testing, testing
3/21/99 37
CCL/N300; Paul Huang
ITRI CCL
Key Design Considerations
Are your assumptions reasonable ?
CCL/N300; Paul Huang
3/21/99
38
ITRI CCL
Design Constraints
l
Target: Right product at the right time at the right price
Cost System Market Segment Market Timing (Market window) Specifications
Competition
Advantages & Weaknesses Targeted Market
Resources
Engineering team (Experience / Stability) Management team (Financing / Supportiveness) Standards / Customer / Industry tracking
CCL/N300; Paul Huang
3/21/99
39
ITRI CCL
What is Required ??? (Perceived vs. Real)
l
Optimizing Performance:
Wire-speed switching at Layer 2 Wire-speed forwarding at Layer 3
l l l l
Minimize Latency:
Cut-through switching vs. Store-and-forward
Increased Scalability: SOHO, Departmental, Enterprise, Backbone Maximize Integration: Multi-chip vs. Single chip solution Increased Functionality:
VLAN (Port, MAC, IP, IEEE 802.1q Tagging, etc.) Port Trunking / Port Snooping Support Layer 3, Layer 4, , Layer 7 Support IP, IPX, SNA, Support IEEE 802.3x flow control, jamming CoS / QoS / RSVP / SBM / Differentiated Service
Multiple loss / delay queues per VC queueing
CCL/N300; Paul Huang
3/21/99
40
ITRI CCL
Are these assumptions reasonable ?
l l l l l
Maintain multicast / unicast packet sequence. Multicast packet needs to switched at the same time Support 8 k / 16 k / 32 k MAC addresses. Support 8 k / 16 k / 32 k IP addresses. Support full SNMP / RMON statistic collection.
CCL/N300; Paul Huang
3/21/99
41
ITRI CCL
Key Design Considerations
Can you successfully overcome todays technology limitations ?
CCL/N300; Paul Huang
3/21/99
42
ITRI CCL
Technology Assumptions
Memory speeds, size, types
DRAM, SRAM, SDRAM, SSRAM, Rambus, NetRAM
Semiconductor technology
Dimension: 0.8 m, , 0.35 m, 0.25 m, 0.18 m, etc. Power: 5 V, 3.3 V, 2.5 V, etc. Embedded Memory
Design Tools
Simulation: RTL level, Behavioral, Cycle-base Layout Capabilities Emulation Technology
CCL/N300; Paul Huang
3/21/99
43
ITRI CCL
Do these assumptions hold ?
l
Freebies
Memory speed / size Silicon cost Computational power
Does these assumption still holds in a hyper-competitive environment?
No, because everybody have access to the same components and semiconductor foundry.
CCL/N300; Paul Huang
3/21/99
44
ITRI CCL
Improved Competitiveness
l
Creating advantages by speeding up design cycle
Increase engineering experience Invest in the state-of-the-art engineering tools
Latest simulation / CAD tools Advance computers SOC Emulation / FPGA hardware
l l
to improve simulation time to reduce potential errors, thus less debugging
CCL/N300; Paul Huang
3/21/99
45
ITRI CCL
Design Goals
Design specifications
CCL/N300; Paul Huang
3/21/99
46
ITRI CCL
Design Specifications
l
System Features
Single chip eight 10/100 Mbps Ethernet ports with RMII interface Provides two 32-bit memory interfaces which support SSRAM Supports a 16-bit CPU interface Statistics collection to support SNMP, RMON-1
Layer 3 Features
Supports wire-speed IP routing (1.2Mpps) with line rate address lookup Supports 10K routes Supports IP Multicast Supports two level of user data priority (Class of Service Support)
Layer 2 Features
Supports IEEE 802.1d bridging and spanning tree algorithm Supports port or IEEE802.1Q compliant tag based VLANs Supports 8K MAC address entries IEEE 802.3x flow control for full duplex operation Supports port snooping
CCL/N300; Paul Huang
3/21/99
47
ITRI CCL
Design Goals
Architecture
CCL/N300; Paul Huang
3/21/99
48
ITRI CCL
Key Common Architectural Components
Forwarding Table Management, Network Management, and System Management
Forwarding Decision
Backplane
Output Link Scheduling
Forwarding Decision
Output Link Scheduling
In high performance systems, the forwarding decision, backplane and output link scheduling must be performed in hardware, while the less timely management and maintenance functions are performed in software.
CCL/N300; Paul Huang
3/21/99
49
ITRI CCL
Architectural Evolution
CPU Memory
CPU Memory
Line Card #1 Line Card #2
Line Card #1 Line Card #2
CPU / Memory
CPU / Memory
Line Card #N
CCL/N300; Paul Huang 3/21/99
CPU / Memory
Line Card #N
50
ITRI CCL
Architectural Evolution
CPU Memory
CPU Memory
Line Card #1 Line Card #2
Forwarding Engine Forwarding Engine
Line Card #1 Line Card #2
Forwarding Engine Forwarding Engine
Crossbar
Line Card #N
CCL/N300; Paul Huang
Forwarding Engine
3/21/99
Line Card #N
Forwarding Engine
51
ITRI CCL
Architectural Renewal w/ Advance Technology
Trade-off
Centralized Hardware Packet Connection-less Shared Bus Transmission Big Pipe Dumb Network Wired vs. Distributed vs. Software vs. Cell vs. Connection-oriented vs. Crossbar vs. vs. vs. vs. Switching Managed BW Intelligent Wireless
Technology Factors
Semiconductor Advances Computing Power (CPU) Memory Size Analog / RF / Optical technology Material Advances Optical transmission
CCL/N300; Paul Huang
3/21/99
52
ITRI CCL
Conceptual Model of L3 Switch
Descriptor Links
CPU
Buffer Mgt. Packet Memory
Routing Table
Forwarding Engine
Packet Control
I/O Scheduler
M A C M A C M A C M A C M A C M A C M A C M A C
CCL/N300; Paul Huang
3/21/99
53
ITRI CCL
Packet Flow (Tetris)
Data Hdr
Hdr
Packet Control
Forwarding Engine
Routing Decision Hdr
Buffer Mgt.
Output Scheduler
Priority / Normal / Multicast Pkt (Ptr & Hdr)
Data
CCL/N300; Paul Huang
Hdr
3/21/99 54
ITRI CCL
Packet Controller
Memory Controller
Memory Interface Module
Packet Memory
Packet FIFO
Scheduler
MAC
CCL/N300; Paul Huang
3/21/99
55
ITRI CCL
Output Scheduler
From Packet Control
Next Read Next Write
Priority
Normal
Multicast
From Buffer Mgt.
Scheduler
Packet FIFO
Header Process
MAC
CCL/N300; Paul Huang
3/21/99
56
ITRI CCL
Buffer Management
Routing Decision
I/O ports, Pkt Location, QoS
Modified IP Header
Tail Maintenance
Free Descriptor
To Output Scheduler
Head Maintenance
Temporary Descriptors
CCL/N300; Paul Huang
3/21/99
57
ITRI CCL
Forwarding Engine
L2 Table Lookup (DA) L2 Learning (SA) L2 Learning (SA)
Route
Port Snooping Map
Port Trunking Map
Header Verification
Multicast Lookup
Unicast Lookup (ARP)
Header Modification
Routing Decision
CCL/N300; Paul Huang
3/21/99
58
ITRI CCL
Design Trade-offs
Packet Memory Design
CCL/N300; Paul Huang
3/21/99
59
ITRI CCL
Variable Length Format
l
Advantage
Simple, no link list required No descriptors required No gaps between packets Easy to debug
Port #1
Port #2
Port #3
Disadvantage
No sharing among ports Fast route decision required Large temporary FIFO required Parity bit or packet length write-back required Look-ahead forwarding not allowed for multicast packets
Variations
Parity bit vs. Packet Length
CCL/N300; Paul Huang
3/21/99
60
ITRI CCL
Fixed Packet Size Format
l
Advantage
Sharing among ports Routing decision relaxed Look-ahead forwarding allowed Small temporary FIFO
Port #1
Port #2
Port #3
Disadvantage
Inefficient for small packets Link list required Difficult to debug
Variations
1536 bytes vs 2048 bytes
CCL/N300; Paul Huang
3/21/99
61
ITRI CCL
Cell Format
l
Advantage
Sharing among ports Efficient for most packets Routing decision relaxed Look-ahead forwarding allowed Small temporary FIFO
Port #1
Port #2
Port #3
Disadvantage
Large descriptor memory required Link list required Complex logic / Longer design cycle Prone to error Very difficult to debug
Variations
64 / 128 / 256 bytes
CCL/N300; Paul Huang
3/21/99
62
ITRI CCL
Design Trade-offs
Buffer Management
CCL/N300; Paul Huang
3/21/99
63
ITRI CCL
Queue Methodology #1
Link List for Port #2 Link List for Port #3 Link List for Port #4 Link List for Free Unicast Descriptor Link List for Free Multicast Descriptor
Link List for Port #1
A2
A3
A5
xx
A4
A8
A9
A1
A6
A7
A10
CCL/N300; Paul Huang
3/21/99
64
ITRI CCL
Queue Methodology #2
Link List for Port #2 Link List for Port #3 Link List for Port #4 Link List for Free Unicast Descriptor Link List for Free Multicast Descriptor
Link List for Port #1
A2
A3
A5
xx
A4
A8
A9
A1
A1
A6
A7
A10
A10
CCL/N300; Paul Huang
3/21/99
65
ITRI CCL
Queue Methodology #3
Link List for Port #2 Link List for Port #3 Link List for Port #4 Link List for Multicast Link List for Free Unicast Descriptor
Link List for Port #1
A2
A3
A5
xx
A1
A4
A8
A10
A9
A6
A7
CCL/N300; Paul Huang
3/21/99
66
ITRI CCL
Queue Methodology #4
Link List for Port #2 Link List for Port #3 Link List for Port #4 Link List for Free Packet Buffer
Link List for Port #1
A2 A4 A9
A3 A8 A1 A6 A10
A5 A1 A7 A10
CCL/N300; Paul Huang
3/21/99
67
ITRI CCL
Design Trade-offs
IP Forwarding
CCL/N300; Paul Huang
3/21/99
68
ITRI CCL
IP Routing
l
Routing Algorithms calculate the routes
Unicast: RIP, OSPF Multicast: DVMRP, PIM, MOSPF, CBT
l l
Routes are converted to table format Route tables are written into memory
Initialization Route updates
Route search looks up forwarding instruction / packet
CCL/N300; Paul Huang
3/21/99
69
ITRI CCL
Route Search Operation
l
l l
Longest Prefix Match Lookup Criteria
# memory access required size of the data structure # instruction required
Routing Table R1 = 0101 R2 = 0101101 R3 = 010110101011 IP = 010101101011 IP = 010110101101
Lookup Methods
Hashing Cache hit CAM Tree search Table lookup CPU search Protocol based (Tagging)
Depth 32
232 leaves (IP Address)
CCL/N300; Paul Huang 3/21/99 70
ITRI CCL
Histogram of Prefix Length Distribution
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
100
CCL/N300; Paul Huang
101
102
3/21/99
103
104
105
106
71
ITRI CCL
Mutated Binary Search on Hashing Table
Waldvogel, et. Al. Scalable High Speed IP Routing Lookup
Hash Table
Length 5 7 12 Hash 01010 0101011 0110110 011011010101
8 10
9 12 11 13
16 18
14 17 19
15
20 24 26
21
22
27
28
30
32
72
CCL/N300; Paul Huang
3/21/99
ITRI CCL
Mutated Binary Search on Hashing Table
Criteria
Memory reference: 2X (Average); 5X (Worst) Memory usage: Lookup time: 1.2 Mbyte 100 ns (Ave); 450 ns (Worst); 2 ~ 10 Mpps
Advantage:
The speed of IP lookup is independent of forwarding table size Relatively few memory access Fast enough to support Gigabit rates
Disadvantage:
Routing update requires the tree to be rebuilt Insertion and deletion of routes from memory table is complex
CCL/N300; Paul Huang
3/21/99
73
ITRI CCL
Direct Table Lookup
P. Gupta, et. al. Routing Lookups in Hardware at Memory Access Speeds
23 24 31
224 Entries
24 31
Next Hop
28 Entries
CCL/N300; Paul Huang
3/21/99
74
ITRI CCL
Direct Table Lookup
Criteria
Memory reference: 2X (Maximum) Memory usage: Lookup time: 33 Mbyte 10 ~ 20 Mpps
Advantage:
Few memory references Enabling pipelined implementation
Disadvantage:
Inefficient memory usage Insertion and deletion of routes from memory table is complex
CCL/N300; Paul Huang
3/21/99
75
ITRI CCL
Conclusion
l
Understand and always keep the Big Picture in mind
Market Technology Brain Power
l l
Be aware of the Hype vs. Reality Remember the KISS Principle
Keep it simple, stupid ! Successful technologies are not about perfection, but about compromise between complexity, performance, ease of deployment and cost
CCL/N300; Paul Huang
3/21/99
76