0% found this document useful (0 votes)
139 views45 pages

UDP High Speed

RBUDP is a UDP-based transport protocol that uses "blasts" of data to maximize throughput while avoiding losses. It estimates available bandwidth and sends data just below this rate. If losses occur, TCP is used to exchange loss reports and lost packets are retransmitted in smaller blasts. SABUL/UDT is an application-level transport library that uses UDP for data and control packets. It employs congestion control techniques like rate control using AIMD, window-based flow control, and selective ACKs to be TCP-friendly in different network conditions. GTP is a transport protocol for data grids that aims to efficiently utilize high BDP wide area networks.

Uploaded by

divyapreethiak
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
139 views45 pages

UDP High Speed

RBUDP is a UDP-based transport protocol that uses "blasts" of data to maximize throughput while avoiding losses. It estimates available bandwidth and sends data just below this rate. If losses occur, TCP is used to exchange loss reports and lost packets are retransmitted in smaller blasts. SABUL/UDT is an application-level transport library that uses UDP for data and control packets. It employs congestion control techniques like rate control using AIMD, window-based flow control, and selective ACKs to be TCP-friendly in different network conditions. GTP is a transport protocol for data grids that aims to efficiently utilize high BDP wide area networks.

Uploaded by

divyapreethiak
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

UDP-based schemes for High Speed Networks

Presented By : Sumitha Bhandarkar


Presented On : 03.24.04
Agenda
• RBUDP
– E. He, J. Leigh, O. Yu, T. A. DeFanti, “Reliable Blast UDP : Predictable High Performance Bulk Data
Transfer”, IEEE Cluster Computing 2002, Chicago, Illinois, Sept 2002.

• Tsunami (No technical resources available)


– http://www.ncne.org/training/techs/2002/0728/presentations/200207-wallace1_files/v3_document.htm

• SABUL/UDT
– H. Sivakumar, R. L. Grossman, M. Mazzucco, Y. Pan, Q. Zhang, “Simple Available Bandwidth Utilization
Library for High-Speed Wide Area Networks”, to appear in Journal of Supercomputing, 2004.
– Y. Gu and R. Grossman, “UDT: An Application Level Transport Protocol for Grid Computing”, Second
International Workshop on Protocols for Fast Long-Distance Networks, February 2004 (PFLDnet 2004).
– Y. Gu and R. Grossman, “UDT: A Transport Protocol for Data Intensive Applications”, IETF DRAFT.
http://bebas.vlsm.org/v08/org/rfc-editor/internet-drafts/draft-gg-udt-00.txt

• GTP
– R.X. Wu and A.A. Chien, “GTP: Group Transport Protocol for Lambda-Grids”, 4th IEEE/ACM
International Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004)

2
TCP based Schemes
The problems

• Slow Startup
• Slow loss recovery
• RTT bias
• Burstiness caused by window control
• Large amount of “control traffic” due to per-packet ack

. 3
RBUDP
• Intended to be aggressive.
• Intended for high bandwidth dedicated or QOS enabled networks - not
for deployment on the broader internet.
• Uses UDP for data traffic and TCP for signaling traffic.
• Estimates available bandwidth on the network using Iperf/app_perf
(NOTE : this requires user interaction ie, NOT automated …)
• Tries to send just below this rate in “blasts” to avoid losses (payload =
RTT * Estimated BW)
• If losses do occur within a “blast”, TCP is used to exchange loss reports
• Lost packets are recovered by retransmitting the lost packets in smaller
“blasts”

4
RBUDP

E. He, J. Leigh, O. Yu, T. A. DeFanti, “Reliable Blast UDP : Predictable High Performance Bulk Data 5
Transfer”, IEEE Cluster Computing 2002, Chicago, Illinois, Sept 2002.
RBUDP
Sample Results (with network bottleneck)

E. He, J. Leigh, O. Yu, T. A. DeFanti, “Reliable Blast UDP : Predictable High Performance Bulk Data 6
Transfer”, IEEE Cluster Computing 2002, Chicago, Illinois, Sept 2002.
RBUDP
Sample Results (with receiver bottleneck)

E. He, J. Leigh, O. Yu, T. A. DeFanti, “Reliable Blast UDP : Predictable High Performance Bulk Data 7
Transfer”, IEEE Cluster Computing 2002, Chicago, Illinois, Sept 2002.
RBUDP
Conclusions

Advantages
• Keeps the pipe as full as possible
• Avoid TCP’s per-packet ack interaction
• Paper provides analytical model- so performance is “predictable”

Disadvantages
• Sending rate needs to be adjusted by the user (no means of automatically
adjusting sending rate in response to the dynamic network conditions)
-Thus the solution is good ONLY in dedicated/QOS supported networks.
• No flow control - a fast sender can flood a slow receiver. Offered
solution is to use app_perf (modified Iperf developed by the authors to
take into account the receiver bottleneck) for bandwidth estimation.
8
Tsunami
• No tech papers. This info is from a presentation at July 2002
NLANR/Internet2 Techs Workshop. Available for download at
http://www.indiana.edu/~anml/anmlresearch.html. Latest version is dated
12/09/02
• Very simple and primitive scheme - NOT TCP-FRIENDLY
• Application level protocol - uses UDP for data and TCP for signaling
• Receiver keeps track of lost packets and requests for retransmission
• So how is this different from RBUDP ?

9
SABUL / UDT
• SABUL (Simple Available Bandwidth Utilization Library) uses UDP to
transfer data and TCP to transfer control information.
• UDT (UDP-based Data Transfer Protocol) uses UDP only for both data
and control information.
• UDT is the successor to SABUL.
• Both are application level protocols available as open source C++
library on Linux/BSD/Solaris and NS-2 simulation modules.

10
SABUL / UDT
• Rate control : for handling dynamic congestion - uses constant rate
control interval (called SYN - set to 0.01 seconds) to avoid RTT bias.
• Window based flow control : used in slow start, to ensure that fast
sender does not swamp a slow receiver, to limit unacknowledged pkts.
• Selective positive acknowledgement (one per SYN) and immediate
negative acknowledgement.
• Uses both packet loss and packet delay for inferring congestion
• TCP Friendly - less aggressive than TCP in low BDP networks ; better
than TCP in higher BDP networks.
• PFLDNet 2004 claim : Orthogonal Design - The UDP based framework
can be used with any congestion control algorithm and the UDT
congestion control algorithm can be ported to any TCP implementation.

11
SABUL / UDT

Y. Gu and R. Grossman, “UDT: An Application Level Transport Protocol for Grid Computing”, PFLDnet2004.
12
SABUL / UDT
Rate Control (AIMD)
• Increase
– If loss rate during the last SYN is less than a threshold (0.1%)
sending rate is increased.
– Old version (SABUL) :
– New version (UDT) :
– Estimated BW calculated using packet-pair technique
– Every 16th data packet and its successor are sent back to back
to form packet pair

– Receiver uses median filter on interval between arrival times of


each packet pair to estimate link capacity
Y. Gu, X. Hong, M. Mazzucco and R. Grossman, “SABUL: A High Performance Data Transfer Protocol “,
Submitted for publication. 13
Y. Gu and R. Grossman, “UDT: An Application Level Transport Protocol for Grid Computing”, PFLDnet2004.
SABUL / UDT
Rate Control (AIMD)

Y. Gu and R. Grossman, “UDT: An Application Level Transport Protocol for Grid Computing”, PFLDnet2004.
14
SABUL / UDT
Rate Control (AIMD)
• Decrease
– increase inter-packet time by 1/8 (or equivalently, decrease sending
rate by 1/9) for one of these conditions -
– if largest lost seq no. in NAK is greater than the largest sent
sequence number when last decrease occurred
– if it is the 2dec_countth NAK since last time the above condition is
satisfied. dec_count is reset to 4 each time the first condition is
satisfied, and incremented by 1 each time the second condition is
satisfied.
– delay warning is received
– Loss information carried in NAK are also compressed, for loss of
consecutive packets.
– No data is sent in the next SYN time after a decrease
– Delay warning is generated by the rcvr based on observed RTT 15
trend
SABUL / UDT
Rate Control (AIMD)
• Flow Control
– Receiver calculates the packet arrival rate (AS) using a median
filter and sends it back with the ACK
– On sender side if the AS value in the ack is greater than 0, then
window is updated as

– During congestion loss reports can be dropped or delayed. If sender


keeps sending new packets, it worsens congestion. Flow control helps
prevent this.
– Flow control also used in the slow start phase
– starts with flow window of 2
– similar to TCP
– only beginning of a new session.
Y. Gu and R. Grossman, “UDT: An Application Level Transport Protocol for Grid Computing”, PFLDnet2004.
16
SABUL / UDT
Timers

• SYN timer - trigger rate control event (fixed at 0.01s)


• SND timer - schedule the data packet sending (updated by rate control
scheme)
• ACK timer - trigger an ACK. (same as SYN interval)
• NAK timer - Used to trigger a NAK. Its interval is updated to the
current RTT value each time the SYN timer is expired.
• EXP timer - Used to trigger data packets retransmission and maintain
connection status. It is somewhat similar to the TCP RTO.

17
SABUL / UDT
Simulation Results

100Mbps/
1ms

1Gbps/
100ms

Y. Gu and R. Grossman, “Using UDP for Reliable Data Transfer over High Bandwidth-Delay Product
Networks”, submitted for publication. 18
SABUL / UDT
Simulation Results

7 concurrent
flows

100Mbps
bottleneck link

Y. Gu and R. Grossman, “Using UDP for Reliable Data Transfer over High Bandwidth-Delay Product
Networks”, submitted for publication. 19
SABUL / UDT
Simulation Results

Y. Gu and R. Grossman, “Using UDP for Reliable Data Transfer over High Bandwidth-Delay Product
Networks”, submitted for publication. 20
SABUL / UDT
Simulation Results

Y. Gu and R. Grossman, “Using UDP for Reliable Data Transfer over High Bandwidth-Delay Product
Networks”, submitted for publication. 21
SABUL / UDT
Real Implementation Results

Y. Gu and R. Grossman, “Using UDP for Reliable Data Transfer over High Bandwidth-Delay Product
Networks”, submitted for publication. 22
SABUL / UDT
Real Implementation Results

1Gbps/
40us

Y. Gu and R. Grossman, “Using UDP for Reliable Data Transfer over High Bandwidth-Delay Product
Networks”, submitted for publication. 23
SABUL / UDT
Real Implementation Results

1Gbps/ 110ms

I-TCP = TCP with


concurrent UDT flows

S-TCP = TCP without


concurrent UDT flows

Y. Gu and R. Grossman, “Using UDP for Reliable Data Transfer over High Bandwidth-Delay Product
Networks”, submitted for publication. 24
SABUL / UDT
Real Implementation Results

Y. Gu and R. Grossman, “Using UDP for Reliable Data Transfer over High Bandwidth-Delay Product
Networks”, submitted for publication. 25
SABUL / UDT
Conclusions

• From one of the SLAC talks 1 - “Looks good, BUT 4*CPU Utilization of
TCP”
• Reordering robustness worse than TCP - all out-of-order packets are treated
as losses. Suggested solution is to delay NAK reports by a short delay.
• All losses are treated as congestion - bad performance at high link error
rates. (Better than TCP though, since it does not respond to each and every
loss event).
• Router queue size is maintained smaller compared to TCP due to less
burstiness.
• Increase algorithm relies on bandwidth estimation - may not be suitable for
links with large number of concurrent flows.

1
http://www.slac.stanford.edu/grp/scs/net/talk03/pfld-feb04.ppt
26
GTP
Group Transport Protocol

• Motivated by the following observations about lambda grids


– Very high speed (1Gig, 10Gig etc.) dedicated links connecting
small number of end points (eg 103, and not 108) and possibly long
delays (eg. 60ms between experimental sites)
– Communication patterns not necessarily just point-to-point ;
multipoint-to-point and multipoint-to-multipoint very likely.
– Aggregate capacity of multiple connections could be far greater
than data handling speed of end system  end point congestion far
more likely than network congestion

27
GTP
Overview

• Receiver-driven (dumb sender, very smart receiver)


• Request-response data transfer model
• Rate-based explicit flow control
• Receiver-centric max-min fair allocation across multiple flows
(irrespective of individual RTTS)
• UDP for data, TCP for control connection.

28
GTP
Framework

R.X. Wu and A.A. Chien, “GTP: Group Transport Protocol for Lambda-Grids”, 4th IEEE/ACM International
Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004) 29
GTP
Framework (cont.)

• Single Flow Controller (SFC) : manages sending data packet requests,


chooses/requests sending rate, manages receiver buffer requirements
• Single Flow Monitor (SFM) : Measures flow statistics such as allocated
rate, achieved rate, packet loss rate, rtt estimate etc, which will be used by
both SFC and CE
• Capacity Estimator (CE) : Estimates flow capacity for each individual
flow based on statistics from SFM
• Max-min Fairness Scheduler : Estimates max-min fair share for each
individual flow

30
GTP
Flow Control and Rate Allocation

• Single Flow Controller (SFC) :


– flow rate adjusted per RTT
– loss proportional-decrease and proportional-increase for rate adaptation

• Capacity Estimator (CE) :


– flow rate adjusted per centralized control interval (default 3*RTTmax)
– Exponential Increase and loss proportional-decrease

R.X. Wu and A.A. Chien, “GTP: Group Transport Protocol for Lambda-Grids”, 4th IEEE/ACM International
Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004) 31
GTP
Flow Control and Rate Allocation (cont.)

• Target rate for each flow is

•Max-min Fairness Scheduler adjusts the target flow rate to ensure max-min
fairness

R.X. Wu and A.A. Chien, “GTP: Group Transport Protocol for Lambda-Grids”, 4th IEEE/ACM International
Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004) 32
GTP
Other Details

• Current implementation expects in-order deliver. Can be augmented in


future for handling out-of-order packets.
• TCP-Friendliness is “tunable” by allocating a fixed share of the total
bandwidth for TCP in the CE
• Currently congestion detection is only loss based. Future work will
augment the algorithm to include delay-based congestion detection.
• Transition management ensures max-min fairness is maintained even
when flows join/leave dynamically.

R.X. Wu and A.A. Chien, “GTP: Group Transport Protocol for Lambda-Grids”, 4th IEEE/ACM International
Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004) 33
GTP
Simulation Results

R.X. Wu and A.A. Chien, “GTP: Group Transport Protocol for Lambda-Grids”, 4th IEEE/ACM International
Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004) 34
GTP
Simulation Results (Cont.)

R.X. Wu and A.A. Chien, “GTP: Group Transport Protocol for Lambda-Grids”, 4th IEEE/ACM International
Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004) 35
GTP
Simulation Results (Cont.)

R.X. Wu and A.A. Chien, “GTP: Group Transport Protocol for Lambda-Grids”, 4th IEEE/ACM International
Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004) 36
GTP
Emulation Results

R.X. Wu and A.A. Chien, “GTP: Group Transport Protocol for Lambda-Grids”, 4th IEEE/ACM International
Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004) 37
GTP
Emulation Results (Cont.)

R.X. Wu and A.A. Chien, “GTP: Group Transport Protocol for Lambda-Grids”, 4th IEEE/ACM International
Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004) 38
GTP
Real Implementation Results

R.X. Wu and A.A. Chien, “GTP: Group Transport Protocol for Lambda-Grids”, 4th IEEE/ACM International
Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004) 39
GTP
Real Implementation Results (Cont.)

R.X. Wu and A.A. Chien, “GTP: Group Transport Protocol for Lambda-Grids”, 4th IEEE/ACM International
Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004) 40
GTP
Real Implementation Results (Cont.)

R.X. Wu and A.A. Chien, “GTP: Group Transport Protocol for Lambda-Grids”, 4th IEEE/ACM International
Symposium on Cluster Computing and the Grid, April 2004. (CCGrid 2004) 41
Questions ???

42
Extra Slides

43
Scatter/Gather DMA
• Optimization for improving network stack processing
• Under normal circumstances, data is copied between kernel and app memory
• This is required because the network device drivers read/write contiguous
memory locations, whereas applications use mapped virtual memory
• When the NIC drivers are capable of scatter/gather DMA a scatter/gather list
is maintained so that the NICS can do direct read/write to the final memory
location where the data is intended to go. The scatter/gather data structure
makes the memory look contiguous to the NID drivers
• All protocol processing is done by reference. Eliminating the memory copy
has shown to improve performance dramatically
• In practice, the process is a little more complicated. At the send side copy-
on-write should be enforced so that packets sent out but not acknowledged are
not overwritten. At the recv side, page borders should be enforced ….
44
Packet Pair BW Estimation
• Two packets of same size (L) are transmitted back to back
• Bottleneck link capacity (C) is smaller than the capacity of all other the
links (by definition)
• Packets face “transmission delay” at the bottleneck link
• As a result at the receiver they arrive with larger inter-packet delay than
when they were sent
• This delay can be used to computer the bottleneck link capacity
• (Makes lots of assumptions. Also will work only with FIFO queuing)

45

You might also like