Document

SIP Overload Control
Report from the IETF Design Team
Volker Hilt
volkerh@bell-labs.com
Bell-Labs/Alcatel-Lucent
All Rights Reserved © Alcatel-Lucent 2006, #####
Overview
IETF SIP Overload Control Design Team
Simulation Results for SIP
SIP Overload Control
Conclusion
Slide 2 | Volker Hilt | March 2008
IETF SIP Overload Control Design Team
Team was founded beginning of 2007 by SIPPING WG
Members
 Eric Noel, Carolyn Johnson (AT&T Labs)
 Volker Hilt, Indra Widjaja (Bell-Labs/Alcatel Lucent)
 Charles Shen, Henning Schulzrinne (Columbia University)
 Ping Wu*, Tadeusz Drwiega*, Mary Barnes (Nortel)
 Jonathan Rosenberg (Cisco)
 Nick Stewart (British Telecom)
Developed four independent simulation tools
 AT&T, Bell-Labs, Columbia, Nortel*
 Simulation results for SIP
 Proposals and initial results for overload controlled SIP.
* Left design team in fall 2007
Slide 3 | Volker Hilt | March 2008
Simulation Results
Setup and Assumptions
SIP server topology consisting of UAs, edge
proxies and core proxies.
UAs are connected to edge proxies.
 Each UA creates a single call (INVITE
followed by BYE transaction).
 Poisson arrival rate.
 Load is equally distributed across edge
proxies.
Edge proxies forward requests to a core proxies.
 Edge proxies reject a request if one/both
core proxies are overloaded.
 To break up the problem domain we
assume edge proxies have infinite capacity.
Core proxies forward call to the edge proxy of
the destination.
 Capacity: 500 messages per second at a
constant rate.
Server Topology
All proxies are modeled as queuing system.
 Queue size: 500 messages
Media path congestion is not considered.
Slide 4 | Volker Hilt | March 2008
Proxy Model
Simulation Results
Client-to-Server vs. Server-to-Server Communication
Server-to-Server Communication
A server sends a stream of SIP requests to other servers.
SIP request streams between servers are dynamic.
 Load between servers can be reduced gradually by
rejecting/retrying some of the requests.
Overload control can use feedback to request that an
upstream server reduces traffic to a desired amount.
Client-to-Server Communication
UAs typically only initiate a single request at a time.
 A UA can be told to wait a certain time before resending the request.
Feedback-based overload control does not prevent
overload in the server.
Slide 5 | Volker Hilt | March 2008
B
D
C
Server-to-Server
Communication
a
b
c
D
…
Problem: a large number of UAs can cause overload even
if all UAs are told to back-up.
A
z
UA-to-Server
Communication
Simulation Results
Scenario 1: No Overload Control
Assumptions
Proxies do not use any overload control.
When the input buffer of a proxy is filled up, messages are dropped.
Requests that failed at one of the core proxies are not retried.
Sim1
140
Results
Congestion collapse!
Carried load (cps)
The number of INVITE
transactions completed
(i.e., calls set up)
drops to zero.
120
AT&T Sim1 gput
CU Sim1 gput
BL Sim1 gput
100
80
60
40
20
0
0
100
200
300
400
500
Offered load (cps)
Slide 6 | Volker Hilt | March 2008
600
700
800
Simulation Results
Scenario 2: 503 (Service Unavailable) – Reject Requests
Assumptions
Proxies use 503 (Service Unavailable) responses to reject requests during overload.
Watermark-based (Bang-Bang) overload control algorithm:
 Enter overload state when queue length reaches high watermark (400 messages)
 Enter normal state when queue length drops below low watermark (300 messages)
 When in overload state a proxy rejects all incoming requests with 503 responses.
Edge proxies do not
retry requests that are
rejected.
Provides little or no
improvement compared
to no overload control.
 Note: Performance can
be improved by using
other control algorithms.
Congestion collapse!
120
Carried load (cps)
Results
Statefull 503
140
AT&T Sim2 gput
CU Sim2 gput
100
BL Sim2 gput
80
60
40
20
0
0
Slide 7 | Volker Hilt | March 2008
200
400
Offered load (cps)
600
800
Simulation Results
Scenario 3: 503 (Service Unavailable) - Retry Requests
Assumptions
Same assumptions as in Scenario 2.
But: edge proxies retry all requests that have been rejected by one core proxy
at the second core proxy.
 Requests are rejected only if they fail at both core proxies.
Statefull 503
140
Retrying requests
decreases goodput.
 Increased load
caused by retries.
Congestion collapse!
Carried load (cps)
Results
120
AT&T Sim3 gput
100
CU Sim3 gput
BL Sim3 gput
80
60
40
20
0
0
Slide 8 | Volker Hilt | March 2008
200
400
Offered load (cps)
600
800
SIP Overload Control
Mechanisms and Algorithms
The IETF SIP overload design team has started to investigate solutions for more
effective SIP overload control mechanisms.
Current simulations of SIP overload control mechanisms are focusing on:
 Server-to-server communication.
 Feedback channel between core and edge proxy (hop-by-hop).
 Four different overload control mechanisms:
 on/off control,
 rate-based control,
 loss-based control,
 window-based control.
Simulation results available for these four mechanisms and different control
algorithm proposals.
 Contributed by AT&T Labs, Bell-Labs/Alcatel-Lucent, Columbia University
 Initial simulation results.
Slide 9 | Volker Hilt | March 2008
SIP Overload Control
Hop-by-hop vs. end-to-end
SIP requests for the same source/destination pair can
travel along different paths, depending on
 provider policies, services invoked, forwarding rules,
request forking, load balancing, etc.
A SIP proxy cannot make assumptions about which
downstream proxies will be on the path of a SIP request.
C
A
B
D
Hop-by-hop overload control
 Server provides overload control feedback to its
direct upstream neighbor.
Hop-by-hop
 No knowledge about routing policies of neighbors needed.
 Neighbor processes feedback and rejects/retries
excess requests if needed.
x
End-to-end overload control
 Feedback from all servers on all paths between a
source and a destination needs to be considered.
 A server needs to track the load of all servers a request
may traverse on its way to the target.
C
A
B
 Complex and challenging since requests for the same
destination may travel along very different paths.
 May be applicable in limited, tightly controlled
environments.
Slide 10 | Volker Hilt | March 2008
D
x
End-to-end
SIP Overload Control
AT&T Labs Simulation Results
Stateless 503
160
140
Carried load (cps)
120
100
80
Theoretical
AT&T Labs Sim3 no control gput
AT&T Labs Sim3 RetryAfter algo1 gput
AT&T Labs Sim3 Rate algo1 gput
60
40
20
0
0
200
400
600
800
1000
Offered load (cps)
 Every sampling interval, Core Proxies estimate optimal control parameters such
that queueing delay is within a pre-defined target delay. Core Proxies solely rely on
measured offered load and measured internal queueing delay (no Edge Proxies to
Core Proxies signaling).
 On/off control builds upon existing SIP 503 Retry-After capability. Each control
interval, Core Proxies estimate optimal retry after timer value and share with Edge
Proxies within the 503 messages.
 In rate-based control, each control interval, Core Proxies estimate optimal
controlled load and active sources, then share with Edge Proxies either with
dedicated signaling or as overhead in response messages. Edge Proxies execute
percent blocking throttling algorithm.
Slide 11 | Volker Hilt | March 2008
SIP Overload Control
Bell-Labs/Alcatel-Lucent Simulation Results
Loss-based overload control.
Feedback-loop between receiver
(core proxy) and sender (edge proxy).
 Feedback in SIP responses.
Receiver driven control algorithm.
 Estimates current processor
utilization.
 Compares to target processor
utilization.
 Multiplicative increase and decrease
of loss-rate to reach target utilization.
Sender adjusts the load it sends to receiver based on the feedback received
using percent-blocking.
Overload control algorithms:
 Occupancy algorithm (OCC), Acceptance Rate/Occupancy algorithm (ARO)
Slide 12 | Volker Hilt | March 2008
SIP Overload Control
Columbia University Simulation Results
 Window-based overload control
CU Overload Control Results
 SIP session as control unit, dynamically
estimated from processed SIP messages
160
140
 Receiver (Core Proxy in the scenario)
 decreases window on session arrival
 dynamically computes available window
Goodput (cps)
120
100
80
Sim1
Sim3
CU-WIN-II
Theoretical
60
 splits and feedbacks to active Senders
40
 Feedback piggybacked in responses/requests
20
Sim2
CU-WIN-I
CU-WIN-III
0
0
 Sender (Edge Proxy in the scenario)
200
400
600
800
1000
1200
Load (cps)
 forwards a session only if window slot
available
 Three different window adaptation algorithms work equally well in steady state
 CU-WIN-I: keep current estimated sessions below total allowed sessions given target delay
 CU-WIN-II: open up the window after a new session is processed
 CU-WIN-III: discrete version of CU-WIN-I, divided into control intervals
Slide 13 | Volker Hilt | March 2008
Conclusion
Current Status
The SIP protocol is vulnerable to congestion collapse under overload.
 The 503 response mechanism is insufficient to avoid congestion collapse.
 Simulation results confirm problems.
The IETF design team is investigating different mechanisms and algorithms for
SIP overload control.
 Initial simulation results are available.
 Results show stable server-to-server goodput under overload.
Open Issues
 Overload control with uneven distribution of load and fluctuating load
conditions.
 Dynamic arrival and departure of servers.
 Fairness between upstream neighbors.
Comments and suggestions are very welcome!
Slide 14 | Volker Hilt | March 2008