Load balancing in IP protocols Author: Sunesh Kumra Supervisor: Prof Raimo Kantola Instructor: Michael Zhidovinov Work was carried out: Nokia Networks, Helsinki Thesis number: 1023 – 2004 Presentation Date: Aug 31, 2004 Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Table of Contents • Introduction • Research Problem • Stateless Load Balancer • Stateful Load Balancer • Dynamic Addition and Removal of Nodes • Capacity Based Load Balancing • Overload Control • Conclusion Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Introduction- Context • The diagram below shows a Network Element that is build with many loosely, coupled server nodes. The load balancer is responsible for distributing traffic to these server nodes. Network Element build with loosely coupled CPUs IP traffic switches load balancer database server nodes Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Research Problem – Requirements • The most important functional requirement of the load balancer is to ensure that all the traffic pertaining to one call goes to the same CPS Process • Performance: The LB is he single point of entry in the cluster (NE) and hence has to be fast enough without becoming the bottleneck of the cluster. • Scalability: More nodes can be added to LB (load balancer) at the run time. A load balancer should be able to scale both statically and dynamically. • Awareness of load at the nodes where the traffic is being routed. Ideally, the load balancer must be adaptive. • LB should be able to handle failures of internal nodes. The aim is not to make sure that the LB can handle all kinds of faults, but it should be able to handle basic fault situation such as the case when an internal node crashes. Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Introduction- types of load balancers • Network-Based load balancing : This type of load balancing is provided by IP routers and DNS (domain name servers) that service a pool of host machines. For example, when a client resolves a hostname, the DNS can assign a different IP address to each request dynamically based on current load conditions. • Network-Layer based load balancing : The load balancer may balance the traffic based on the source IP address and/or port of the incoming IP packet. This type of load balancing does not take into account the contents of the packet, so is not very flexible. • Transport-Layer based load balancing : The load balancer may choose to route the entire connection to a particular server. This type of load balancing is very useful if the connections are short-lived and are established frequently. • Application-Layer/Middleware based load balancing :This type of load balancing is performed in the application-layer, often on a per-session or perrequest basis. Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Introduction- classes of load balancers • Non-adaptive load balancer: A load balancer can use non-adaptive policies, such as simple round-robin algorithm, hash-based or randomization algorithm. • Adaptive load balancer: A load balancer can use adaptive policies that utilize runtime information, such as amount of CPU load on the node to determine the server to route the request to. • Load Balancers and Load Distributors are not the same thing. Strictly speaking non-adaptive load balancers are load distributors. Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Research Problem – categories from LB perspective • UDP based protocols • TCP based protocols where each session/call lasts for a very long time. • TCP based protocol where each session/call is short lived or a mix of short and medium duration Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Research Problem – criteria of load balancing – stateful applications • Incase the applications are stateful; the load balancer has to make sure that all the messages pertaining to one call are routed to the same node (This is the most usual case). Notice in the figure below that all messages from the same call (denoted by the same color) end up at the same node. Node 1 Ext 1 LB Ext 2 Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Node 2 Node n Research Problem – criteria of load balancing – stateless applications • Incase the applications are stateless, the load balancer may route the incoming message to any node. It is the responsibility of the application to replicate the call state. We can see in the figure below that the messages from one call (denoted by the same color) end up at different nodes. Node 1 Ext 1 LB Ext 2 Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Node 2 Backend Node n Stateless Load Balancer – LB via NAT • The advantage of the load balancing via NAT is that nodes can run any operating system that supports TCP/IP protocol, internal nodes can use private Internet addresses, and only one externally visible IP address is needed for the load balancer. Node 1 Ext 1 LB Ext 2 Node 2 Node n • The disadvantage is that the scalability of the virtual server via NAT is limited as all the traffic passes through it. Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Stateless Load Balancer- LB using IP Tunneling • In the load balancing using IP tunneling, the load balancer schedules requests to the different nodes, and the nodes return replies directly to the external nodes. Node 1 Ext 1 LB Ext 2 Node 2 Node n srcI P srcI P VIP payload VIP encapsulation payload VIP Internal network payload VIP Decapsulation payload • The original IP packet is encapsulated in another IP packet and directed to a chosen internal node. At the internal node, the packet is decapsulated and the original packet is retrieved. The original packet has the source IP address and port where the packet originated and is used to establish a new connection back to the external node Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Stateless Load Balancer- LB using Direct Routing • Compared to the load balancing using IP tunneling approach, this approach doesn't have tunneling overhead (In fact, this overhead is minimal in most situations), but requires that one of the load balancer's interfaces and the internal nodes interfaces must be in the same physical segment. src MAC src MAC Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra VIP payload VIP Direct routing payload VIP Internal network payload VIP Decapsulation payload Stateful Load Balancer – properties 1/2 • For every call instead of calculating the hash we use Round-Robin algorithm, ensuring an even load distribution. • For every message we have to read/write from/to the Call State machine. Reading from the Call State Machine would be at least twice as many times as writing to it. The Call State Machine may soon become the bottleneck of the load balancer. Call State Machines soon grow to a big size, taking up a lot of memory. Maintaining call state takes a lot of memory. For example in the worst case, if the load balancer is serving 20 000 transactions/second and each transaction has a timeout of 3 minutes then it has to maintain 180 x 20 000 = 3.6 million states at any time. If every state takes 20 bytes then the 68 MB memory is required just for maintaining call-states • The graceful addition and removal of the nodes is also very simple to implement in stateful load balancers. This is because if there were a few nodes added to the cluster, it will not change anything in the Call State Machine for the on-going calls. Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Stateful Load Balancer– properties 2/2 • The stateful load balancer does not scale as well as the stateless load balancer as it has to access a common repository called the Call State machine for reading and writing states. • It is difficult to implement redundancy model in stateful load balancers like hotactive standby. The amount of data to be replicated to the standby node depends on the number of calls served by the load balancer. Without providing redundancy for the load balancer, it becomes the single and biggest point of failure for the cluster. To provide a fault tolerant load balancer the call states need to be replicated to a standby unit, the larger the Routing Table the more the data to replicate. In the example that we considered where every state took 20 bytes to store, we would need to replicate a table of size 68 MB, which is an overhead. To replicate these 20 000 states to the standby unit we need a good internal replication mechanism, because 20 000 x 20 = 390 kilobyte of data would need to be transferred every second. Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Dynamic Addition and Removal of Nodes problem • Typically the stateless load balancer uses the hash-algorithm to route a message. In the following cluster the hash for a certain call ID yields node 1. Node 1 Ext 1 LB Node 2 Ext 2 Node 3 Node 4 • Now if an additional node is removed, for the same call, the hash returns node 3. Node 1 Ext 1 LB Ext 2 Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Node 2 Node 3 Dynamic Removal of a node –1/3 • At startup: Hash Number 0 1 2 3 4 5 6 7 8 9 Service Node 1 1 1 2 2 2 3 3 3 1 Standby Node ID 2 2 2 3 3 3 1 1 1 2 New Node ID - • Node 2 sends a notification to the LB to stop sending new requests to it. It also sends a list of its on-going calls. The LB thus maintains a list of active calls in the node, which has to be taken out of service, gracefully. The LB marks the node 2 as a gray node, a node to which no new calls should be sent, shown in the table above. Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Dynamic Removal of a node –2/3 Hash Number 0 1 2 3 4 5 6 7 8 9 Service Node 1 1 1 2 2 2 3 3 3 1 Standby Node ID 2 2 2 3 3 3 1 1 1 2 New Node ID - • When a request comes to the LB from the outside world and the routingfunction generates 3, which has gray Service Node ID corresponding to it; then the LB checks to see if the Call ID of the incoming request exists in the pending calls for the node. If yes, it sends it to node 2, else it sends it to node 3. • When a response comes to the LB from outside world and the routingfunction generates 3, which has gray Service Node ID corresponding to it; then the LB checks to see if the Call ID of the incoming response exists in the pending calls for the node. If yes, it sends it to node 2, else it sends it to node 3. Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Dynamic Removal of a node –3/3 • When all the ongoing sessions in the node 2 are finished, node 2 sends an event to the load balancer and the load balancer updates the routing table as shown in the table below. Hash Number 0 1 2 3 4 5 6 7 8 9 Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Service Node 1 1 1 1 3 1 3 3 3 1 Standby Node ID 3 3 3 3 1 3 1 1 1 3 New Node ID - Capacity Based Load Balancing – 1/2 • In all the discussion above we assumed that the internal nodes had an equal processing capacity. In reality this may not be the case. For example in a cluster running Diameter, SIP and COPS applications, there could be very easily be a case where some nodes are running all the three protocols, some nodes are just running a dedicated protocol, or yet different combinations. The message is that the load balancer cannot distribute traffic to the internal entities assuming that they have equal traffic-handling capacity. • Assume that today the standard CPU speed is 1600 MHz, and two year later when we want to add more nodes (new hardware) into the cluster, maybe the commonly available CPU speed then is 2400 MHz, then the traffic cannot be evenly distributed amongst the internal nodes because different nodes have different processing capacity. Hence the need for capacity-based load balancer. • Peer Capacity is the parameter of interest for us, for the capacity based load balancer. For example, if a cluster typically has every node with processor with 1600 MHz speed and each node has two processors, and then Peer Capacity may have values from 1 to 4. A value of 1 would mean that the Peer is designed to consume half of one processor and the value of 4 would mean that the Peer should consume both the processors fully. Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Capacity Based Load Balancing – 2/2 • If capacity based load balancer is used, and capacity of Peer 1, Peer 2, Peer 3 and Peer 4 is 1, 2, 3 and 4;then the HashTable is initialized as shown in the following table. All Node ID 0 1 2 3 4 5 6 7 8 9 Service Node 1 2 2 3 3 3 4 4 4 4 Standby Node ID 2 3 3 4 4 4 1 1 1 1 New Node ID - • So capacity based load balancer nicely spreads the traffic by merely changing the population of the HashTable, nothing else is changed. Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Overload Control • The arguments in favor and against doing overload control entirely at the load balancer are given below: • Advantages: • The load balancer is a front door for the cluster. The point of entry is a logical place to make sure that excess traffic does not enters the cluster. • There is no proprietary interface required between the Peers and the load balancer for receiving feedback from the nodes. • Disadvantages: • The processing logic at the load balancer increases and thus would lower it’s performance. • The load balancer would have to keep track of load at the internal nodes, therefore bringing in state to it. • It is not possible to configure the load balancer to use the metrics of overload provided by the nodes. • It is not possible for the load balancer to detect the load at the internal nodes accurately. For example if an internal node is shared such that it is dedicated 20% for COPS, 30 % for Diameter and 50% for SIP. If the load balancer is balancing traffic for, say Diameter and measuring the response time from the Peer to find out how loaded it is, then it might happen that the Diameter Peer starts consuming CPU allocated for other protocols. There is no way that the load balancer can know this. Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Results and Conclusion-1/2 • As IP Telephony becomes more popular and Call Processing Servers become more distributed, the demand for greater scalability and dependability is increasing. Distributed system performance and dependability can degrade significantly, when servers become overloaded by client requests. To alleviate such bottlenecks, load balancer must implement a congestion control algorithm. It should also be possible for the operator or service provider to add extra hardware to the system without interrupting the ongoing traffic. • This paper lists four classes of load balancers for IP traffic, which were NetworkBased load balancer, Network-Layer load balancer, Transport-Layer load balancer or the Application Layer based load balancer. All load balancer should follow in one of the above four categories. • Performance and scalability are the most important requirements for any load balancer. However providing congestion control and the ability to add or remove servers from the load balancer at run time are very important functionalities as well. A load balancer, which can adapt to changing load in the servers or changing topology, is called as an adaptive load balancer. In the absence of the intelligence to adapt to changing conditions, a load balancer should rather be called as load distributor. Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Results and Conclusion-2/2 • While designing a load balancer care should be taken to keep its functionality as simple as possible. It is very important to have clear requirements before designing a load balancer. This is because a few minor requirements can change the way you want to design a load balancer. For example if there is a requirement that a load balancer must be designed to serve multiple clients which have a short-lived connection, then a transport layer or networking layer load balancer may be a suitable choice. However if a requirement states that a load balancer must be designed to serve some clients that have a very long-lived connection, then an application layer load balancer may be a suitable choice. So the approach towards load balancing solution can vary with every small requirement change. • A stateless load balancer has been argued to be better choice than stateful load balancers. A stateful load balancer is easier to design and can provide more flexibility like ease of removing or adding a server to the load balancer and congestion control. • The traffic of any protocol should be distributed without modifying or extending the protocol itself. Even if the interoperability for a protocol is not an aim, then also it should be preferred to have a solution, which involves no modification to the existing protocol. • Before deciding on a load balancer policy all the alternatives should be considered which are stateful or stateless load balancer on either Layer 3, 4 or 7. The load balancer can further be adaptive or non-adaptive. Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra Thank You Load Balancing in IP Protocols.PPT / 14-Aug-2004 / Sunesh Kumra