vCRIB: Virtual Cloud Rule Information Base Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan HotCloud 2012 Introduction Datacenters use rules rules to to implement implement management management policies • Accessof control action on on aa hypercube hypercube of flow space AnAnaction flow space Dst IP Src IP Examples: • Deny • Normal R1 • Enqueue Introduction Motivation • Rate limiting • Traffic measurement • Traffic Flow engineering fields examples: • Src IP / Dst IP R1=Accept • Protocol SrcIP: 12.0.0.0/8 Port / Dst Port DstIP:• Src 10.0.0.0/16 Architecture Evaluation 2 Conclusion Current Practice Rules are saved on predefined fixed machines R0 1 R4 R0 R4 R0 ToR3 R4 R0 R4 Datacenters have different resource constraints R4 R0 R4 R0 R4 R3 R0 R3 R4 R3 R0 R3 R3 R3 3 R4 R3 2 ToR2 R3 ToR1 ToR3 R3 ToR2 R0 R4 Agg2 Agg1 Machines have limited resources ToR1 R0 R3 R3 Agg2 Agg1 R0 R4 Multiple policies may compete for resources Introduction Motivation Architecture Evaluation 3 Conclusion vCRIB Goal: Flexible Rule Placement Find the best feasible rule placement based on resource constraints Agg2 Agg1 R3 A R0 R4 ToR1 ToR2 R0 R4 Architecture C R3 Introduction Motivation R3 B ToR3 R0 R4 Evaluation 4 Conclusion Future Datacenters will have many fine-grained rules Regulating VM pair communication • Access control (CloudPolice) • Bandwidth allocation (Seawall) 100K – 1M Per flow decision • Flow measurement for traffic engineering (MicroTE, Hedera) 10 – 100M VLAN per server • Traffic management (NetLord, Spain) Introduction Motivation Architecture Evaluation 1M 5 Conclusion Where to place rules? Hypervisors vs. Switches Hypervisor Switch Performance Software, Slow Hardware, Fast Flexibility Complex rules OpenFlow rules Entry point Close to VMs External traffic, Aggregate traffic Resources Limited CPU budget # TCAM entries Introduction Motivation Architecture Evaluation 6 Conclusion Rule Location Trade-off (Resource vs. Bandwidth Usage) Agg2 Agg1 ToR1 ToR2 ToR3 Storing rules at hypervisor incurs CPU processing overhead Introduction Motivation Architecture Evaluation 7 Conclusion Rule Location Trade-off (Resource vs. Bandwidth Usage) Agg2 Agg1 ToR1 ToR2 ToR3 Move the the rules rule to switch and Saving at ToR hypervisor usesforward all CPU traffic budget Introduction Motivation Architecture Evaluation 8 Conclusion Can we reduce Open vSwitch CPU usage? CPU=100% CPU=50% # wildcard patterns 1000000 Rules 100000 10000 1000 CPU usage 100 10 1 400 600 800 Wildcard Pattern 1024 # rules The setsame of ignore bits in Handle number ofthe newmask flows with lower R1=Accept, DstIP: 10.0.0.0/16, SrcIP: 12.0.0.0/8 CPU budget 1111111111111111****************, 11111111************************ Introduction Motivation Architecture Evaluation 9 Conclusion Rule Location Trade-off (Resource vs. Bandwidth Usage) Agg2 Agg1 ToR1 ToR2 ToR3 If rule memory is limited in one switch Introduction Motivation Architecture Evaluation 10 Conclusion Rule Location Trade-off (Resource vs. Bandwidth Usage) Agg2 Agg1 ToR1 ToR2 ToR3 Can tradeoff bandwidth within the switch fabric, in addition just move to trading-off the rule tobandwidth another switch between hypervisors and switches Introduction Motivation Architecture Evaluation 11 Conclusion Our Approach: vCRIB, a Virtual Cloud Rule Information Base Flexible Proactive rule placement rule placement at hypervisors abstraction andlayer switches Allow operators to define fine-grained rules without worrying Optimize performance given resource constraints about placement Network State Agg2 Agg1 Rules vCRIB R1 R2 R3 R4 Introduction Motivation ToR1 Architecture ToR2 ToR3 Evaluation 12 Conclusion Challenges: Overlapping Rules Rules Network R1 State R2 R3 R4 vCRIB R1 R2 R3 R4 Introduction Motivation Agg2 Agg1 ToR1 Architecture ToR2 ToR3 Evaluation 13 Conclusion Challenges: Overlapping Rules R3 Introduction R0 R2 ToR1 R1 Dst IP Src IP R1 R2 R3 R4R4 Motivation Architecture Evaluation 14 Conclusion Challenges: Overlapping Rules Partitions rules to reduce overlapping rules dependency R8 R5 R7 R6 R3 R0 R2 R1 ToR1 R4 Splitting rules covering multiple partitions causes inflation Introduction Motivation Architecture Evaluation 15 Conclusion vCRIB: Partitioning Recursively cut partitions to create a BSP tree R4 R5 R7 Evaluation R6 R8 Architecture R5 R7 R0 R2 R3 Motivation R4 R3 Introduction R3 Stop whenever a resource at a node is exhausted R0 R2 R1 Smaller partitions • are more flexible to place • match fewer communicating VMs R1 Select a cut that • balances two partitions • creates fewest number of new rules R4 16 Conclusion Challenges: Placement Complexity Constraints • Functionality • Machine resources Goal • Minimize traffic overhead • Minimize delay • Minimize cost of bandwidth usage vs. saved CPU Motivation R5 R7 R6 Architecture T11 T21 ToR1 T22 T23 R8 Introduction R3 Generalized Assignment Problem R1 • Different partition sizes • Different machine capacities • Different traffic overhead for each partition location R0 R2 R4 T32 T33 Evaluation 17 Conclusion vCRIB: Placement (Branch and Bound) Select the largest unassigned partition Place it on a switch/hypervisor • Capable of handling its rules Functionality Resources • Make minimum traffic overhead Introduction Motivation Architecture Evaluation 18 Conclusion vCRIB Architecture vCRIB Manager Partitions Traffic and Topology information Placement R3 R1 Partitioning R1 R0 R2 R0 R2 ToR1 R4 Rules R3 R1 R5 R7 R0 R2 R4 Architecture R6 R8 Motivation R6 R8 Introduction R5 R7 R3 Rule Placement R5 R7 R3 R4 R4 Evaluation 19 Conclusion Evaluation: Goal Can partitioning algorithm achieve small partitions? Can placement algorithm leverage resource availability to decrease traffic overhead? Configuration • 100 VMs per machine • 10K flows (10KB) per machine • ClassBench rules • 1K rule capacity per switches Introduction Motivation Architecture Agg2 Agg1 ToR1 ToR2 Evaluation ToR3 20 Conclusion Evaluation: Partitioning Maximum Partition Size 4K 8K 16K 32K 1000 100 10 1 2.5 5 7.5 10 Hypervisors Rule Capacity (K) Change rule capacitysize to of show the effect Maximum partitions goes of different budgets down asCPU resources increase Introduction Motivation Architecture Evaluation 21 Conclusion Evaluation: Placement Network Traffic (MB) Rnd-4K R3 Agg1R5 R7 Agg2 Agg-4K For machine ToR1 each ToR2 ToR3 select VM addresses from a contiguous IP range 800 750 700 650 600 550 500 2.5 5 7.5 10 Hypervisor Rule Capacity (K) Traffic decreases as make Aggregated addresses resources increase lower traffic overhead Introduction Motivation Architecture No traffic decrease Replication Evaluation 22 Conclusion Conclusion vCRIB provides an abstraction layer for placement of rules in datacenters Places the rules on both hypervisors and switches to achieve the best performance given the resource constraints 23 Introduction Motivation Architecture Evaluation Conclusion Future Work Exploit performance model of hypervisors & switches Online Algorithm adjusting to traffic changes Replication in the partitioning and placement algorithm 24 Introduction Motivation Architecture Evaluation Conclusion vCRIB: Virtual Cloud Rule Information Base Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan HotCloud 2012