DREAM: Dynamic Resource Allocation for Software-defined Measurement (SIGCOMM’14) 1 Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat Measurement is Crucial for Network Management Tenant: Netflix Expedia Reddit Anomaly Failure Detection Detection Traffic Traffic Engineering Engineering Management: Accounting Accounting Measurement: Heavy Hitter detection Heavy Hitter Heavy Hitterdetection detection Change detection Change Changedetection detection Network: Motivation Motivation System Algorithm Evaluation 2 High Level Contribution: Flexible Measurement Management: Users dynamically instantiate complex measurements on network state Measurement: DREAM supports the largest number of measurement tasks while maintaining measurement accuracy, by dynamically leveraging tradeoffs between switch resource consumption and measurement accuracy Network: We leverage unmodified hardware and existing switch interfaces Motivation Motivation System Algorithm Evaluation 3 Prior Work: Software Defined Measurement (SDM) Controller Heavy Hitter detection Change detection 3 Update 1 Install rules rules 2 Fetch counters #Bytes=1M #Bytes=5M Source IP: 10.0.1.128/30 10.0.1.130/31 Source IP: 55.3.4.34/31 55.3.4.32/30 Motivation Motivation System Algorithm Evaluation 4 Our Focus: Measurement Using TCAMs Existing OpenFlow switches use TCAMs which permit counting traffic for a prefix Focus on TCAMs enables immediate deployability Prior work has explored other primitives such as hash-based counters Motivation Motivation System Algorithm Evaluation 5 Challenge: Limited TCAM Memory 31 26 Find source IPs sending > 10Mbps 5 13 13 2 00 01 10 Controller 3 11 Heavy Hitter detection 1 Install rules 2 Fetch counters 00 13MB 01 13MB 10 3MB 11 2MB Problem: Requires too many TCAMs 64K IPs to monitor a /16 prefix >> ~4K TCAMs at switches Motivation Motivation System Algorithm Evaluation 6 Reducing TCAM Usage Monitor internal nodes to reduce TCAM usage 31 26 Monitoring 1* is enough because a node with size 5 cannot have leaves >10 5 13 13 2 3 00 01 10 11 31 26 5 13 13 2 3 00 01 10 11 Motivation Motivation System Algorithm Evaluation 7 Challenge: Loss of Accuracy Fixed configuration misses heavy hitters as traffic changes 31 26 13 5 13 2 3 Missed heavy hitters 39 9 4 Motivation Motivation 30 5 15 System 15 Algorithm Evaluation 8 Dynamic Configuration to Avoid Loss of Accuracy 39 9 4 Find leaves >10Mbps using 3 TCAMs 30 5 15 Merge 15 Divide Monitor children to detect HHs but using 2 TCAMs 39 9 30 Monitor parent to save a TCAM 4 5 Motivation Motivation 15 15 System Algorithm Evaluation 9 Reducing TCAM Usage: Temporal Multiplexing Required TCAM changes over time # TCAMs Required Task 1 Task 2 Time Motivation Motivation System Algorithm Evaluation 10 Reducing TCAM Usage: Spatial Multiplexing Required TCAMs varies across switches # TCAMs Required Switch A Switch B Time Only needs more TCAMs at switch A Motivation Motivation System Algorithm Evaluation 11 Reducing TCAM Usage: Diminishing Returns 1 7% Accuracy 0.8 Accuracy Bound 12% 0.6 0.4 0.2 0 256 512 1024 TCAMs 2048 Can accept an accuracy bound <100% to save TCAMs Motivation Motivation System Algorithm Evaluation 12 Key Insight Leverage spatial and temporal multiplexing and diminishing returns to dynamically adapt the configuration and allocation of TCAM entries per task to achieve sufficient accuracy Motivation Motivation System Algorithm Evaluation 13 DREAM Contributions System Supports concurrent instances of three task types: Heavy Hitter, Hierarchical HH and Change Detection Algorithm Dynamically adapts tasks TCAM allocations and configuration over time and across switches, while maintaining sufficient accuracy Evaluation Significantly outperforms fixed allocation and scales well to larger networks Motivation Motivation System Algorithm Evaluation 14 Anomaly detection Traffic engineering Network provisioning Accounting Network visualization DDoS detection Heavy Hitter detection Hierarchical HH detection DREAM Change detection Network Measurement Management DREAM Tasks Motivation Architecture System Algorithm Evaluation 15 DREAM Workflow • Task type • Task parameters • Task filter • Accuracy bound Instantiate task Report Task Instance 1 TCAM Allocation and Configuration DREAM SDN Controller Motivation Task Instance n Configure counters Architecture System Algorithm Fetch counters Evaluation 16 Algorithmic Challenges Dynamically adapts tasks TCAM allocations and configuration configuration configuration over time and across switches, switches while maintaining sufficient accuracy accuracy How to allocate TCAMs for sufficient accuracy? Diminishing Return Which switches to allocate? Temporal Multiplexing How to adapt TCAM configuration on multiple switches? Spatial Multiplexing Motivation System Algorithm Evaluation 17 Dynamic TCAM Allocation Allocate TCAM Why iterative approach? Estimate accuracy Measure Enough for High accuracy Satisfied We cannot know TCAMs the curve every trafficand task instance Thus we cannot formulate a one-shot optimization Not enough TCAMs Low accuracy Unsatisfied 1 Why estimating accuracy? Accuracy 0.8 We don’t have ground-truth Thus we must estimate accuracy 0.6 0.4 0.2 0 256 512 Motivation System 1024 TCAMs 2048 Algorithm Evaluation 18 Estimate Accuracy: Heavy Hitter Detection Precision = True detected HH Detected HHs Is 1 because any detected HH is a true HH Recall = True detected HH True detected + Missed HHs Estimate missed HHs Motivation System Algorithm Evaluation 19 Estimate Recall for Heavy Hitter Detection True detected HH True detected + Missed HHs Recall = Find an upper bound of missed HHs using size and level of internal nodes Threshold=10Mbps With size 26: missed <=2 HHs At level 2: missed <=2 HH 76 26 50 12 5 Motivation 14 7 12 System 15 2 0 Algorithm 35 15 20 15 Evaluation 20 Allocate TCAM Goal: maintain high task satisfaction Fraction of task’s lifetime with sufficient accuracy How many TCAMs to exchange? Large Oscillations Accuracy Accuracy Small Slow convergence Time Motivation System Algorithm Time Evaluation 21 Avoid Overloading Not enough TCAMs to satisfy all tasks Solutions Reject new tasks Drop existing tasks Motivation System Algorithm Evaluation 22 Algorithmic Challenges Dynamically adapts tasks TCAM allocations and configuration over time and across switches, while maintaining sufficient accuracy How to allocate TCAMs for sufficient accuracy? Diminishing Returns Which switches to allocate? Temporal Multiplexing How to adapt TCAM configuration on multiple switches? Spatial Multiplexing Motivation System Algorithm Evaluation 23 Allocate TCAM: Multiple Switches A task can have traffic from multiple switches Controller Heavy Hitter detection 20 HHs A B 30 HHs 10 HHs Local accuracy is important Global accuracy is important If a task is globally unsatisfied, increasing B’s TCAMs is expensive Use both local and global accuracy If a task is globally satisfied, no need to increase A’s TCAMs (diminishing returns) Motivation System Algorithm Evaluation 24 DREAM Modularity Task Independent TCAM Configuration: Divide & Merge Task Dependent Accuracy Estimation TCAM Allocation DREAM Motivation System Algorithm Evaluation 25 Evaluation: Accuracy and Overhead Accuracy Satisfaction of a task: Fraction of task’s lifetime with sufficient accuracy % of rejected/dropped tasks Overhead How fast is the DREAM control loop? Motivation System Algorithm Evaluation 26 Evaluation: Alternatives Equal: divide TCAMs equally at each switch, no reject Fixed: fixed fraction of TCAMs, reject extra tasks Motivation System Algorithm Evaluation 27 Evaluation Setting Prototype on 8 Open vSwitches • 256 tasks (HH, HHH, CD, combination) • 5 min tasks arriving in 20 mins • Accuracy bound=80% • 5 hours CAIDA trace • Validate simulator using prototype Large scale simulation (4096 tasks on 32 switches) • accuracy bounds • task loads (arrival rate, duration, switch size) • tasks (task types, task parameters e.g., threshold) • # switches per tasks Motivation System Algorithm Evaluation 28 Prototype Results: Average Satisfaction DREAM: High satisfaction of tasks at the expense of more rejection for small switches 100 0.8 0.6 0.4 Dream Equal Fixed 0.2 0 DREAM-reject Fixed-reject DREAM-drop 80 512 1024 2048 4096 Switchincapacity # TCAMs Switch % of tasks Satisfaction Satisfaction Average 1 60 40 20 0 512 1024 2048 Switch in capacity # TCAMs Switch 4096 Fixed: High rejection as over-provisions for small tasks Motivation System Algorithm Evaluation 29 Prototype Results: 95th Percentile Satisfaction Satisfaction Satisfaction 95th Percentile DREAM: High 95th percentile satisfaction 1 0.8 0.6 Dream Equal Fixed 0.4 0.2 0 512 1024 2048 4096 #Switch TCAMscapacity in Switch Equal and Fixed only keep small tasks satisfied Motivation System Algorithm Evaluation 30 Conclusion Measurement is crucial for SDN management in a resource-constrained environment Dynamic TCAM allocation across measurement tasks • Diminishing returns in accuracy • Spatial and temporal multiplexing Future work • More TCAM-based measurement tasks (quintiles for load balancing, entropy detection) • Hash-based measurements DREAM is available at github.com/USC-NSL/DREAM 31