SCREAM: Sketch Resource Allocation for Software-defined Measurement (CoNEXT’15) Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat Measurement is Crucial for Network Management Network Management on multiple tenants: Traffic Anomaly Traffic DDoS Engineering Detection Engineering detection Anomaly Accounting Detection Need fine-grained visibility of network traffic Measurement tasks: Heavyhitter Hitter detection Heavy detection (HH) Heavy Hitter detection Hierarchical heavy hitter detection (HHH) SuperChange sourcedetection detection (SSD) 2 Software Defined Measurement Task 1 Task 2 DREAM [SIGCOMM’14] / SCREAM [CoNEXT’15] Controller Configure Collect Switch A Task 1 counters Switch B Task 1 counters Task 2 counters Task 2 counters 3 Our Focus: Sketch-based Measurement Summaries of streaming data to approximately answer specific queries E.g., Bitmap for counting unique items OpenFlow Counters DREAM [SIGCOMM’14] Expensive, power-hungry TCAM Sketches SCREAM [CoNEXT’15] Cheaper SRAM Counters Volume counters Volume and Connection counters Flows Selected prefixes All traffic all-the-time Memory Sketches use a cheaper memory and are more expressive 4 Sketch Example: Count-Min Sketch At packet arrival: (IP, 1 Kbytes) h1(IP) h2(IP) 2+1=3 4+1=5 d h3(IP) 1+1=2 At query: What is the traffic size of IP? = row with min collision = Min(3,5,2) = 2 Resource accuracy trade-off: Provable error bound given traffic properties (e.g., skew) 5 Challenges: Limited Counters for Many Tasks Limited shared resources: • SRAM capacity (e.g., 128 MB) • Shared with other functions (e.g., routing) Too many resources to guarantee accuracy: 1 MB-32 MB per task • Less than 4-128 tasks in SRAM Many task instances: • 3 types (Heavy hitter, Hierarchical heavy hitter, Super source) • Different flow aggregates (Rack, App, Src/Dst/Port) • 1000s of tenants 6 Goal: Many Accurate Sketch-based Measurements Users dynamically instantiate a variety of measurement tasks SCREAM supports the largest number of measurement tasks while maintaining measurement accuracy 7 Approach: Dynamic Resource Allocation Required memory Resource accuracy trade-off depends on traffic Count Min: Provable error bound given traffic properties Ex: Skew of traffic from each IP Worst-case uses >10x counters than average Skew Dynamic allocation for current traffic 8 Opportunity: Temporal Multiplexing Memory requirement varies over time Required Memory Task 1 Task 2 Time Multiplex memory among tasks over time 9 Opportunity: Spatial Multiplexing Memory requirement varies across switches Required Memory Task 1 Task 2 Switch A Switch B Multiplex memory among tasks across switches 10 Key Insight Leverage spatial and temporal multiplexing and dynamically allocate switch memory per task to achieve sufficient accuracy for many tasks • DREAM has the same insight • SCREAM applies it for sketches 11 SCREAM Contributions 2- Allocate memory among sketch-based task instances across switches while maintaining sufficient accuracy SCREAM Dynamic resource allocator Allocation Heavy hitter (HH) tasks Hierarchical heavy hitter (HHH) tasks Super Source (SSD) tasks 1- Supports 3 sketch-based task types • Anomaly detection • Traffic engineering • DDoS detection 12 SCREAM Iterative Workflow Collect & report Counters from many switches Estimate accuracy Accuracy Allocate resources Memory size 13 Allocate resources Give more memory to task1 Allocated Memory (KB) Estimate accuracy Task1 accuracy <80% 50 100 40 80 60 30 40 20 20 10 0 00 50 Allocated Memory (KB) Collect & report Allocated Memory (KB) Precision Accuracy SCREAM Iterative Workflow Task11 Task Task22 Task 20 40 20 40 Time Time (s) (s) 60 60 50 40 40 30 30 20 20 10 10 00 Task11 Task Task22 Task 20 40 20 40 Time Time (s) (s) 60 60 14 SCREAM Iterative Workflow 80 Precision Accuracy Collect & report Merge counters from switches 100 60 40 Task 1 Task 2 20 Allocate resources Give more memory to task2 Allocated Memory (KB) Estimate accuracy Skew of traffic for task2 changes Task2 accuracy <80% 0 0 20 40 Time (s) 60 50 40 30 20 10 0 Task 1 Task 2 20 40 Time (s) 60 15 SCREAM Challenges Collect & report Network-wide task implementation using sketches Estimate accuracy Accuracy estimation without the ground-truth Allocate resources Fast & Stable allocation in DREAM [SIGCOMM’14] Challenge: Merge Sketches of Different Sizes Network-wide Task Heavy hitter (HH) Source IPs sending > 10Mbps 25 10 Switch A d d w1 15 Switch B w2 17 SCREAM Solution to Merge Sketches for HH Detection Previous work: Min of sums 10 + 40 50 30 + 50 SCREAM: Sum of mins 70 + 20 80 90 Min 50 10 30 70 40 Min ≥ 50 Min + 10 20 20 30 25 10 Switch A 15 Switch B 10 40 30 50 Both over-approximate smaller is more accurate 70 20 18 SCREAM Solutions Collect & report Network-wide task implementation using sketches • Merge sketches of different sizes for HH, HHH, SSD • SSD algorithm with higher and more stable accuracy Estimate accuracy Accuracy estimation without the ground-truth Allocate resources Fast & Stable allocation in DREAM [SIGCOMM’14] Precision Estimation for Heavy Hitter Detection True detected HH = Sum(P[Detected HH is true]) Precision = Detected HHs Insight: Relate probability to Error on counters of detected HHs Estimate-Threshold Error Estimate-Threshold Threshold Estimated Real True HH False HH P[Detected HH is true] = 1 - P[Error ≥ Estimate-Threshold] 20 Precision Estimation Step 1: Find a Bound on The Error Insight: probability Error on counters ofto detected Idea 1: Relate Use average Error to in Markov’s inequality bound itHHs Idea 1 P[Detected HH is true] = 1 - P[Error ≥ Estimate-Threshold] 21 Precision Estimation Step 2: Improve The Bound A row in Count-Min: Idea 2 Idea 1 Insight: • Average Error = heavy items collision + small items collision • Counter indices of detected HHs show heavy collisions Idea 2: Markov’s inequality only for small items 22 SCREAM Solutions Collect & report Network-wide task implementation using sketches • Merge sketches of different sizes for HH, HHH, SSD • SSD algorithm with higher and more stable accuracy Estimate accuracy Accuracy estimation without the ground-truth Precision estimators for HH, HHH and SSD tasks Allocate resources Fast & Stable allocation in DREAM [SIGCOMM’14] SCREAM Solutions Collect & report Network-wide task implementation using sketches • Merge sketches of different sizes for HH, HHH, SSD • SSD algorithm with higher and more stable accuracy Estimate accuracy Accuracy estimation without the ground-truth Precision estimators for HH, HHH and SSD tasks Allocate resources Fast & Stable allocation in DREAM [SIGCOMM’14] Evaluation Metrics: • Satisfaction of a task: Fraction of task’s lifetime with sufficient accuracy • % of rejected tasks Alternatives: • OpenSketch: Allocate for bounded error for worst-case traffic at task instantiation (test with different bounds) • Oracle: Knows required resource for a task in each switch in advance 25 Evaluation Setting Simulation for 8 switches: • 256 task instances (HH, HHH, SSD, combination) • Accuracy bound = 80% • 5 min tasks arriving in 20 minutes • 2 hours CAIDA trace 26 SCREAM Provides High Accuracy for More Tasks SCREAM: High satisfaction and low reject 100 80 60 40 20 0 128 OS_10 OS_50 OS_90 SCREAM 256 384 Switch capacity (KB) 512 Rejected tasks (%) Average Satisfaction 100 80 60 OS_10 OS_50 OS_90 SCREAM 40 20 0 128 256 384 Switch capacity (KB) 512 OpenSketch: Loose bound Under provision low satisfaction Tight bound Over provision high reject 27 SCREAM’s Performance Is Close to An Oracle 100 80 60 40 20 0 128 Oracle SCREAM 256 384 Switch capacity (KB) 512 Rejected tasks (%) Average Satisfaction 100 Oracle SCREAM 80 60 40 20 0 128 256 384 Switch capacity (KB) 512 SCREAM performance is close to an oracle, its satisfaction is a bit lower because: • Iterative allocation takes time • Accuracy estimation has error 28 Other Evaluations Changing traffic skew SCREAM supports more accurate tasks than OpenSketch Accuracy estimation error SCREAM accuracy estimation has 5% error in average Other accuracy metrics Tasks in SCREAM have high recall (low false negative) 29 Conclusion Measurement is crucial for SDN management in a resource-constrained environment Practical sketch-based SDM by dynamic memory allocation • Implementing network-wide tasks using sketches • Estimating accuracy for 3 types of tasks SCREAM is available at github.com/USC-NSL/SCREAM 30 Thanks! Questions? 31