conext15

advertisement
SCREAM: Sketch Resource Allocation
for Software-defined Measurement
(CoNEXT’15)
Masoud Moshref, Minlan Yu,
Ramesh Govindan, Amin Vahdat
Measurement is Crucial for Network Management
Network Management on multiple tenants:
Traffic
Anomaly
Traffic
DDoS
Engineering
Detection
Engineering
detection
Anomaly
Accounting
Detection
Need fine-grained visibility of network traffic
Measurement tasks:
Heavyhitter
Hitter
detection
Heavy
detection
(HH)
Heavy
Hitter
detection
Hierarchical
heavy
hitter
detection (HHH)
SuperChange
sourcedetection
detection (SSD)
2
Software Defined Measurement
Task 1
Task 2
DREAM [SIGCOMM’14] / SCREAM [CoNEXT’15]
Controller
Configure
Collect
Switch A
Task 1 counters
Switch B
Task 1 counters
Task 2 counters
Task 2 counters
3
Our Focus: Sketch-based Measurement
Summaries of streaming data to approximately answer specific queries
E.g., Bitmap for counting unique items
OpenFlow Counters
DREAM [SIGCOMM’14]
Expensive,
power-hungry TCAM
Sketches
SCREAM [CoNEXT’15]
Cheaper SRAM
Counters
Volume counters
Volume and
Connection counters
Flows
Selected prefixes
All traffic all-the-time
Memory
Sketches use a cheaper memory and are more expressive
4
Sketch Example: Count-Min Sketch
At packet arrival:
(IP, 1 Kbytes)
h1(IP)
h2(IP)
2+1=3
4+1=5
d
h3(IP)
1+1=2
At query:
What is the traffic size of IP? = row with min collision = Min(3,5,2) = 2
Resource accuracy trade-off:
Provable error bound given traffic properties (e.g., skew)
5
Challenges: Limited Counters for Many Tasks
Limited shared resources:
• SRAM capacity (e.g., 128 MB)
• Shared with other functions (e.g., routing)
Too many resources to guarantee accuracy:
1 MB-32 MB per task
• Less than 4-128 tasks in SRAM
Many task instances:
• 3 types (Heavy hitter, Hierarchical heavy hitter, Super source)
• Different flow aggregates (Rack, App, Src/Dst/Port)
• 1000s of tenants
6
Goal: Many Accurate Sketch-based Measurements
Users dynamically instantiate a variety of measurement tasks
SCREAM supports the largest number of measurement
tasks while maintaining measurement accuracy
7
Approach: Dynamic Resource Allocation
Required memory
Resource accuracy trade-off depends on traffic
Count Min: Provable error bound given traffic properties
Ex: Skew of traffic from each IP
Worst-case uses >10x counters than average
Skew
Dynamic allocation for current traffic
8
Opportunity: Temporal Multiplexing
Memory requirement varies over time
Required Memory
Task 1
Task 2
Time
Multiplex memory among tasks over time
9
Opportunity: Spatial Multiplexing
Memory requirement varies across switches
Required Memory
Task 1
Task 2
Switch A
Switch B
Multiplex memory among tasks across switches
10
Key Insight
Leverage spatial and temporal multiplexing
and dynamically allocate switch memory per task
to achieve sufficient accuracy for many tasks
• DREAM has the same insight
• SCREAM applies it for sketches
11
SCREAM Contributions
2- Allocate memory among sketch-based task instances
across switches while maintaining sufficient accuracy
SCREAM
Dynamic resource allocator
Allocation
Heavy hitter
(HH) tasks
Hierarchical heavy hitter
(HHH) tasks
Super Source
(SSD) tasks
1- Supports 3 sketch-based task types
• Anomaly detection
• Traffic engineering
• DDoS detection
12
SCREAM Iterative Workflow
Collect & report
Counters from many switches
Estimate accuracy
Accuracy
Allocate resources
Memory size
13
Allocate resources
Give more memory to task1
Allocated Memory (KB)
Estimate accuracy
Task1 accuracy <80%
50
100
40
80
60
30
40
20
20
10 0
00
50
Allocated Memory (KB)
Collect & report
Allocated
Memory (KB)
Precision
Accuracy
SCREAM Iterative Workflow
Task11
Task
Task22
Task
20
40
20
40
Time
Time (s)
(s)
60
60
50
40
40
30
30
20
20
10
10
00
Task11
Task
Task22
Task
20
40
20
40
Time
Time (s)
(s)
60
60
14
SCREAM Iterative Workflow
80
Precision
Accuracy
Collect & report
Merge counters from switches
100
60
40
Task 1
Task 2
20
Allocate resources
Give more memory to task2
Allocated Memory (KB)
Estimate accuracy
Skew of traffic for task2 changes
Task2 accuracy <80%
0
0
20
40
Time (s)
60
50
40
30
20
10
0
Task 1
Task 2
20
40
Time (s)
60
15
SCREAM Challenges
Collect & report
Network-wide task implementation using sketches
Estimate accuracy
Accuracy estimation without the ground-truth
Allocate resources
Fast & Stable allocation in DREAM [SIGCOMM’14]
Challenge: Merge Sketches of Different Sizes
Network-wide Task
Heavy hitter (HH)
Source IPs sending > 10Mbps
25
10
Switch A
d
d
w1
15
Switch B
w2
17
SCREAM Solution to Merge Sketches for HH Detection
Previous work: Min of sums
10 + 40
50
30 + 50
SCREAM: Sum of mins
70 + 20
80
90
Min
50
10
30
70
40
Min
≥
50
Min
+
10
20
20
30
25
10
Switch A
15
Switch B
10
40
30
50
Both over-approximate  smaller is more accurate
70
20
18
SCREAM Solutions
Collect & report
Network-wide task implementation using sketches
• Merge sketches of different sizes for HH, HHH, SSD
• SSD algorithm with higher and more stable accuracy
Estimate accuracy
Accuracy estimation without the ground-truth
Allocate resources
Fast & Stable allocation in DREAM [SIGCOMM’14]
Precision Estimation for Heavy Hitter Detection
True detected HH
= Sum(P[Detected HH is true])
Precision =
Detected HHs
Insight: Relate probability to Error on counters of detected HHs
Estimate-Threshold
Error
Estimate-Threshold
Threshold
Estimated
Real
True HH
False HH
P[Detected HH is true] = 1 - P[Error ≥ Estimate-Threshold]
20
Precision Estimation Step 1: Find a Bound on The Error
Insight:
probability
Error on counters
ofto
detected
Idea 1: Relate
Use average
Error to
in Markov’s
inequality
bound itHHs
Idea 1
P[Detected HH is true] = 1 - P[Error ≥ Estimate-Threshold]
21
Precision Estimation Step 2: Improve The Bound
A row in Count-Min:
Idea 2
Idea 1
Insight:
• Average Error = heavy items collision + small items collision
• Counter indices of detected HHs show heavy collisions
Idea 2: Markov’s inequality only for small items
22
SCREAM Solutions
Collect & report
Network-wide task implementation using sketches
• Merge sketches of different sizes for HH, HHH, SSD
• SSD algorithm with higher and more stable accuracy
Estimate accuracy
Accuracy estimation without the ground-truth
Precision estimators for HH, HHH and SSD tasks
Allocate resources
Fast & Stable allocation in DREAM [SIGCOMM’14]
SCREAM Solutions
Collect & report
Network-wide task implementation using sketches
• Merge sketches of different sizes for HH, HHH, SSD
• SSD algorithm with higher and more stable accuracy
Estimate accuracy
Accuracy estimation without the ground-truth
Precision estimators for HH, HHH and SSD tasks
Allocate resources
Fast & Stable allocation in DREAM [SIGCOMM’14]
Evaluation
Metrics:
• Satisfaction of a task: Fraction of task’s lifetime with
sufficient accuracy
• % of rejected tasks
Alternatives:
• OpenSketch: Allocate for bounded error for worst-case
traffic at task instantiation (test with different bounds)
• Oracle: Knows required resource for a task in each
switch in advance
25
Evaluation Setting
Simulation for 8 switches:
• 256 task instances (HH, HHH, SSD, combination)
• Accuracy bound = 80%
• 5 min tasks arriving in 20 minutes
• 2 hours CAIDA trace
26
SCREAM Provides High Accuracy for More Tasks
SCREAM: High satisfaction and low reject
100
80
60
40
20
0
128
OS_10
OS_50
OS_90
SCREAM
256
384
Switch capacity (KB)
512
Rejected tasks (%)
Average Satisfaction
100
80
60
OS_10
OS_50
OS_90
SCREAM
40
20
0
128
256
384
Switch capacity (KB)
512
OpenSketch:
Loose bound  Under provision  low satisfaction
Tight bound  Over provision  high reject
27
SCREAM’s Performance Is Close to An Oracle
100
80
60
40
20
0
128
Oracle
SCREAM
256
384
Switch capacity (KB)
512
Rejected tasks (%)
Average Satisfaction
100
Oracle
SCREAM
80
60
40
20
0
128
256
384
Switch capacity (KB)
512
SCREAM performance is close to an oracle,
its satisfaction is a bit lower because:
• Iterative allocation takes time
• Accuracy estimation has error
28
Other Evaluations
Changing traffic skew
SCREAM supports more accurate tasks than OpenSketch
Accuracy estimation error
SCREAM accuracy estimation has 5% error in average
Other accuracy metrics
Tasks in SCREAM have high recall (low false negative)
29
Conclusion
Measurement is crucial for SDN management
in a resource-constrained environment
Practical sketch-based SDM by dynamic memory allocation
• Implementing network-wide tasks using sketches
• Estimating accuracy for 3 types of tasks
SCREAM is available at github.com/USC-NSL/SCREAM
30
Thanks!
Questions?
31
Download