Virtual Network Diagnosis as a Service

advertisement
Virtual Network
Diagnosis as a Service
Wenfei Wu (UW-Madison)
Guohui Wang (Facebook)
Aditya Akella (UW-Madison)
Anees Shaikh (IBM System Networking)
Virtual Networks in the Cloud
•
•
•
•
•
The cloud provider maintains the data center infrastructure
The tenant request/configure virtual networks for their apps
The virtual network is mapped to the physical infrastructure
Network services are provided as virtual middleboxes
Multiple tenants share the infrastructure
Subnet 1
VM
GW
MB 2
Tenant
VM
R
VM R
Subnet 2
VM
GW
Application Traffic
VS
VS
VS
VM
VM
VM
Middlebox
Virtual Network Problems
• Multiple layers may have various problems
o Virtual network connectivity issues
o Tenants’ application performance may be degraded
• Isolation and abstraction prevent tenants from
diagnosing their virtual networks
Existing Solutions
• Infrastructure-layer tools may expose the physical
network to tenants
• sFlow, NetFlow
• OFRewind, NDB, etc.
• Tracing tools in the virtual network are difficult to
deploy in some virtual components (e.g.,
middleboxes)
• Tcpdump, Xtrace, SNAP
VND Proposal
• The cloud provider should offer a virtual network
diagnosis service (VND) to the tenants
1.
2.
3.
4.
Diagnostic request
Trace collection
Trace parse
Data query
Diagnosis Policy
Diagnosis
Request
Tenant
Path: n1… nk
Pattern: IP, port, …
n1: Table Server1
n2: Table Server2
…
Control
Server
Diagnosis
Policy
Path: n1… nk
Pattern: IP,
port
n1: collector1
n2: collector2
…
Tenant allocated resources
Table
Table
& Physical Topology
Server
Server
VND Challenges
• Preserve isolation and abstractions
• Low overhead
• Scalability
Contents
• Motivation
• VND Design & Implementation
• Evaluation
• Conclusions
VND Architecture
Collect Parse
Config Config
Table Server
Query Executor
Trace Table
ts
id
…
Policy
Manager
Tenant
Collect Config
& Topology
Analysis
Manager
Control Server
Table Server
Query Executor
Trace Table
ts
id
Collection Policy
…
Flow Trace
Trace Parser
Trace Parser
Trace Collector
Trace Collector
Raw Data
Raw Data
Parse Config
Cloud Controller
Query Execution
Data Collection (1)
• The tenant submits a Data Collection Configuration
Front
End
Virtual Appliance Link : l1
Capture Point node1
Flow Pattern field = value, ...
Capture Point node2
...
Virtual Appliance Node: n1
Capture Point input, [output]
...
Load
Balancer
Server
1
Cap: Input
srcIP=10.0.0.6
dstIP=10.0.0.8
tcp
dstPort=80
dualDirection
Server
2
Cap: output
…
Server
3
Data Collection (2)
• Policy Manager generates a Data Collection Policy
Trace ID
tr_id1
Physical Cap Point
vs1
Collector
c1 at h1
Pattern
field = value, ...
Path
vs1, ..., h1, c1
Load
Balancer
vSwitch
(vs1)
NIC
Collector (c1)
tr_id1
srcIP=10.0.0.6,
…, etc.
Hypervisor
Data Parse
• The tenant submits a Data Parse Configuration
Trace ID all
Filter: ip.proto = tcp
or ip.proto = udp
Fields: timestamp as ts,
ip.src as src_ip,
ip.dst as dst_ip,
ip.proto as proto,
tcp.src as src_port,
tcp.dst as dst_port,
udp.src as src_port,
udp.dst as dst_port
Table ID tab_id1
Filter exp
Fields field_list
exp = exp and exp |
exp or exp | not exp
| (exp) | prim
prim = field in value_set
field_list = field (as name)
(, field (as name))*
ts
src_IP
dst_IP
proto
src_port
dst_port
…
…
…
…
…
…
Data Analysis
• Trace Tables form a distributed database
Tenants
Operations
Diagnostic Applications
SQL Interface
Analysis
Manager
Diagnostic Schema
Collection View
Trace_ID
Vnet_App
Cap_Loc
Parse View
Pattern
Trace_ID
…
Trace Table
Trace Table
Trace Table
ts
ts
ts
id
…
Query Executor
id
…
Query Executor
id
…
Query Executor
Data Analysis Examples
ts
src_IP
dst_IP
seq
ack
Payload_length
…
…
…
…
…
…
• RTT
1. create view T1_f as select * from T1 where srcIP=IP1
and dstIP=IP2
2. create view T1_b as select * from T1 where dstIP=IP1
and srcIP=IP2
3. create view RTT as select f.ts as t1, b.ts as t2 from T1_f
as f, T1_b as b where f.seq + f.payload_length = b.ack
4. select avg(t2-t1) from RTT
• throughput, loss, delay, statistics, etc.
Optimizations
• Local Table Server placement
• Place the collector locally with the capture points
• Avoid the trace collection traffic traversing the network
• Move data only when queries need it
• Avoid interference with existing rules using the
multi-table feature on OVS
Contents
• Motivation
• VND Design & Implementation
• Evaluation
• Conclusions
Trace Collection Overhead (1)
• We transfer data between VMs in 2 physical servers
o 10Gbps NIC
o 8 pairs of VMs, each pair transfer 1Gbps traffic
• Each minute, we capture one VM pair’s traffic
• The duplicated traffic increases
as we capture more traffic
• The original application traffic
is not impacted
Trace Collection Overhead (2)
• VMs perform memory copy and data transfer
simultaneously
• We measure memory and network throughput
• Each 1Gbps network traffic
duplication causes 59 MB/s
memory overhead
Data Query Overhead
• We use RTT monitoring as an example
o 200 Mbps traffic
o Calculate average RTT periodically
• Network overhead is negligible
• Execution time scales linearly with the data size
• VND can process 2-3 Gbps traffic in real time
RTT
Scalability (1)
• Control Server is a simple web server can be scaled
up easily
• Data collection distributed locally to avoid being
the bottleneck
Scalability (2)
• Data Query simulation
• A data center with 10,000 servers, each has a 10Gbps
NIC
• Virtual network size [2, 20]
• Query executors can process 3 Gbps traffic in real time
• Total link utilization [0.1, 0.9]
• Results
• 30% of total link capacity can be queried in real time
Conclusions
• The cloud provider should offer a virtual network
diagnosis service to the tenants
• We design VND
• Architecture, interfaces and operations
• Address several important challenges
• Our experiments and simulation demonstrate the
feasibility of VND
END
Q&A
Please contact with WISDOM
http://wisdom.cs.wisc.edu
Functional Validation
• Middlebox Bottleneck Detection
• VND use throughput and RTT time to find abnormity
RE
IDS
Functional Validation(2)
• Middlebox scaling
RE
IDS
VND Architecture
Collect Parse
Config Config
Table Server
Query Executor
Trace Table
ts
id
…
Policy
Manager
Tenant
Collect Config
& Topology
Analysis
Manager
Control Server
Table Server
Query Executor
Trace Table
ts
id
Collection Policy
…
Flow Trace
Trace Parser
Trace Parser
Trace Collector
Trace Collector
Raw Data
Raw Data
Parse Config
Cloud Controller
Query Execution
Download