Document

advertisement
Grant Agreement
n. 318627
The
Plane distributed
measurement infrastructure
Overview, insights & hindsights
Dario Rossi
Professor
dario.rossi@telecom-paristech.fr
http://www. telecom-paristech.fr/~drossi
Journée du Conseil Scientifique de l’Afnic , #JCSA2015, July 9th 2015
Motivations
Measurement
Architecture
Ecosystem
Example of use
Summary
Today Internet
How to keep users
Where is
Which ISP is the best
happy & engaged?
in my area ?
traffic coming from?
Why is
How does it work?
How to optimize my
How to do better?
network for
not working?
users apps?
User viewpoint
Researcher viewpoint
Cloud viewpoint
ISP viewpoint
Motivations
Measurement
Architecture
Ecosystem
Example of use
Summary
Today Internet
“The Internet is the first thing that humanity has built
that humanity doesn't understand, the largest
experiment in anarchy that we have ever had.”
Eric Schmidt – ex Google Exec. Chairman
3
Motivations
Measurement
Architecture
Ecosystem
Example of use
Summary
Internet measurement
Shed light on the Internet operational obscurity
Lot of measurement tools…
….
issing orchestration
Plane: avoid to reinvent the wheel
& assist in building automated pilots!
4
Motivations
Measurement
Architecture
Ecosystem
Example of use
Summary
Two kinds of measurements
• Passive
– Observe network traffic
without interference
– Similar to Aristotle’s
observational method
• Active
– Perturb the network &
measure its reaction
– Similar to Newton’s
experimental method
Ἀριστοτέλης
Sir Isaac Newton
@aristotle
@newton
All science is either practical, poetical or theoretical
(Metaphysics)
Every body continues in its state of rest, or of uniform
motion in a right line, unless it is compelled to change
that state by forces impressed upon it (Principia)
1388
2411
372
353
5
5
Motivations
Measurement
Architecture
Ecosystem
Example of use
Summary
Passive m easurements
Deploy
passive
sensors
Data collection
Extract
meaningful
analytics
Passive sensor
User traffic
Raw (or halfcooked) data
Post Process
6
Motivations
Measurement
Architecture
Ecosystem
Example of use
Summary
Active m easurements
Perform active
measurements
A
Data collection
Actuator
Control
Active probes
Control
active
sensors
Extract
meaningful
analytics
Post Process
Raw (or halfcooked) data
7
Motivations
Measurement
Architecture
erging all together
Ecosystem
Example of use
Summary
Integration with existing
monitoring frameworks
Active and passive analysis
Sup
r visor
foreiterative
root-cause-analysis
Active
sensor
Passive
sensor
Re p osit or y
Data
Control
A
8
~~~Big data hype~~~
• Traffic measurements orders of m agnitude
– 40Gbps (157PB/yr) full-packet processing at each passive
sensor, with up to O(107) traffic classifications per second…
– O(109) active probes in an Internet anycast census…
(more on that later if time allows)
• To be compared with
– The Large Hadron Collider (LHC), generates ~25PB/yr
and O(108) collisions per second
– Sloan Digital Sky Survey (SDSS), generates ~73TB/yr
– Capacity of the Human genome with 2-bits bases ~ O(109)
[courtesy of Sony Classical]
[Image from http://irevolution.net/category/big-data/]
Motivations
Measurement
Architecture
Ecosystem
Example of use
Summary
Plane architecture walkthrough
Client/
Reasoner
S
Note:
simplified view, yet the
devil hides in the details
Supervisor
A
Probes
Repositories
10
Motivations
Measurement
Architecture
Ecosystem
Example of use
Summary
Plane architecture: entities
Client/
Reasoner
Access control;
delegation; boostrap
S
Sensor data;
Proxies for
legacy tools/
platforms
A
Orchestrate measurements;
expert system (automatic pilot)
Supervisor
Probes
Past sensor data;
legacy databases;
Big data processing
Repositories
11
Motivations
Measurement
Architecture
Ecosystem
Example of use
Summary
Plane architecture: interfaces
Native probe
Native repo
Client/
Reasoner
Proxy probe/repo
S
(avoid reinventing the wheel!)
Supervisor
A
Probes
Repositories
12
Motivations
Measurement
Architecture
Ecosystem
Example of use
Summary
Plane architecture: messages
C
Client/
Reasoner
S
R
Schema-centric definition,
with a type registry
Schema completely describes
the measurement, JSON format
S
Supervisor
A
Probe
Repository
13
Motivations
Measurement
Architecture
Ecosystem
Example of use
Summary
Plane architecture: messages
C
{
"capability": “measure”,
"version": 0,
"registry": http://ict-mplane.eu/reg,
"label": “ping-aggregate”
"when": “now .. future / 1s”
"parameters": {"source.ip4“: "137.3.1.1",
"destination.ip4": "*"},
“results”: ["delay.twoway.icmp.us.min",
"delay.twoway.icmp.us.mean",
"delay.twoway.icmp.us.max",
"delay.twoway.icmp.count"]
}
S
{
“specification": “measure”,
"version": 0,
"registry": http://ict-mplane.eu/reg,
"label": “ping-aggregate-experiment1”,
"when": “now + 30s / 1s”,
"token" : "0f31c9033f8fce0c9be41d4944"
"parameters": {"source.ip4“: "137.3.1.1",
"destination.ip4": “137.194.164.1"},
“results”: ["delay.twoway.icmp.us.min",
"delay.twoway.icmp.us.mean",
"delay.twoway.icmp.us.max",
"delay.twoway.icmp.count"]
}
R
{
“specification": “measure”,
"version": 0,
"registry": http://ict-mplane.eu/reg,
"label": “ping-aggregate-experiment1”,
"when": “2015-09-07 14:30:02.123 […] /1s“,
"token" : "0f31c9033f8fce0c9be41d4944“,
"parameters": {"source.ip4“: "137.3.1.1",
"destination.ip4": “137.194.164.1"},
“results”:
Note: ["delay.twoway.icmp.us.min",
"delay.twoway.icmp.us.mean",
simple ping
example,
"delay.twoway.icmp.us.max",
"delay.twoway.icmp.count"]
for illustrational
purposes
“resultsvalue” : [[ 2390,2983, 6600,30]]
}
*grin* If I'd known then that [ping] would be my most famous
accomplishment in life, I might have worked on it another day
or two and added some more options.
Mike Muuss - US Army Research Laboratory
14
Motivations
Measurement
Architecture
Ecosystem
Example of use
Summary
Plane architecture: transport
mPlane over:
• HTTPS (HTTP over TLS)
• WSS (WebSockets over TLS)
• SSH (Secure Shell)
Client/
Reasoner
S
Supervisor
Probe
Repository
Indirect export uses custom protocols:
IPFIX, FTP, BitTorrent, Apache Flume, RFC1149, RFC6214, etc.
15
Motivations
Measurement
Architecture
Ecosystem
Example of use
Summary
Plane architecture: workflow
10. Decide and iterate
1. Send high level
specification
Client/
Reasoner
9.Return high-level
results
S
2. Decompose
specification
Supervisor
8. Aggregate results
3. Send lowlevel
specification
7.Return results
Probes
4. Measure
Repositories
6.Analyze
5. Indirect export of Big data
16
Motivations
Measurement
Architecture
Ecosystem
Example of use
Summary
Plane architecture: Inter-domain
S
Client/
Reasoner
S
Supervisor
Probes
S
Repositories
17
Motivations
Measurement
The broader
•
Architecture
Ecosystem
Example of use
Summary
easurement ecosystem
“From global measurements to local management”
– 10 partners, 3.8 MEUR, FP7 STREP
– More focused use case: Build a framework out of
probes
– Knowledge sharing (e.g., joint work, Dagsthul seminars, etc.)
• IETF Large-Scale Measurement of Broadband Performance (LMAP)
– Defines the components, protocols, rules, etc., but does not specifically
target adding “a brain” to the system
– Common core set, Largely interoperable
• IETF IP Performance Metrics (IPPM)
– Registry related, we use its vocabulary as much as possible
• IETF IP Flow Information Export (IPFIX)
– Indirect export related, active contributors
• Others in scope
– IETF DOCTORS; tcpm; ConEx; NETCONF; IRTF NMRG; ETSI STQ; ITU SG12
18
Motivations
Measurement
Architecture
Ecosystem
Example of use
Summary
Plane consortium
• 16 partners • FP7 IP
– 3 ISPs
– 6 research
centers
– 5 universities
– 2 SMEs
– 11 MEUR
– 3 years long
– ends early 2016
Marco Mellia
POLITO
Saverio Nicolini
NEC
Dina Papagiannaki
Telefonica
Ernst Biersack
Eurecom
Brian Trammell
ETH
Tivadar Szemethy
NetVisor
Andrea Fregosi
Fastweb
Guy Leduc
Univ. Liege
Dario Rossi
ENST
Pietro Michiardi
Eurecom
Fabrizio Invernizzi
Telecom Italia
Pedro Casas
FTW
19
Motivations
Measurement
Architecture
Ecosystem
Example of use
Summary
Plane achievements so far
• Standardization &dissemination
Note:
hard to summarize!
– 3 RFCs, 5 drafts
– 80+ papers, including 4 Best paper awards and
ACM SIGCOMM IMC & CoNEXT, IEEE INFOCOM
• Software
https://www.ict-mplane.eu/public/software and https://github.com/fp7mplane/
(Ready-to-use Virtual Machines under preparation)
• Check the demos on
https://www.youtube.com/channel/UCHGS6UlUKvGZTyt5DemmPaw
“Biased sampling”:
Anycast geolocation as a showcase of mPlane achievements
20
Motivations
Measurement
Architecture
Ecosystem
Example of use
Summary
IP Anycast
• Anycast: set of equivalent replicated servers, traffic
routed to closest replica (in BGP sense)
• Question: where are Google DNS servers 8.8.8.8?
Commercial databases
Mountain View, CA (IP2Location)
New York, NY (Geobytes)
United States (Maxmind)
Distributed measurement
Tools using distributed
measurement aren’t better !
Time varying answers
Unknown accuracy
?????
1ms ≈
100Km
21
Motivations
Measurement
Architecture
Ecosystem
Example of use
Summary
Geolocation technique
Filter latency
noise
Scalability
Detect
Speed of light
violations
Measure
Latency
Planetlab/RIPE Atlas
Lightweight
Measure
Detect
Enumerate
Solve MIS
Geolocate
Classification
Optimum (brute force)
5-approx (greedy).
Maximum likelihood
Iterate
Feedback
Infrastructure ?
Enumerate
Geolocate
Iterate
D. Cicalese, et al. “A fistful of Pings: Accurate and lightweight Anycast enumeration and Geolocation”, IEEE INFOCOM 2015 22
Motivations
Measurement
Architecture
Ecosystem
Example of use
Summary
measurement infrastructure
• RIPE Atlas vs PlanetLab
Footprint
VPs
ASes
Countries
RIPE Atlas
7k
2k
150
PlanetLab
~300
180
30
Infrastructure ?
• Anecdotal understanding
– For some deployments,
(Microsoft IP/24 Example)
RIPE includes PlanetLab
– For others, the union is greater than the sum of the parts !
•
mPlane enables/simplifies systematic studies
23
Motivations
Measurement
Architecture
Anycast census
Ecosystem
Example of use
Lightweight
Summary
Scalable
http://www.telecom-paristech.fr/~drossi/anycast
• O(107) targets x O(102) active sensors
• O(103) targets /sensor /second
• O(103) targets are anycast – needle in the IPv4 haystack
24
Motivations
Measurement
Su
Architecture
Ecosystem
Example of use
Summary
ary
• Overview
– Measurements to shed light on Internet operational obscurity
• Insights
– Measurement plane to facilitate expression
of measurement capabilities and needs
– Allow users to concentrate effort on hard problems
– Avoiding time consuming tasks
• Hindsights
– Crucial to foster adoption (GitHub, ready-to-use VMs)
to reach a critical mass (incentive in proxying)
25
Motivations
Measurement
Architecture
Ecosystem
any thanks
?? || //
Example of use
Summary
26
Backup slides
27
Reasoner
Repository components
Plane architecture: simplistic view
Client
S
Client
Supervisor
Sup e r visor
Measurement components (aka Probes)
proxy
mPlaneAPI
A
mPlaneAPI
Re p osit or y
Plane architecture: gory details
S
Methodology overview
Detect
Speed of light
violations
Measure
Latency
Planetlab/Ripe
Enumerate
Solve MIS
Geolocate
Classification
Optimum (brute force)
5-approx (greedy).
Maximum likelihood
Iterate
Feedback
• PlanetLab
Measure
Detect
Speed of light
violations
Measure
Latency
Planetlab/Ripe
– 300 vantage points
– Geolocated with Spotter (ok for unicast)
– Freedom in type/rate of measurement
ICMP, DNS, TCP-3way delay, etc
• RIPE
– 6000 vantage points
– Geolocated with MaxMind (ok for unicast)
– More constrained (ICMP, traceroute)
In this talk: min over 10 ICMP samples
Enumerate
Solve MIS
Geolocate
Classification
Optimum (brute force)
5-approx (greedy).
Maximum likelihood
Iterate
Feedbac
k
Detect
Detect
Speed of light
violations
Measure
Latency
Planetlab/Ripe
The vantage points p and q are
referring to two different instances if:
Enumerate
Solve MIS
Geolocate
Classification
Optimum (brute force)
5-approx (greedy).
Maximum likelihood
Iterate
Feedbac
k
Packets cannot travel faster
than the speed of light
Detect
Speed of light
violations
Enumerate
•
Find a maximum independent set E
Measure
Latency
Planetlab/Ripe
– of discs such that:
– Brute force (optimum) vs Greedily from smallest (5-approximation)
Enumerate
Solve MIS
Geolocate
Classification
Optimum (brute force)
5-approx (greedy).
Maximum likelihood
Iterate
Feedbac
k
Geolocate
• Classification task
Detect
Speed of light
violations
Measure
Latency
Planetlab/Ripe
– Map each disk Dp to most likely city
– Compute likelihood (p) of each city in disk
based on:
• ci: Population of city i
• Ai: Location of ATA airport of city i
ci
di
• d(x,y): Geodesic
p = a distance
+ (1 - a )
cj
åweighting
å dj
• a: city vs distance
Enumerate
Solve MIS
Geolocate
Classification
Optimum (brute force)
5-approx (greedy).
Maximum likelihood
Iterate
Feedbac
k
Frankfurt
p=0.30
i
j
j
Munich
p=0.60
• Output policy
• Proportional: Return all cities in Dp with
respective likelihoods
• Argmax: Pick city with highest likelihood
Zurich
p=0.10
rationale: users lives in densely
populated area; to serve users,
servers are placed close to cities
airports: simplifies validation against
ground truth (DNS)
Iterate
Detect
Speed of light
violations
Measure
Latency
Planetlab/Ripe
Increase recall of
anycast replicas
enumeration
Enumerate
Solve MIS
Geolocate
Classification
Optimum (brute force)
5-approx (greedy).
Maximum likelihood
Iterate
Feedbac
k
• Collapse
– Geolocated
disks to city
area
• Rerun
– Enumeration
on modified
input set
Performance at a glance
– Protocol agnostic and lightweight
• Based on a handful of delay measurement O(100) VPs
• 1000x fewer VPs than state of the art
– Enumeration
• iGreedy use75% of the probes (25% discarded due to overlap)
• Overall 50% recall (depends on VPs; stratification is promising)
– Geolocation
• Correct geolocation for 78% of enumerated replicas
• 361 km mean geolocation error for all enumerated replicas
(271 km median for erroneous classification)
Download