Matti Siekkinen
University of Oslo
TKK / HIIT
April 9./10., 2008
Part 1: Root Cause Analysis of TCP Throughput
What limits the throughput of a given TCP transfer?
Main results from my Ph.D. work
Part 2: Monitoring as a First Class Citizen in an
Autonomic Network Architecture
Building monitoring support within autonomic network architecture
Work is part of the EU funded ANA project
13 April 2020
Guillaume Urvoy-Keller, Ernst W. Biersack
Institut Eurecom, France
Denis Collange
France Télécom R&D, France
Introduction and Motivation
Root cause analysis techniques
Taxonomy of TCP rate limitation causes
Our approach to infer limitation causes
Case study on Performance Analysis of ADSL Clients
Conclusions
13 April 2020
ISPs would like to know how clients are doing
What are the performance limitations that Internet applications are facing?
Why does a client with 4Mbit/s ADSL access obtain only total download rate of few KB/s with eDonkey?
Why, after upgrading my subscription, I see no improvement in throughput?
The network provides few answers directly
The network elements are by design not intelligent
Need techniques for traffic measurement and analysis
13 April 2020
What?
Analysis and inference of the reasons that prevent a given TCP connection from achieving a higher throughput.
Reasons are called limitation causes
Why TCP?
TCP typically over 90% of all traffic
13 April 2020
TCP Rate Analysis Tool (T-RAT) by Zhang et al.
(sigcomm 2002)
Pioneering research work o o
Ground breaking insights
It is not all congestion!
o Opened up many questions
We implemented and tested it o Results are way off too often o Fundamental assumptions do not hold
T-RAT analyzes unidirectional traffic o o
Passively collected measurements
Usable in more cases (asymmetric paths) o The source of the problems
13 April 2020
We analyze only passive traffic measurements
Capture and store all TCP/IP headers , analyze later off-line
Observe traffic at a single measurement point
Applicable in diverse situations
E.g. at the edge of an ISP’s network o
Know all about clients’ downloads and uploads
Bidirectional packet traces
Connection level analysis
13 April 2020
Single measurement point anywhere along the path
Cannot/don’t want to control it
Complicates estimation of parameters (RTT and cwnd) ack2
A: RTT ~ d1
piece of cake…
B: RTT ~ d3+d4
How to get d4?
(Did ack2 trigger
data2?)
A B
13 April 2020
A lot of data to analyze
Potentially millions of connections per trace
Deep analysis
For each connection of each trace o Compute a lot of metrics o Divide connections into pieces
• Analyse separately and compute more metrics o Need to keep track of everything
Need solutions for data management
InTraBase
13 April 2020
Find the right metrics to characterize all limitations
Need to gather a lot of experience
Get it right!
Several methods for computing a particular metrics o
Choose the “best” for the situation o Try to maximize correctness of results o E.g. 5 ways to estimate RTTs
Careful validations o Benchmark with a lot of reference traces o Cross validate metrics
13 April 2020
Introduction and Motivation
Root cause analysis techniques
Taxonomy of TCP rate limitation causes
Our approach to infer limitation causes
Case study on Performance Analysis of ADSL Clients
Conclusions
13 April 2020
Study long lived TCP connections
Short connections are another topic o Dominated by slow start?
Assume FIFO scheduling
Necessary for link capacity estimations with packet dispersion techniques
Does not hold for all networks o E.g. cable modem and 802.11 access networks
13 April 2020
Application
Transport layer
TCP receiver o Receiver window limitation
TCP protocol o
Congestion avoidance mechanism…
Network layer
Bottleneck link
13 April 2020
Application that sends small amounts of data at constant rate
40 bytes “pushed”
13 April 2020
Application that sends larger bursts separated by idle periods
BitTorrent, HTTP/1.1 (persistent) transfer periods only keep-alive messages
13 April 2020
The application does not even attempt to use all network resources
TCP connections are partitioned into two periods:
Bulk Transfer Period (BTP): application provides constantly data to transfer o Never run out of data in buffer B1
Application Limited Period (ALP): opposite of BTP o TCP has to wait for data because B1 is empty
Sender
Application buffers
B1
Receiver
Application
TCP Network TCP
13 April 2020
Receiver advertized window limits the rate
max amount of outstanding bytes = min(cwnd,rwnd)
Sender is idle waiting for ACKs to arrive
Flow control
Sender application overflows receiving application
Buffer B2 is full
Sender
Application
TCP buffers
Network
Receiver
Application
B2
TCP
Configuration problem (unintentional)
default receiver advertized window is set too low
window scaling is not enabled
13 April 2020
Limitation is due to congestion at a bottleneck link
Shared bottleneck: obtain only a fraction of its capacity
Non-shared bottleneck: obtain all of its capacity
13 April 2020
Divide & Conquer
1.
Partition connections into BTPs and ALPs o
Filter out application impact
2.
Analyze the bulk transfer periods for limitation by o o
TCP receiver
TCP protocol o Network
Methods are based on metrics computed from packet headers
13 April 2020
13 April 2020
1. phase: Isolate
Fact: TCP always tries to send MSS size packets
Consequence: small packets (size < MSS) and idle time indicate application limitation packet smaller than MSS o Buffer between application and TCP is empty
ALP ALP
…
Idle time > RTT large fraction of small packets Time
MSS packet
…
13 April 2020
2. phase: Merge
Why?
o After Isolate, BTPs may be separated by very short ALPs o Analyze impact of the application
•
How much ALPs decrease overall throughput?
How?
o Merge subsequent transfer periods separated by ALP to create a new BTP o Mergers controlled with drop parameter o Iterate until all possible mergers are performed
13 April 2020
See:
M. Siekkinen, G. Urvoy-Keller, E. W. Biersack: On the Interaction
Between Internet Applications and TCP. ITC 2007.
13 April 2020
1.
Compute limitation scores for each BTP
4 quantitative scores o
[0,1] o We use retransmission rates, inter-arrival time patterns, path capacity, RTT etc.
2.
Perform classification of BTPs into limitation causes
Map (combination of) limitation scores into a cause
Threshold-based scheme
13 April 2020
Dispersion score
Retransmission score
Receiver window limitation score
4 thresholds need to be calibrated b-score
13 April 2020
Difficult task: Diversity vs. Control
Reference data needs to be representative & diverse enough o No simulations
Need to control experiments in some way to get what we want
Australia
Reference data with partially controlled experiments
Try to generate transfers limited by certain cause
FTP downloads from Fedora Core mirror sites o 232 sites covering all continents
Artificial bottleneck links with rshaper o network limitation
Nistnet to add delay
o receiver limitation (W r
/RTT < bw)
Eurecom Rshaper
Control the number of simultaneous downloads
Nistnet o unshared vs. shared bottleneck
Interne t
Finland
13 April 2020
Japan
USA
set th1 here bottleneck set at 1 Mbit/s, 1 download at a time
13 April 2020
Have a look at:
M. Siekkinen, G. Urvoy-Keller, E. W. Biersack: A Root Cause
Analysis Toolkit for TCP.
To appear in Computer Networks, 2008
13 April 2020
Introduction and Motivation
Root cause analysis techniques
Taxonomy of TCP rate limitation causes
Our approach to infer limitation causes
Case study on Performance Analysis of ADSL Clients
Conclusions
13 April 2020
Stress test for our techniques
Do we learn useful things?
Knowing throughput limitations (=performance) is useful
ISPs want satisfied clients
Need to know what’s going on before things can be improved
Applied root cause analysis toolkit on customer traffic of France
Telecom’s ADSL access network
13 April 2020
access network collect network
Internet
Two pcap probes here
24 hours of traffic on March 10, 2006
290 GB of TCP traffic
64% downstream, 36% upstream
Observed packets from ~3000 clients, analyze only 1335
Excluded clients did not generate enough traffic for RCA
13 April 2020
Connections
Size distribution highly skewed
Use only 1% of them for RCA o Represent > 85% of all traffic
Clients
Heavy-hitters: 15% of clients generate 85-90% of traffic (up & down)
Low access link utilization o Why?
13 April 2020
Main observation
Application limits performance of over 80% of clients
What’s going on?
13 April 2020
other
Quite stable and symmetric volumes
Over 80% of all traffic
eDonkey and “other” dominate
13 April 2020
P2P eDonkey
No recognized P2P
Asymmetric port 80/8080 downstream
Real Web traffic?
13 April 2020
Most clients’ performance limited by applications
Very low link utilizations for application limited traffic
Most of application limited traffic seems to be P2P
Peers often have asymmetric uplink and downlink capacities
P2P applications/users enforce upload rate limits
Most clients’ download performance seems to suffer from P2P clients drastically limiting their upload rates downloading client
Interne t uploading clients
Low utilization Low capacity+rate limiter
13 April 2020
“Client size” distribution skewed
Heavy hitters dominate
Majority of clients mostly throughput limited by applications
Due to: o o
P2P clients throttle upload rate (Too much?) o Asymmetric link capacities
Consequences:
Low utilization of the core access network o
See also:
Client would benefit little from subscription upgrade
M. Siekkinen, D. Collange, G. Urvoy-Keller, E.W. Biersack:
Performance Limitations of ADSL Users: A Case Study.
PAM 2007
13 April 2020
We can infer root causes for TCP throughput using
bidirectional packet traces at
single measurement point located anywhere on the TCP/IP path.
Useful for:
Performance evaluation of applications
Evaluation of network utilization
Identification of TCP configuration problems
For future:
Wireless traffic
On-line analysis
Analysis of user behavior
13 April 2020
Joint work with:
Vera Goebel, Thomas Plagemann, Karl-Andre Skevik
University of Oslo
Martin May, Theus Hossmann, Ariane Keller
ETH Zurich
Guy Leduc, Bamba Gueye
University of Liége
Ranganai Chaparadza, Lorenzo Peluso, Rudolf Roth
Fraunhofer FOKUS Institute
Overview of the ANA Project
Monitoring in ANA
Approach, requirements, goals
Monitoring architecture
Information sharing
Conclusions
13 April 2020
ANA facts:
4 years: January 2006 to December 2009
10 European partners, 1 Canadian partner
Roughly 30-40 researchers involved
A Future and Emerging Technologies (FET) project
Forward looking and "risky" research
Proactive initiative on Situated and Autonomic Communications (SAC)
New paradigms for communication/networking systems
4 projects: ANA, BIONETS, Haggle, Cascadas
13 April 2020
ETH Zurich
University of Basel
NEC
Lancaster University
Fokus
University of Liege
University Pierre et Marie
Curie
NKUA
University of Oslo
Telekom Austria
University of Waterloo
13 April 2020
The Internet suffers from architectural stress:
not ready to integrate and manage the envisaged huge numbers of dynamically attached devices (wireless revolution, mobility, personal area networks, etc)
Lacks integrated monitoring and security mechanisms
Consensus in the research community* that a next step beyond the Internet is needed.
* as seen by the number of recent related projects and initiatives (FIRE, GENI, FIND)
13 April 2020
Voice, Video, P2P, Email, youtube, ….
Protocols – TCP, UDP, SCTP, ICMP,…
Changing/updating the Internet core is difficult or impossible !
(e.g. IPv6, Multicast,
Mobile IP, QoS, …) layer
IP
Homogeneous networking abstraction
Link
Ethernet, WIFI (802.11), ATM, SONET/SDH,
FrameRelay, modem, ADSL, Cable, Bluetooth…
13 April 2020
Goal: To demonstrate the feasibility of autonomic networking.
Identify fundamental autonomic networking principles .
Design and build an autonomic network architecture .
ANA in a blink:
Network must scale in size and in functionality .
Evolving network: variability at all levels of the architecture.
ANA = framework for function (re-)composition .
Dynamic adaptation and re-organization of network.
Networks have to work
do research through prototypes
Build an experimental network architecture early on
Prototype used as feedback to refine architectural models.
grow
13 April 2020
• all device have to know IP
• IP address configuration through DHCP, zeroconf, ad hoc mode
• routing protocol has to be agreed on
Always require manual configuration
13 April 2020
New ANA Compartment
ANA core
ANA core
ANA core
ANA core
• Self-organization
• determine comm. Protocol
(non-IP) monitoring
• intra-compartment routing
• functional composition
(suitable network stack)
• Beyond IP!!!
13 April 2020
ANA core
ANA core
ANA core
To enable this vision we need:
The ANA core o Highly configurable network stack
Self-association
Service discovery
Self-organization
Functional composition
Self-optimization
13 April 2020
ANA does not propose another "one-size-fits-all network waist".
ANA is a framework to host, interconnect, and federate multiple heterogeneous networks.
ANA introduces the core concept of "network compartments."
Multiple "network compartments" can co-exist
.
..
Application layer
IP
…
Link layer
13 April 2020
ANA framework
ANA does not impose how network compartments should work internally:
the ANA framework specifies how networks interact.
ANA specifies interfaces and interactions with network compartment
…
Internal operation is not imposed leading to multiple and heterogeneous compartments but generic interaction
13 April 2020
ANA framework
App Layer
Trans Layer
Net Layer
MAC Layer
Phy Layer
Per application port
UDP/TCP handling
IP does defragmentation, checksum,…
All packets from Ethernet with:
0x0800 IP
0x86DD
IPv6
13 April 2020
At least same functionality as before, but decomposed
Allows for composition of
Checksum functionality / services
Routing
Also:
Functional
Compartment
New functionality integrated in protocol stack
Fragmentation
Applications
Reliable
Transport
Mobility
Prediction
Monitoring
Not so novel, but we add
Dynamic re-configuration
Autonomic properties
Phy/MAC Layer
13 April 2020
ANA Blueprint offers a flexible and evolvable framework.
Allows variability at all levels of the architecture: multiple o o functionalities, variants to perform a given task, o and compartments
co-exist and (can) compete, open for extensions (evolution).
Where does autonomic fit into the Blueprint?
Blueprint provides a well-defined structure on which to operate in an autonomic way
Easy to test/replace/upgrade parts of the system (except for minimal core)
Generic set of abstractions provides "common language" for algorithms and protocols
13 April 2020
Compartment
Information Channel (IC)
Information Dispatch Point (IDP)
Functional Block (FB)
13 April 2020
Compartment = wrapper for networks
Implements operational rules and administrative policies for a given communication context
Defines:
How to join and leave a compartment: member registration, trust model, authentication, etc.
How to reach (communicate with) another member: peer resolution, addressing, routing, etc.
The compartment-wide policies: interaction rules with "external world", the compartment boundaries (administrative or technical), peerings with other compartments, etc.
Compartments decompose communication systems and networks into smaller and easier manageable units.
13 April 2020
Addressing and naming are left to compartments.
Each compartment is free to use any addressing and naming schemes
Can choose not to use addresses (e.g. in sensor networks)
Main advantages
No need to manage a unique global addressing scheme
No need to impose a unique way to resolve names
ANA is open to future addressing and naming schemes
Main drawback
Global routing becomes something similar to searching
(if communicating parties are not all members of a given compartment)
13 April 2020
Startpoints instead of endpoints
In ANA communication is always towards a startpoint, or information dispatch point (IDP)
Bind to destinations in an address agnostic way
Support many flavors of compartments that can use different types of addresses and names
Useful decoupling between identifiers and means to address them
IC
A data is sent to IDP which has state to reach destination
13 April 2020
Code and state that can process data packets
Protocols and algorithms are represented as FBs
Access to FBs is also via information dispatch points (IDPs)
FBs can have multiple input and output IDPs
FB internally selects output IDP(s) to which data is sent
FB
FB data is sent to IDP which has state to call correct function inside FB
13 April 2020
Node compartment a
FB1 b
IC
Node compartment c
FB2
13 April 2020
Organize a node's functionalities as (compartment) members:
Member database: catalog of available functions
Resolution step to access a given function o Also implements access control
Resolution instantiates functional blocks (FBs)
The node compartment hosts/executes FBs and IDPs
The node compartment is the "startpoint" of any communication client
Node Compartment p e f a
13 April 2020
62
Network compartments are free to internally run whatever addressing/naming schemes, routing protocols, etc.
The "glue" for all interactions in ANA is the compartment API .
All network compartments must support the API in order to allow all possible interactions between compartments.
13 April 2020
63
The API offers 5 fundamental primitives
IDP p publish(IDP c
, CONTEXT, SERVICE)
int unpublish(IDP c
, IDP p
, SERVICE)
IDP r resolve(IDP c
, CONTEXT, SERVICE)
void* lookup(IDP c
, CONTEXT, SERVICE)
int send(IDP r
, DATA)
SERVICE = what is published or looked up
e.g., an address, a name, a file, a printing service, etc.
The CONTEXT defines some scope inside a compartment.
e.g. “global” scope = “*”, node local scope = “.”
13 April 2020
64
Publishing an IPv4 address in the Ethernet compartment.
Ethernet
Compartment
ETH-FB
IP-FB y z publish
"10.1.2.3" z node M z <-publish(y, “*”, “10.1.2.3”)
13 April 2020
Overview of the ANA Project
Monitoring in ANA
Approach, requirements, goals
Monitoring architecture
Information sharing
Conclusions
13 April 2020
66
Monitoring is essential for autonomic behaviour:
Need to know system state at all times
Adapt to the environment automatically
Monitoring gives awareness and therefore enables autonomic features, such as:
Functional composition
Service placement and selection
Advanced routing
Topology optimization
…
BUT the monitoring framework must exhibit some level of autonomy as well!
13 April 2020
Monitoring
Managed Element
Examples of decisions:
- Compose functional blocks differently
- Move service or data elsewhere
- Change routing
13 April 2020
68
Monitoring framework provides service to all ANA functional blocks that need some network state awareness
Goals:
Efficiency and accuracy o o
Avoid duplication of monitoring tasks at many levels of the architecture
(typically in many overlays)
Provide resilient and flexible means to store and give access to monitored data o Enable distributed monitoring
Self-adaptation o To environment, system resources, and usage (non-functional requirements) o Individual components as well as the whole framework
Extensibility and modularity o o
Framework allows cooperation among tools
New tools can be added
13 April 2020
Overview of the ANA Project
Monitoring in ANA
Approach, requirements, goals
Monitoring architecture
Information sharing
Conclusions
13 April 2020
Anomaly detection
Adaptive routing
Peer selection
Topology prediction
Vivaldi …
Figure out how to fulfill requests, i.e. how and what to measure.
Context data mgmt
Context mapping
Orchestration
– Handle requests
–
Manage measuring bricks
– Optimization
MCIS
Aggregation
Adaptive sampling ping
System monitoring
Avail. bw meas.
Packet capturing
Link quality prediction
…
Monitoring data storage
(RAM, DB, …)
Peer selection
Knows the network related metrics it needs (e.g. latency)
Orchestration
Dispatcher
• Discovers ”metric” bricks
• Decomposes & forwards requests
Latency
Vivaldi ping
Connectivity
Link quality
Link quality prediction
Achievable tput
Avail. bw
Passive av bw meas.
Metric bricks
Metric bricks decide how to measure the metrics , i.e. which other metric bricks or measuring bricks to use depending on:
• context (e.g. wireless or wired nw)
• non-functional requirements (e.g. max tolerated error)
MCIS
Monitoring data storage
(RAM, DB, …)
Packet capturing
System monitoring
Latency brick adapts to environment and qualitative parameters
Ping
Use Vivaldi with error prediction Ping high error low error tolerance tolerance non-functional requirements
13 April 2020
Overview of the ANA Project
Monitoring in ANA
Approach, requirements, goals
Monitoring architecture
Information sharing
Conclusions
13 April 2020
76
Efficient, robust access to data
Mechanisms for publishing and querying/finding data
Multi-attribute range queries o E.g. SELECT srcip from flow_records WHERE bytes>10 8 AND …
One-time queries and subscriptions
Information sharing functional block
Based on Mercury
What is Mercury?
A. Bharambe, M. Agrawal and S. Seshan.: Mercury: Supporting
Scalable Multi-Attribute Range Queries (SIGCOMM 2004)
13 April 2020
One ring per attribute
Each ring behaves like DHT without hashing, i.e. contiguous value ranges
Explicit load balancing scheme to cope with popular value ranges
Send data to all rings
Send query to only one ring
Query
[240, 320)
50 ≤ x ≤ 100
150 ≤ y ≤ 250
[0, 105)
77
[160, 240)
R x
[0, 80)
R y
Data item x = 100 y = 200
[210, 320)
[80, 160)
[105, 210)
From ”Mercury: Scalable Routing for Range Queries”by Ashwin R. Bharambe, SIGCOMM 2004
13 April 2020
data1 cmpt
Metadata cmpt data3 cmpt data2 cmpt
Metadata compartment enables discovery of data compartments
Kind of catalog of data stored in the whole system
One data compartment per data type
E.g. Cisco Netflow records data4 cmpt
78
Each data compartment represents a single Mercury system
Is distributed over several ANA nodes
Has an attribute hub per attribute of this data type
Organizes data independently from other data compartments
13 April 2020
79
A data compartment is a usual ANA compartment
Uses the proposed primitives of ANA compartment API
Each node has an MCIS functional block
MCIS = Multi-compartment Information Sharing
Gives access to all data compartments (including meta-data compartment)
Entry point for accessing data and storing data
13 April 2020
80
Metadata compartment
resolve(): get IDP to a data compartment
lookup(): get datatype tuples matching the query
publish(): store a new data type, i.e. “establish” a new data cmpt, get IDP to that cmpt
Data compartment
resolve(): not currently supported
lookup(): get data records of the data cmpt matching the query
publish(): store a new data record into that data cmpt
Two exercises:
Querying the IS system
Storing data into the IS
13 April 2020
81
1.
Search for MCIS service
resolve(n,”.”,”MCIS”,e): returns IDP i to the metadata cmpt
2.
Search for data type
lookup(i,”*”,”querystring”,e): returns matching data types stored currently in the system
Query string example (MIB style) X.Y.* returns data1
3.
Resolve the data1 compartment
resolve(i,”*”,”X.Y.data1”,e): returns IDP j
4.
Make the query
lookup(j,”*”,”a<x&b>y”,e): returns matching data records e
Client j i n
MCIS k l m
MCIS
MCIS
MCIS
MCIS
13 April 2020
82
1.
Search for MCIS service
resolve(n, ”.”, ”MCIS”, e): returns IDP i to the metadata cmpt
2.
Resolve the X.Y.data2 compartment
resolve(i, ”*”, ”X.Y.data2”, e): returns IDP r
3.
Store data item
publish(r, dataitem, NULL)
MCIS
MCIS
MCIS
Client n e r i
MCIS
MCIS
13 April 2020
83
80 000+ lines of C++ code
No documentation
One major source of headache
Identifiers in Mercury are IP address + TCP/UDP port number
Needed to introduce generic identifiers
Original code quite modular
We programmed
MCIS brick code
ANA nw “layer” tools wan-env apps mercury sim-env bricks
•MCIS util ana-env
13 April 2020
84
Adaptive index structures o o
Adapt to environment (e.g. nb of nodes, resources) and usage (e.g. query and data rates and patterns)
E.g, shut down unused attribute hubs and use DHT for attributes that don’t require range queries
Multi-compartment load balancing o Now only within a single compartment
Other features
Multi-attribute indexes
Joins
13 April 2020
Overview of the ANA Project
Monitoring in ANA
Approach, requirements, goals
Monitoring architecture
Information sharing
Conclusions
13 April 2020
Monitoring as an integral part of the architecture
To enable autonomic behavior
Goals of monitoring framework
Efficiency and accuracy
Adaptability
Extensibility and modularity
Current status
Still immature
Some FBs are already there, some under development, some in design phase
Implementation and evaluation
Through use case scenarios
E.g. P2P VoD streaming (Advanced peer selection)
Some of the future research topics:
self-adaptive MCIS
self-organized coordinate system (University of Liege)
mobility monitoring and link quality prediction (ETHZ)
13 April 2020