How Can You Have QoS When… Jennifer Rexford AT&T Labs--Research How Can You Have QoS When… • A typographical error by a network operator can bring down your service? • Routing anomalies and slow convergence might (temporarily) discard your traffic? • You don’t know how to set the QoS parameters and estimate your bill? • You can’t tell who is to blame for the QoS violation you have experienced? A single typo can bring down your network? • Router configuration problems – Non-standard “assembly language” programming – Configuring individual routers not a network – Complexity in network protocols and mechanisms • … lead to performance problems – Human error responsible for half of outages – Security holes, resource inefficiency – Delay and cost in configuring and troubleshooting • … and some research challenges – – – – Models of protocol configuration state Codifying of best common practices Tools for error checking and data mining Systems for automated configuration The routing system discards your packets? • Routing problems – Transient instability during routing convergence – Blackholes, route hijacking, policy oscillation,… – Congestion due to sub-optimal routing configuration • … lead to performance problems – Packets dropped, discarded, or out-of-order – Forwarding loops consuming extra bandwidth • … and some research challenges – – – – Faster data-plane convergence Detection and diagnosis of anomalies Checking of configuration errors and policy conflicts Better traffic engineering and capacity planning Customers don’t know what QoS params to select? • QoS specification problems – Don’t know the traffic mix by 5-tuple – Can’t accurately map the 5-tuple to applications – Can’t predict how much their bill will be • … leads to slow QoS adoption – Customers want QoS but can’t specify it – Customers want QoS but are wary of their bill • … and some research challenges – Traffic measurement and characterization – Digging below the 5-tuple in the IP/TCP header – Mapping traffic classes into QoS classes Customers don’t know who is to blame? • Finger-pointing problem – Hard to detect QoS violations – Even harder to diagnose them – Even harder to ascribe blame • … leads to low end-to-end QoS adoption – SLAs based just on basic availability – SLAs only for “on net” traffic in one domain • … and some research challenges – Measurements for the finger-pointing problem – Techniques for outsourcing the finger-pointing to a third-party or the provider Conclusion • We need to solve these problems!