Reliable Multicast for Time-Critical Systems

Reliable Multicast for

Time-Critical Systems

Mahesh Balakrishnan

Ken Birman

Cornell University

Mission-Critical Datacenters

 COTS Datacenters

 Online e-tailers, search engines, corporate applications

 Web-services

 Mission-Critical Apps

 Need: Scalability, Availability, Fault-Tolerance

… Timeliness!

The Time-Critical Datacenter

 Migrating time-critical applications to commodity datacenters…



… conversely, providing datacenter webservices with time-critical performance.

What’s a Time-Critical System?



Not ‘real time’, but ‘real fast’!

 Financial calculators, military command and control… air traffic control (ATC)



… foobooks.com!

 Technology Gap: Real-Time focuses on determinism, scale-up architectures

The French ATC System



Mid to Late 90’s

 Teams of 3-5 air traffic controllers on a cluster of desktop consoles

 50-200 of these console clusters in an air traffic control center

 Why study the French ATC?

ATC Subsystems















Radar Image

Weather Alert

Track Updates

Updates to Flight Plans

Console to Console State Updates

System Management and Monitoring

ATC center to center Updates



Multicast ubiquitous…

Two Kinds of Multicast

 Virtually Synchronous Multicast: very reliable, not particularly fast

 Unreliable Multicast: very fast, not particularly reliable

 Nothing in between!

Two Kinds of Subsystems

 Category 1: Complete reliability (virtual synchrony) e.g: Routing decisions

 Category 2: Careful application design + natural hardware properties + management policies. e.g: Radar

Multicast in the French ATC

 Engineering Lessons:

 Structure application to tolerate partial failures

 Exploit natural hardware properties

 Can we generalize to modern systems?

 Research Direction: Time-Critical Reliability

 Can we design communication primitives that encapsulate these lessons?

Anatomy of a Cloned Service

Updates multicast to whole group RACS

Queries unicast to single nodes

Services

 An Amazon web-page is constructed by

100s of co-operating services*

 Multicast is used for:

 Updating Cloned Services

 Publish-Subscribe / Eventing

 Datacenter Management/Monitoring

* Werner Vogels, CTO of amazon.com, at SOSP 2005

Multicast in the Datacenter

 A node is in many multicast groups:

 One for each service it hosts

 One for each topic it subscribes to

 One or more administration groups

Large Numbers of Overlapping Groups!

Service Semantics

User

History

Service

Product

Popularity

Service

Store Inventory

Shipping

Scheduler

User Profile

Data

Data Store

Services: stale data can result in overselling / underselling  loss of realworld dollars

Cache

Services: updated periodically by back-end data-stores

Product

Recommendations

The Challenge

 Datacenter Blades are failure-prone:

 Crash failures

 Byzantine behavior

 Bursty Packet Loss :

End-hosts kernels drop packets when subjected to traffic spikes.

A New Reliability Model

 Rapid delivery is more important than perfect reliability

 Probabilistic Timeliness

 Graceful Degradation

Wanted: a multicast primitive that

2.

3.

4.

5.

1.

Scales to large numbers of arbitrarily overlapping multicast groups

Delivers multicasts quickly

Tolerates datacenter failure modes – bursty packet loss, node failures

Offers probabilistic properties

‘Gives up’ on lost data after a threshold period

Ricochet: Lateral Error Correction

 Receivers exchange error correction

XORs of multicast traffic

 Works very well with multiple groups – scales upto a thousand groups per node

 Probabilistic Timeliness: probability distribution of delivery latencies

Predictive Total Ordering (Plato)

 Delivers messages to applications with no ordering delay in most cases

 Orders messages only if there is a high probability of out-of-order delivery across different nodes

 Probabilistic Timeliness: probability distribution of ordered delivery latency

Performance

 SRM takes seconds to recover lost packets

 Ricochet recovers almost all packets within ~70 milliseconds

Conclusion





Move from R/T to T/C yields huge benefits!

 Ricochet is faster… slashes latency… scalable…

 Clean delivery delay curve a powerful design tool, replaced traditional hard (but conservative) limits

We’re open for business:





Software and detailed paper available for download

Give it a try… tell us what you think!

www.cs.cornell.edu/projects/quicksilver/ricochet.html

Reliable Multicast for Time-Critical Systems

Reliable Multicast for

Time-Critical Systems

Mission-Critical Datacenters

The Time-Critical Datacenter

What’s a Time-Critical System?

The French ATC System

ATC Subsystems

Two Kinds of Multicast

Two Kinds of Subsystems

Multicast in the French ATC

Anatomy of a Cloned Service

Services

Multicast in the Datacenter

Service Semantics

The Challenge

A New Reliability Model

Predictive Total Ordering (Plato)

Performance

Conclusion

Related documents

Products

Support

Reliable Multicast for Time-Critical Systems