Matei Zaharia, Benjamin Hindman, Andy
Konwinski, Ali Ghodsi, Anthony Joseph,
Randy Katz, Scott Shenker, Ion Stoica
• Clusters of commodity servers have become a major computing platform in industry and academia
• Driven by data volumes outpacing the processing capabilities of single machines
• Democratized by cloud computing
• Some have declared that “the datacenter is the new computer”
• Claim: this new computer increasingly needs an operating system
• Not necessarily a new host OS, but a common software layer that manages resources and provides shared services for the whole datacenter, like an OS does for one host
• Growing number of applications
– Parallel processing systems: MapReduce, Dryad,
Pregel, Percolator, Dremel, MR Online
– Storage systems: GFS, BigTable, Dynamo, SCADS
– Web apps and supporting services
• Growing number of users
– 200+ for Facebook’s Hadoop data warehouse, running near-interactive ad hoc queries
• Resource sharing across applications & users
• Data sharing between programs
• Programming abstractions (e.g. threads, IPC)
• Debugging facilities (e.g. ptrace, gdb)
Result: OSes enable a highly interoperable software ecosystem that we now take for granted
• Today, a scientist analyzing data on a single machine can pipe it through a variety of tools, write new tools that interface with these through standard APIs, and trace across the stack
• In the future, the scientist should be able to fire up a cloud on EC2 and do the same thing:
– Intermix a variety of apps & programming models
– Write new parallel programs that talk to these
– Get a unified interface for managing the cluster
– Debug and trace across all these components
• Hadoop MapReduce as common execution and resource sharing platform
• Hadoop InputFormat API for data sharing
• Abstractions for productivity programmers, but not for system builders
• Very challenging to debug across all the layers
• Resource sharing:
– Lower-level interfaces for fine-grained sharing
(Mesos is a first step in this direction)
– Optimization for a variety of metrics (e.g. energy)
– Integration with network scheduling mechanisms
(e.g. Seawall [NSDI ‘11], NOX, Orchestra)
• Data sharing:
– Standard interfaces for cluster file systems, keyvalue stores, etc
– In-memory data sharing (e.g. Spark, DFS cache), and a unified system to manage this memory
– Streaming data abstractions (analogous to pipes)
– Lineage instead of replication for reliability (RDDs)
• Programming abstractions:
– Tools that can be used to build the next
MapReduce / BigTable in a week (e.g. BOOM)
– Efficient implementations of communication primitives (e.g. shuffle, broadcast)
– New distributed programming models
• Debugging facilities:
– Tracing and debugging tools that work across the cluster software stack (e.g. X-Trace, Dapper)
– Replay debugging that takes advantage of limited languages / computational models
– Unified monitoring infrastructure and APIs
• A successful datacenter OS might let users:
– Build a Hadoop-like software stack in a week using the OS’s abstractions, while gaining other benefits (e.g. cross-stack replay debugging)
– Share data efficiently between independently developed programming models and applications
– Understand cluster behavior without having to log into individual nodes
– Dynamically share the cluster with other users
• Datacenters need an OS-like software stack for the same reasons single computers did: manageability, efficiency & programmability
• An OS is already emerging in an ad-hoc way
• Researchers can help by taking a long-term approach towards these problems
• Focus on paradigms, not performance
– Industry is tackling performance but lacks luxury to take long-term view towards abstractions
• Explore clean-slate approaches
– Likelier to have impact here than in a “real” OS because datacenter software changes quickly!
• Bring cluster computing to non-experts
– Much harder and more rewarding than big users