Cluster-on-Demand (COD) Justin Moore Duke University Slide 1

advertisement
Slide 1
Cluster-on-Demand (COD)
Justin Moore
Duke University
Slide 2
How Big Is It?

500? 5000? 25,000?

Clusters are growing

Clusters are expensive
– Power, A/C, Management …

How to manage {heat, power, failures}?

How to keep everything organized?

How to divide resources?
Slide 3

How Do You Use It?
We’ve got good middleware
– Batch queues, Internet Services, research apps …

But customers are very picky
– “Linux!” “FreeBSD!” “Windows!” “Minix!” “Minix??”
– “I only need it for 30 minutes!!”

Customers != administrators
– Contributing to the problem, not the solution

How to share and manage our clusters?
“Can’t we all just get along??”
Slide 4
COD: The More the Merrier

Automated framework for resource management

Owners define policies, customers define configs

COD creates, configures dynamic virtual clusters
– Isolated, secure collection of nodes
– Backed by network storage
– Automatic configuration: fast and OS-agnostic

Middleware negotiates allocations with COD
– Virtual Cluster Manager: COD-aware layer
Slide 5
Dynamic Virtual Clusters
Reserve pool
(off-power)
DB
Ninja Virtual
Cluster
COD
Manager
Node
reallocation
Example: CNN on 9/11
SGE Virtual
Cluster
Slide 6

Those Wonderful Toys
Leverage open standards and open source
– DHCP, NFS, NIS, XML
– Only constraint is that Linux must support hardware
– PXELinux-based installer, RHAT/Debian tools

Currently testing working COD prototype
– Core of policy-based scheduling engine: CSP-solver
– Framework of node requests + allocation negotiation
– OS- and filesystem-agnostic installer
– Testbed to examine policies and microbenchmarks
Slide 7

COD: Size Doesn’t Matter
Enable management scalability for hosting centers
– Hierarchical policy-driven mechanisms
– Empower owners and customers
Details and paper at
http://www.cs.duke.edu/~justin/cod/
Slide 8
Questions?
Download