Jeff Chase's Talk - Internet Systems Software Group

Slicing with SHARP Jeff Chase Duke University Federated Resource Sharing How do we safely share/exchange resources across domains? – {Administrative, policy, security, trust} domains or sites - Sharing arrangements form dynamic Virtual Organizations - Physical and logical resources (data sets, services, applications). Location-Independent Services Clients request routing dynamic server set varying load The last decade has yielded immense progress on building and deploying large-scale adaptive Internet services. • Dynamic replica placement and coordination • Reliable distributed systems and clusters • P2P, indirection, etc. Example services: caching network or CDN, replicated Web service, curated data, batch computation, wide-area storage, file sharing. Managing Services Servers Clients Service Manager sensor/actuator feedback control • A service manager adapts the service to changes in load and resource status. – e.g., gain or lose a server, instantiate replicas, etc. – The service manager itself may be decentralized. • The service may have contractual targets for service quality ─ Service Level Agreements or SLAs. An Infrastructure Utility Resource pool Resource efficiency Surge protection Robustness “Pay as you grow” Economy of scale Geographic dispersion In a utility/grid, the service obtains resources from a shared pool. – Third-party hosting providers ─ a resource market. – Instantiate service wherever resources are available and demand exists. – Consensus: we need predictable performance and SLAs. The Slice Abstraction • Resources are owned by sites. E.g., node, cell, cluster • Sites are pools of raw resources. Service S1 E.g., CPU, memory, I/O, net Site B • A slice is a partition or bundle or subset of resources. • The system hosts application services as “guests”, each running within a slice. Site A Service S2 Site C Slicing for SLAs • Performance of an application depends on the resources devoted to it [Muse01]. • Slices act as containers with dedicated bundles of resources bound to the application. – Distributed virtual machine / computer • Service manager determines desired slice configuration to meet performance goals. • May instantiate multiple instances of a service, e.g., in slices sized for different user communities. • Services may support some SLA management internally, if necessary. Example: Cluster Batch Pools in Cluster-on-Demand (COD) • Partition a cluster into isolated virtual clusters. – Virtual cluster owner has exclusive control over servers. • Assign nodes to virtual clusters according to load, contracts, and resource usage policies. • Example service: a simple wrapper for SGE batch scheduler middleware to assess load and obtain/release nodes. Service Managers (e.g., SGE) request COD Cluster grant Dynamic Virtual Clusters in a Grid Site Manager [HPDC 2003] with Laura Grit, David Irwin, Justin Moore, Sara Sprenkle A Note on Virtualization • Ideology for the future Grid: adapt the grid environment to the service rather than the service to the grid. – Enable user control over application/OS environment. – Instantiate complete environment down to the metal. – Don’t hide the OS :it’s just another replaceable component. – Requires/leverages new underware for instantiation. • Virtual machines (Xen, Collective, JVM, etc.) • Net-booted physical machines (Oceano, UDC, COD) – Innovate below the OS and alongside it (infrastructure services for the control plane). SHARP • Secure Highly Available Resource Peering • Interfaces and services for external control of federated utility sites (e.g., clusters). • A common substrate for extensible policies/structures for resource management and distributed trust management. – Flexible on-demand computing for a site, and flexible peering for federations of sites • From PlanetLab to the New Grid – Use it to build a resource “dictatorship” or a barter economy, or anything in between. – Different policies/structures may coexist in different partitions of a shared resource pool. Goals • The question addressed by SHARP is: how do the service managers get their slices? • How does the system implement and enforce policies for allocating slices? – Predictable performance under changing conditions. – Establish priorities under local or global constraint. – Preserve resource availability across failures. – Enable and control resource usage across boundaries of trust or ownership (peering). – Balance global coordination with local control. – Extensible, pluggable, dynamic, decentralized. Non-goals SHARP does NOT: • define a policy or style of resource exchange; – E.g., barter, purchase, or central control (e.g., PLC) • care how services are named or instantiated; • understand the SLA requirements or specific resource needs of any application service; • define an ontology to describe resources; • specify mechanisms for resource control/policing. Resource Leases S1 Service Manager request Site A “Authority” grant <lease> <issuer> A’s public key </issuer> <signed_part> <holder> S1’s public key </holder> <rset> resource description </rset> <start_time> … </start_time> <end_time> … </end_time> <sn> unique ID at Site A </sn> </signed_part> <signature> A’s signature </signature> </lease> Agents (Brokers) S1 Service Manager request request Site A Authority grant grant • Introduce agent as intermediary/middleman. • Factor policy out of the site authority. • Site delegates control over its resources to the agent. • Agent implements a provisioning/allocation policy for the resources under its control. Leases vs. Tickets • The site authority retains ultimate control over its resources: only the authority can issue leases. – Leases are “hard” contracts for concrete resources. • Agents deal in tickets. – Tickets are “soft” contracts for abstract resources. – E.g., “You have a claim on 42 units of resource type 7 at 3PM for two hours…maybe.” (also signed XML) – Tickets may be oversubscribed as a policy to improve resource availability and/or resource utilization. – The subscription degree gives configurable assurance spanning continuum from a “hint” to a hard reservation. Service Instantiation 7 instantiate service in virtual machine Site redeem ticket 5 6 grant lease request grant ticket 2 1 request 3 grant ticket 4 Service Manager Agent Like an airline ticket, a SHARP ticket must be redeemed for a lease (boarding pass) before the holder can occupy a slice. Ticket Delegation Transfer of resources, e.g., as a result of a peering agreement or an economic transaction. The site has transitive trust in the delegate. Agents may subdivide their tickets and delegate them to other entities in a cryptographically secure way. Secure ticket delegation is the basis for a resource economy. Delegation is accountable: if an agent promises the same resources to multiple receivers, it may/will be caught. Peering Sites may delegate resource shares to multiple agents. E.g., “Let my friends at UNC use 20% of my site this week and 80% next weekend. UNC can allocate their share to their local users according to their local policies. Allocate the rest to my local users according to my policies.” Note: tickets issued at UNC self-certify their users. A SHARP Ticket <ticket> <subticket> <issuer> A’s public key </issuer> <signed_part> <principal> B’s public key </principal> <agent_address> XML RPC redeem_ticket() </agent_address> <rset> resource description </rset> <start_time> … </start_time> <end_time> … </end_time> <sn> unique ID at Agent A </sn> </signed_part> <signature> A’s signature </signature> </subticket> <subticket> <issuer> B’s public key </issuer> <signed_part> … </signed_part> <signature> B’s signature </signature> </subticket> </ticket> Tickets are Chains of Claims claimID = a holder = A a.Rset = {…} a.term claimID = b holder = B issuer = A parent = a b.rset/term A Claim Tree anchor The set of active claims for a site forms a claim tree. 40 ticket T The site authority maintains the claim tree over the redeemed claims. 25 8 final claim 10 9 3 3 Ticket Distribution Example A: 40 B: 8 resource space D: 3 E: 3 C: 25 H: 7 F: 9 conflict G: 10 t0 t time • Site transfers 40 units to Agent A • A transfers to B and C • B and C further subdivide resources • C oversubscribes its holdings in granting to H, creating potential conflict Detecting Conflicts resource space A: 40 100 B: 8 D: 3 E: 3 C: 25 … 40 H: 7 F: 9 25 26 9 7 8 G: 10 10 t0 t time 3 3 Conflict and Accountability • Oversubscription may be a deliberate strategy, an accident, or a malicious act. • Site authorities serve oversubscribed tickets FCFS; conflicting tickets are rejected. – Balance resource utilization and conflict rate. • The authority can identify the accountable agent and issue a cryptographically secure proof of its guilt. – The proof is independently verifiable by a third party, e.g., a reputation service or a court of law. • The customer must obtain a new ticket for equivalent resource, using its proof of rejection. Agent as Arbiter Agents implement local policies for apportioning resources to competing customers. Examples: – Authenticated client identity determines priority. – Sell tickets to the highest bidder. – Meet long-term contractual obligations; sell the excess. Agent as Aggregator Agents may aggregate resources from multiple sites. Example: PlanetLab Central – Index by location and resource attributes. – Local policies match requests to resources. – Services may obtain bundles of resources across a federation. Division of Knowledge and Function Service Manager Knows the Application Instantiate app Monitor behavior SLA/QoS mapping Acquire contracts Renew/release Agent/Broker Guesses global status Availability of resources What kind How much Where (site grain) How much to expose about resource types? About proximity? Authority Knows local status Resource status Configuration Placement Topology Instrumentation Thermals, etc. Issues and Ongoing Work • SHARP combines resource discovery and brokering in a unified framework. – Configurable overbooking degree allows a continuum. • Many possibilities exist for SHARP agent/broker representations, cooperation structures, allocation policies, and discovery/negotiation protocols. • Accountability is a fundamental property needed in many other federated contexts. – Generalize accountable claim/command exchange to other infrastructure services and applications. • Bidding strategies and feedback control. Conclusion • Think of PlanetLab as an evolving prototype for planetary-scale on-demand utility computing. – Focusing on app services that are light resource consumers but inhabit many locations: network testbed. • It’s growing organically like the early Internet. – “Rough consensus and (almost) working code.” • PlanetLab is a compelling incubator and testbed for utility/grid computing technologies. • SHARP is a flexible framework for utility resource management. • But it’s only a framework…. Performance Summary • SHARP prototype complete and running across PlanetLab • Complete performance evaluation in paper • 1.2s end-to-end time to: – Request resource 3 peering hops away – Obtain and validate tickets, hop-by-hop – Redeem ticket for lease – Instantiate virtual machine at remote site • Oversubscription allows flexible control over resource utilization versus rate of ticket rejection Related Work • Resource allocation/scheduling mechanisms: – Resource containers, cluster reserves, Overbooking • Cryptographic capabilities – Taos, CRISIS, PolicyMaker, SDSI/SPKI • Lottery ticket inflation – Issuing more tickets decreases value of existing tickets • Computational economies – Amoeba, Spawn, Muse, Millenium, eOS • Self-certifying trust delegation – PolicyMaker, PGP, SFS Physical cluster COD servers backed by configuration database Web interface DHCP Dynamic virtual clusters resource manager MySQL confd NIS MyDNS Image upload Network boot Automatic configuration Resource negotiation Batch pool vCluster Database-driven network install Web service vCluster Energy-managed reserve pool SHARP • Framework for distributed resource management, resource control, and resource sharing across sites and trust domains Challenge Maintain local autonomy SHARP Approach Sites are ultimate arbiters of local resources Decentralized protocol Resource availability in the presence of agent failures Tickets time out (leases) Malicious actors Signed tickets Controlled oversubscription Audit full chain of transfers 40 anchor 8 claimID = a holder = A issuer = A parent = a.rset/term 40 ticket T space 25 3 3 9 future claim final claim 10 7 T t0 t time 25 10 claimID = b holder = B issuer = A parent = a b.rset/term 8 9 3 claim tree at time t0 ticket({c0,…,cn})   ci, i = 0..n-1: subclaim(ci+1, ci) & anchor(c0) 3 claimID = c holder = C issuer = B parent = b c.rset/term subclaim(claim c, claim p)  c.issuer = p.holder  c.claimID = p  contains(p.rset, c.rset)  subinterval(c.term, p.term) anchor(claim a)  a.issuer = a.holder & a.parent = null contains(rset p, rset c)  c.type = p.type  c.count  p.count subinterval(term c, term p)  p.start  c.start  p.end  p.start  c.end  p.end agent A B C claim delegation Mixed-Use Clustering Virtual clusters BioGeometry batch pool SIMICS/Arch batch pool Internet Web/P2P emulations Student semester project Somebody’s buggy hacked OS Physical clusters Vision: issues “leases” on isolated partitions of the shared cluster for different uses, with push-button Web-based control over software environments, user access, file volumes, DNS names. Grids are federated utilities • Grids should preserve the control and isolation benefits of private environments. • There’s a threshold of comfort that we must reach before grids become truly practical. – Users need service contracts. – Protect users from the grid (security cuts both ways). • Many dimensions: – decouple Grid support from application environment – decentralized trust and accountability – data privacy – dependability, survivability, etc. COD and Related Systems Other cluster managers based on databasedriven PXE installs: Oceano hosts Web services under dynamic load. OS-Agnostic COD Dynamic clusters Netbed/Emulab Flexible configuration NPACI Rocks configures static clusters for emulation experiments. Open source configures Linux compute clusters. COD addresses hierarchical dynamic resource management in mixed-use clusters with pluggable middleware (“multigrid”). Dynamic Virtual Clusters Web interface Reserve pool (off-power) COD database Virtual Cluster #2 COD Manager Pluggable service managers Batch schedulers (SGE, PBS), Grid PoP, Web services, etc. Virtual Cluster #1 Allocate resources in units of raw servers Database-driven network install Enabling Technologies Linux driver modules, Red Hat Kudzu, partition handling DHCP host configuration DNS, NIS, etc. user/group/mount configuration NFS etc. network storage automounter PXE network boot IP-enabled power units Ethernet VLANs Power: APM, ACPI, Wake-on-LAN Recent Papers on Utility/Grid Computing • Managing Energy and Server Resources in Hosting Centers [ACM Symposium on Operating Systems Principles 2001] • Dynamic Virtual Clusters in a Grid Site Manager [IEEE HighPerformance Distributed Computing 2003] • Model-Based Resource Provisioning for a Web Service Utility [USENIX Symposium on Internet Technologies and Systems 2003] • An Architecture for Secure Resource Peering [ACM Symposium on Operating Systems Principles 2003] • Balancing Risk and Reward in a Market-Based Task Manager [IEEE High-Performance Distributed Computing 2004] • Designing for Disasters [USENIX Symposium on File and Storage Technologies 2004] • Comparing PlanetLab and Globus Resource Management Solutions [IEEE High-Performance Distributed Computing 2004] • Interposed Proportional Sharing for a Storage Service Utility [ACM Symposium on Measurement and Modeling of Computer Systems 2004]

Jeff Chase's Talk - Internet Systems Software Group

Related documents

Products

Support

Jeff Chase's Talk - Internet Systems Software Group

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib