Verifiable Resource Accounting for Cloud Computing Services Vyas Sekar, Petros Maniatis ISTC for Secure Computing 1 2 State of cloud computing today .. It's that dreaded time of the month again, the time of the month that we, the 400,000+ Amazon Web Service consumers await with great anticipation / horror. What I'm talking about is the Amazon Web Services Billing Statement sent at beginning of each month. As it turns out, Microsoft's doesn't disclose revenues related to its cloud services. And on that matter, it's not alone. Neither do Amazon, Google, or IBM. Need stronger, verifiable resource accounting! 3 Divided opinions on “better accounting” vs. Non-problem Technically “easy” Market forces will solve this! “Obviously” critical problem But, we don’t know how!! Little systematic research on this topic! 4 Goal of this work • Stimulate active discussion • Our own position: “obviously critical” • Sketch a technical framework for how 5 Outline • Motivation • Problem definition • Did-I verifiability • Should-I verifiability • Discussion • Ongoing work 6 Problem Setup Verifier T,R,W,A Task (T) Provider Report (R) Trusted Layer Customer Witness (W) Attribution Model (A) e.g., SLA-like contract 7 What does verifiability mean? Task,Report,Witness,Attribution (T,R,W,A) Verifier Customer 1. Did I use the resources billed? T did physically consume X cycles, Y GB RAM, Z MB bandwidth Is P double counting or overcharging? 2. Should I have used these resources? e.g., Was it because of poor scheduling by P? Did T consume more due to “contention” with T’ on same CPU? 8 Outline • Motivation • Problem definition • Did-I verifiability • Should-I verifiability • Discussion • Ongoing work 9 Did-I Verifiability T1 C1 Provider P R1 T2 C2 R2 T1, T2 did physically consume X1, X2 cycles i.e., P is not “double counting” or overcharging 10 A Clean-slate Solution Task1 Task2 Visibility into low-level No spurious reports Epoch Resource1 Resource2 1 T1=5, T2=0 T1=1, T2=2 2 T1=1, T2=10 T1=0, T2=10 Resource 1 Resource 2 “Trusted” Hardware-root-of-trust …. “Witness” 11 Challenges with Clean Slate Performance slowdown Task1 Task2 Bandwidth overhead Epoch Resource1 Resource2 1 T1=5, T2=0 T1=1, T2=2 2 T1=1, T2=10 T1=0, T2=10 Resource 1 Resource 2 …. Doesn’t exist yet! 12 Practical Approximations • Bandwidth overhead Aggregation • Performance slowdown – Sampling or snapshots • Relaxing hardware dependence – Small instruction stream recorder (not online) – Shim layer for monitoring 13 Outline • Motivation • Problem definition • Did-I verifiability • Should-I verifiability • Discussion • Ongoing work 14 Should-I Verifiability Provider P T R Consumer Ideal Provider P’ T R’ Is R very different from R’ in ideal case? e.g., is P scheduling/allocating as it promised? e.g., is R high because of contention? 15 Clean-slate Should-I Customer Provider Requests Allocator Decisions Interrupts e.g., this is the VMM or cluster scheduler implementing “weighted fair queuing” Verifier Log of Requests, interrupts Allocator Log of Decisions Decisions “Witness” 16 Challenges with Clean-Slate Leak proprietary logic Customer Provider Requests Allocator Interrupts Decisions Verifier Log of Requests, interrupts Allocator Log of Decisions Decisions Log overhead e.g., locate verifier or agent close to P 17 Balancing privacy vs accountability Verifier Customer Provider Requests Private Policy Allocator Template Interrupts Hidden Log of Requests, interrupts Decisions Allocator Template Log of Decisions Decisions e.g., Is the provider running a “fair queueing” scheduler? But “weights” are private policy 18 Alternative “Quantitative” Should-I Leak proprietary logic Customer Provider Log of Requests, interrupts Requests Allocator Interrupts Verifier Decisions Expected 50 Task Log of Decisions Allocator Decisions 0 1 2 3 4 5 6 7 Report CPU Memory Very different from SLA verification Not promising lower bound on “resources” Rather computing upper bound on “consumption” 19 Outline • Motivation • Problem definition • Did-I verifiability • Should-I verifiability • Discussion • Ongoing work 20 Discussion • Provider incentives – More adoption to avoid underutilization – Less conservative in accounting – Prevent customers from gaming the system • Why markets may not suffice? – Infrastructure few players – Cost of migrating is non-trivial • Relaxing provider assistance – Resource prediction or collaborative inference 21 Summary • Honeymoon phase for cloud is over Need stronger verifiable accounting • Benefits to consumers & providers – Side benefit: may encourage better practices • Sketch a framework, potential solutions – Did-I and Should-I verifiability • Working toward a practical realization 22