A Collaborative Monitoring Mechanism for Making a Multitenant Platform Accoutable HotCloud 10 By Xuanran Zong Background • Applications are moving to cloud – Pay-as-you-go basis – Resource multiplexing – Reduce over-provisioning cost • Cloud service uncertainty – How do the clients know if the cloud provider handles their data and logic correctly • Logic correctness • Consistency constraints • Performance Service level agreement (SLA) • To ensure data and logic are handled correctly, service provider offers service level agreement to clients – Performance • e.g. One EC2 compute unit has the computation power of 1-1.2 GHz – Availability • e.g. the service would up 99.9% of the time SLA • Problems – Few means are provided to clients to make a SLA accountable when problem occurs • Accountable means we know who is responsible when things go wrong • Monitoring is provided by provider – Clients are often required to furnish evidence all by themselves to be eligible to claim credit for SLA violation EC2 SLA Reference: http://usenix.org/events/hotcloud10/tech/slides/wangc.pdf Accountability service • Provided by third party • Responsibility – Collect evidence based on SLA – Runtime compliance check and problem detection Problem description • Clients has a set of end-points {ep0, ep1, … , epn-1} that operate on data stored in multitenancy environment • Many things can go wrong – Data is modified without owner’s permission – Consistency requirement is broken • The accountability service should detect these issues and provide evidence. System architecture • Wrapper provided by third party • Wrapper captures input/ouput from epi and send to accountability service Accountability service • The accountability service maintains a view of the data state – Reflects what data should be from users’ perspective – Aggregates data updating requests of users to calculate the data state – Authenticates query results based on the calculated data state Evidence collection and processing • Logging service, wep, extract operation information and send log message to accountability service W – If it is a update service, W updates MB-tree – If it is a query service, W authenticates the result with MB-tree and ensures correctness and completeness – MB-tree maintains the data state Data state calculation • Use Merkle B-tree to calculate data state • By combining the items in VO, we can recalculate the root of the MB-tree and compare it with the root to reveal the correctness and completeness of the query result Consistency issue • What if the log messages arrive out-of-order? – Assume eventual consistency – Clocks are synchronized – Maintains a sliding window of sorted log messages based on timestamp – Time window size is determined by the maximum delay of passing a log message from client to W Collaborative monitoring mechanism • Current approach – Centralized: availability, scalability, trustworthy • Let’s make it distributed – Data state is maintained by a set of services – Each service maintains a view of the data state Design choice I • Log send to one data state service and the service then propagate the log to other services in a synchronous manner – Pros • Strong consistency • Request can be answered by any service – Cons • Large overhead due to synchronous communication Design choice II • Log send to one service and the service propagate the log asynchronously – Pros • Better logging performance – Cons • Uncertainty in answering an authentication request Their design • • • • Somewhere in between of the two extremes Partition the key range into a few disjoint regions Log message only sends to its designated region Log message is propagate synchronously within the region and asynchronously across regions • Authentication request is directed to service whose region overlaps most with request range – Answer with certainty if request range falls inside service region – Wait, if not Evaluation • Overhead – Centralized design – Where does the overhead come from? Evaluation • VO calculation overhead Evaluation • Performance improvement with multiple data state service Discussion • Articulate the problem clearly and show one solution that employs third party to make the data state accountable • Which part is the main overhead? – Communication? VO calculation? • Distributed design does not help much when query range is large • Do people want to sacrifice their performance(at least double the time) in order to make the service accountable? • Can we use similar design to make other parts accountable? For instance, performance?