Automated, Elastic Resource Provisioning for NoSQL clusters Using TIRAMOLA Ioannis Konstantinou CSLAB, National Technical University of Athens, Greece Motivation – the story(1) ‘Big-data’ opts for highly scalable distributed solutions Applications suffer from highly variable and unpredictable workloads (Web) analytics, science, business Store + analyze everything, no matter the size Traditional databases not up to the task Social networks, internet gaming, web sites, etc Over-provisioning is costly, under-provisioning leads to outages Cloud computing can help!!!! Elastic, pay-as-you go resource provisioning Suitable for distributed scalable applications Can we take advantage of elasticity in an automated, fine-tuned application agnostic manner? 2 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 Motivation – the story (2) NoSQL Non-relational Horizontal scalable Distributed Open source And often: Many, many, implementations… 3 schema-free, easily replicated, simple API, eventually consistent /(not ACID), big-data-friendly, etc Currently around 150 see http://nosql-database.org/ Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 NoSQLs and elasticity Column family Document store CouchDB, mongoDB, … Key-Value store Hbase, Cassandra, … Riak, Dynamo,Voldemort, … Many offer elasticity+sharding: Expand/contract resources according to demand Shared-nothing architecture allows that Pay-as-you-go, robustness, performance Manual and simple threshold-based elastic actions are suboptimal 4 Important! See Apr 2011 Amazon outage (foursquare, reddit,…) Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 thus…(end of the story) PaaS and NoSQLs are (or should be) inherently elastic How efficiently do they implement elasticity? NoSQLs over an IaaS platform We need a study that registers qualitative + quantitative results Can we automate the elasticity procedure? 5 EC2, Eucalyptus, OpenStack,… For any NoSQL, any user-given policy, any metric of interest? Can we bundle everything together? The Tiramola system Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 Contributions Tiramola, a generic modular system able to manage Module for NoSQL cluster monitoring Real time reporting of client, application and general purpose metrics Decision making module for automatic elasticity 6 any NoSQL engine User-defined policies Automatic resource provisioning using IaaS clouds Implementation as a Markov Decision Process Continuously identifies the best scaling action according to observed system state Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 Contribution side-effects Coding + infrastructure Using cloud-based client tools, platform-agnostic Euca2ools guarantee execution in numerous cloud platforms YCSB clients Cassandra, Hbase implementation 7 Open source python code (GFOSS + google code) http://tiramola.googlecode.com almost Voldemort, Riak Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 Related work Elastic Nimbus [13], AutoScale [14], SmartSla [15], iBalloon [17], Kingfisher [18] [19] elastic HDFS [20] Provides only support for scaling [21] requires software installation at the cloud vendor [22] classic microeconomics for profit maximization, no support for fine-grained policy definitions Systems like Autoscaling [24], RightScale [26], Scalr [27] 8 Specific HDFS implementation Not fine-grained policy support, No support for auto-learning Cloudy [20], Nefeli [21], Microeconomics [23] Do not handle NoSQL systems Do not address dynamic cluster resizes Commercial: vendor-lock in Limited metric support, primitive decision making policies Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 Tiramola architecture overview 9 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 Tiramola monitoring module Ganglia tool Metrics 10 Suitable for clusters Easy to install/maintain Low overhead- UDP General purpose like CPU, RAM App-specific and user metrics: gmetric spoofing Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 Tiramola cloud management module Translates resize actions Euca2tools EC2 compliant Support for any cloud management Precooked VM images 11 Contacts IaaS provider Acquires or releases cloud resources Ready to launch AMIs Contain all necessary NoSQL libraries Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 Tiramola cluster coordinator module Operates when cloud management finishes resource (de-)allocation Orchestrates NoSQL Implements specific NoSQL resizing actions Remote execution of shell scripts Injection of new config files Current support for: 12 When resources change HBase, Cassandra, Riak Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 Tiramola core: Decision-Making module Formulation as an MDP: {S, A, {Piαj}, γ, Riαj} Identify V (s) E k rk t 1 st s Exp. Sum of rew. k State = # running VMs Actions = {add-n, rem-n, no-op} P: Transition probabilities going from state i to j Reward Function R – sets the policy for the resize Immediate gain for going to state s R(s) = f(gains, costs) Thus optim.Value f: V * ( s) R( s) max V * (s)(Bellman’s equation) a s s' System of equations, optimal policy greedy w.r.t. V 13 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 Large Degree of freedom MDP allows for optimal solution without previous knowledge Learning is real-time and continues during execution Decision making is auto-tuned, reacting to environment and reward changes. Generic applicable 14 The system learns from previous experience, by “exploring” permissible states Arbitrary reward functions allow for virtually any optimization policy Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 Estimate R(s) for possible transitions How to know exact R(s) for all s, without making the transition? Idea: cluster measurements around current load r(s)=f(latency,VMs) Add 2 extra dimensions: #VMs, load 15 Assume “deterministic”, reliable cluster behavior Similar input ⇨ similar performance metrics Find, for all permissible transitions, what latency would be Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 Architectural considerations Robustness Modularity 16 Daemon process that checkpoints and can be restarted State is provided from the IaaS Cloud and the Monitoring module. Applicable timeouts (not realtime systems!) Different interchangeable components APIs that utilize primitives (NoSQL and Policies) Expandability Speed (irrelevant in most cases) Written in Python Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 Platform Setup Cloud Cluster A total of 16 server VMs 2-core processor, 4GB RAM, 50GB disk space Storage 17 Similar to an Amazon EC2 large instance 4-core processor, 8GB RAM, 50GB disk space A total of 20 client VMs Private OpenStack cactus cluster QCOW image: 1.6GB compressed, 4.3GB uncompressed VM root fs instead of EBS (Reddit outage) Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 Clients, Data and Workloads Hbase (v. 0.20.6), Cassandra (v. 0.7.0 beta) Hadoop 0.2.20 8 initial nodes (VMs) Ganglia 3.1.2 YCSB tool Database: 20M objects – 20GB raw (Cass ~60GB, Hbase ~90GB) Loads: UNI_R, UNI_U, UNI_RMW, ZIPF_R 18 Default: uniform read, 10%-50% range Both client (YCSB) and cluster (Ganglia) metrics reported Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 1. Metrics affected Stress systems by increasing client request λ UNI_R workload Identify critical operation points 8-node cluster, no resize Max throughput Max total cluster CPU usage HBase CPUmax=55% Cassandra CPUmax=76% Further λ increase has no effect 19 HBase μmax = 80Kreqs/sec @ λ=80Kreqs/sec Cassandra μmax = 13Kreqs/sec @ λ=20Kreqs/sec Systems are fully utilized and requests are queued Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 The setup λstart=180K reqs/sec (way over critical point) At time t=370sec: Double the cluster size (add extra 8 nodes) Triple the cluster size (add extra 16 nodes) 4 different experiments Measure client, cluster-side metrics 20 READ+8, READ+16, UPDATE+8 and UPDATE+16 query latency, throughput and total cluster usage Vs time Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 2. HBase cluster resize For READ, throughput is (roughly) doubled and tripled Update is not affected 21 More servers handle more requests Data is not transferred, but it is cached I/O bound operation Updates will be handled by the initial 8 nodes Only new regions due to compaction will be served by new nodes Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 Evaluating Tiramola – setup Initial state: S4, 1 to 16 VMs cluster size allowed Sinusoid-like READ loads – vary peak, periodicity Reward functions (proof of concept) r1(s) = -C∙|VMs| r2(s) = B ∙ thr r3(s) = B ∙ thr - C ∙ |VM|2 r4(s) = B ∙ thr - C ∙ |VMs| - A ∙ lat. Monitor every min, decide every 10 min, 5 min backoff Training Period for initial data points 22 Alter YCSB clients Not necessary, but allows quicker “correct” decisions Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 Evaluating Tiramola – 1 r1, r2 r3, r4 23 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 Evaluating Tiramola – 2 Different amplitude Different periodicity 24 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 Evaluating Tiramola – 3 Initial training set close to applied load Min training Initial training set very different from applied load Min training 25 Max training Max training Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013 Questions 26 “TIRAMOLA: Elastic NoSQL Provisioning through A Cloud Management Platform” – SIGMOD 2012(Demo Track) “Automatic Scaling of Selective SPARQL Joins Using the TIRAMOLA System” – SWIM 2012 “On the Elasticity of NoSQL Databases over Cloud Management Platforms” – CIKM 2011 “Elastic NoSQL databases over the Cloud” – ΕΛ/ΛΑΚ 2011 CELAR: “Automatic, multi-grained elasticityprovisioning for the Cloud” – FP7 http://www.celarcloud.eu/ http://tiramola.googlecode.com Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou 14/05/2013