presentation - Computing Systems Laboratory

advertisement
Automated, Elastic Resource
Provisioning for NoSQL clusters
Using TIRAMOLA
Ioannis Konstantinou
CSLAB, National Technical University of Athens, Greece
Motivation – the story(1)

‘Big-data’ opts for highly scalable distributed solutions




Applications suffer from highly variable and unpredictable
workloads



(Web) analytics, science, business
Store + analyze everything, no matter the size
Traditional databases not up to the task
Social networks, internet gaming, web sites, etc
Over-provisioning is costly, under-provisioning leads to outages
Cloud computing can help!!!!
Elastic, pay-as-you go resource provisioning
 Suitable for distributed scalable applications
 Can we take advantage of elasticity in an
automated, fine-tuned application agnostic manner?

2
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Motivation – the story (2)

NoSQL

Non-relational
Horizontal scalable
Distributed
Open source

And often:





Many, many, implementations…

3
schema-free, easily replicated, simple API, eventually consistent /(not
ACID), big-data-friendly, etc
Currently around 150 see http://nosql-database.org/
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
NoSQLs and elasticity

Column family


Document store


CouchDB, mongoDB, …
Key-Value store


Hbase, Cassandra, …
Riak, Dynamo,Voldemort, …
Many offer elasticity+sharding:




Expand/contract resources according to demand
Shared-nothing architecture allows that
Pay-as-you-go, robustness, performance
Manual and simple threshold-based elastic actions are
suboptimal

4
Important! See Apr 2011 Amazon outage (foursquare, reddit,…)
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
thus…(end of the story)


PaaS and NoSQLs are (or should be) inherently elastic
How efficiently do they implement elasticity?

NoSQLs over an IaaS platform



We need a study that registers qualitative + quantitative
results
Can we automate the elasticity procedure?



5
EC2, Eucalyptus, OpenStack,…
For any NoSQL, any user-given policy, any metric of interest?
Can we bundle everything together?
The Tiramola system
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Contributions

Tiramola, a generic modular system able to manage




Module for NoSQL cluster monitoring


Real time reporting of client, application and general purpose
metrics
Decision making module for automatic elasticity


6
any NoSQL engine
User-defined policies
Automatic resource provisioning using IaaS clouds
Implementation as a Markov Decision Process
Continuously identifies the best scaling action according to
observed system state
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Contribution side-effects

Coding + infrastructure



Using cloud-based client tools, platform-agnostic



Euca2ools guarantee execution in numerous cloud platforms
YCSB clients
Cassandra, Hbase implementation

7
Open source python code (GFOSS + google code)
http://tiramola.googlecode.com
almost Voldemort, Riak
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Related work

Elastic Nimbus [13], AutoScale [14], SmartSla [15], iBalloon
[17], Kingfisher [18]



[19] elastic HDFS





[20] Provides only support for scaling
[21] requires software installation at the cloud vendor
[22] classic microeconomics for profit maximization, no support for
fine-grained policy definitions
Systems like Autoscaling [24], RightScale [26], Scalr [27]


8
Specific HDFS implementation
Not fine-grained policy support, No support for auto-learning
Cloudy [20], Nefeli [21], Microeconomics [23]


Do not handle NoSQL systems
Do not address dynamic cluster resizes
Commercial: vendor-lock in
Limited metric support, primitive decision making policies
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Tiramola architecture overview
9
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Tiramola monitoring module

Ganglia tool




Metrics


10
Suitable for clusters
Easy to install/maintain
Low overhead- UDP
General purpose like
CPU, RAM
App-specific and user
metrics: gmetric
spoofing
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Tiramola cloud management module

Translates resize actions



Euca2tools



EC2 compliant
Support for any cloud
management
Precooked VM images


11
Contacts IaaS provider
Acquires or releases
cloud resources
Ready to launch AMIs
Contain all necessary
NoSQL libraries
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Tiramola cluster coordinator module


Operates when cloud
management finishes
resource (de-)allocation
Orchestrates NoSQL


Implements specific
NoSQL resizing actions



Remote execution of
shell scripts
Injection of new config
files
Current support for:

12
When resources change
HBase, Cassandra, Riak
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Tiramola core: Decision-Making module

Formulation as an MDP:



{S, A, {Piαj}, γ, Riαj}
Identify V (s)  E   k rk t 1 st  s
Exp. Sum of rew.
k


State = # running VMs
Actions = {add-n, rem-n, no-op}
P: Transition probabilities going from state i to j

Reward Function R – sets the policy for the resize





Immediate gain for going to state s
R(s) = f(gains, costs)
Thus optim.Value f: V * ( s)  R( s)  max V * (s)(Bellman’s equation)

a
s

s'
System of equations, optimal policy greedy w.r.t. V
13
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Large Degree of freedom

MDP allows for optimal solution without previous
knowledge


Learning is real-time and continues during execution


Decision making is auto-tuned, reacting to environment and
reward changes.
Generic applicable

14
The system learns from previous experience, by “exploring”
permissible states
Arbitrary reward functions allow for virtually any optimization
policy
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Estimate R(s) for possible transitions

How to know exact R(s) for all s, without making the
transition?



Idea: cluster measurements around current load

r(s)=f(latency,VMs)
Add 2 extra dimensions: #VMs, load


15
Assume “deterministic”, reliable cluster behavior
Similar input ⇨ similar performance metrics
Find, for all permissible transitions, what latency would be
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Architectural considerations

Robustness




Modularity





16
Daemon process that checkpoints and can be restarted
State is provided from the IaaS Cloud and the Monitoring
module.
Applicable timeouts (not realtime systems!)
Different interchangeable components
APIs that utilize primitives (NoSQL and Policies)
Expandability
Speed (irrelevant in most cases)
Written in Python
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Platform Setup

Cloud Cluster


A total of 16 server VMs



2-core processor, 4GB RAM, 50GB disk space
Storage


17
Similar to an Amazon EC2 large instance
4-core processor, 8GB RAM, 50GB disk space
A total of 20 client VMs


Private OpenStack cactus cluster
QCOW image: 1.6GB compressed, 4.3GB uncompressed
VM root fs instead of EBS (Reddit outage)
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Clients, Data and Workloads

Hbase (v. 0.20.6), Cassandra (v. 0.7.0 beta)




Hadoop 0.2.20
8 initial nodes (VMs)
Ganglia 3.1.2
YCSB tool


Database: 20M objects – 20GB raw (Cass ~60GB, Hbase
~90GB)
Loads: UNI_R, UNI_U, UNI_RMW, ZIPF_R


18
Default: uniform read, 10%-50% range
Both client (YCSB) and cluster (Ganglia) metrics reported
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
1. Metrics affected

Stress systems by
increasing client request λ

UNI_R workload
Identify critical operation
points
8-node cluster, no resize



Max throughput



Max total cluster CPU usage



HBase CPUmax=55%
Cassandra CPUmax=76%
Further λ increase has no effect

19
HBase μmax = 80Kreqs/sec @ λ=80Kreqs/sec
Cassandra μmax = 13Kreqs/sec @ λ=20Kreqs/sec
Systems are fully utilized and requests are queued
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
The setup


λstart=180K reqs/sec (way over critical point)
At time t=370sec:



Double the cluster size (add extra 8 nodes)
Triple the cluster size (add extra 16 nodes)
4 different experiments


Measure client, cluster-side metrics


20
READ+8, READ+16, UPDATE+8 and UPDATE+16
query latency, throughput and total cluster usage
Vs time
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
2. HBase cluster resize

For READ, throughput is
(roughly) doubled and tripled



Update is not affected



21
More servers handle more
requests
Data is not transferred, but it is
cached
I/O bound operation
Updates will be handled by the initial 8 nodes
Only new regions due to compaction will be served by new
nodes
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Evaluating Tiramola – setup


Initial state: S4, 1 to 16 VMs cluster size allowed
Sinusoid-like READ loads – vary peak, periodicity


Reward functions (proof of concept)






r1(s) = -C∙|VMs|
r2(s) = B ∙ thr
r3(s) = B ∙ thr - C ∙ |VM|2
r4(s) = B ∙ thr - C ∙ |VMs| - A ∙ lat.
Monitor every min, decide every 10 min, 5 min backoff
Training Period for initial data points

22
Alter YCSB clients
Not necessary, but allows quicker “correct” decisions
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Evaluating Tiramola – 1

r1, r2

r3, r4
23
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Evaluating Tiramola – 2

Different amplitude

Different periodicity
24
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Evaluating Tiramola – 3

Initial training set close to applied load
Min
training

Initial training set very different from applied load
Min
training
25
Max
training
Max
training
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Questions






26
“TIRAMOLA: Elastic NoSQL Provisioning
through A Cloud Management Platform” –
SIGMOD 2012(Demo Track)
“Automatic Scaling of Selective SPARQL Joins
Using the TIRAMOLA System” – SWIM 2012
“On the Elasticity of NoSQL Databases over
Cloud Management Platforms” – CIKM 2011
“Elastic NoSQL databases over the Cloud” –
ΕΛ/ΛΑΚ 2011
CELAR: “Automatic, multi-grained elasticityprovisioning for the Cloud” – FP7
 http://www.celarcloud.eu/
http://tiramola.googlecode.com
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
Download