Delivering Performance Objectives in the Cloud without

advertisement
1
Towards Multi-Tenant Performance SLOs
Willis Lang*, Srinath Shankar+, Jignesh M. Patel*, Ajay Kalhan^
*University of Wisconsin-Madison
+Microsoft Gray Systems Lab
^Microsoft Corp.
To appear in ICDE 2012
Overall Operating Costs of Providing
Cloud Services are High
Monthly Cost of 46,000 Server Data Center
[Hamilton, 2011]
Networking
Networking
8%
Infrastructure $260,039
Infrastructure
4%
$130,019
Power
$1,007,651
Servers
$1,852,778
Server & Power
88%
 Dominating costs are server and power costs: 57% and 31% respectively
2
Performance Service Level Objectives
and Managing Cloud Costs
Tenants can get their own
server and high performance
Tenants have performance
objectives
Consolidate tenants onto the
fewest number of servers
(maximize the degree of
multi-tenancy) while
maintaining perf objectives
3
4
An Optimization Problem
Given: Groups of tenants with different performance objectives and a number
of server configurations
High Perf Low Perf
Find:
(1) Tenant Scheduling Policies and (2) Hardware Provisioning Policies
Such that costs are minimized and performance is delivered
5
Multi-Tenant Scheduling
H Tenants
 Perf Objective – TPC-C throughput
 H tenants– 100tps
 L tenants– 10tps
L Tenants
#H: 20
1
5
 Want to maximize degree of multitenancy without breaking SLO
Avg H Perf
 What if we also have different server
types available?
2000
130
900 tps ea.
#L: 40
1
20
Avg L Perf
2000
110
30 tps ea.
6
Hardware Setup
 2 x Intel Nehalem L5630
 32GB DDR3
 RAID battery-backed cache
 1 x 10k RPM SAS – OS/software
+
 “diskC” - $4000 ($111 per month)
 Data: 2 x 10k RPM SAS 300GB
 Log: 1 x 10k RPM SAS 300GB
 “ssdC” - $4500 ($125 per month)
 Data: 2 x Crucial C300 256GB
 Log: 1 x Crucial C300 256GB
7
Software Setup
 SQL Server 2012
 All tenants of the ‘H’ performance class get an individual
database within a SQL Server instance
 Databases in SQL Server have their own physical files for data
and log
 All tenants of the ‘L’ performance class get an individual
database within a different SQL Server instance
 SQL Server instance memory provisioning to control
performance (not VM)
8
Heterogeneous SLO Characterization
ssdC
25
25
 Benchmark
server to find max degree
multi-tenancy for perf objectives
20
20
 Systematically reduce ‘H’ tenants,
steadily increase ‘L’ tenant scheduling
15
until a perf objective fails
Num of H (100tps) Tenants
Num of H (100tps) Tenants
diskC
15
10
10 characterizing function:
 Server
5
0
0
50
100
Number of L (10tps) Tenants
5
 Both perf objectives met
0
 Some perf objective fails
0
50
Number of L (10tps) Tenants
100
Applying Our Optimization Framework
 Scenario: 10,000 tenants, 2,000x100tps & 8,000x10tps
9
Optimal Solution:
 94 ssdC servers, 38 10tps tenants and 20 100tps tenants
 + 5 diskC servers, 25 10tps tenants and 20 100tps tenants
 + 43 ssdC servers, 100 10tps tenants
Number H (100tps) Tenants
ssdC
diskC
40
30
5
20
94
10
43
0
0
38 50
100
Number of L (10tps) Tenants
150
Applying Our Optimization Framework
10
Monthly Server Costs
$30,000
$25,000
$20,000
ssdC – 100tps tenants
diskC – 10tps tenants
$15,000
$10,000
$5,000
$0
Optimal
Only diskC
Tenant
Segregated
11
Summary
We have presented an optimization framework
that tells a Database-as-a-Service provider how
to provide performance Service Level Objectives
while minimizing cluster infrastructure costs
12
Thesis Research
An optimization framework to
determine
optimal analytic
tenant
Complex theparallel
scheduling
server provisioning
workloads andcause
non-linear
inspeedup
light of tenant
goals
and performance
force low-power
[ICDE
server2012]
clusters complexity
to be much oflarger
Computational
MR
and affects
more theexpensive
jobs
ability to than
save
traditional
Demonstrated
that smaller
it is possible
to
energy
byclusters
using
clusters
[DaMoN
2010
Best
Paper]
decrease
energy
and
performance
[VLDB
2010]
Parallel
processing
in
a exploiting
controlleddata
way
using
hardware
By
existing
replication
bottlenecksan such
asrelationship
network
mechanisms
(e.g.,
CPU
schemes,
elegant
bandwith and
frequency/voltage
and choices
memory
between
loadalgorithmic
balancing
and
are a cause
parking)
andof energy
algorithmic
choices
energy
efficiency
can inefficiency
be exploited
[Under2009,
Submission]
[CIDR
IEEE 2009]
DEB 2011]
[SIGMOD
Record
Cluster Design,
Performance in
the Cloud
Low-Power
Server
Hardware
ICDE 12
DaMoN 10, Under Submission
Characterizing
Performance vs
Energy and
Server Costs
Cluster-level
Performance
and Energy
Consumption
Node-local
Performance
and Energy
Consumption
VLDB 10, SIGMOD Rec 09
CIDR 09, IEEE DEB 11
13
Acknowledgements
 Special thanks to David
DeWitt, Jeff Naughton, Alan
Halverson, Eric Robinson,
Rimma Nehme, Dimitris
Tsirogiannis, Nikhil Teletia,
Chris Ré
 Funded by a grant from
Microsoft Gray Systems Lab
Cluster Design,
Performance in
the Cloud
Low-Power
Server
Hardware
ICDE 12
DaMoN 10, Under Submission
Characterizing
Performance vs
Energy and
Server Costs
Cluster-level
Performance
and Energy
Consumption
Node-local
Performance
and Energy
Consumption
VLDB 10, SIGMOD Rec 09
CIDR 09, IEEE DEB 11
14
15
16
Memory-based resource governor
 E.g., 2 performance goals, 100tps and 10tps
 20 tenants pay for 100tps and 30 tenants pay for 10tps
 The aggregate memory for all 100tps tenants:
20
20 × 100
20 + 30 + 20 × 100 + 30 × 10 × 𝑀𝐸𝑀
2
 Similarly, for 10tps tenants:
30
30 × 10
20 + 30 + 20 × 100 + 30 × 10 × 𝑀𝐸𝑀
2
17
Simplicity vs Cost
None of these heuristic methods
consistently provides solutions near
to the optimal method.
Methods
ssdC SKU
Rel. Cost
2.0
diskC cost -10% vs ssdC
1.5
1.0
0.5
diskC SKU
0.0
Optimal
Hetero SLO
Hetero SLO
ssdC-only
Hetero SLO
NA
20% 100tps, 50% 100tps, 80% 100tps,
80% 10tps
50% 10tps
20% 10tps
2.0
diskC cost -30% vs ssdC
diskC-only
NA
Hetero SLO
ssdC-H
High-perf
Low-perf
ssdC-L
Low-perf
High-perf
Rel. Cost
1.5
1.0
0.5
0.0
20% 100tps, 50% 100tps, 80% 100tps,
80% 10tps
50% 10tps
20% 10tps
18
Log Disk Bottlenecks
Achieved tps for 100tps Tenant
14
12
10
8
6
4
2
0
180
160
140
120
100
80
60
40
20
0
75/1 100/1 125/1 150/1 175/1 200/0
<# 1tps tnt>/<# 100tps tnt>
TPS Achieved by One
100tps Tenant
Average Log Write Wait
Time (ms)
Avg Log Write Wait (ms)
Download