Experiment-driven System Management Shivnath Babu Duke University

advertisement
Experiment-driven
System Management
Shivnath Babu
Duke University
Joint work with Songyun Duan, Herodotos
Herodotou, and Vamsidhar Thummala
Managing DBs in Small to Medium Business
Enterprises (SMBs)
• Peter is a system admin in
an SMB
– Manages the database (DB)
– SMB cannot afford a DBA
• Suppose Peter has to tune
a poorly-performing DB
Database (DB)
– Design advisor may not help
– Maybe the problem is with
DB configuration parameters
Tuning DB Configuration Parameters
• Parameters that control
–
–
–
–
Memory distribution
I/O optimization
Parallelism
Optimizer’s cost model
• Number of parameters ~ 100
– 15-25 critical params depending on OLAP Vs. OLTP
• Few holistic parameter tuning tools available
– Peter may have to resort to 1000+ page tuning
manuals or rules of thumb from experts
– Can be a frustrating experience
Response Surfaces
• TPC-H 4 GB DB size, 1 GB memory, Query 18
2-dim Projection of a 11-dim Surface
DBA’s Approach to Parameter Tuning
• DBAs run experiments
– Here, an experiment is a run of the DB workload with
a specific parameter configuration
– Common strategy: vary one DB parameter at a time
Experiment-driven Management
Result Mgmt. task
Are more
experiments
needed?
Process
output to extract
information
Yes
Plan
next set of
experiments
Conduct
experiments on
workbench
Goal: Automate this process
Roadmap
• Use cases of experiment-driven mgmt.
– Query tuning, benchmarking, Hadoop, testing, …
• iTuned: Tool for DB conf parameter tuning
– End-to-end application of experiment-driven mgmt.
• .eX: Language and run-time system that brings
experiment-driven mgmt. to users & tuning tools
What is an Experiment?
• Depends on the management task
– Pay some extra cost, get new information in return
– Even for a specific management task, there can be
spectrum of possible experiments
Uses of Experiment-driven Mgmt.
• DB conf
parameter
tuning
Uses of Experiment-driven Mgmt.
• DB conf
parameter
tuning
• MapReduce
job tuning
in Hadoop
Uses of Experiment-driven Mgmt.
• DB conf
parameter
tuning
• MapReduce
job tuning
in Hadoop
• Server
benchmarking
– Capacity
planning
– Cost/perf
modeling
Uses of Experiment-driven Mgmt.
• Tuning
“problem
queries”
<2473, 7496>
7496>
<2473,
<380459, 229739>
Sort
<100, 187>
Hash Aggregate
<100, 187>
Nested Loop Join
<100, 436>
Hash Join
Index Scan (orders)
Hash
<65, 309>
Hash Join
<65, 309>
Sequential Scan
(lineitem)
<1629, 1615>
Sequential Scan
(supplier)
<Estimated, Actual>
Cardinality
<1, 0.6>
Hash
<1, 1>
Sequential Scan
(nation)
<1, 1>
Uses of Experiment-driven Mgmt.
• Tuning
“problem
queries”
Sort
Hash Aggregate
Hash Join
Sequential Scan
(orders)
Hash
Hash Join
Sequential Scan
(lineitem)
Hash
Hash Join
Sequential Scan
(supplier)
Hash
Sequential Scan
(nation)
Uses of Experiment-driven Mgmt.
• DB conf parameter
tuning
• MapReduce job
tuning in Hadoop
• Server benchmarking
– Capacity planning
– Cost/perf modeling
• Tuning “problem
queries”
• Troubleshooting
• Testing
• Canary in the
server farm (James
Hamilton, Amazon)
• …
Roadmap
• Use cases of experiment-driven mgmt.
– Query tuning, benchmarking, Hadoop, testing, …
• iTuned: Tool for DB conf parameter tuning
– End-to-end application of experiment-driven mgmt.
• .eX: Language and run-time system that brings
experiment-driven mgmt. to users & tuning tools
Problem Abstraction
• Unknown response surface: y = F(X)
– X = Parameters x1, x2, …, xm
• Each experiment gives a <Xi,yi> sample
– Set DB to conf Xi
– Run workload that needs tuning
– Measure performance yi at Xi
• Goal: Find high performance setting with low
total cost of running experiments
6
8
Example
4
y
Utility(X)
0
2
Where to do the
next experiment?
4
6
8
10
x1
• Goal: Compute the potential utility of
candidate experiments
12
iTuned’s Adaptive Sampling
Algorithm for Experiment Planning
// Phase I: Bootstrapping
– Conduct some initial experiments
// Phase II: Sequential Sampling
– Loop: Until stopping condition is reached
1. Identify candidate experiments to do next
2. Based on current samples, estimate the utility
of each candidate experiment
3. Conduct the next experiment at the candidate
with highest utility
Utility of an Experiment
• Let <X1,y1>--<Xn,yn> be the samples from n experiments
done so far
• Let <X*,y*> be the best setting so far (i.e., y* = mini yi)
– wlg assuming minimization
• U(X), Utility of experiment at X is
// y = F(X)
– y* - y if y* > y
– 0 otherwise
• However, U(X) poses a chicken-and-egg problem
– y will be known only after experiment is run at X
• Goal: Compute expected utility EU(X)
Expected Utility of an Experiment
• Suppose we have the probability density function of y
(y is the perf at X)
– Prob(y = v | <Xi,yi> for i=1,…,n)
• Then, EU(X) =
v=+1
sv=-1 U(X) Prob(y = v) dv
v=y*
EU(X) = sv=-1 (y* - v) Prob(y = v) dv
• Goal: Compute Prob(y = v | <Xi,yi> for i=1,…,n)
Model: Gaussian Process Representation
(GRS) of a Response Surface
• GRS models the response surface as:
y(X) = g(X) + Z(X) (+ (X) for measurement error)
– E.g., g(X) = x1 – 2x2 + 0.1x12 (Learned using common
techniques)
– Z: Gaussian Process to capture regression residual
Primer on Gaussian Process
• Univariate Gaussian distribution
– G = N(,)
– Described by mean , variance 
• Multivariate Gaussian distribution
– [G1, G2, …, Gn]
– Described by mean vector
and covariance matrix
• Gaussian Process
– Generalizes multivariate
Gaussian to arbitrary
number of dimensions
– Described by mean and
covariance functions
Model: Gaussian Process Representation
(GRS) of a Response Surface
• GRS captures the response surface as:
y(X) = g(X) + Z(X) (+ (X) for measurement error)
• If Z is a Gaussian process, then:
 [Z(X1),…,Z(Xn),Z(X)] is multivariate Gaussian
Z(X) | Z(X1),…,Z(Xn) is a univariate Gaussian
 y(X) is a univariate Gaussian
Parameters of the GRS Model
• [Z(X1),…,Z(Xn)] is multivariate Gaussian
– Z(Xi) has zero mean
– Covariance(Z(Xi),Z(Xj)) / exp(k –k |xik – xjk|k)
• Residuals at nearby points have higher correlation
• k, ½k learned from <X1,y1>--<Xn,yn>
Use of the GRS Model
• Recall our goals to compute
v=y*
– EU(X) = sv=-1 (y* - v) Prob(y = v) dv
– Prob(y = v | <Xi,yi> for i=1,…,n)
• Lemma: Using the GRS, we can compute the mean (X)
and variance 2(X) of the Gaussian y(X)
• Theorem: EU(X) has a closed form that is a product of:
– Term that depends on (y* - (X))
– Term that depends on (X)
• It follows that settings X with high EU are either:
– Close to known good settings (for exploitation)
– In highly uncertain regions (for exploration)
Example
• Settings X with high EU are either:
– Close to known good settings (high y*-(X))
– In highly uncertain regions (high (X))
6
8
Unknown actual
surface
(X)
4(X)
2
4
y
y*
0
EU(X)
4
6
8
x1
10
12
Where to Conduct Experiments?
Clients
Clients
Clients
Test Platform
Middle Tier
DBMS
Production Platform
Test
Data
DBMS
Data
Write Ahead
Log (WAL)
shipping
Standby Platform
DBMS
Data
iTuned’s Solution
• Exploit underutilized resources with minimal
impact on production workload
• DBA/User designates resources where
experiments can be run
– E.g., production/standby/test
• DBA/User specifies policies that dictate when
experiments can be run
– Separate regular use (home) from experiments (garage)
– Example: If CPU, mem, & disk utilization < 10% for past
15 mins, then resource can be used for experiments
One Implementation of Home/Garage
Clients
Clients
Clients
Standby Machine
Middle Tier
Home
Production Platform
Apply
WAL
DBMS
WAL shipping
Data
iTuned
Apply
DBMS
WAL
HomeGarage
Workbench for
experiments
DBMS DBMS
Data
Interface
Engine
Experiment Planner & Scheduler
Copy on
Write
Overheads are Low
Operation in API
Time (seconds)
Description
Create Container
610
Create a new garage
(one time process)
Clone Container
17
Clone a garage from
already existing one
Boot Container
19
Boot garage from
halt state
Halt Container
2
Stop garage and
release resources
Reboot Container
2
Reboot the garage
Snapshot-R DB (5GB,
20GB)
7, 11
Create read-only
snapshot of the
database
Snapshot-RW DB
(5GB, 20GB)
29, 62
Create read-write
snapshot of database
Empirical Evaluation (1)
• Cluster of machines with 2GHz processors and
3GB memory
• Two database systems: PostgreSQL & MySQL
• Various workloads
– OLAP: Mixes of heavy-weight TPC-H queries
• Varying #queries, #query_types, and MPL
• Scale factors 1 and 10
– OLTP: TPC-W and RUBiS
• Tuning of up to 30 configuration parameters
Empirical Evaluation (2)
• Techniques compared
Default parameter settings shipped (D)
Manual rule-based tuning (M)
Smart Hill Climbing (S): State-of-the-art technique
Brute-Force search (B): Run many experiments to
find approximation to optimal setting
– iTuned (I)
–
–
–
–
• Evaluation metrics
– Quality: workload running time after tuning
– Efficiency: time needed for tuning
Comparison of Tuning Quality
iTuned’s Scalability Features (1)
•
•
•
•
•
Identify important parameters quickly
Run experiments in parallel
Stop low-utility experiments early
Compress the workload
Work in progress:
– Apply database-specific knowledge
– Incremental tuning
– Interactive tuning
iTuned’s Scalability Features (2)
• Identify important parameters quickly
– Using sensitivity analysis with a few experiments
#Parameters = 9, #Experiments = 10
iTuned’s Scalability Features (3)
Roadmap
• Use cases of experiment-driven mgmt.
– Query tuning, benchmarking, Hadoop, testing, …
• iTuned: Tool for DB conf parameter tuning
– End-to-end application of experiment-driven mgmt.
• .eX: Language and run-time system that brings
experiment-driven mgmt. to users & tuning tools
Back of the Envelope Calculation
• DBAs cost $300/day; Consultants cost $100/hr
• 1 Day of experiments gives a wealth of info.
– TPC-H, TPC-W, RUBiS workloads; 10-30 conf. params
• Cost of running these
experiments for 1 day
on Amazon Web Serv.
–
–
–
–
Server: $10/day
Storage: $0.4/day
I/O: $5/day
TOTAL: $15/day
.eX: Power of Experiments to the People
eXL script
Language processor
.eX
Run-time
engine
Resources
• Users & tools express
needs as scripts in eXL
(eXperiment Language)
• .eX engine plans and
conducts experiments
on designated resources
• Intuitive visualization
of results
Current Focus of .eX
• Parts of an eXL script
I1
I2 O1 … …
1. Query: (approx.) response
surface mapping, search
2. Expt. setup & monitoring
3. Constraints & optimization:
resources, cost, time
Result
Automatically
generate the
experiment-driven
workflow
Process
output to extract
information
Are more
experiments
needed?
Yes
Plan
next set of
experiments
Conduct
experiments on
workbench
Summary
• Automated expt-driven mgmt: The time has come
– Need, infrastructure, & promise are all there
• We have built many tools around this paradigm
– http://www.cs.duke.edu/~shivnath/dotex.html
• Poses interesting questions and challenges
– Make it easy for users/admins to do expts
– Make experiments first-class citizens in systems
Download