Performance Evaluation Course

Performance Evaluation Course
Manijeh Keshtgary
Shiraz University of Technology
Spring 1393
Goals of This Course
 Comprehensive course on performance analysis
 Includes measurement, statistical modeling, experimental
design, simulation, and queuing theory
 How to avoid common mistakes
in performance analysis
 Graduate course: (Advanced Topics)
 Lot of independent reading
 Project/Survey paper
Text Books
 Required
 Raj Jain. The Art of Computer Systems Performance Analysis: Techniques for
Experimental Design, Measurement, Simulation, and Modeling, John Wiley
and Sons, Inc., New York, NY, 1991. ISBN:0471503363
HW & Paper
Objectives: What You Will Learn
 Specifying performance requirements
 Evaluating design alternatives
 Comparing two or more systems
 Determining the optimal value of a parameter (system tuning)
 Finding the performance bottleneck (bottleneck identification)
 Characterizing the load on the system
(workload characterization)
 Predicting the performance at future loads (forecasting)
 Performance evaluation is about quantifying the service delivered by
a computer or communication system
 For example, we might be interested in: comparing the power
consumption of several server farm configurations
 it is important to carefully define the load and the metric, and
to be aware of the performance evaluation goals
Computer system users, administrators, and designers
are all interested in performance evaluation since the goal
is to obtain or to provide the highest performance at the
lowest cost.
 A system could be any collection of HW, SW and firmware
components; e.g., CPU, DB system, Network.
 Computer performance evaluation is of vital importance
 in the selection of computer systems,
 the design of applications and equipment,
 and analysis of existing systems.
Basic Terms
 System: Any collection of hardware, software, and firmware
 Metrics: Criteria used to evaluate the performance of the
system components
 Workloads: The requests made by the users of the system
Main Parts of the Course
 Part I: An Overview of Performance Evaluation
 Part II: Measurement Techniques and Tools
 Part III: Probability Theory and Statistics
 Part IV: Experimental Design and Analysis
 Part V: Simulation
 Part VI: Queuing Theory
Part I: An Overview of Performance
 Introduction
 Common Mistakes and How To Avoid Them
 Selection of Techniques and Metrics
Part II: Measurement Techniques and Tools
 Types of Workloads
 Popular Benchmarks
 The Art of Workload Selection
 Workload Characterization Techniques
 The Art of Data Presentation
Part III: Probability Theory and Statistics
 Probability and Statistics Concepts
 Four Important Distributions
 Summarizing Measured Data By a Single Number
 Summarizing The Variability Of Measured Data
 Graphical Methods to Determine Distributions of Measured Data
 Sample Statistics
 Confidence Interval
 Comparing Two Alternatives
 Measures of Relationship
 Simple Linear Regression Models
Part IV: Experimental Design and Analysis
 Introduction to Experimental Design
 One Factor Experiments
Part V: Simulation
 Introduction to Simulation
 Types of Simulations
 Model Verification and Validation
 Analysis of Simulation Results
Part VI: Queuing Theory
 Introduction to QueueingTheory
 Analysis of A Single Queue
 Queuing Networks
 Operational Laws
 Mean Value Analysis and Related Techniques
Purpose of Evaluation
Three general purposes of performance
 selection evaluation - system exists elsewhere
 performance projection - system does not yet exist
 performance monitoring - system in operation
Selection Evaluation
 Evaluate plans to include performance as a major
criterion in the decision to obtain a particular system
from a vendor is the most frequent case
 To determine among the various alternatives which are
available and suitable for a given application
 To choose according to some specified selection criteria
 At least one prototype of the proposed system must
Performance Projection
 Orientated towards designing a new system to estimate
the performance of a system that does not yet exist
 Secondary goal - projection of a given system on a new
workload, i.e. modifying existing system in order to
increase its performance or decrease it costs or both
(tuning therapy)
 Upgrading of a system - replacement or addition of one
or more hardware components
Performance Monitoring
 usually performed for a substantial portion of the
lifetime of an existing running system
 performance monitoring is done:
 to detect bottlenecks
 to predict future capacity shortcomings
 to determine most cost-effective way to upgrade the system
 to overcome performance problems, and
 to cope with increasing workload demands
Evaluation Metrics
A computer system, like any other engineering
machine, can be measured and evaluated in terms of
how well it meets the needs and expectations of its
It is desirable to evaluate the performance of a
computer system because we want to make sure that it
is suitable for its intended applications, and that it
satisfies the given efficiency and reliability
We also want to operate the computer system near its
optimal level of processing power under the given
resource constraints.
Three Basic Issues
All performance measures deal with three basic issues:
How quickly a given task can be accomplished,
How well the system can deal with failures and other unusual
situations, and
How effectively the system uses the available resources.
Performance Measures
We can categorize the performance measures as follows.
 Responsiveness
 Usage Level
 Mission ability
 Dependability
 Productivity
 Responsiveness: These measures are intended to evaluate how
quickly a given task can be accomplished by the system.
Possible measures are:
waiting time, Processing time,
Queue length, etc.
Usage Level and Missionability:
 Usage Level: These measures are intended to evaluate how well the various
components of the system are being used.
 Possible measures are: Throughput and utilization of various resources.
 Missionability: These measures indicate if the system would remain
continuously operational for the duration of a mission. Possible measures :
interval availability (Probability that the system will keep performing
satisfactorily throughout the mission time) and life-time (time when the
probability of unacceptable behaviour increases beyond some threshold).
These measures are useful when repair/tuning is impractical or when
unacceptable behavior may be catastrophic
These measures indicate how reliable the system is over the long
Possible measures are:
 Number of failures/day.
 MTTF(mean time to failure).
 MTTR(mean time to repair).
 Long-term availability, and cost of a failure.
 These measures are useful when repairs are possible and failures
are tolerable.
These measures indicate how effectively a user can get his
or her work accomplished.
Possible measures are:
 User friendliness.
 Maintainability.
 And understandability.
Which measures for what system
 The relative importance of various measures depends on the
application involved.
 In the following, we provide a broad classification of computer systems
according to the application domains, indicating which measures are
most relevant:
1. General purpose computing: These systems are designed for general
purpose problem solving.
 Relevant measures are: responsiveness, usage level. and productivity.
2. High availability
 Such systems are designed for transaction processing
environments: (bank, Airline. Or telephone databases. Switching
systems. etc.).
 The most important measures are responsiveness and
 Both of these requirements are more severe than for general
purpose computing systems, moreover, any data corruption or
destruction is unacceptable.
 Productivity is also an important factor.
3. Real-time control
 Such systems must respond to both periodic and
randomly occurring events within some(possibly
hard) timing constraints.
They require high levels of responsiveness and
dependability for most workloads and failure
types and are therefore significantly overdesigned.
Utilization and Throughput play little role in
such systems
4. Mission Oriented
These systems require extremely high levels of reliability over a short
period, called the mission time.
Little or no repair/tuning is possible during the mission.
Such systems include fly-by-wire airplanes, battlefield systems, and
Responsiveness is also important, but usually not difficult to achieve.
Such systems may try to achieve high reliability during the short
term at the expense of poor reliability beyond the mission
5. Long-life
Systems like the ones used for unmanned spaceships
need long life without provision for manual
diagnostics and repairs.
Thus. In addition to being highly dependable, they
should have considerable intelligence built in to
do diagnostics and repair either automatically or
by remote control from aground station.
Responsiveness is important but not difficult to
1.2. Techniques of Performance
analytic modeling
The latter two techniques can also be combined
to get what is usually known as hybrid modeling.
 Measurement is the most fundamental technique
and is needed even in analysis and simulation to
calibrate the models.
 Some measurements are best done in hardware,
some in software, and some in a hybrid manner.
Simulation Modeling
 Simulation involves constructing a model for the behavior of the system and
driving it with an appropriate abstraction of the workload.
 The major advantage of simulation is its generality and flexibility, almost any
behavior can be easily simulated
However, there are many important issues that must be considered in
1. It must be decided what not to simulate and at what level of detail. Simply
duplicating the detailed behavior of the system is usually unnecessary and
prohibitively expensive.
2. Simulation, like measurement. Generates much raw data. Which must be
analyzed using statistical techniques.
3. Similar to measurements. A careful experiment design is essential to keep the
simulation cost down.
Analytic Modeling
Analytic modeling involves constructing a mathematical model
of the system behavior(at the desired level of detail) and
solving it.
The main difficulty here is that the domain of tractable models
is rather limited.
Thus, analytic modeling will fail if the objective is to study the
behavior in great detail.
However. For an overall behavior characterization, analytic
modeling is an excellent tool.
Advantages of Analytical model over
other two
 It generates good insight into the workings of the system that is
valuable even if the model is too difficult to solve.
 Simple analytic models can usually be solved easily, yet provide
surprisingly accurate results, and
 Results from analysis have better predictive value than those
obtained from measurement or simulation
Hybrid Modeling
 A complex model may consist of several submodels, each
representing certain aspect of the system.
 Only some of these submodels may be analytically tractable,
the others must be simulated.
 For example, the fraction of memory wasted due to
fragmentation may be difficult to estimate analytically, even
though other aspects of system can be modeled analytically.
Hybrid Model (cont)
We can take the hybrid approach. Which will proceed as follows:
1. Solve the analytic model assuming no fragmentation of memory
and determine the distribution of memory holding time.
2. Simulate only memory allocation, Holding, And deallocation and
determine the average fraction of memory that could not be used
because of fragmentation.
3. Recalibrate the analytic model of step 1 with reduced memory
and solve it.
(It may be necessary to repeat these steps a few times to get
1.3 Applications of Performance
 System design
 System Selection
 System Upgrade
 System Tuning
 System Analysis
System Design
In designing a new system.
One typically starts out with certain performance/ reliability
objectives and a basic system architecture. And then decides
how to choose various parameters to achieve the objectives.
This involves constructing a model of the system behavior at
the appropriate level of detail, and evaluating it to choose the
Analytical model is ok if we just want to eliminate bad choices
and simulation if we need more details.
System Selection
Here the problem is to select the “best” system from
among a group of systems that are under consideration for
reasons of cost, availability, compatibility, etc.
Although direct measurement is the ideal technique to use
here. There might be practical difficulties in doing so (e.g.,
not being able to use them under realistic workloads, or
not having the system available locally).
 Therefore, it may be necessary to make projections based on
available data and some simple modeling.
System Upgrade
 This involves replacing either the entire system or parts there
of with a newer but compatible unit.
 The compatibility and cost considerations may dictate the
 So the only remaining problem is to choose quantity, speed,
and the like.
 Often, analytic modeling is adequate here;
 however, in large systems involving complex interactions
between subsystems. Simulation modeling may be essential
System Tuning
 The purpose of tune up is to optimize the performance by appropriately changing
the various resource management policies.
 Some examples are process scheduling mechanism, Context switching, buffer
allocation schemes, cluster size for paging, and contiguity in file space allocation.
 It is necessary to decide which parameters to consider changing and how to change
them to get maximum potential benefit.
 Direct experimentation is the simplest technique to use here, but may not be feasible
in a production environment.
 Analytical model can not present its changes, so simulation is better
System Analysis
 Suppose that we find a system to be unacceptably sluggish.
 The reason could be either inadequate hardware resources(CPU, memory,
disk, etc.) or poor system management.
 In the former case, we need system upgrade, and in the latter, a system tune up.
 Nevertheless, the first task is to determine which of the two cases applies.
 This involves monitoring the system and examining the behavior of various
resource management policies under different loading conditions.
 Experimentation coupled with simple analytic reasoning is usually adequate to
identify the trouble spots; however, in some cases, complex interactions may
make a simulation study essential.
Performance Evaluation Metrics
 Performance metrics can be categorised into three
classes based on their utility function:
 Higher is Better or HB
 Lower is Better or LB
 Nominal is Best or NB
metric (e.g., response time)
metric (e.g., throughput)
metric (e.g., utilisation)
 Objectives
What kind of problems will you be able to solve after taking this
 The Art
 Common Mistakes
 Systematic Approach
 Case Study
Objectives (1 of 6)
 Select appropriate evaluation techniques, performance metrics and
workloads for a system
 Techniques: measurement, simulation, analytic modeling
 Metrics: criteria to study performance (ex: response time)
 Workloads: requests by users/applications to the system
 Example: What performance metrics should you use for the following
 a) Two disk drives
 b) Two transactions processing systems
 c) Two packet retransmission algorithms
Objectives (2 of 6)
 Conduct performance measurements correctly
 Need two tools
 Load generator – a tool to load the system
 Monitor – a tool to measure the results
 Example: Which type of monitor (software and hardware)
would be more suitable for measuring each of the following
 a) Number of instructions executed by a processor
 b) Degree of multiprogramming on a timesharing system
 c) Response time of packets on a network
Objectives (3 of 6)
Use proper statistical techniques to compare several alternatives
Find the best among a number of alternatives
One run of workload often not sufficient
 Many non-deterministic computer events
that effect performance
Comparing average of several runs may also not lead
to correct results
 Especially if variance is high
Example: Packets lost on a link. Which link is better?
File Size
Link A
Link B
Objectives (4 of 6)
 Design measurement and simulation experiments to provide
the most information with the least effort
 Often many factors that affect performance.
Separate out the effects of individual factors.
 Example: The performance of a system depends upon three
 A) garbage collection technique: G1, G2, or none
 B) type of workload: editing, compiling, AI
 C) type of CPU: P2, P4, Sparc
 How many experiments are needed? How can the
performance of each factor be estimated?
Objectives (5 of 6)
 Perform simulations correctly
 Select correct language, seeds for random numbers, length of
simulation run, and analysis
 Before all of that, may need to validate simulator
 Example: To compare the performance of two cache replacement
A) What type of simulation model should be used?
B) How long should the simulation be run?
C) What can be done to get the same accuracy with a shorter run?
D) How can one decide if the random-number generator in the
simulation is a good generator?
Objectives (6 of 6)
 Use simple queuing models to analyze the performance of systems
 Queuing models are commonly used for analytical modeling of
computer systems
 Often can model computer systems by service rate and arrival rate of load
 Multiple servers
 Multiple queues
 Example: The average response time of a database system is 3 seconds.
During a 1-minute observation interval, the idle time on the system was
10 seconds. Using a queuing model for the system, determine the
 System utilization, average service time per query, the number of queries
completed during observation, average number of jobs in the system, …
 Objectives
 The Art
 Common Mistakes
 Systematic Approach
 Case Study
The Art of Performance Evaluation
 Evaluation cannot be produced mechanically
 Requires intimate knowledge of system
 Careful selection of methodology, workload, tools
 Not one correct answer as two performance analysts may
choose different metrics or workloads
 Like art, there are techniques to learn
 how to use them
 when to apply them
Example: Comparing Two Systems
 Two systems, two workloads, measure transactions per second
Workload 1
Workload 2
 Which is better?
Example: Comparing Two Systems
 Two systems, two workloads, measure transactions per second
Workload 1
Workload 2
 They are equally good!
 … but is A better than B?
The Ratio Game
 Take system B as the base
Workload 1
Workload 2
 A is better!
 … but is B better than A?
The Ratio Game
 Take system A as the base
Workload 1
Workload 2
 B is better!?
 Objectives
 The Art
 Common Mistakes
 Systematic Approach
 Case Study
Common Mistakes (1-4)
 1. Undefined Goals (Don’t shoot and then draw target)
 There is no such thing as a general model
 Describe goals and then design experiments
 2. Biased Goals (Performance analysis is like a jury)
 Don’t show YOUR system better than HERS
 3. Unsystematic Approach
 Arbitrary selection of system parameters, factors, metrics, … will lead to
inaccurate conclusions
 4. Analysis without Understanding (“A problem well-stated is half solved”)
 Don’t rush to modeling before defining a problem
Common Mistakes (5-8)
 5. Incorrect Performance Metrics
 E.g., MIPS
 6. Incorrect Workload
 Wrong workload will lead to inaccurate conclusions
 7. Wrong Evaluation Technique (Don’t have a hammer and see
everything as a nail)
 Use most appropriate: model, simulation, measurement
 8. Overlooking Important Parameters
 Start from a complete list of system and workload parameters that
affect the performance
Common Mistakes (9-12)
 9. Ignoring Significant Factors
 Parameters that are varied are called factors; others are fixed
 Identify parameters that make significant impact on performance when
 10. Inappropriate Experimental Design
 Relates to the number of measurement or simulation experiments to be
 11. Inappropriate Level of Detail
 Can have too much! Ex: modeling disk
 Can have too little! Ex: analytic model for congested router
 12. No Analysis
 Having a measurement expert is desirable but not enough
 Expertise in analyzing results is crucial
Common Mistakes (13-16)
 13. Erroneous Analysis
 E.g., take averages on too short simulations
 14. No Sensitivity Analysis
 Analysis is evidence and not fact
 Need to determine how sensitive results are to settings
 15. Ignoring Errors in Input
 Often parameters of interest cannot be measured;
Instead, they are estimated using other variables
 Adjust the level of confidence on the model output
 16. Improper Treatment of Outliers
 Outliers are values that are too high or too low
compared to a majority of values
 If possible in real systems or workloads, do not ignore them
Common Mistakes (17-20)
 17. Assuming No Change in the Future
 Workload may change in the future
 18. Ignoring Variability
 If variability is high, the mean performance alone
may be misleading
 19. Too Complex Analysis
 A simpler and easier to explain analysis
should be preferred
 20. Improper Presentation of Results
 It is not the number of graphs,
but the number of graphs that help make decisions
Common Mistakes (21-22)
 21. Ignoring Social Aspects
 Writing and speaking are social skills
 22. Omitting Assumptions and Limitations
 E.g.: may assume most traffic TCP, whereas some links may have
significant UDP traffic
 May lead to applying results
where assumptions do not hold
Checklist for Avoiding Common Mistakes in
Performance Evaluation
Is the system correctly defined and the goals are clearly stated?
Are the goals stated in an unbiased manner?
Have all the steps of the analysis followed systematically?
Is the problem clearly understood before analyzing it?
Are the performance metrics relevant for this problem?
Is the workload correct for this problem?
Is the evaluation technique appropriate?
Is the list of parameters that affect performance complete?
Have all parameters that affect performance been chosen as factors to be varied?
Is the experimental design efficient in terms of time and results?
Is the level of detail proper?
Is the measured data presented with analysis and interpretation?
Is the analysis statistically correct?
Has the sensitivity analysis been done?
Would errors in the input cause an insignificant change in the results?
Have the outliers in the input or the output been treated properly?
Have the future changes in the system and workload been modeled?
Has the variance of input been taken into account?
Has the variance of the results been analyzed?
Is the analysis easy to explain?
Is the presentation style suitable for its audience?
Have the results been presented graphically as much as possible?
Are the assumptions and limitations of the analysis clearly documented?
 Objectives
 The Art
 Common Mistakes
 Systematic Approach
 Case Study
A Systematic Approach
State goals and define boundaries
List services and outcomes
Select performance metrics
List system and workload parameters
Select factors and values
Select evaluation techniques
Select workload
Design experiments
Analyze and interpret the data
Present the results. Repeat.
State Goals and Define Boundaries
 Just “measuring performance” or
“seeing how it works” is too broad
 E.g.: goal is to decide which ISP
provides better throughput
 Definition of system may depend upon goals
 E.g.: if measuring CPU instruction speed,
system may include CPU + cache
 E.g.: if measuring response time, system may include CPU +
memory + … + OS + user workload
List Services and Outcomes
 List services provided by the system
 E.g., a computer network allows users
to send packets to specified destinations
 E.g., a database system responds to queries
 E.g., a processor performs a number of tasks
 A user request for any of these services results in
a number of possible outcomes (desirable or not)
 E.g., a database system may answer correctly, incorrectly (due
to inconsistent updates), or
not at all (due to deadlocks)
Select Metrics
 Criteria to compare performance
 In general, related to speed, accuracy and/or availability of
system services
 E.g.: network performance
 Speed: throughput and delay
 Accuracy: error rate
 Availability: data packets sent do arrive
 E.g.: processor performance
 Speed: time to execute instructions
List Parameters
 List all parameters that affect performance
 System parameters (hardware and software)
 E.g.: CPU type, OS type, …
 Workload parameters
 E.g.: Number of users, type of requests
 List may not be initially complete,
so have a working list and let grow as progress
Select Factors to Study
 Divide parameters into those that are
to be studied and those that are not
 E.g.: may vary CPU type but fix OS type
 E.g.: may fix packet size but vary
number of connections
 Select appropriate levels for each factor
 Want typical and ones with potentially high impact
 For workload often smaller (1/2 or 1/10th) and larger (2x or
10x) range
 Start small or number can quickly overcome available
Select Evaluation Technique
 Depends upon time, resources, and
desired level of accuracy
 Analytic modeling
 Quick, less accurate
 Simulation
 Medium effort, medium accuracy
 Measurement
 Typical most effort, most accurate
 Note, above are all typical
but can be reversed in some cases!
Select Workload
 Set of service requests to system
 Depends upon measurement technique
 Analytic model may have probability
of various requests
 Simulation may have trace of requests
from real system
 Measurement may have scripts impose transactions
 Should be representative of real life
Design Experiments
 Want to maximize results with minimal effort
 Phase 1:
 Many factors, few levels
 See which factors matter
 Phase 2:
 Few factors, more levels
 See where the range of impact for the factors is
Analyze and Interpret Data
 Compare alternatives
 Take into account variability of results
 Statistical techniques
 Interpret results
 The analysis does not provide a conclusion
 Different analysts may come to different conclusions
Present Results
 Make it easily understood
 Graphs
 Disseminate (entire methodology!)
"The job of a scientist is not merely to see: it is to see,
understand, and communicate. Leave out any of these
phases, and you're not doing science. If you don't see,
but you do understand and communicate, you're a
prophet, not a scientist. If you don't understand, but
you do see and communicate, you're a reporter, not a
scientist. If you don't communicate, but you do see and
understand, you're a mystic, not a scientist."
 Objectives
 The Art
 Common Mistakes
 Systematic Approach
 Case Study
Case Study
 Consider remote pipes (rpipe) versus remote procedure calls
 rpc is like procedure call but procedure is handled on remote
 Client caller blocks until return
 rpipe is like pipe but server gets output on remote machine
 Client process can continue, non-blocking
 Results are returned asynchronously
 Goal: study the performance of applications using rpipes to
similar applications using rpcs
System Definition
 Client and Server and Network
 Key component is “channel”, either a rpipe or an rpc
 Only the subset of the client and server that handle channel are
part of the system
- Try to minimize effect of components
outside system
 There are a variety of services that can happen over a rpipe
or rpc
 Choose data transfer as a common one,
with data being a typical result of
most client-server interactions
 Classify amount of data as either large or small
 Thus, two services:
 Small data transfer
 Large data transfer
 Limit metrics to correct operation only
(no failure or errors)
 Study service rate and resources consumed
 Performance metrics
 A) elapsed time per call
 B) maximum call rate per unit time
 C) Local CPU time per call
 D) Remote CPU time per call
 E) Number of bytes sent per call
 Speed of CPUs
 Local
 Remote
 Network
 Speed
 Reliability (retrans)
 Operating system overhead
 Time between calls
 Number and sizes
 of parameters
 of results
 Type of channel
 rpc
 Rpipe
 For interfacing with channels
 Other loads
 For interfacing with network
 On CPUs
 On network
Key Factors
 A) Type of channel
 rpipe or rpc
 B) Speed of network
 Choose short (LAN) and across country (WAN)
 C) Size of parameters
 Small or larger
 D) Number of calls
 11 values: 8, 16, 32 …1024
 E) All other parameters are fixed
 (Note, try to run during “light” network load)
Evaluation Technique
 Since there are prototypes, use measurement
 Use analytic modeling based on measured data
for values outside the scope of the experiments conducted
 Synthetic program generated
specified channel requests
 Will also monitor resources consumed and log results
 Use “null” channel requests to get baseline resources
consumed by logging
 Heisenberg uncertainty principle in physics:
“the measurement of position necessarily disturbs a particle's momentum,
and vice versa—i.e., that the uncertainty principle is a manifestation
of the observer effect”
Experimental Design
 Full factorial (all possible combinations of factors)
 2 channels, 2 network speeds, 2 sizes, 11 numbers of calls
  2 x 2 x 2 x 11 = 88 experiments
Data Analysis
 Analysis of variance will be used to quantify
the first three factors
 Are they different?
 Regression will be used to quantify the effects of
n consecutive calls
 Performance is linear? Exponential?
Data Presentation
 The final results will be plotted as a function
of the block size n