Introduction to Analytical Modeling

advertisement
Introduction to Analytical Modeling
Gregory V. Caliri
BMC Software, Inc.
Waltham MA USA
ABSTRACT
Analytical models are constructed and used by capacity planners to predict computing resource requirements
related to workload behavior, content, and volume changes, and to measure effects of hardware and software
changes. Developing the analytical model provides the capacity planner with an opportunity to study and
understand the various behavior patterns of work and hardware that currently exist. Certain factors must be
taken into consideration to avoid common errors in model construction, analysis, and predictions.
Definition of Analytical Modeling
What is an analytical model? By pure definition and
in terms of being applied to computer systems, it is a
set of equations describing the performance of a
computer system1. In practical terms, it describes a
collection of measured and calculated behaviors of
different elements over a finite period of time within
the computer system – workloads, hardware, software,
and the CPU itself, and can even include the actions
and behaviors of its users and support personnel.
In most instances, the capacity planner constructs the
model using activity measurement information
generated and collected during one or more time
intervals. It is critical that an interval or series of
intervals be used that contain significant volumes of
business-critical activity.
Units of work are then
characterized by type and grouped into workloads. The
capacity analyst can then translate future business
requirements into measurable units of computing
resource consumption, and calculate capacity and
performance projections for workloads.
changes.
Some will even carry the use of an
analytical model beyond entering changes to the
current system or set of systems and use it as input to a
second model so as to measure the effects of the
combination of two existing systems.
For most sites, the projection of capacity requirements
and future performance are the objectives behind the
capacity planning effort. In these "what-if" analysis
situations, the capacity planner follows a several step
process consisting of the following steps:
- Receives projections for future business computing
requirements
Translates those business requirements into data
processing resource requirements based on the
information contained in the model, and other sources,
if the model does not contain sufficient workloads with
characteristics meeting those requirements
Calculates the status of the system after the new
workload requirements have been input
-
Purposes for building Analytical models
Some users will construct analytical models to merely
gain an understanding of the current activity on the
system and to measure performance and analyze
behavior of the workloads and hardware within it.
Others will use them as a basis for prediction of
behavior of certain elements of work within a system
by inputting changes to different components of the
system; one might include changes to faster or slower
hardware,
configuration changes,
increased or
decreased or altered workload arrival patterns.
Reports results to management, listing any
available options.
Starting off
If you've never engaged in capacity planning,
implementing the process in your enterprise. Very
simply:
-
Define and identify the purpose(s) for your
modeling study
-
Ensure that sufficient collection mechanisms and
analytical tools are available to support model
construction and analysis
-
Characterize workloads according to a set of rules
and definitions
-
Identify the intervals of time and critical
workloads for study
-
Accept input and business requirements from your
user community
-
Establish a standard method to report results back
to management
Let's review each of these steps.
Define the purpose for the modeling study
It is important to define exactly what the purpose is for
building models and what their specific uses will be.
Most will use the model to execute a series of "whatif" changes to the environment by making alterations
to the analytical model -- workload volume increases
or decreases, hardware changes, or addition of new
users and transactions. Performance results are then
measured.
As a parallel function, an analytical
model can be used to model changes to the existing
environment that will allow the analyst to tune the
system for improved performance.
Refining the objective for the use of the model can also
serve to streamline the process. For instance, are we
only concerned about CPU capacity? Must we control
response time of certain mission-critical workloads?
Detailed modeling of database changes? Will you be
analyzing and tuning for typically heavy use periods,
or only doing so for peak periods? Each scenario
listed would entail different levels of data collection
and varying complexities in workload characterization.
Of course, if an analytical model has reduced detail
and very coarse granularity in its components, it will
not be as flexible and will not be able to be used to
return specific esoteric results.
Establishing the purpose(s) for the modeling study will
affect the total approach that is taken to model
construction, characterization of workloads, and series
of analytical iterations to be performed with the model.
Definition of the modeling goal will also lead to
increased confidence in the results of the study.
Data collection and retention; model construction
A data collection routine must be designed and
implemented, and the data collected must be robust
enough so that appropriate records are available to
identify all components and all pertinent workload
activity. On an OS/3902 system, this would include
all SMF3 job and task related records (type 30s), all
pertinent RMF4 records for configuration, hardware
activity, workload activity (types 70 - 75, and type 78
records if collected), and all appropriate database and
online activity monitor records (IMS DC Monitor 5,
DB26 related SMF, IMF7 etc.).
Some sites will find it impossible to generate, collect,
and archive data with extreme granularity for extended
periods of time. In these instances, it is recommended
that prime intervals for modeling be identified early in
the process and that the data is kept from these periods
of time, even if certain monitoring instrumentation
mechanisms have to be deployed.
Similar, but less detailed data collection mechanisms
exist on UNIX systems.
Often the analyst must
execute series of UNIX commands, collect the output
from those commands and later generate reports and
input for modeling from that output.
There are several commercially available measurement
and capacity planning tools available. These packages
provide their own collectors to generate measurement
data that will permit creation of an analytical model.
Organize and characterize workloads according to
a set of rules and definitions
This is probably the most difficult task because it is
highly subjective. As with other steps, errors made
here can be carried forward through the process and
cause improper results.
To begin workload characterization, you must study all
units of work in the enterprise at an extremely granular
and low level. This will give the capacity planner an
understanding of system activity and behavior patterns.
If data collection was set up properly, this should be
possible.
Classify work according to its type -- batch, online,
TSO, query transactions, long / short transactions,
utilities, long and short processes.
As part of the previous step, you should have already
made computing activity and resource consumption
trackable and identifiable.
The mission-critical
workload definitions should already be roughly
established.
more intervals for modeling that have different
mixes of work and build separate models.
From this point, begin to classify work and build
workloads by the type of work that it is, and do so
from a system activity standpoint.
In an OS/390
system, batch should be classified as short, long, and
"hot" and further grouped as to its service. For
instance, production batch serving the business might
be placed in one set of workloads, and internal work of
some type would be placed in others; online database
transactions should be identified and grouped not only
as to its production or test role but also by its function.
The most important rule to follow in area is to ensure
that your model contains a robust workload mix and
the most critical workloads executing at a significant
and typical activity level. A baseline analytical model
should come reasonably close to representing a
realistic situation.
Some will attempt to perform the capacity planning
process by classifying work by user communities or
account codes. This approach is only valid if the work
within each user group or accounting code group is
also classified as to the type of work and placed into its
own workloads.
Erroneous projections are often
produced when user counts are employed.
This
approach assumes that additional users will exhibit the
exact behavior and execute work with the same
distributions and resource consumption compositions
as the existing user community.
Obviously, there are many methods of internal
communication and sometimes these may be dictated
by corporate culture.
One suggested method for
receiving input from users is to hold a monthly
meeting with a representative from each of your user
communities.
Identify the intervals of time and critical workloads
for study
When selecting an appropriate interval of time to
measure and input into the construction of analytical
models, observe the following:
1) Attempt to select a period of high, but not
completely saturated system utilization.
2) Keep in mind your objectives for modeling, and
ensure that the model contains all of the critical
workloads to be measured and observed.
3) Do not use intervals of time that contain
anomalies of activity, such as looping processes,
crashed regions, application outages, and other
factors that are likely to cause unrealistic
measurements.
4) The mix of workloads and their activity will
change from one time of day to another.
In
OS/390 mainframe systems,
this is rather
common; often there will be a high volume of
online, real-time transactional processing during
the standard business day and a concentration of
batch work during the evening hours. Situations
containing the same variances can exist in other
platforms as well. In such instances, select two or
Accept input and business requirements from your
user community
This meeting can be used by the capacity planner to
receive input from, and deliver feedback to user groups
and explain the current state of the enterprise in plain
language. There is also a side benefit to this meeting;
different user groups can communicate with each other
on upcoming projects. Often duplication of effort is
eliminated because two or more groups determine that
they are doing the same work, and with a cooperative
effort, save system development time and use fewer
computing resources.
It is also an excellent opportunity to release, distribute
and explain the monthly or quarterly performance and
capacity plan to users.
Establish a standard method to report results back
to management
Capacity planners often issue a monthly or quarterly
report to management.
The report should be
straightforward, and offer brief explanations of
performance results of critical workloads.
There
should also be a report on the state of the enterprise’s
capacity, with capacity and performance expectations
based on growth projections.
Revisions to the
capacity plan and the reasons for them should also be
included.
One mistake often made is the inclusion of too much
irrelevant information in reports or presentations. In
most cases, upper management personnel do not have
the time nor the interest to wade through technical
jargon and attempt its translation.
Use of visuals
can cut through the technological language barrier.
Often the capacity planner gets into a quandary – he or
she has to provide a high level report for management
and executives, but may also be challenged by
technical personnel to explain the report in technical
terms. In such instances, you must have the technical
detail available and make it available to those who
wish to see it. You will be asked for it at some point
in time, and it might be advisable to distribute the high
level report and extend an invitation to your audience
to read the extended technical report.
If actions must be taken, executives often wish to have
a variety of viable options and the benefits and
consequences of each put before them. Avoid listing
only one possible solution to management to solve a
problem and refrain from presenting options that are
not practically possible to implement.
Queuing theory and its role in Analytical Modeling
The mathematical basis for many analytical modeling
studies is the application of queuing theory. In plain
English, it is a mechanism to reflect the length of
time that a task waits to receive service and queue
length times are calculated based on the speed that a
unit providing service (or server, not to be confused
with a "file server", etc.) can provide and the number
of requests to be processed.
If one thinks of a single device – for instance, a disk,
or a channel, or a CPU as a "server" - the following
formula can be applied to determine the average
response time for a transaction to be handled at that
one service point, or server.
This formula is known
as "Little's Law".
Rt
= Response time, or the time that the transaction
enters the queue until the request is satisfied
S
= Service time , or the amount of time that the
server itself spends handling the request
Tx = The number of transactions receiving or
awaiting service at any one time
The formula:
Rt =
S / (1-(Tx*S))
If Tx*S is equal to or greater than one, then the server
is considered to be saturated, as transactions are
arriving at the server at a greater rate than the server
can handle.
To demonstrate this formula, let's assume that a
serving CPU can service a request in 50 milliseconds,
or .05 second.
We can then input transactions per
hour, and divide by 3600 to obtain transactions per
second. Using the formula, we can input transaction
counts and determine where the response time will
degrade noticeably, and where the server will saturate.
With lower arrival rates, the response time hovers very
close to the service time. There is very little queuing
taking place for the first 20000 transactions per hour.
However, when the total is doubled to 40000 per hour,
the queuing time accelerates to 64 milliseconds, and
the transactions are spending more time queued for
service than they are actually receiving service. The
queuing time rises with a more rapid rate as more
transactions are input to the server unit.
In the
rightmost column, you will note that if the service time
were reduced, the queues for transactions would be
shorter and the response time would not be noticeable
at 72000 transactions per hour as they are with 50ms
service time.
There is also a column listing response time
calculations if service time for the transaction were
improved and reduced to 30 ms.
Analysis of a single server's response time by
arrival rate; service time is constant at 50 ms.
Trans/
Hr
Trans/
Sec
Pct.
Server
busy
Response
time
(service
time .05s)
Queue
Time
Response if
service time
is .03s
10000
2.778
13.89
0.058
0.008
0.033
11000
3.056
15.28
0.059
0.009
0.033
12000
3.333
16.67
0.060
0.010
0.033
13000
3.611
18.06
0.061
0.011
0.034
14000
3.889
19.44
0.062
0.012
0.034
15000
4.167
20.83
0.063
0.013
0.034
16000
4.444
22.22
0.064
0.014
0.035
17000
4.722
23.61
0.065
0.015
0.035
18000
5
25.00
0.067
0.017
0.035
19000
5.278
26.39
0.068
0.018
0.036
20000
5.556
27.78
0.069
0.019
0.036
40000
11.11
55.56
0.113
0.063
0.045
60000
16.67
83.33
0.300
0.250
0.060
70000
19.44
97.22
1.800
1.750
0.072
71000
19.72
98.61
3.600
3.550
0.073
71500
19.86
99.31
7.200
7.150
0.074
72000
20
100.00 Saturated Saturated
0.075
What should be evident is that there is a definite point
where the response time begins to markedly curve
upward!
Now, one must consider that a process traveling
through various points of service in a computing
system will have to undergo some type of queuing
process at each point. The length of time spent in all
of these queues, plus the service time spent at each
point of service, comprises the response time for a
single process or transaction. Mathematical formulae
exist for explanation and calculation of a process in a
multi-point system, but they are beyond the scope of
an introductory paper. The extended explanation of
the above formula and its practical application with
multiple points of service can be found at its source;
this was extracted from a paper by Dr. Jeffrey Buzen,
“A Simple Model of Transaction Processing”,
contained in the 1984 CMG Proceedings.
It is possible to simulate operations of several hundred
terminals using scripted keystroke files. However, it
may not be practical to perform such simulations with
thousands of terminals.
One of the research documents that the author
encountered discussed the prospect of experimentation
and, with a touch of humor, conveyed that some
experiments are unfeasible due to safety reasons. He
cited two prime examples. One was the scenario of a
jetliner, carrying a full load of passengers, and then
attempting a landing with one engine shut off. The
other was the possibility of driving a nuclear reactor to
the point of critical mass so those researchers could
definitively prove where the point actually occurred! 7
It is conceded that stress testing or overloading of a
computer system to determine its points of
performance degradation would not carry the
possibility of disaster that these previous experiments
would carry, but one would wish to avoid them
nonetheless.
Modeling methodologies other than analytical
Two other modeling methods are often used to
determine current status of computing systems and to
model any changes to them. The first is the use of
experimental models. Measuring existing situations or
even creating new situations and measuring the
performance results performs experimentation and the
percentage of used capacity. Benchmark workloads
are run on new hardware and/or software environments
and true performance measurements are collected.
When commercial computing environments were
much smaller than they are today, it was relatively
easy to simulate an actual business environment.
Indeed, the "stress test" was a commonplace
occurrence within the MIS world. A number of
individuals were handed a scripted series of
instructions to follow at a certain time, and
performance results were measured at the conclusion
of the test. It is still the most accurate method of
computer performance prediction. However, several
problems arise in the running experiments.
In today's world of MIS, the volumes of transactions
processed are so high that it is impossible in many
cases to obtain a true experimental reading of what
might happen after changes are executed. How many
individuals would be needed to enter 50,000 online
terminal transactions in an hour, or generate a number
of hits from varied locations to a web server to
duplicate the effort?
Furthermore, with today's
24/7/365 expectations, an enterprise may not have the
machine, time, and personnel resources with which
practical experimentation can be performed.
Another modeling technique in use today, and gaining
popularity in some areas of computing is simulation.
The following quotation provides a down-to-earth
explanation as to how simulation models work:
"The simulation model describes the operation of the
system in terms of individual events of the individual
elements in the system. The interrelationships among
the elements are also built into the model. Then the
model allows the computing device to capture the
effect of the elements' action on each other as a
dynamic process."(Kobayashi, 1981)8
In short, this is stating that a simulation model
describes workloads, and the different components of
each, as well as the results of the continuous
interaction of the different components of the
computer system as time proceeds.
Several factors make it difficult to use simulation
models for larger systems that contain multiple
workloads and devices. The most notable reasons are
that there are too many variances in behavior of
different devices and workloads which compose the
workloads over a period of time, and that the arrival of
transactions used as input to the simulation will
probably have an uneven distribution.
This leads us
to begin consideration of the less-complex, but highly
effective analytical modeling technique. It must also
be noted that for several years, simulation modeling
has been a proven, effective methodology used in less
complex analyses such as network traffic modeling
and prediction.
Because network traffic generally
has fewer variances in its composition, and packets
generally do not interact with each other, simulation
techniques can be applied in a practical fashion.
There have also been some other applications and
hybrid techniques developed through the years; one is
called “Simalytic Modeling”, and it combines both
analytical and simulation techniques in the same
model. An excellent paper (Norton) on this hybrid
methodology is noted in the recommended reading
section below.
Future of Analytical Modeling
Analytical models, or products using analytical
analysis and queueing theory, and the tools to create
and analyze them and to report and predict
performance will continue to enjoy widespread use.
In large scale computer systems running applications
that contain a high degree of variation of activity, it
will remain a highly practical method of analysis
because of its relative simplicity and practicality.
New technologies have emerged that will force
changes in methods of model construction and "whatif" exercises.
Internal and operational architecture
changes in the mainframe arena will lead to a complete
revision of the modeling paradigm, and analytical
modeling should continue to service the mainframe
realm.
approaches for effective capacity planning.
The
availability of statistical data, the platforms used for
processing, and the objectives and complexities of
studies can dictate the methodology to be used for
capacity planning.
References
(1) Buzen, Dr. Jeffrey P., “A Simple Model of
Transaction Processing”, CMG Proceedings, 1984.
(2),(3),(4),(5) OS/390, RMF, SMF, and IMS DC
Monitor are trademarks of IBM Corporation, White
Plains, NY.
(6) IMF is a trademark of BMC Software, Inc.,
Houston, TX.
(7) Extracted from an Internet WWW home page
(http://staff.um.edu.mt/jskl/simul.html), which is an
extract from the text of “Simulation”, by J. Skelnar,
University of Malta, 1995.
(8) Kobayashi, Hisashi. , “Modeling and Analysis:
An Introduction to System Performance Evaluation
Methodology.” The Systems Programming Series.
Reading, MA: Addison-Wesley Publishing Company,
1984. (quote attributed by Norton, below)
Actual stress test experiments will likely not be as
prevalent as they were in the past for larger interactive
applications, simply because of the large scale efforts
required to plan for them and the human and system
resources required to execute them. However, it will
certainly be used to benchmark hardware, vendor
software, and even batch cycle testing.
Recommended reading – in addition to the above
Simulation has been used for many years to provide a
more detailed analysis of systems with workload
components that contain limited variability.
Simulation has and will continue to come into play in
the world of the Internet.
With the rise of Ecommerce, simulation modeling appears to be a viable
method for modeling web-based, multi-platform, and
network applications.
“Using Analytical Modeling to Ensure Client/Server
Application Performance”, Leganza, Gene, Cayenne
Systems, CMG Proceedings, 1996. (and other works
by Leganza in CMG Proceedings that deal with stress
testing).
Conclusion
This paper has touched upon several high-level areas
of capacity planning and specifically, use of an
analytical model as a primary tool. While analytical
modeling is but one method in use today, different
platforms and applications may require other
“Simalytic Enterprise Modeling - The Best of Both
Worlds”, Norton, Tim R., Doctoral Candidate,
Colorado Technical University CMG Proceedings,
1996 (and many other works by Norton found in
CMG Proceedings through the years)
Download