1 hen we analyze some quality of service (QoS) properties, such...

advertisement
1
Towards a Bell-Curve Calculus and its
Application to e-Science
1
Lin Yang ,
1
Alan Bundy ,
Dave Berry 2, and Conrad Hughes
l.yang@ed.ac.uk bundy@inf.ed.ac.uk daveb@nesc.ac.uk
1
conrad@nesc.ac.uk
W
hen we analyze some quality of service (QoS) properties, such as run time, accuracy and reliability, errors need to be taken into
account. For worst-case analysis, we could use interval arithmetic and propagate error bounds to get the biggest accumulative
error. The idea of interval arithmetic is to extend a numeric value to an error bound, e.g. we use the interval [41, 43] to represent the
possible value of 42. Extended numeric analysis is used as the way of interval propagations in workflows. The simplest example is for
unary and monotonically increasing function f(x), the extended function f*([a, b]) = [f (a), f (b)] [1].
For average case analysis, we propose to use normal distribution (bell-curve) to add the concept of probability to differentiate the
likely from the unlikely values of the QoS properties. That is, we can represent the probability density function (p.d.f.) using a
bell-curve.
The biggest advantage of using bell-curve is that the function has only two parameters: mean value µ and standard deviation σ ,
so it is quite efficient to store and propagate in workflows.
2
The Central Limit Theory gives us some theoretical support to this programme . Moreover, experiments on QoS properties in DIGS
project has also shown that a bell-curve is a possible approximation to probabilistic behaviour of runtime, accuracy and reliability,
where reliability is interpreted as mean time to failure here, so that it can be represented as bell-curve instead of a single value.
There are at least four ways to combine Grid services (we use services
Sequential:
S 2 is
Parallel_All:
S1
S1
and S 2 to represent two arbitrary services) [2].
invoked after S1’
s invocation and the input of S 2 is the output of S1.
and S 2 are invoked simultaneously and the outputs are both passed to the next service.
Parallel_First: The output of whichever of S1 and S2 first succeeds is passed to the next service.
Conditional:
S1
is invoked first. If it succeeds, its output is the output of the workflow; if it fails,
S2
is invoked and the output
of S 2 is the output of the whole workflow.
In terms of the three QoS properties and four basic structures, we have twelve fundamental combination functions (see Table 1).
For instance, the combination function of run time in sequential structure is the sum of the run time of the component services.
Moreover, the parameters of the combination function µ0 and σ 0 need to be calculated in terms of the parameters of the
inputs µ1 , µ2 , σ 1 and σ 2 . For example, for runtime in sequential structure, we use µ0 = µ 1 + µ 2 and σ 0 = σ12 +σ 2 2 to do the
approximation (it will of course have a beautiful result since the equations are proved true in mathematical way). Using Agrajag4, we
will get a perfect match (the error is generated by the limited calculation in the approximation method in Agrajag) of the piecewise
uniform approximation curve (blue curve) and the true sum curve (mauve curve) (see Figure 1).
1
2
3
School of Informatics, University of Edinburgh
National e-Science Centre
DIGS (Dependability Infrastructure for Grid Services) is an EPSRC-funded project, to investigate in fault -tolerance system and other quality of service
issues in service-oriented architectures.
4
Agrajag is a framework written in Perl and C, developed by Conrad Hughes, to implement some operations and measurements on some basic models
of stochastic distributions.
2
TABLE 1
T HE T WELVE FUNDAMENTAL COMBINATION FUNCTIONS
Seq
Para_All
Para_Fir
Cond
run time
sum
max
min
cond1
accuracy
mult
combine1
varies?
cond2
reliability
mult
combine2
varies?
cond3
The table shows the twelve fundamental combination functions in terms of three QoS properties
and the four basic combination methods. Sum, max, min and mult represent respectively taking
the sum, maximum, minimum and multiplication of the input bell-curves. Varies means the functions
have not been defined and may have any form according to various situations. Cond1-3 are three
different conditional functions and their calculation ways depend on the succeeding results.
Most of our combination functions need to be defined by ourselves and tested in Agrajag. For example, for runtime in parallel_all
structure, we need to get the maximum of two bell curves. Through systematic experimentation using Agrajag, we will discover that
in most common situations, to get the maximum of two bell curves , the effect of approximating the output curve
using σ 0 = max(σ 1 ,σ 2 )is better than that using σ0 = σ12 +σ 2 2 or that using σ 0 = 1(1 σ 1 + 1 σ 2 ) . To make comparisons of all sorts
of approximation methods and find the best one based on different situations are our main goals .
FIGURE 1
T HE SUM OF T WO BELL CURVES AND I TS APPROXIMATION
As shown above, we will test our candidate formulae of combinations in Agrajag to see if the errors are acceptable.
REFERENCES
[1] A. Bundy, “The Estimate of Accuracy”
CISA BlueNote 1416, School of Informatics, UoE, 2002
[2] A. Bundy, “Towards a Bell-Curve Calculus and its Application to e-Science”
CISA BlueNote 1509, School of Informatics, UoE, 2005
Download