Identification Method of Critical Risk Factors in Software Project Risk

advertisement
Identification of Critical Risk Factors in Software Project Risk Management
Guo-ping Jiang, Ming-chi Lin
Department of Economics and Management, Naval University of Engineering,
Wuhan Hubei, China, 430033
(gpjiang1029@163.com)
Abstract - The most important function of software
metrics is to support quantitative decision of software
project management. In this paper, we focus on
identification of the most critical risk factors in software
project risk management framework based on metrics and
Bayesian network. Sensitivity analysis can be performed to
study how sensitive the risk node’s probability is according
to small changes of probability parameters in the risk BN.
For a risk BN of known structure and probability
parameters, we first estimate the most probable risk
scenario, and then perform sensitivity analysis for the risk
node. After we find the critical risk factors, concentrate on
these factors in risk monitoring and control process.
Keywords - Bayesian Network, Critical Risk Factors,
Most Probable Explanation, Software Metrics, Sensitivity
Analysis
I.
INTRODUCTION
Software project risk management has got a lot of
attention in IT industry. Many research institutions made
great effort in this domain and harvested the fruits of their
labors. Software metrics is another important subject in
software project management. Everything that you want
to control must be measured first. The most important
requirement of software metrics is to support quantitative
decision of software project management. For software
risk management, software metrics can also work.
It is accepted gradually in software engineering
community that software metrics are essential aids to
managing software project. Metrics can be used to
evaluate development process and the quality of software
products. Metrics can also assist in risk management and
control by identifying areas of possible risk and then to
prioritize and track the risks.
A relative new and hopeful approach of software
project risk management utilizing software metrics is
Bayesian Network (BN, for short). Risk BN model can
handle [6]:
• Diverse process and product variables;
• Empirical evidence and expert judgment;
• Genuine cause and effect relationships;
• Uncertainty;
• Incomplete information.
The aim of this paper is to discuss how to identify
critical risk factors in risk BN model based on software
metrics. In section 2, we introduce the software project
risk management framework and bring forward the
problem of critical risk factors identification, and then
point out the technical route of solving this problem. In
Natural Science Fund by Naval University of Engineering:
HGDYDJJ11011
section 3, we discuss how to estimate the most probable
risk scenario of the risk BN model. Identification method
of critical risk factors based on BN sensitivity analysis is
put forward in section 4. In section 5, an illustration
example is presented to show how the method can be
used. In the last section, we summarize this paper.
II.
SOFTEARE PROJECT RISK MANAGEMENT
FRAMEWORK
Bayesian network has got attention in artificial
intelligence domain in last 20 years. BN is the product of
graph theory and probability theory. It is a complicated
weighted causal relationship graph. Nodes in BN denote
variables in the problem domain, and arcs represent direct
causal relation between variables. The essence of BN is
the joint probability distribution of the problem under
study.
To model certain risk as a BN in software develop
process, the first thing is to analysis causal relationship
between risk and risk factors, thus establish model’s
topology structure; then educe conditional probability
table for every node integrating historical data and
expertise[9]. Characters of risk and risk factors can be
depicted by software metrics, so we model the correlative
software metrics into the risk BN. In software develop
process, manager observe and control these metrics at
different stages thus to alleviate risks before their
occurrence. Manager can also evaluate risk’s effect by
observing software metrics’ changes.
Every risk BN has a node that denotes the risk under
study, in this paper, let X 0 be the risk node, and
( X 0 , X1 , X 2 ,... X n )
be
the
set
of
nodes;
( X m1 , X m2 ,..., X mi )  ( X1 , X 2 ,... X n ) be the set of metrics
nodes in risk BN[4]. We suppose that risk node is a leaf
node. This is feasible, for we don’t model risk’s influence
into the risk BN at present.
We list risk management functions that can be
performed by risk BN as follows[10]:
• Estimate risk probability;
• Estimate the most probable risk scenario;
• Identify critical risk factors;
• Appraise risk control plan, support project risk
management decision.
Risk factors are causations that produce the risk, and
a risk must have some critical risk factors. Critical Risk
factors identification is an important activity in risk
management. For a risk BN, how to find out critical risk
factors? Performing sensitivity analysis is the general
method. For a Bayesian network, performing sensitivity
analysis yields insight in the relation between the
probability parameters of the network and its posterior
marginal. The relationship between post marginal
probability of risk node and probability parameters of the
BN can be obtained through performing a sensitivity
analysis. If we can get the most sensitive probability
parameters then these probability parameters are the
critical risk factors. To perform sensitivity analysis for a
risk BN is to calculate Pr( X 0 | O) , where O is the set of
observed evidence. We limit evidence to software
metrics. The result of Pr( X 0 | O) must be a function of
BN’s probability parameters. The probability parameters
which have bigger coefficients are sensitive parameters.
For risk BN, they are critical risk factors.
There are three instances we will confront when
identify critical risk factors. First, some evidence O
have been observed, O   , thus we will calculate
Pr( X 0 | O) ; Another instance, O   while BN’s
probability parameters are not all known, that is to say,
we will calculate Pr( X 0 |  ) with some unknown
probability parameters. The third instance, O   and
risk BN’s probability parameters are all known. For this
instance, we can estimate the most probable risk scenario
( X 0  x0 , X1  x1 ,..., X n  xn ) for this risk BN, and then
take these metric nodes value as evidence,
let O  ( X m1  xm1 ,..., X mi  xmi ) . For all of these three
instances, calculation of Pr( X 0 | O) is the focus.
III. ESTIMATE THE MOST PROBABLE RISK
SCENARIO
For each value of the risk node x0k , perform
probability inference to compute the most probable
explanation x K ( x0 k , x1k ,..., xnk ) and the corresponding
joint probability pk . Comparing these
pk s, the
assignment x K ( x0 k , x1k ,..., xnk ) that has the biggest pk
is the most probable risk scenario. So the essential of
estimating the Most Probably Risk Scenario (MPRS, for
short) for a given risk BN is to calculate the Most
Probable Explanation (MPE, for short) of this BN given
the evidence X 0  x0k .
Definition 1[3] (Most Probable Explanation): for a
given Bayesian network, the Most Probable Explanation
is a complete assignment ( X 0  x0 ,..., X n  xn ) which is
agree with available evidence, and has the highest
probability among all such assignments.
Here evidence is X 0  x0k .
There are two solutions for this problem[7].
Solution 1: Chain rule
Obtain the joint probability contribution of risk BN
according to the chain rule, and then compare the joint
probability of every complete assignment. The
assignment x( x0 , x1 ,..., xn ) which has the highest joint
probability is the most probable risk scenario.
Chain rule: Pr ( x)   p( xi | pa( X i ))
X i V ( G )
For a given BN, this method is simple and clear, but
it needs a great deal of computation, especial when BN
has many nodes, and every node has several values. So
we think of another method, namely Bucket elimination
algorithm.
Solution 2: Bucket Elimination
Bucket Elimination (BE, for short) is a variable
elimination algorithm. BE is one of many algorithms for
probability inference in BN. The most outstanding
character of BE is simplicity and generality, without
introducing new terminology as other algorithms. Figure
1 shows the algorithm to compute MPRS based on BE
algorithm, where elim-mpe is a BE algorithm introduced
in reference [7].
Generally speaking, nodes elimination order is a
NP-hard problem. But for any arbitrary order, BE
algorithm can get the MPE result in any case; the only
difference is computation time and speed. To our research
problem in this paper, we take these steps to obtain nodes
elimination order: from risk node backward to root nodes
along arcs between nodes, in the trace process, for every
node, first left branch then right. The order obtained by
this regulation may be not the best one, but it is enough
just for this problem.
IV. IDENTIFY CRITICAL RISK FACTORS
Sensitivity analysis is a general technique for
studying the effects of the inaccuracies in the parameters
of a mathematical model on this model’s output.
Sensitivity analysis basically amounts to systematically
varying the values of the parameter of the model under
study. For BN, sensitivity analysis equals to vary the
assessments for conditional probability of the network’s
nodes[8]. Sensitivity analysis provides for studying the
effects of the inaccuracy in the specified assessment on a
probability of interest. Sensitivity analysis can be
performed to study how sensitivity the risk node’s
probability is according to small changes in the
probability parameters in risk BN. Post probability
Pr ( X 0 | O) of the risk node X 0 in a risk BN given
evidence O is the occurrence probability of the risk
under study. If Pr ( X 0 | O) is sensitive to a specifically
parameter then this parameter’s small change will
conduce to big change of Pr ( X 0 | O) . That is to say, this
parameter is very critical to Pr ( X 0 | O) , and it is critical
risk factor of the risk under study.
According to the foregone research work[1],[10], there
is a proposition as follows:
Proposition: Let B be a Bayesian network with
digraph G (V (G ), A(G )) and let Pr be the joint
probability distribution defined by B . Let O  V (G ) be
the observed nodes in G
and let
o denote the
corresponding observations. Let Vr be the network’s
node of interest and let Sen(Vr , O) be the sensitivity set
reference [5] as Figure 2 to show how our method can be
used.
for Vr given O . Then for any value vr of Vr , we have
axb
.
cx  d
For every conditional probability x  Pr (vs |  s ) of
every node Vs  Sen(Vr , O) , where a , b , c and d are
that Pr (vr | o) 
constant that dependent upon the values vs of Vs and
s .
In this paper, the node of interest is risk node X 0 ,
observed
nodes
are
metric
nodes,
O  ( X m1 , X m 2 ,..., X mi ) .
The problem of risk node’s sensitivity to O equals
to calculating Pr ( x0 | o) .
Figure 1 show the sensitivity analysis algorithm
based on symbolic probabilistic inference (SPI, for short)
for computing critical risk factors ([1]).
Algorithm 1
Input: A Risk Bayesian network G (V (G ), A(G )) , a target node,
namely risk node X 0  x0 and an evidential set O  o (possibly
empty).
Output: a set of most sensitive probability parameters S .
Initialize S   ;
Calculate probabilities Pr ( x0 | o) ;
 Step 1: Identifies the Set of Relevant Nodes Sen( X 0 , O)
 Step 2: Identifying the Set of Sufficient Parameters
Rule 1: Eliminate the parameters ij if xi  j , for X i  O .
Fig. 2. Defects BN ([5])
We name these eight discrete variables
as X1 , X 2 ,..., X 8 respectively, thus change Figure 2 into
Figure 3. The probability parameters of this BN are
known, each variable has binary value 0 or 1. Nodes
X1 , X 3 , X 4 , X 5 and X 7 are metric nodes. Taking nodes
elimination order as { X 8 , X 7 , X 6 , X 5 , X 4 , X 2 , X1 , X 3 } ,
Bucket Elimination algorithm computer the most
probable risk scenario of defects BN is (0,1,1,0,1,1,0,1).
Let evidence O  ( X1 , X 3 , X 4 , X 5 , X 7 ) be a set of
observed metrics, o  (0,1,0,1,0) . Calculating Pr( x8 | o)
is to find the critical risk factors. The problem’s essence is
to perform sensitivity analysis of X 8 with the
evidence O . Take the algorithm presented in section 5,
Sen( X 8 , O)  { X 2 , X 5 , X 6 , X 8 } . The feasible monomials
are listed in Table 1.
Where ij  Pr( xi  j | pa( xi )   ) .
Rule 2: Eliminate the parameters if their parents' instantiations are
incompatible with the evidence.
 Step 3: Identifying Feasible Monomials
Rule 3: Parameters associated with contradicting conditioning
instantiations cannot appear in the same monomial.
 Step 4: Computing the polynomial coefficients
Normalize coefficients of polynomial function of Pr ( x0 | o) ;
Fig. 3. Defects BN
Initialization valve r0 ;
For
each
monomial
mi
in
the
polynomial
function
TABLE 1
Feasible monomials
of Pr ( x0 | o) and the coefficient ci
If ci  r0 , then S  mi .
Fig. 1. Sensitivity Analyses Algorithm[1]
Step 1 is identifying the relevant nodes Sen( X 0 , O)
for not every node in risk BN is relevant to the calculation
of Pr ( x0 | o) . We will not consider the rest nodes further.
Add a dummy node for every node in BN, at the
same time, add an arc from dummy node point to the
previous node, and thus change risk BN to a new DAG
G ' . All the nodes that are not d-separated by O with
X 0 are relevant nodes to calculate Pr ( x0 | o) , noted
as Sen( X 0 , O) .
V.
Evidence
X1  0 , X 3  1 ,
Monomials
{ 2001}*{5100 }*{6010 }*{8000 ,8100 }
X4  0 , X5  1 ,
{ 2001}*{5100 }*{6110 }*{8001 ,8101}
X7  0
{ 2101}*{5101}*{6011}*{8000 ,8100 }
{ 2101}*{5101}*{6111}*{8001 ,8101}
According to values of X 8 , we divide the set of
monomials
to
two
subsets
M 0  { X 8  0}
M1  { X 8  1} .
M 1  { 2001510060108100 , 2001510061108101 ,
 21015101 60118100 , 2101510161118101 }
ILLUSTRATION EXAMPLE
We use the simplified defects BN introduced in
and
Pr( x8  1| o)  C1 2001510060108100
C2 2001510061108101  C3 2101510160118100
C4 2101510161118101
Compute the coefficients of this function, we get final
expression:
Pr( x8  1| o)
  20015100 60108100   2001 5100 6110 8101
 21015101 60118100   2101 5101 6111 8101
For we regard all the nodes in this risk BN as choice
nodes, so every coefficient[10] equals to 1.
Similarity, we get
Pr( x8  0 | o)
  20015100 60108000   2001 5100 6110 8001
 21015101 60118000   2101 5101 6111 8001
Normalize above equations, we get:
Pr( x8  1| o)
 ( 20015100 60108100   2001510061108101
 21015101 60118100   2101510161118101 )
/( 20015100   21015101 )
It’s not easy to identify which is critical to Pr( x8 | o)
from above results for this special example, we can only
conclude that X 2 , X 6 are critical to risk’s occurrence.
So in the software development process, managers should
pay more attention to the introduced defects and the
residual defects, take corresponding control steps
according to the ranges of both.
VI. CONCLUSION
In this paper we discuss the method of identifying
critical risk factors in software risk management
framework based on Bayesian network. In the process of
software project development, managers collect
correlative metrics, establish BN for the risk being studied
integrating historic data and causal relationship between
risk factors. They can analyze, appraisal and dynamic
control the risk utilizing this risk BN. Identifying critical
risk factors is a very important risk management activity.
Performing a sensitivity analysis can acquire the
relationship between marginal probability of interesting
node and other probability parameters in the BN. So we
take sensitivity analysis of risk node to investigate risk
probability’s dependency on the probability parameters in
risk BN. According to different application instance,
perform sensitivity analysis on risk BN grounded on
symbolic probability inference algorithm. For the instance
of known probability parameters and no evidence, we first
estimate the most probable risk scenario, and then
perform sensitivity analysis, obtain the sensitive
parameters of risk node. We present the rule of acquire
node elimination order and critical risk factors
identification method in the paper. We also apply this
method to a known defect BN, and result shows that this
method is feasible.
REFERENCES
[1] Enrique Castillo, José Manuel Gutiérrez and Ali S. Hadi,
“Sensitivity analysis in siscrete Bayesian Networks,” IEEE
Transactions on Systems, Manager, and Cybernetics 27(4),
pp.412-423.
[2] Haipeng Guo, William Hsu, “A survey of algorithms for
real-time Bayesian Network inference,” AAAI/ KDD/
UAI-2002 Joint Workshop on Real-Time Decision Support
and Diagnosis Systems, Edmonton, July 2002.
[3] Kalev Kast, R.Dechter, “Stochastic local search for
Bayesian networks,” In proceeding of Uncertainty in AI
(UAI-97).
[4] Linda H. Rosenberg, Lawrence E. Hyatt, “Software metrics
program for risk assessment,” the 47th International
Astronautically Congress & Exhibition, 29th Safety and
Rescue Symposium, Risk management and Assessment
Session, Beijing, China, October 1996.
[5] Norman Fenton, Martin Neil, “software metrics and risk,”
2nd European Software Measurement Conference, 8
October, 1999.
[6] Norman E Fenton, Martin Neil, “Software metrics:
roadmap,” Proceedings of the conference on the future of
software engineering, 2000, pp.357-370.
[7] Rina Dechter, Irina Rish, “A scheme for approximating
probabilistic inference,” In proceeding of Uncertainty in AI
1997 (UAI-97), pp.132-141.
[8] Veerle M.H. Coupé, Linda C. van der Gaag, “Properties of
sensitivity analysis of Bayesian Belief Networks,”Annals
of Mathematics and Artificial Intelligence 36, 2002,
pp.323-356.
[9] Jiang Guoping, Chen Yingwu. “Coordinate metrics and
process model to manage software project risk,”
International Engineering Management Conference 2004
[10] Jiang Guoping, Chen Yingwu. “Software project risk
evaluation based on object oriented BN,” Systems
Engineering and Electronic Technology, 2005(1) (in
Chinese)
Download