Identification of Critical Risk Factors in Software Project Risk Management Guo-ping Jiang, Ming-chi Lin Department of Economics and Management, Naval University of Engineering, Wuhan Hubei, China, 430033 (gpjiang1029@163.com) Abstract - The most important function of software metrics is to support quantitative decision of software project management. In this paper, we focus on identification of the most critical risk factors in software project risk management framework based on metrics and Bayesian network. Sensitivity analysis can be performed to study how sensitive the risk node’s probability is according to small changes of probability parameters in the risk BN. For a risk BN of known structure and probability parameters, we first estimate the most probable risk scenario, and then perform sensitivity analysis for the risk node. After we find the critical risk factors, concentrate on these factors in risk monitoring and control process. Keywords - Bayesian Network, Critical Risk Factors, Most Probable Explanation, Software Metrics, Sensitivity Analysis I. INTRODUCTION Software project risk management has got a lot of attention in IT industry. Many research institutions made great effort in this domain and harvested the fruits of their labors. Software metrics is another important subject in software project management. Everything that you want to control must be measured first. The most important requirement of software metrics is to support quantitative decision of software project management. For software risk management, software metrics can also work. It is accepted gradually in software engineering community that software metrics are essential aids to managing software project. Metrics can be used to evaluate development process and the quality of software products. Metrics can also assist in risk management and control by identifying areas of possible risk and then to prioritize and track the risks. A relative new and hopeful approach of software project risk management utilizing software metrics is Bayesian Network (BN, for short). Risk BN model can handle [6]: • Diverse process and product variables; • Empirical evidence and expert judgment; • Genuine cause and effect relationships; • Uncertainty; • Incomplete information. The aim of this paper is to discuss how to identify critical risk factors in risk BN model based on software metrics. In section 2, we introduce the software project risk management framework and bring forward the problem of critical risk factors identification, and then point out the technical route of solving this problem. In Natural Science Fund by Naval University of Engineering: HGDYDJJ11011 section 3, we discuss how to estimate the most probable risk scenario of the risk BN model. Identification method of critical risk factors based on BN sensitivity analysis is put forward in section 4. In section 5, an illustration example is presented to show how the method can be used. In the last section, we summarize this paper. II. SOFTEARE PROJECT RISK MANAGEMENT FRAMEWORK Bayesian network has got attention in artificial intelligence domain in last 20 years. BN is the product of graph theory and probability theory. It is a complicated weighted causal relationship graph. Nodes in BN denote variables in the problem domain, and arcs represent direct causal relation between variables. The essence of BN is the joint probability distribution of the problem under study. To model certain risk as a BN in software develop process, the first thing is to analysis causal relationship between risk and risk factors, thus establish model’s topology structure; then educe conditional probability table for every node integrating historical data and expertise[9]. Characters of risk and risk factors can be depicted by software metrics, so we model the correlative software metrics into the risk BN. In software develop process, manager observe and control these metrics at different stages thus to alleviate risks before their occurrence. Manager can also evaluate risk’s effect by observing software metrics’ changes. Every risk BN has a node that denotes the risk under study, in this paper, let X 0 be the risk node, and ( X 0 , X1 , X 2 ,... X n ) be the set of nodes; ( X m1 , X m2 ,..., X mi ) ( X1 , X 2 ,... X n ) be the set of metrics nodes in risk BN[4]. We suppose that risk node is a leaf node. This is feasible, for we don’t model risk’s influence into the risk BN at present. We list risk management functions that can be performed by risk BN as follows[10]: • Estimate risk probability; • Estimate the most probable risk scenario; • Identify critical risk factors; • Appraise risk control plan, support project risk management decision. Risk factors are causations that produce the risk, and a risk must have some critical risk factors. Critical Risk factors identification is an important activity in risk management. For a risk BN, how to find out critical risk factors? Performing sensitivity analysis is the general method. For a Bayesian network, performing sensitivity analysis yields insight in the relation between the probability parameters of the network and its posterior marginal. The relationship between post marginal probability of risk node and probability parameters of the BN can be obtained through performing a sensitivity analysis. If we can get the most sensitive probability parameters then these probability parameters are the critical risk factors. To perform sensitivity analysis for a risk BN is to calculate Pr( X 0 | O) , where O is the set of observed evidence. We limit evidence to software metrics. The result of Pr( X 0 | O) must be a function of BN’s probability parameters. The probability parameters which have bigger coefficients are sensitive parameters. For risk BN, they are critical risk factors. There are three instances we will confront when identify critical risk factors. First, some evidence O have been observed, O , thus we will calculate Pr( X 0 | O) ; Another instance, O while BN’s probability parameters are not all known, that is to say, we will calculate Pr( X 0 | ) with some unknown probability parameters. The third instance, O and risk BN’s probability parameters are all known. For this instance, we can estimate the most probable risk scenario ( X 0 x0 , X1 x1 ,..., X n xn ) for this risk BN, and then take these metric nodes value as evidence, let O ( X m1 xm1 ,..., X mi xmi ) . For all of these three instances, calculation of Pr( X 0 | O) is the focus. III. ESTIMATE THE MOST PROBABLE RISK SCENARIO For each value of the risk node x0k , perform probability inference to compute the most probable explanation x K ( x0 k , x1k ,..., xnk ) and the corresponding joint probability pk . Comparing these pk s, the assignment x K ( x0 k , x1k ,..., xnk ) that has the biggest pk is the most probable risk scenario. So the essential of estimating the Most Probably Risk Scenario (MPRS, for short) for a given risk BN is to calculate the Most Probable Explanation (MPE, for short) of this BN given the evidence X 0 x0k . Definition 1[3] (Most Probable Explanation): for a given Bayesian network, the Most Probable Explanation is a complete assignment ( X 0 x0 ,..., X n xn ) which is agree with available evidence, and has the highest probability among all such assignments. Here evidence is X 0 x0k . There are two solutions for this problem[7]. Solution 1: Chain rule Obtain the joint probability contribution of risk BN according to the chain rule, and then compare the joint probability of every complete assignment. The assignment x( x0 , x1 ,..., xn ) which has the highest joint probability is the most probable risk scenario. Chain rule: Pr ( x) p( xi | pa( X i )) X i V ( G ) For a given BN, this method is simple and clear, but it needs a great deal of computation, especial when BN has many nodes, and every node has several values. So we think of another method, namely Bucket elimination algorithm. Solution 2: Bucket Elimination Bucket Elimination (BE, for short) is a variable elimination algorithm. BE is one of many algorithms for probability inference in BN. The most outstanding character of BE is simplicity and generality, without introducing new terminology as other algorithms. Figure 1 shows the algorithm to compute MPRS based on BE algorithm, where elim-mpe is a BE algorithm introduced in reference [7]. Generally speaking, nodes elimination order is a NP-hard problem. But for any arbitrary order, BE algorithm can get the MPE result in any case; the only difference is computation time and speed. To our research problem in this paper, we take these steps to obtain nodes elimination order: from risk node backward to root nodes along arcs between nodes, in the trace process, for every node, first left branch then right. The order obtained by this regulation may be not the best one, but it is enough just for this problem. IV. IDENTIFY CRITICAL RISK FACTORS Sensitivity analysis is a general technique for studying the effects of the inaccuracies in the parameters of a mathematical model on this model’s output. Sensitivity analysis basically amounts to systematically varying the values of the parameter of the model under study. For BN, sensitivity analysis equals to vary the assessments for conditional probability of the network’s nodes[8]. Sensitivity analysis provides for studying the effects of the inaccuracy in the specified assessment on a probability of interest. Sensitivity analysis can be performed to study how sensitivity the risk node’s probability is according to small changes in the probability parameters in risk BN. Post probability Pr ( X 0 | O) of the risk node X 0 in a risk BN given evidence O is the occurrence probability of the risk under study. If Pr ( X 0 | O) is sensitive to a specifically parameter then this parameter’s small change will conduce to big change of Pr ( X 0 | O) . That is to say, this parameter is very critical to Pr ( X 0 | O) , and it is critical risk factor of the risk under study. According to the foregone research work[1],[10], there is a proposition as follows: Proposition: Let B be a Bayesian network with digraph G (V (G ), A(G )) and let Pr be the joint probability distribution defined by B . Let O V (G ) be the observed nodes in G and let o denote the corresponding observations. Let Vr be the network’s node of interest and let Sen(Vr , O) be the sensitivity set reference [5] as Figure 2 to show how our method can be used. for Vr given O . Then for any value vr of Vr , we have axb . cx d For every conditional probability x Pr (vs | s ) of every node Vs Sen(Vr , O) , where a , b , c and d are that Pr (vr | o) constant that dependent upon the values vs of Vs and s . In this paper, the node of interest is risk node X 0 , observed nodes are metric nodes, O ( X m1 , X m 2 ,..., X mi ) . The problem of risk node’s sensitivity to O equals to calculating Pr ( x0 | o) . Figure 1 show the sensitivity analysis algorithm based on symbolic probabilistic inference (SPI, for short) for computing critical risk factors ([1]). Algorithm 1 Input: A Risk Bayesian network G (V (G ), A(G )) , a target node, namely risk node X 0 x0 and an evidential set O o (possibly empty). Output: a set of most sensitive probability parameters S . Initialize S ; Calculate probabilities Pr ( x0 | o) ; Step 1: Identifies the Set of Relevant Nodes Sen( X 0 , O) Step 2: Identifying the Set of Sufficient Parameters Rule 1: Eliminate the parameters ij if xi j , for X i O . Fig. 2. Defects BN ([5]) We name these eight discrete variables as X1 , X 2 ,..., X 8 respectively, thus change Figure 2 into Figure 3. The probability parameters of this BN are known, each variable has binary value 0 or 1. Nodes X1 , X 3 , X 4 , X 5 and X 7 are metric nodes. Taking nodes elimination order as { X 8 , X 7 , X 6 , X 5 , X 4 , X 2 , X1 , X 3 } , Bucket Elimination algorithm computer the most probable risk scenario of defects BN is (0,1,1,0,1,1,0,1). Let evidence O ( X1 , X 3 , X 4 , X 5 , X 7 ) be a set of observed metrics, o (0,1,0,1,0) . Calculating Pr( x8 | o) is to find the critical risk factors. The problem’s essence is to perform sensitivity analysis of X 8 with the evidence O . Take the algorithm presented in section 5, Sen( X 8 , O) { X 2 , X 5 , X 6 , X 8 } . The feasible monomials are listed in Table 1. Where ij Pr( xi j | pa( xi ) ) . Rule 2: Eliminate the parameters if their parents' instantiations are incompatible with the evidence. Step 3: Identifying Feasible Monomials Rule 3: Parameters associated with contradicting conditioning instantiations cannot appear in the same monomial. Step 4: Computing the polynomial coefficients Normalize coefficients of polynomial function of Pr ( x0 | o) ; Fig. 3. Defects BN Initialization valve r0 ; For each monomial mi in the polynomial function TABLE 1 Feasible monomials of Pr ( x0 | o) and the coefficient ci If ci r0 , then S mi . Fig. 1. Sensitivity Analyses Algorithm[1] Step 1 is identifying the relevant nodes Sen( X 0 , O) for not every node in risk BN is relevant to the calculation of Pr ( x0 | o) . We will not consider the rest nodes further. Add a dummy node for every node in BN, at the same time, add an arc from dummy node point to the previous node, and thus change risk BN to a new DAG G ' . All the nodes that are not d-separated by O with X 0 are relevant nodes to calculate Pr ( x0 | o) , noted as Sen( X 0 , O) . V. Evidence X1 0 , X 3 1 , Monomials { 2001}*{5100 }*{6010 }*{8000 ,8100 } X4 0 , X5 1 , { 2001}*{5100 }*{6110 }*{8001 ,8101} X7 0 { 2101}*{5101}*{6011}*{8000 ,8100 } { 2101}*{5101}*{6111}*{8001 ,8101} According to values of X 8 , we divide the set of monomials to two subsets M 0 { X 8 0} M1 { X 8 1} . M 1 { 2001510060108100 , 2001510061108101 , 21015101 60118100 , 2101510161118101 } ILLUSTRATION EXAMPLE We use the simplified defects BN introduced in and Pr( x8 1| o) C1 2001510060108100 C2 2001510061108101 C3 2101510160118100 C4 2101510161118101 Compute the coefficients of this function, we get final expression: Pr( x8 1| o) 20015100 60108100 2001 5100 6110 8101 21015101 60118100 2101 5101 6111 8101 For we regard all the nodes in this risk BN as choice nodes, so every coefficient[10] equals to 1. Similarity, we get Pr( x8 0 | o) 20015100 60108000 2001 5100 6110 8001 21015101 60118000 2101 5101 6111 8001 Normalize above equations, we get: Pr( x8 1| o) ( 20015100 60108100 2001510061108101 21015101 60118100 2101510161118101 ) /( 20015100 21015101 ) It’s not easy to identify which is critical to Pr( x8 | o) from above results for this special example, we can only conclude that X 2 , X 6 are critical to risk’s occurrence. So in the software development process, managers should pay more attention to the introduced defects and the residual defects, take corresponding control steps according to the ranges of both. VI. CONCLUSION In this paper we discuss the method of identifying critical risk factors in software risk management framework based on Bayesian network. In the process of software project development, managers collect correlative metrics, establish BN for the risk being studied integrating historic data and causal relationship between risk factors. They can analyze, appraisal and dynamic control the risk utilizing this risk BN. Identifying critical risk factors is a very important risk management activity. Performing a sensitivity analysis can acquire the relationship between marginal probability of interesting node and other probability parameters in the BN. So we take sensitivity analysis of risk node to investigate risk probability’s dependency on the probability parameters in risk BN. According to different application instance, perform sensitivity analysis on risk BN grounded on symbolic probability inference algorithm. For the instance of known probability parameters and no evidence, we first estimate the most probable risk scenario, and then perform sensitivity analysis, obtain the sensitive parameters of risk node. We present the rule of acquire node elimination order and critical risk factors identification method in the paper. We also apply this method to a known defect BN, and result shows that this method is feasible. REFERENCES [1] Enrique Castillo, José Manuel Gutiérrez and Ali S. Hadi, “Sensitivity analysis in siscrete Bayesian Networks,” IEEE Transactions on Systems, Manager, and Cybernetics 27(4), pp.412-423. [2] Haipeng Guo, William Hsu, “A survey of algorithms for real-time Bayesian Network inference,” AAAI/ KDD/ UAI-2002 Joint Workshop on Real-Time Decision Support and Diagnosis Systems, Edmonton, July 2002. [3] Kalev Kast, R.Dechter, “Stochastic local search for Bayesian networks,” In proceeding of Uncertainty in AI (UAI-97). [4] Linda H. Rosenberg, Lawrence E. Hyatt, “Software metrics program for risk assessment,” the 47th International Astronautically Congress & Exhibition, 29th Safety and Rescue Symposium, Risk management and Assessment Session, Beijing, China, October 1996. [5] Norman Fenton, Martin Neil, “software metrics and risk,” 2nd European Software Measurement Conference, 8 October, 1999. [6] Norman E Fenton, Martin Neil, “Software metrics: roadmap,” Proceedings of the conference on the future of software engineering, 2000, pp.357-370. [7] Rina Dechter, Irina Rish, “A scheme for approximating probabilistic inference,” In proceeding of Uncertainty in AI 1997 (UAI-97), pp.132-141. [8] Veerle M.H. Coupé, Linda C. van der Gaag, “Properties of sensitivity analysis of Bayesian Belief Networks,”Annals of Mathematics and Artificial Intelligence 36, 2002, pp.323-356. [9] Jiang Guoping, Chen Yingwu. “Coordinate metrics and process model to manage software project risk,” International Engineering Management Conference 2004 [10] Jiang Guoping, Chen Yingwu. “Software project risk evaluation based on object oriented BN,” Systems Engineering and Electronic Technology, 2005(1) (in Chinese)