International Journal of Software Engineering and Its Applications Vol. 7, No. 1, January, 2013 A Risk Classification Scheme for Software Projects Masood Uzzafer School of Computer Science, University of Nottingham, UK Keyx8muz@nottingham.edu.my Abstract This paper presents a risk classification scheme for software projects that utilizes the dependence of risk events to occur on other risk events and the statistical dependence of impacts of risk events. The proposed risk classification scheme helps to avoid the under and over estimations of the overall impact of dependent risk events. Further, the proposed risk classification scheme protects against under and over counting of the impacts of statistically dependent risk events of software projects. The proposed risk classification scheme utilizes the Software Engineering Institute’s risk taxonomy and shows that different classes of the risk taxonomy are independent having statistically independent impacts while different attributes of a taxonomy class are independent having statistically dependent risk impacts. Keywords: Software Risk Classification, Software Risk Assessment, Software Risk Management 1. Introduction The central notion of risk is that it is an event which may or may not take place hence its occurrence is uncertain and its occurrence brings out adverse monetary consequences to a software project. Monetary loss due to risk events is defined in terms of impact and probability of impact of risk events. The actual risk impact cannot be realized unless the risk event takes place; this inability to fully understand risk events and their impacts before they occur is the uncertainty around the risk events [10]. Software risk assessment is an integral part of risk management of software projects. Software risk assessment has two main components namely: qualitative risk assessment and quantitative risk assessment. Qualitative risk assessment focuses on the identification of risk events, while quantitative risk assessment studies the impacts and probabilities of impacts of the identified risk events. Researchers have proposed various risk assessment models for software projects. Using different techniques, software risk assessment models apply qualitative and quantitative assessments to model the impact of risk events of software projects [1, 2, 3]. While traditional risk assessment models provide a foundation for the assessment of risk of software development and contractual projects, they lack the capability to classify the risk events based on their dependence to occur on other risk events. Further, traditional risk assessment models do not rely on the statistical dependence of the impact of a risk event on other risk events. Hence, qualitatively, the risk events are not classified based on their dependence to occur on other risk events, while, quantitatively, impacts of risk events are not treated based on the statistical dependence of impacts on the impacts of other risk events. The dependent risk events are caused by other risk events, whereas; the independent risk events are independent of the occurrences of other risk events. Therefore, dependence is a situation in which a risk is physically dependent on other risk events for its existence. Whereas, the independent risk events can occur without the presence and occurrence of other risk events. The dependence does not imply that a risk event must occur if risk event(s) it is 57 International Journal of Software Engineering and Its Applications Vol. 7, No. 1, January, 2013 dependent upon has occurred. Further, the definition of dependence and independence does not imply the probabilistic dependence and independence, which deals with the probability of occurrence of events and requires a probabilistic condition to be satisfied [4]. Therefore, dependence of a risk event does not make the occurrence of the risk event more probable should the risk events it is dependent upon have occurred; rather it states that the risk event can occur only after risk events it is dependent upon have occurred. The statistically dependent risk events are defined as events that have correlated impacts; whereas, statistically independent risk events have uncorrelated impacts. In-other words, the statistical dependence define the correlation among impacts of risk events, which is the degree of linearity of the impacts and it measures the influence of the impact of a risk event on the impacts of other risk events. The statistical dependence or correlated impact is not the probabilistic dependence, which requires correlated events to fulfill a probabilistic condition [4]. Hence, the correlation among the impacts of risk events is not the likelihood of the impact of a risk event to occur based on the impact of other risk events [9]. Therefore, the correlation of impacts of risk events does not reflect the probabilistic dependence of impacts of risk events. Alternatively, probabilistic independence of impacts of risk events does not cause uncorrelation in the impacts of risk events. The proposed risk classification scheme qualitatively defines risk events as dependent or independent and quantitatively it defines the impacts of risk events as statistically dependent (correlated) or statistically independent (uncorrelated). The risk classification scheme classify the risk events using the risk taxonomy proposed by Software Engineering Institute (SEI) [5, 6]. The SEI’s risk taxonomy is a process for the identification of risk events of software projects. The risk taxonomy groups together different sets of risk and each set is called class where every class has different subsets of risks called attributes and each attribute has multiple elements that represents risk events of software projects. The proposed risk classification, when deployed with the SEI’s risk taxonomy, shows that the overall risk due to each class is independent of the risks of other SEI risk classes also the risk due to different elements of a class is independent of the elements of the same class . Further, it shows that the impacts of the risk events of different taxonomy classes are statistically independent and impacts of risk events within a taxonomy class are statistically dependent. The paper is organized as follows: Section 2 presents the risk classification scheme based on the dependence of risk events and statistical dependence of impacts of risk events. Section 3 adopts the proposed classification scheme using SEI’s risk taxonomy for the classification of risk events and their impacts, finally Section 4 draw some conclusions. 2. Risk Classification 2.1 Dependent and Indepdent Risk Events There are risk events that do not occur unless they are triggered by other risk events. Such risk events become a threat to the software project after risk events on which they are dependent upon have occurred. Therefore, dependent risk events are caused by other risk events; their occurrence is dependent on the occurrence of other risk events and they do not exist unless other risk events trigger them. Contrary to that, the independent risk events are independent of other risk events and their occurrences are independent of the existence and the occurrence of other risk events. Therefore, based on the dependence to occur on other risk events, the proposed scheme classifies risk events as dependent or independent. Figure 1 illustrates a scenario of dependent and independent risk events where each node represents risk events and branches connecting 58 International Journal of Software Engineering and Its Applications Vol. 7, No. 1, January, 2013 the risk events show the dependence such that the risk event at the end of the branch is dependent on the risk event at the head of the branch. Each branch also has respective probabilities of occurrence. The risk events B and C are dependent on the risk event A and the probability of occurrence of the risk events B and C, are conditional probabilities with the occurrence probability of risk event A, such that P(B|A)=0.6 and P(C|A)=0.4, respectively. Further, the risk events B and C are mutually exclusive; hence they cannot occur simultaneously. The risk event B further triggers the risk events B1, B2 and B3 with respective probabilities, where the risk events B1, B2 and B3 are mutually exclusive. The risk event D is independent of the risk event A and hence independent of the existence and occurrence of the risk events B, B1, B2, B3 and C. Due to the independence of the risk events A and D, the occurrence of the risk event A do not change the probability of occurrence of risk event D, therefore P(D|A)=P(D). Further, the risk events D1 and D2 are dependent on the risk event D but are independent of the risk events A, B, C, B1, B2 and B3. Figure 1. Dependent and Independent Risk Classifying risk events based on their dependence helps to avoid the under and excessive risk estimates. For example, if a risk event that is dependent on other risk event(s) is treated as independent it produces excessive risk estimate. The risk estimate can be measured with risk exposure ( RE ), which is a measure of expected loss due to a risk event and defined as follows, where Pr is the probability of loss and L is the monetary loss due to the risk event [1]: RE Pr* L (1) Consider a software project having risk events shown in Figure 1 with the respective probabilities and each branch of risk causes $10,000 in monetary losses. Therefore, the risk exposure of risk event B1 is estimated as $10,000*0.2=$2000; hence, the risk exposure due to risk event A that causes risk events B1, B2, B3, B and C is the sum of the risk exposures of these risk events, which is $21,500. In-addition, the risk event D occurs independently of risk event A and causes risk events D1 and D2 with a total risk exposure of $10,000. Therefore, the total risk exposure of the software project is $21,500+$10,000=$31,500. Table 1, shows the risk exposures due to different risk events for this risk scenario. Now consider an alternate scenario where the risk event C is not treated as dependent on risk event A, rather as independent, as shown in Figure 2. Therefore, the probability from risk A to risk event B becomes 1, which does not mean that the risk event B is almost always certain to occur should risk event A occur, rather it explains the risk event B is the only event that can take place as a consequence of occurrence of risk event A. Table 2, shows the risk exposures due to different risk events for this risk scenario. The risk exposures are compiled for this scenario which reveals that the exposure to risk due to risk event A is $21,500, while for risk events C and D has each $10,000 risk exposure. Therefore, the total risk exposure for this risk scenario is $21,500+$10,000+$10,000 =$41,500. This scenario presents when a dependent risk is modelled as an independent risk event that causes excessive risk estimate of 59 International Journal of Software Engineering and Its Applications Vol. 7, No. 1, January, 2013 $10,000. Alternatively, when an independent risk event is treated as dependent it would produce an under estimation of the overall risk. Table 1. Risk Exposures of Risk Events A D Pr Risk Exposure ($) =Pr*10,000 B1 0.2 2,000 B2 0.75 7,500 B3 0.05 2,000 B 0.6 6,000 C 0.4 4,000 D1 0.7 7,000 D2 0.3 3,000 Total 31,500 Figure 2. Dependent and Independent Risk Table 2. Risk Exposures of the Alternate Scenario Pr A C D Total Risk Exposure ($) =Pr*10,000 B1 0.2 B2 0.75 2,000 7,500 B3 0.05 2,000 B 1 10,000 C 1 10,000 D1 0.7 7,000 D2 0.3 3,000 41,500 2.2 Statistically Dependent and Statistically Independent Risk Events The statistically dependent risk events are said to have correlated impacts such that their impacts have positive or negative correlations. The impact of a risk event has direct consequences on the impacts of risk events that have statistically dependent impacts, whereas, the impacts of statistically independent risk events have no correlation with each other. Therefore, based on the statistical properties of impacts of risk events, the scheme classifies risk events as statistically dependent or statistically independent. Figure 3 shows that the 60 International Journal of Software Engineering and Its Applications Vol. 7, No. 1, January, 2013 impacts of the risk events A and B shows positive correlation, therefore, a large impact of risk event A causes a large impact of risk event B and vice versa. Whereas, the impacts of the risk events E and D shows no correlation; hence, the risk events E and D are statistically independent and have uncorrelated impacts. Figure 3. Impacts of Statistically Dependent and Independent Risk Events The correlation could be positive or negative, a positive correlation means that the risk impact moves in the same direction while negative correlation means that the risk impacts moves in the opposite direction i.e., one increases while other decreases. The impacts of statistically dependent risk events are influenced by the impacts of risk events they have correlations with. The statistically dependent risk events measure the same underline risk of a software project depending upon their correlation. Therefore, considering these risk events together could cause double counting or under counting the impact of risk events. Therefore, the impacts of such risk events should be considered according to their correlation with each other. Whereas, the statistically independent risk events measures different attributes of the overall software project risk. The overall impact of statistically dependent risk events should be estimated according to their correlation. For example, consider risk events A and B with risk impacts modelled as random variables X and Y , having probability distributions FX (x ) and FY ( y ) , respectively. The risk impacts X and Y has certain degree of correlation in their impacts as shown by shaded area in Figure 4(a), which represents the influence of risk events over each other. Therefore, to get the overall impact of X and Y their correlation must be considered. This is achieved by defining the conditional probability distribution of one random variable conditioned over the other random variable. For example, the conditional probability distribution of Y , FY ( y | x ) , allows the selection of Y samples given the value of X that preserves the correlation between X and Y . Figure 4. Correlated Impacts of Risk Events X and Y 61 International Journal of Software Engineering and Its Applications Vol. 7, No. 1, January, 2013 If the samples of Y are selected without considering the conditional distribution, it can cause uncorrelated sample selection, which leads to under counting or double counting of the overall risk. The under counting and double counting depends upon the sampled value of Y . For example, a random value of Y is sampled from the distribution FY ( y ) , instead of sampling it from the conditional distribution, FY ( y | x ) , may change the correlation with X . If the actual correlation is low, and a sample of Y is selected that causes high correlation with X , it produces a situation where Y is falsely appeared to have more influence over X , as shown with the shaded area in Figure 4(b), which would mean that the values of Y increases or decreases with the values of X . A reverse situation is shown on Figure 4(c), where the actual correlation is high but the sampled value of Y produces a low correlation with X as shown in the shaded area, which, would mean that the values of Y will not move as fast as the values of X due to low correlation. These situations cause an under counting or double counting of the overall risk impact depending upon the sampled value of Y . Conditional probability distributions help to avoid selection of un-correlated samples where random variables have correlations. Consider risk impacts of X and Y shown in Table3 with their respective distributions f X (x ) and f Y ( y ) on the side, along with their joint probability distribution f X ,Y (x , y ) shown in the middle. Table 3. X and Y with Respective Distributions 0.1 0.1 0.12 0.2 0.4 Y 0.5 0.7 0.35 0.25 0.1 0.1 0.1 0.4 0.1 0.1 0.5 0.1 0.1 0.6 0.1 0.1 0.7 0.1 0.1 0.8 0.1 0.1 0.9 0.1 0.1 1 0.1 0.1 0.1 f(x) 0.1 0.3 f(y) 0.18 0.1 0.2 X 0.3 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 Table 4 shows the conditional distribution f Y |X ( y | x ) estimated from the marginal and joint distributions f X (x ) , f Y ( y ) and f X ,Y (x , y ) , respectively. Such that f Y |X ( y | x ) f X ,Y (x , y ) f X (x ) (2) Each column represents the conditional distribution i.e., the first column is the conditional ) distribution when X is sampled as 0.1, i.e., f Y |X ( y |x 0 . .1 Therefore, the conational samples of Y that are correlated with x 0.1 will be selected. 62 International Journal of Software Engineering and Its Applications Vol. 7, No. 1, January, 2013 Table 4. Conditional Distribution FY ( y | x ) fY|X(y|x) 0.1 0.1 0.12 0.2 0.4 0.5 0.7 0.35 0.25 1 0.1 1 0.4 0.1 1 0.5 0.1 1 0.1 0.6 1 0.7 0.1 1 0.8 0.1 1 0.9 0.1 1 1 0.1 1 0.1 f(x) 0.1 0.3 f(y) 0.18 1 0.2 X 0.3 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 Consider, the random impact of a risk event modelled with random variable, X i , having a known probability distribution, f X (x i ) , where i is the index of the risk event and x i is the single realization of the random variable, X i , at some time instance. Similarly, the overall random impact of all the risk events is a random variable, X , having probability distribution, f X (x ) . The probability distribution, f X (x ) , of the overall random risk impact, X , is the product of the marginal probability distributions, f X (x i ) , of all the statistically independent risk events [4, 8], i i f X (x ) i f X i (x i ) (3) The probability distribution, f X (x ) , of the overall random risk impact due to all of the statistically dependent risk events is the joint probability distribution of the marginal distributions, f X (x i ) , of the impact of statistically dependent risk events of the software project, i f X (x ) f X 1 ,...X i (x 1,..., x i )i (4) When joint probability distribution of the impact of statistically dependent random risk events is fully specified, then equation (2) can be expanded as follows, where x 1 and x 2 can be generated independently: f X (x ) f X 1 (x 1 )f X 2|X 1 (x 2 | x 1 )...f Xi |X (i 1) (x i | x (i 1) ) (5) 63 International Journal of Software Engineering and Its Applications Vol. 7, No. 1, January, 2013 From equation (2) it follows: f X 2|X 1 (x 2 | x 1 ) f (x 1 , x 2 ) f X 1 (x 1 ) (6) 3. Risk Classifications Based on the above definitions, the risk events of a software project are classified in the following four categories: Independent and Statistically Independent (ISI): these risk events are not caused by other risk event(s) also their impacts have no correlation with other risk event(s). Dependent and Statistically Independent (DSI): such risk events are caused by other risk event(s) and their impacts are also dependent on other risk event(s). Their impacts may be correlated with the risk event(s) that causes them to occur or may be correlated with the risk event(s) that did not cause them to occur. Independent and Statistically Dependent (ISD): these risk events are not caused by other risk event(s) hence they are independent yet their impacts are correlated with the impact(s) of other risk event(s). Dependent and Statistically Dependent (DSD): such risk events depend on other risk event(s) to occur, once occurred their impacts are correlated with the impact(s) of other risk event(s). Their risk impacts may be correlated with the impact(s) of the risk event(s) that causes them to occur or on the risk event(s) that did not cause them to occur. 3.1. The Risk Classification Scheme Based on the risk classification, a scheme is presented here for the risk classification of software projects. The proposed scheme utilizes the Software Engineering Institute’s (SEI) risk identification procedure. SEI provides a questionnaire based approach, called Taxonomy Based Questionnaire (TBQ), for the systematic identification of risks of software projects [5, 6]. The TBQ consists of a list of questions to elicit risks in each taxonomy category. The TBQ ensures that all the software project risks are systematically addressed and identified. The SEI questionnaire consists of 194 questions addressing different attributes of risks of the software project. The SEI risk taxonomy TBQ consists of three main categories of the risk called classes. Each class is further divided into 13 elements, and each element is characterized by a set of attributes as shown in Table 5. All together there are three classes, 13 elements and 64 attributes. Three classes of the TBQ are, Product Engineering (PE), Development Environment (DE) and Program Constraints (PC). The PE class has 5 elements namely: Requirements, Design, Code and Unit Test, Integration and Test and Engineering Specialties, whereas the DE class has 5 elements and the PC class has 3 elements. Further, the Requirements element of PE class has following attributes: Stability, Completeness, Clarity, Validity, Feasibility, Precedence and Scale. Similarly, the other elements of other SEI classes are also characterized by different attributes. The risk events are identified through a set of TBQ questions associated with an attribute where each attribute of the SEI class is described as a potential risk event, therefore, each SEI class is a collection of risk events. 64 International Journal of Software Engineering and Its Applications Vol. 7, No. 1, January, 2013 Table 5. SEI Rsk Taxonomy Product Engineering 1. Requirements a. Stability b. Completeness c. Clarity d. Validity e. Feasibility f. Precedent g. Scale 2. Design a. Functionality b. Difficulty c. Interfaces d. Performance e. Testability f. Hardware Constraints g. Non-Development Software 3. Code and Unit Test a. Feasibility b. Testing c. Coding / Implementation 4. Integration and Test a. Environment b. Product c. System 5. Engineering Specialties a. Maintainability b. Reliability c. Safety d. Security e. Human Factors f. Specifications Development Environment 1. Development Process a. Formality b. Suitability c. Process Control d. Familiarity e. Product Control 2. Development System a. Capacity b. Suitability c. Usability d. Familiarity e. Reliability f. System Support g. Deliverability 3. Management Process a. Planning b. Project Organization c. Management Experience d. Program Interfaces 4. Management Methods a. Monitoring b. Personnel Management c. Quality Assurance d. Configuration Management 5. Work Environment a. Quality Attitude b. Cooperation c. Communication d. Morale Program Constraints 1. Resources a. Schedule b. Staff c. Budget d. Facilities 2. Contract a. Type of Contract b. Restrictions c. Dependencies 3. Program Interfaces a. Customer b. Associate Contractors c. Subcontractors d. Prime Contractor e. Corporate Management f. Vendors g. Politics The PE class deals with the software development activities to develop the software that satisfies the specified software requirements. The risk events associated with the PE class arise from unstable or changing requirements, poor software design and inadequate testing coverage, which are characterized by attributes of the PE class. The DE class is concerned with the practice and the processes used to develop the software. The risk events of the DE class are due to the lack of defined developmental process, poor process control, lack of communication and quality concerns, etc. Whereas, the PC class attributes are external events having an impact on the software project. These risk events may be beyond the control of the software project management but still contributes to the overall risk of the software project. The elements and the attributes of SEI’s risk taxonomy classes are related hence they have correlated impacts. For example, the Stability attribute of Requirement element has two questions: Q.1: Are requirements stable? Q.2: Are the external interfaces changing? These questions help to understand the risk event that is caused by the stability of the software requirements. The stability risk event is independent of the other attributes of the Requirement element. For example, the Completeness attribute of the Requirements element does not trigger the stability of the requirement while the Completeness and Stability of the 65 International Journal of Software Engineering and Its Applications Vol. 7, No. 1, January, 2013 software project requirement are statistically dependent as question 5 and 6 of the Completeness attribute explains: Q.5: Does the customer have unwritten requirements? Q.6: Are the external interfaces completely defined? Customer’s unwritten requirements, Q.5, have a consequence on the Stability of the requirement, Q.1. Similarly, if the external interfaces are not completely defined, Q.6, it causes the external interfaces to change, Q.2. Therefore, questions 5 and 6 of the Completeness element are correlated with Q. 1 and Q.2 of the Stability attribute. Hence, the Stability and Completeness attributes are statistically dependent as they measure the same underline risk of the software project. Hence, the risk events identified through the elements and the attributes of a SEI risk class are independent and statistically dependent (ISD). The description of SEI classes shows that the attributes of a SEI class are independent of the attributes of other SEI classes. Therefore, the overall risk impact of a SEI risk class due to all of its risk events is statistically independent of the overall risk impact of other SEI risk classes. Therefore, based on the risk classification discussed in Section II, the overall risk impact of the SEI classes is defined as independent and statistically independent (ISI) as they corresponds to unique set of risks of software projects. When the probability distribution of impacts of all the risk events are specified the joint distribution of the overall risk due to all the SEI classes is the product of all the marginal distributions of each SEI class, as represented by equation (2), which models the overall risk of the software project. f X (x ) f PE (x PE )f DE (x DE )f PC (x PC ) (7) The risk events identified through the attributes of a SEI class are classified as ISD’s because they are independent but their impacts are correlated with each as they capture the same underlying risk of the software project. Therefore, the probability distribution of their combine impacts is the joint distribution as presented by equation (3). This joint distribution models the overall risk impact of a SEI risk class that is due to all the attributes of that class. 4. Conclusions A scheme to classify the risk events based on the (in)dependence and statistical (in)dependence is presented. The proposed scheme classifies the risk events of software projects based on their (in)dependence and statistical (in)dependence on other risk events of the software project and helps to avoid double counting the impact of risk events. The scheme relies on the SEI’s TBQ for the risk identification and classifies the risk events of SEI classes and attributes based on their (in)dependence and statistical (in)dependence. The SEI classes are classified as independent and statistically independent (ISI), while the attributes of a SEI class are classified as independent and statistically dependent on the attributes of the same SEI class (ISD). References [1] B. W. Boehm, “Software Risk Management: Principles and Practices”, IEEE Software, vol. 8, Issue 1, (1991) January, pp. 32-41. [2] R. Fairley, “Risk Management for Software Projects”, IEEE Software, vol. 11, Issue 3, (1994) May, pp. 57-67. [3] D. Gupta and M. Sadiq, “Software Risk Assessment and Estimation Model”, Proceedings International Conference on Computer Science and Information Technology, ICCSIT (2008) 29th August- 2nd September, Singapore. 66 International Journal of Software Engineering and Its Applications Vol. 7, No. 1, January, 2013 [4] A. Papoulis, “Probability Random Variables and Stochastic Process”, 3rd Edition, McGraw Hill Companies, (1991). [5] R. Williams, G. J. Pandelios and S. G. Behrens, “Software Engineering Institute”, Software Risk Evaluation Method Description, version 2.0, CMU/SEI-99-TR-029, (1999). [6] M. Carr, S. Konda, I. Monarch, C. F. Walker and F. C. Ulrich, “Software Engineering Institute”, Taxonomy Based Risk Identification, Technical Report SEI-93-TR-006, (1993). [8] D. C. Montgomery and G. C. Runger, “Applied Statistics and Probability for Engineers”, 3rd Edition. John Wiley & Sons Inc., (2006). [9] C. Rodger and J. Petch, “Uncertainty and Risk Analysis”, Business Dynamics, PricewaterhouseCoopers, (1999) April. [10] B. Kitchenham and S. Linkman, “Estimates Uncertainty and Risk”, IEEE Software, vol. 14, Issue 3, (1997) May/June, pp. 69-74. Author Masood Uzzafer Masood Uzzafer has more than 14 years of software development experience in designing and development of different software applications including multimedia, DSP, embedded, mobile and web applications. Mr. Masood has worked for Philips Semiconductors, California, and AlliedSignal Aero-Space, Florida. His research interests are risk management and strategic planning. Mr. Masood received Bachelor of Commerce from University of Karachi in 1988, Bachelor of Electrical Engineering from N.E.D. University, Karachi in 1991 and M.S. in Electrical Engineering from Wayne State University, Detroit, U.S.A. in 1994. Currently he is a PhD candidate at University of Nottingham, UK. 67 International Journal of Software Engineering and Its Applications Vol. 7, No. 1, January, 2013 68