MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY Nguyen Ngoc Tuan RISK MANAGEMENT IN SOFTWARE PROJECT SCHEDULING USING BAYESIAN NETWORKS PhD DISSERTATION ON SOFTWARE ENGINEERING Hanoi – 2021 MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY Nguyen Ngoc Tuan RISK MANAGEMENT IN SOFTWARE PROJECT SCHEDULING USING BAYESIAN NETWORKS Major: Software Engineering Code No.: 9480103 PhD DISSERTATION ON SOFTWARE ENGINEERING SUPERVISORS: 1. Assoc. Prof. Dr. Huynh Quyet Thang 2. Dr. Vu Thi Huong Giang Hanoi – 2021 DECLARATION: I certify that this thesis and the work presented in it are products of my own work, and that any ideas or quotations from other people work published or otherwise, are fully acknowledged in accordance with the standard referencing practices of the discipline. This thesis has not been submitted for any degree or other purposes. Hanoi, April 29, 2021 PhD STUDENT Nguyễn Ngọc Tuấn ON BEHALF OF SUPERVISORS Assoc. Prof. Dr. Huỳnh Quyết Thắng 1 Acknowledgements First of all, I would like to express my sincere gratitude to my first supervisor Assoc. Prof. Dr. Huynh Quyet Thang for his invaluable guidance and support throughout my research. Professor Thang has supported me all the way, all the time. It is his patience that keeps me always committed to doing this research and reaching the end of PhD student period. I am also very grateful to my second supervisor Dr. Vu Thi Huong Giang whose bright hints and expertise has been always helpful to me. My special thanks go to Ms. Vo Thi Huong, Ms. Bui Thi Quynh Nga, Mr. Tran Trung Hieu, Mr. Tran The Anh, Mr. Tran Bao Ngoc and Mr. Cao Manh Quyen, who were master and bachelor students at School of ICT, Hanoi University of Science and Technology and helped me with building the tools as well as testing our models. I am also indebted to Dr. Nguyen Thanh Nam (former CEO of FPT and former President of FSOFT), Mr. Luu Quoc Tuan (Tinh Van Outsourcing Jsc.), Mr. Ngo Quang Vinh (Evizi), Mr. Nguyen Huy Binh (FIS) who provide helpful real software project data and valuable expertise judgments on the data. Finally, my greatest appreciation is to my family, especially to my wife Tran Thi Bich Ngoc and to my son Nguyen Minh Huy. Without their love, patience and sacrifice, this achievement would never be possible. 2 Summary Software project management is an art and science of planning and leading software projects. In software industry, project managers mostly rely on their experience and skills to manage their projects and lack of scientific tools to support them. Risk management is a crucial part of software project management that helps prevent software disasters. In this research, risks are defined as uncertain events or conditions that, if they occur, they would have a bad impact on one or more software project outcomes (cost, time, quality). Identifying and dealing with risks or uncertainty in early phases of software development life cycle would lessen longterm cost and enhance the chance of the project success. The most important part of risk management is risk analysis which assesses the risks and their impact to the outputs of the software project. To overcome subjective assessment based on development team’s experience, the team needs a quantitative risk analysis method. Software project scheduling is one part of software project planning. Since in practice, most software projects are over-budget and behind schedule, software project scheduling needs to be taken into careful consideration. We come up with the following questions: How to schedule software projects better? How to better manage risks in software projects? How to quantitatively analyse risks? Some researchers say that Bayesian Networks can be used to quantify uncertain factors in (general) project scheduling and improve project risk assessment and analysis. Our research is aimed at taking those advantages of Bayesian Networks into software project scheduling by addressing common software project features. The research provides answers to the above questions with probabilistic approaches and tools to assess the impacts of risk factors on software project scheduling; proposing list of common risk factors and Bayesian Network model of these risk factors; and proposing advanced scheduling methods based on incorporating Bayesian Networks into popular scheduling techniques such as CPM, PERT or agile iteration scheduling etc. Bayesian Networks help quantify the factors, and hence help better manage them as well as enhancing the predictability of things happen in the project. 3 This research first takes a literature review on (general) project planning issues, project scheduling techniques, project scheduling tools, uncertainty and risk characteristics in software projects, risk management processes, project risk analysis in order to apply state-of-the-art techniques into software projects (Chapter 1). After that, Bayesian Networks are applied in building and experimenting risk factors in software project scheduling. BRI (Bayes Risk-Impact) algorithm is proposed to assess risk factors’ impact on software scheduling (Section 2.1). The first set of risk factors with 5 risk factors are examined using a probabilistic ownbuilt tool CKDY to analyse risks in software project scheduling (Section 2.2). The research proposes an advanced algorithm for agile iteration scheduling using Bayesian Networks. The advantages of this method are providing a schedule and the probability of finishing agile iteration on time (Section 3.1). In addition, the author goes further with a more refined list of 19 risk factors in software scheduling and uses them in software scheduling methods. The research also incorporates Bayesian Network with CPM and PERT scheduling techniques in traditional software projects together with the Bayesian Networks of common risk factors (Section 3.2 and Section 3.3). The list of 19 risk factors in agile software development is also examined in agile iteration scheduling (Section 3.4). The experimental results show that our models are reliable and our approaches have practical implications, i.e. we can take advantage of Bayesian Networks in modelling and quantifying risks/uncertainty in software projects. 4 How to read this report? The author highly recommends that you read this report from beginning to the end. However, if at any point you want to look at specific important pieces of information, the following guide could be helpful: To get the motivation, the overview of related work, the objectives, the scope, the hypothesis and methodology of this research, please go to the Introduction section. To get an overview of software project scheduling and risk management in software project scheduling, please go to Sections 1.1, 1.2 and 1.3. To get an overview of Bayesian Networks, please go to Section 1.4. To get details on main contributions and key findings of the research, please read Chapter 2 and Chapter 3. To get information on common risk factors in software project scheduling, you can have a look at Section 2.3. The Chapter 2 is about building tools and doing experiments on applying Bayesian Networks into risk management in software project planning (Section 2.1) and some key risk factors (Section 2.2). The Chapter 3 is about incorporating Bayesian Networks and common risk factors into software project scheduling techniques such as CPM (Section 3.2), PERT (Section 3.3), Agile software development scheduling (Section 3.4). To get to know the conclusions, the limitations as well as the further research of the study in this PhD thesis, please read the Conclusion section. 5 Content Acknowledgements ............................................................................................... 2 Summary………….. ................................................................................................ 3 How to read this report? ...................................................................................... 5 List of symbols and abbreviations ................................................................. 10 List of tables.......................................................................................................... 12 List of figures ........................................................................................................ 13 Introduction ........................................................................................................... 15 Motivation.................................................................................................................................... 15 Related work .............................................................................................................................. 18 Research scope ......................................................................................................................... 20 Research objectives .................................................................................................................. 21 Scientific and realistic meaning ............................................................................................... 21 Research hypothesis and methodology ................................................................................. 21 Expected results ........................................................................................................................ 22 Structure of the thesis ............................................................................................................... 22 Chapter 1. Overview of software project scheduling and risk management.......................................................................................................... 24 1.1. Software project management and software project scheduling................................ 24 1.1.1. Software project management ................................................................................. 24 1.1.2. Software project scheduling ...................................................................................... 26 1.2. Software project scheduling methods and techniques ................................................ 27 1.2.1. Overview ...................................................................................................................... 27 1.2.2. Traditional scheduling methods and techniques ................................................... 27 1.2.3. Agile software project scheduling ............................................................................ 32 1.3. Risk management in software project scheduling ........................................................ 34 6 1.3.1. Overview of project risk management ..................................................................... 34 1.3.2. Project risk analysis ................................................................................................... 36 1.3.3. Unknown risks ............................................................................................................. 37 1.3.4. Risk aspects in software project scheduling .......................................................... 37 1.4. Bayesian Networks ............................................................................................................ 38 1.4.1. Bayesian approach vs classical approach.............................................................. 38 1.4.2. Probabilistic approach using Bayesian Networks.................................................. 39 1.4.3. Bayesian Inference..................................................................................................... 41 1.4.4. Bayesian Networks and project risk management ................................................ 42 1.5. Chapter remarks ................................................................................................................ 44 Chapter 2. Common risk factors and experiments on Bayesian Networks and software project scheduling.................................................. 46 2.1. Application of Bayesian Networks into schedule risk management in software project .......................................................................................................................................... 46 2.1.1. Common risk factors in software project management ........................................ 47 2.1.2. Bayesian Networks of risk factors ............................................................................ 48 2.1.3. Risk impact calculation .............................................................................................. 54 2.1.4. Bayesian Risk Impact algorithm ............................................................................... 57 2.1.5. Tool and experiments ................................................................................................ 58 2.1.6. Conclusion and contribution ..................................................................................... 63 2.2. Experiments on common risk factors ............................................................................. 64 2.2.1. Discovering the top ranked risk factors ................................................................... 64 2.2.2. Tool CKDY ................................................................................................................... 68 2.2.3. Experiments and analysis ......................................................................................... 70 2.2.4. Conclusion and contribution ..................................................................................... 74 2.3. Proposed common risk factors in software project scheduling................................... 75 2.3.1. The 19 common risk factors in traditional software project .................................. 75 2.3.2. The 19 common risk factors in agile software project .......................................... 77 7 2.3.3. Conclusion and contribution ..................................................................................... 79 2.4. Chapter remarks ................................................................................................................ 79 Chapter 3. Incorporation of Bayesian Networks into software project scheduling techniques ....................................................................................... 81 3.1. Applying Bayesian Networks into specific software project development ................ 81 3.1.1. Introduction .................................................................................................................. 81 3.1.2. Optimized Agile iteration scheduling ....................................................................... 82 3.1.3. Optimization model for Agile software iteration ..................................................... 83 3.1.4. Tool and experimental results .................................................................................. 88 3.1.5. Conclusion and contribution ..................................................................................... 92 3.2. Incorporation of Bayesian Networks into CPM.............................................................. 92 3.2.1. The RBCPM Model .................................................................................................... 93 3.2.2. The RBCPM Method .................................................................................................. 96 3.2.3. Tool and experimental results .................................................................................. 97 3.2.4. Conclusion and contribution ................................................................................... 101 3.3. Incorporation of Bayesian Networks into PERT .......................................................... 102 3.3.1. Proposed model........................................................................................................ 102 3.3.2. Tool development and data collection................................................................... 106 3.3.3. Experimental results and analysis ......................................................................... 110 3.3.4. Conclusion and contribution ................................................................................... 112 3.4. Incorporation of Bayesian Networks into Agile software development scheduling 112 3.4.1. Incorporation of risk model ...................................................................................... 113 3.4.2. Tool and experimental results ................................................................................ 113 3.4.3. Conclusion and contribution ................................................................................... 115 3.5. Chapter remarks .............................................................................................................. 116 Conclusion .......................................................................................................... 117 What has been done ............................................................................................................... 117 8 Main contributions ................................................................................................................... 117 Limitations................................................................................................................................. 117 Further research ...................................................................................................................... 118 List of scientific publications ......................................................................... 119 References ........................................................................................................... 120 Index……… .......................................................................................................... 128 Appendix. Sub Bayesian Networks of the 24 risk factors ...................... 129 9 List of symbols and abbreviations No. Abbreviation Description 1 AF Assigned First 2 AISP Agile Iteration Scheduling Problem 3 BAIS Bayesian Agile Iteration Scheduling 4 BN Bayesian Network 5 BRI Bayes Risk-Impact 6 CMM Capability Maturity Model 7 CMMi Capability Maturity Model Integration 8 CPM Critical Path Method 9 DAG Directed Acyclic Graphs 10 EVM Earned Value Management 11 FDD Feature-Driven Development 12 IDE Integrated Developer Environment 13 IGR Internally Generated Risk 14 LPT Longest Processing Time 15 MCS Monte Carlo Simulation 16 NPT Node Probability Table 17 PERT Program Evaluation and Review Technique 18 PI Probability-Impact 19 PMBOK Project Management Body of Knowledge 20 PMI Project Management Institute 21 PMP Project Management Professional 10 22 PRAM Project Risk Analysis and Management 23 PRM Project Risk Management 24 PRMP Project Risk Management Processes 25 PSPLIB Project Scheduling Problem Library 26 RAMP Risk Analysis and Management for Projects 27 RBCPM Risk Bayesian Critical Path Method 28 RBPERT Risk Bayesian PERT 29 RESCON RESource CONstrained 30 RMP Risk Management Processes 31 RUP Rational Unified Process 32 SPT Shortest Processing Time 33 XP Extreme Programming 11 List of tables Table 1.1 Basic mathematical notations used for CPM calculation ......................... 28 Table 1.2. The differences between waterfall and agile projects ............................. 33 Table 1.3. The differences between Bayesian and Frequentist approaches ............. 38 Table 2.1. Hui and Liu’s common risk factors [9] ................................................... 47 Table 2.2. Risk factors in the phases ........................................................................ 61 Table 2.3. Risk factors, consequences and impact ................................................... 65 Table 2.4. Examples of risk factors and probabilities .............................................. 67 Table 2.5. Probability of risk factors in the whole project with data set 1 ............... 72 Table 2.6. Probability of risk factors in the whole project with data set 2 ............... 73 Table 2.7. Probability of the experimental risk factors to compare with MSBNx ... 74 Table 2.8. CKDY compared with MSBNx .............................................................. 74 Table 2.9. List of 19 common risk factors for software project scheduling ............ 76 Table 2.10. List of 5 risk factors for software project scheduling in Section 2.2 .... 77 Table 2.11. List of 19 risk factors in iteration scheduling ........................................ 78 Table 3.1. The first data sample ............................................................................... 89 Table 3.2. The probability table for tasks and resources .......................................... 90 Table 3.3. Risk factors analysis ................................................................................ 94 Table 3.4. Data sample 1 .......................................................................................... 98 Table 3.5. Data sample 2 .......................................................................................... 99 Table 3.6. Task attributes of the first data sample.................................................. 108 Table 3.7. Task attributes of the second data sample ............................................. 108 Table 3.8. Task attributes of the third data sample ................................................ 109 Table 3.9. The result for the first data sample ........................................................ 114 12 List of figures Figure 1.1. Activities of project management according to PMBOK Guide. .......... 26 Figure 1.2. CPM parameters in an activity ............................................................... 29 Figure 1.3. An example of BN which represents a simple case ............................... 41 Figure 2.1. A sub BN for the risk factor “Staff experience shortage” ..................... 49 Figure 2.2. A sub BN for the risk factor “Low productivity” .................................. 49 Figure 2.3. A sub BN for the risk factor “Lack of client support” ........................... 50 Figure 2.4. A sub BN for the risk factor “Inaccurate cost estimating” .................... 50 Figure 2.5. A sub BN for the risk factor “Incapable project management” ............. 51 Figure 2.6. A sub BN for the risk factor “Lack of senior management commitment” .................................................................................................................................. 52 Figure 2.7. A sub BN for the risk factor “Inadequate configuration control” .......... 52 Figure 2.8. A sub BN for the risk factor “Inaccurate metrics” ................................. 53 Figure 2.9. A sub BN for risk factor “Excessive reliance on a single process improvement” ........................................................................................................... 53 Figure 2.10. The overall BN for software risk factors ............................................. 54 Figure 2.11. A simple example of Bayesian inference ............................................. 55 Figure 2.12. The three nodes of a simple-chain BN ................................................. 57 Figure 2.13. The graphical interface of the tool ....................................................... 59 Figure 2.14. Result of experiment 1 ......................................................................... 60 Figure 2.15. Results of the three experiments .......................................................... 62 Figure 2.16. Experimental results for Software Design phase ................................. 63 Figure 2.17. Sub BN 1 .............................................................................................. 66 Figure 2.18. Sub BN 2 .............................................................................................. 66 Figure 2.19. The overall BN model .......................................................................... 67 Figure 2.20. Experiment with j30 with the early start schedule ............................... 71 Figure 2.21. Activity joint in the file j301_1.rcp ...................................................... 71 Figure 2.22. Diagram of probabilities of finishing phase by phase ........................ 72 Figure 3.1. Home GUI of tool BAIS ........................................................................ 88 Figure 3.2. Gantt chart for SPT strategy................................................................... 90 Figure 3.3. A part of a BN for 19 risk factors .......................................................... 93 Figure 3.4. Task’s parameters and connection to other tasks. .................................. 96 Figure 3.5. A screenshot of RBCPM ........................................................................ 97 Figure 3.6. A result for experiment with data sample 1 ......................................... 100 Figure 3.7. A result for experiment with data sample 2 ......................................... 101 Figure 3.8. Bayesian Network for each activity ..................................................... 103 Figure 3.9. Risk integration network model into PERT scheduling ...................... 104 Figure 3.10. Process in improved RBPERT Model ............................................... 105 Figure 3.11. The input screen of the RBPERT tool ............................................... 106 13 Figure 3.12. The input file type of the RBPERT tool ............................................ 107 Figure 3.13. A result for the network provided by the RBPERT tool for the first data sample ..................................................................................................................... 109 Figure 3.14. A result for RBPERT network provided by the tool for the first data sample ..................................................................................................................... 111 Figure 3.15. A result for experiment with the third data sample (distribution of Total Duration of activity J) ............................................................................................. 111 Figure 3.16. A screenshot of tool BAIS ................................................................. 113 Figure 3.17. The result of the second experiment .................................................. 115 14 Introduction Motivation Projects in general always involve risks and project managers’ regular worries are concerns about risks. In October 2008, the Hanoi Urban Railway Project Line 2A (Cat Linh-Ha Dong) was approved to be invested with the total budget of more than 8.700 billion VND (552 million USD). Until now, the project’s investment had almost doubled to 868 million USD. It was scheduled to be put into service in 2013 but until now the project remains incomplete1. Software projects also have schedule risks, and as a consequence, budget or cost risks. For example, the project on the Vietnamese National Population Database2 was approved to be invested in 2015 and was planned to be finished in two years (2016 and 2017). However, the system can only be put into operations in February 2021. Another similar example is the project on Vietnamese National Public Service Portal3 which was planned to come public in September 2016 but was only opened since December 2019. As a matter of fact, the majority of software projects the author has experienced in Vietnam are behind schedule (some of the projects will be examined in Chapter 2 and Chapter 3). Even in developed countries, software projects are facing ongoing problems. For example, the project Universal Credit - the welfare payment system owned by the Central Government of the United Kingdom - started in 2013. The project schedule has slipped, with the final delivery date now expected to be 2021, although the system is gradually being introduced. In 2013, only one of four planned pilot sites went live on the originally scheduled date, and the pilot was restricted to extremely simple cases4. Many software projects have suffered from significant budget overruns together with a series of delays, which cause either temporary issues or permanent failures. For example, The Queensland Health Payroll System was launched in 2013 in what could be considered one of the most spectacularly over budget projects in Australian history, coming in at over 200 times the original budget. Besides, in spite VnExpress (2019), “Ministry of Transport admits the mistakes on the Cat Linh-Ha Dong urban railway project”, available online (in Vietnamese) at: https://vnexpress.net/bo-giao-thong-van-tai-thua-nhansai-sot-trong-du-an-cat-linh-ha-dong-3988254.html 2 Vietnamese Prime Minister (2015), “Decision regarding the approval of investment policy for the project on the National population database”, Government of Vietnam, 2083/QĐ-TTg (26 November 2015) 3 Vietnamese Prime Minister (2015), “Resolutions on e-Government”, Government of Vietnam, 36a/NQ-CP (14 October 2015) 4 Wikipedia.org, “List of failed and over-budget custom software projects”, Retrieved 20 September 2019, available online at: https://en.wikipedia.org/wiki/List_of_failed_and_overbudget_custom_software_projects 1 15 of promises that the new system would be fully automated, the new system required a considerable amount of manual operation [1]. Another example for software project permanent failure case is the project e-Borders for an advanced passenger information programme which aimed to collect and store information on passengers and crew entering and leaving the United Kingdom. Started in 2007, the project had a series of delays and had to be cancelled in 2014 [2]. Some researches pointed out that most of the software projects (83.8%) are over budget or behind schedule and 52.7% of software development projects deliver software with fewer features than originally specified [3, 4]. Statistics also show that 31.1% of development projects end up being cancelled or terminated prematurely. Among those completed projects, only 61% of them satisfy originally specified features and functions [5]. In the software industry, one of the greatest challenges that development teams constantly face with is to keep the projects under control in terms of budget and schedule (development time frame). The activities of a software project are influenced by internal and external factors (from that project organization) that make it uncertain whether the project will achieve its objectives. The effect that this uncertainty has on the project’s goals is called risk [6]. In the other words, risk is an event or an uncertain condition that, if it occurs, will have a positive or negative effect on at least one of the project objectives [7]. In this thesis, risks are defined as uncertain events or conditions that, if they occur, they would have a bad impact on one or more software project outcomes (cost, time, quality). The above situation raises an important question: how projects’ risks are managed better in order to get rid of the temporary issues as well as preventing from failure? The purpose of project management is to lead the project to success. A successful software project certainly relies on many factors (e.g. following appropriate processes and tasks, managing risks properly etc.). Since risks are inevitable in projects, risk management has become an important part of project management. Although many researchers, experts and writers have proposed variety of processes and techniques, project risk management (PRM) is still rapidly evolving and handling risks in general projects as well as software projects remains a challenge. Concerning PRM, an important component is risk analysis which also known or considered the same as risk quantification. Risk analysis attempts to measure risks and their impacts on different project outcomes (i.e., time, cost, quality). Many software projects fail since project managers mostly plan based on their experience and there is a lack of scientific methods to support them. To overcome subjective 16 assessment based on development team’s experience, the team needs a quantitative risk analysis method. Although various researches have proposed and examined a range of processes and techniques and software project risk management is continuously evolving, handling uncertainty in more and more complex real-world projects remains a challenge. Aside from that, project scheduling (a part of project planning – an early phase of software development life cycle) is concerned with the techniques that can be employed to manage the activities that need to be undertaken during the development of a project. There are various techniques for project scheduling, from simple and easily understandable ones such as Task List, Gantt Chart, Schedule Network Analysis, to more complicated ones like Critical Path Method (CPM), Program Evaluation and Review Technique (PERT), Monte-Carlo Simulation (MCS) or Fuzzy Logic etc. [6, 8, 9, 10]. Traditional project scheduling under risk/uncertainty has attracted more research and attention in the project management community. In some of the project management literature in 1990s, “risk analysis” was equivalent to “the analysis of risk on project plan” [11]. This thesis focuses on modelling risks in software project time management (of course, it is indirectly related to other project outcomes which are cost and quality). In other words, this thesis concentrates on quantitative risk analysis in software project scheduling. The earliest studies incorporating uncertainty/risk in project scheduling were in the late 1950’s by Malcolm et al. [12] and Miller [13]. Since then, a variety of techniques have been introduced, several tools have been developed, and many of them are widely used throughout different industries. However, they often fail to capture uncertainty properly and/or produce inaccurate, inconsistent and unreliable results, especially when applied to software projects which have specifically different attributes to other traditional projects. Project uncertainty has several aspects of which not all can be categorized and treated as risks. Several authors such as Ward and Chapman [14] argued that project risk management should be focusing on managing uncertainty and its various sources rather than emphasizing a set of possible events that might have bad impacts on project performance (i.e., should be aware more about uncertain aspects rather than fixed set of defined risks). However, since this thesis is about software project, risks are considered and treated the same as uncertainty. Most of quantitative techniques and methods in the current practice of project risk management are based on the “Probability Impact” concept, which have certain shortcomings in terms of risk analysis in project scheduling. More sophisticated 17 methods and techniques are needed to address as well as managing important sources of uncertainty/ risk. In software industry, project scheduling also has to deal with the fact that resources such as human, time, technology and money are not always predetermined [15]. There are always risks in software project scheduling as well. In most of the projects, the activity (from now on is considered the same as the “task” in software projects) times are not known for certain. Therefore, they may be assumed as random variables. Furthermore, Bayesian Networks (BNs) have attracted a lot of attention in different fields (construction, R&D etc.) as a powerful approach for decision support under uncertainty. A BN is a graphical and mathematical model which offers a powerful, general and flexible approach for modelling risk and uncertainty. Its capability of modelling causality and also conditional dependency between variables make it perfectly suitable for capturing uncertainty in projects. Yet, BNs are rarely applied in project risk management in general as well as in software project management and software project scheduling. The author of this thesis strongly believes that if we can identify and control risks at early stages of software development project, we can significantly increase the chance of success of the project. Since it is not easy (or impossible) to control all of the problems or factors, this thesis only focus on time factors which related to software development schedule. Therefore, this thesis aims at introducing an advanced approach as well as finding a better model for incorporating and managing uncertainty/risks in software project scheduling. The idea is to use BNs to perform the well-known scheduling techniques such as CPM, PERT etc. as well as modelling risk factors in software project scheduling. The proposed approach enriches the benefits of scheduling techniques by incorporating uncertainty/risk factors and adding the strong analytical power of BNs. Related work There have been various researches on applying BNs in to general projects. Khodakarami [15] applied BNs into general project scheduling with two case studies of aircraft design and health and fitness center design and construction. Erhan et al. [16] proposed a project control framework that integrates the project uncertainty and associated risk factors into project control. Their framework is based on earned value management (EVM), which is an effective and widely used quantitative project control technique in practice. The framework uses hybrid BNs 18 to enhance EVM with the ability to compute the uncertainty associated with its parameters and risk factors, making it practical for construction projects. Ali et al. [17] combined Monte Carlo Simulation and Bayesian Networks methods to present a structure for assessing the aggregated impact of risks on the completion time of a construction project. Lee and Shin [18] proposed an application of BNs into risk management of ship building project and proposed 26 risks. Sharma and Chanda [19] developed a BN model for prediction of R&D project success which also assesses based on R&D project risk factors. Khodakarami et al. [20] also examined an approach to generate project schedules that incorporates risk, uncertainty, and causality using BNs. Their model empowered the traditional CPM to handle uncertainty, and they also provided explanatory analysis to elicit, represent, and manage different sources of uncertainty in project planning. Fenton and Neil [21] introduced AgenaRisk as a probabilistic tool based on BNs; Chang, Yu, and Cheng [22] proposed a risk-based Critical Path Scheduling Method based on 2 risk categories and 7 risk levels which applied into construction projects. Regarding risk factors in software projects, Hui and Liu [5] selected 24 risk factors that may cause potential impacts on (the whole) software project and applied BNs properties in the calculation of impact in their project risk model. Kumar and Yadav [23] considered quantitative features and causal relationships among risk factors in software projects. They introduced a probabilistic approach to assess risks in software projects as well as proposing a list of 27 risk factors (in software projects). However, they analysed risks for the whole software projects and did not focus on the scheduling and planning phases which would decide the success of projects. Adjusting Kumar and Yadav’s method, this thesis proposes the list of 5 most crucial risk factors as well as building the tool CKDY to examine risks in software scheduling (Section 2.2). There have been some other researches on BNs and software risks’ analysis. Hu et al. [24] studied causality analysis among risk factors and project outcomes for software development projects. For this purpose, they proposed a modelling framework based on BNs to deal with causality constraints in risk analysis. The developed framework can be used for discovering new causal relationships and validating existing relationships among risk factors and project outcomes. Anthony et al. [25] proposed a risk assessment model for decision-making in software management which consists of processes and component of risk assessment in three groups: operational risks, technical risks and strategic risks. Rai et al. [26] believed that managing projects is managing risks and identified 43 risk indicators in Agile Software Development. 19 One notable research is from Szoke Akos’ PhD dissertation in 2014 which proposed an optimized algorithm for agile software project scheduling [27]. As can be seen from literature review, much research on software risk analysis focuses on finding out the relationship risk factors and software outcomes, but lack of a quantitative approach and causal relationship between risk factors [5, 23, 28, 29]. Some other researches pay attention to define the quantitative approach and the causal relationship between risk factors and assess risks for the whole software project [30, 31] but does not pay enough attention to model risk factors from the scheduling (in the planning) phase – the phase decides the failure or success of the project later on. To quantify uncertainty, Jefferson et al. [32] apply Action Research to develop a model that takes into account the relationships of dependencies and interdependence that exist between the sources of risks and uncertainties in software projects. As a result, their work contributes with the practice of risk and uncertainty management in software projects. J. Yong and Z. Zhigang [33] proposed a PERT Bayesian Network (PERTBN) model with the modelling methodology and the conditional probability calculation method of different kinds of procedure arrangement (single-chain, centralized, distributed) and stated that with PERTBN model, the effectiveness of the project schedule control and optimization are ensured. However, the research did not examine more in-depth on the risk factors or other specific software features that can have impacts on the project schedule. In addition, there is always a need for properly schedule control in software projects to determine the instant status of the schedule, to know if the schedule has changed, and to embrace changes when they occur. In order to do that, influential factors that cause schedule changes need to be carefully considered. In summary, current researches related to this thesis are either on risk management or assessment for the whole software project or for other project (construction, building, R&D etc.) scheduling. There is a need of probabilistic method on risk management in software project scheduling as well as examining deeper the risk attributes of software project scheduling. Research scope The research is about software projects (or software development projects), having common features and also specific features in comparison to other type of projects (such as construction projects, R&D projects etc.). Unfortunately, there have been only a few good researches on applying probabilistic methods on software development projects. Therefore, this method first has a literature review 20 on common projects to look for approaches applied for them, and after that proposes the approach applied for software projects. The scope of this research is on risk management in software project scheduling. This is quantitative risk management which concerns about risks affecting project schedule (or project time frame). In terms of project scheduling techniques, this thesis focuses on the most popular techniques such as CPM, PERT for traditional software development projects, as well as Agile software project scheduling. Research objectives The main objectives of this research are: 1) To find out a quantitative method to better assess and analyse risks in software project scheduling. In order to achieve this objective, the research has to answer to following questions: what are the risks’ attributes of software project scheduling? How to manage risks in software project scheduling better? In other words, the research aims at analyzing and modelling risks in software project scheduling. 2) To find out a probabilistic method to improve well-known software project scheduling techniques, including both techniques for traditional software scheduling and agile software scheduling. The proposed methods and models would enhance risk management process by a quantitative assessment of risks impact on software project scheduling. If we apply this model and method in practice, the author of this thesis expect that it would help predict, monitor project schedule better as well as making appropriate decisions. Scientific and realistic meaning The proposed methods and model would enhance risk management process by a quantitative assessment of risks impact on software project scheduling. If we apply this model and method in practice, it would help predict, monitor project schedule better as well as making appropriate decisions. Research hypothesis and methodology The hypothesis of this thesis is that it is possible to use BNs to quantify uncertainty in software project scheduling and improve software project risk assessment. 21 Since there is very limited research on this topic, the research methodology comprises a literature reviews from general project management to get the relevant ideas for software project management. Firstly, a literature reviews to investigate the current state of project scheduling under uncertainty which determines the need, scope and objectives of the new approach. Secondly, a literature review follows on the background, theory and application of BNs. This provides the conceptual and the fundamental background for the new approach. The research also examines the features of software projects, both in waterfall model and agile software development model. In order to handle risks in software project scheduling, the common risk factors are also needed to be examined. Within the research, tools are built to validate the models and help software project managers in assessing risks and making appropriate decisions. Expected results Following the above methodology, the author expects to: 1) Apply Bayesian Networks to develop an algorithm and tool to assess the impacts of risks and hence proposes common risk factors in software project scheduling. 2) Apply Bayesian Networks to develop a probabilistic approach to enhance the common scheduling techniques (for both traditional software development and agile software development) in terms of risk management and predictability. Structure of the thesis An overview of the main chapters is as follows: Chapter 1 briefly reviews software project scheduling and software project risk management process and explores the currently popular techniques in project scheduling. Chapter 2 consists of initial attempts of applying BNs into risk management in software project scheduling as well as experiments on common risk factors in software project scheduling. 19 common risk factors for both traditional software development projects and agile software projects are proposed. Chapter 3 incorporates BNs into popular software project scheduling techniques, namely CPM, PERT and agile software scheduling. BNs are also applied in examining the relationships among risk factors proposed in Chapter 2. 22 The last section Conclusion concludes the thesis and points the way forward for future research. The main contributions and results of the research: The research has developed the algorithm BRI (Bayes Risk-Impact) and the tool CKDY to assess the impacts of risks and hence proposes common risk factors in software project scheduling. Based on literature review and experiments, the research has come up with 19 common risk factors in software project scheduling (for both agile development style and traditional development style). The research also proposes advanced scheduling methods in software project development. The methods based on incorporating Bayesian Networks and common risk factors models into popular software scheduling techniques such as PERT, CPM, and Agile software development, with the examination of the model of 19 common risk factors. Tools have been built to experiment the proposed scheduling methods and models. Experimental results show that the proposed methods and models are reliable as well as providing practical value to software development teams in analyzing, monitoring and predicting risks and the chance of success of the project. 23 Chapter 1. Overview of software project scheduling and risk management 1.1. Software project management and software project scheduling 1.1.1. Software project management Software project management is an art and science of planning and monitoring software projects. It refers to the branch of project management dedicated to the planning, scheduling, resource allocation, implementation, tracking and delivery of software and web projects [34]. There are various types of projects (R&D projects, construction projects, information system projects, software projects etc.) which are associated with different styles of management. Software project management is quite distinct from traditional or other project management. Firstly, software is developed, not manufactured. Therefore, the product (working software) is intangible and uniquely flexible. Secondly, software engineering is not recognized as an engineering discipline with the same status as mechanical, electrical engineering etc. Moreover, software projects have a unique lifecycle process that requires multiple rounds of testing, updating, and customer feedback. That software development process is not standardized. Lastly, most software projects are “one-off” projects. Software development team can only use similar experience, not the same experience or repeated process. Therefore, software project management is about the methodology to organize all activities related to the software. We always need project management since software projects always have constraints of budget and time frame. Nowadays, most IT-related projects are managed in the agile style and software is developed in groups, in order to keep up with the increasing pace of business, and iterate based on customer and stakeholder feedback. Besides being used in ITrelated projects, Agile style has also been increasingly used in other project management. The project manager leads the project team and often plays the central role among the investors (or customers), the suppliers and the senior management of the organization. He or she makes sure the project complies with the constraints as well as delivering the product (software) on time. Software project managers may have to do any of the following tasks [34]: 24 - Planning and scheduling: This means putting together the blueprint for the entire project from ideation to fruition. It will define the scope, allocate necessary resources, propose the timeline, delineate the plan for execution, lay out a communication strategy, and indicate the steps necessary for testing and maintenance. - Leading: A software project manager will need to assemble and lead the project team, which likely will consist of developers, analysts, testers, graphic designers, and technical writers. This requires excellent communication, people and leadership skills. - Execution: The project manager will participate in and supervise the successful execution of each stage of the project. This includes monitoring progress, frequent team check-ins and creating status reports. - Time management: Staying on schedule is crucial to the successful completion of any project, but it is particularly challenging when it comes to managing software projects because changes to the original plan are almost certain to occur as the project evolves. Software project managers must be experts in risk management and contingency planning to ensure forward progress when roadblocks or changes occur. - Budget: Like traditional project managers, software project managers are tasked with creating a budget for a project, and then sticking to it as closely as possible, moderating spend and re-allocating funds when necessary. - Maintenance: Software project management typically encourages constant product testing in order to discover and fix bugs early, adjust the end product to the customer’s needs, and keep the project on target. The software project manager is responsible for ensuring proper and consistent testing, evaluation and fixes are being made. Therefore, managers have diverse roles. Since software project management is normally concerned with activities involved in ensuring that software is delivered on time, on schedule and in accordance with the requirements of the organizations developing and procuring the software, managers most significant activities are planning, estimating and scheduling. According to Project Management Institute (PMI) in Project Management Body of Knowledge (PMBOK) guide [7], project management includes five stages or process groups: Initiating, Planning, Executing, Monitoring and Controlling, and Closing (Figure 1.1). 25 In modern software project planning, the two essential tasks are project risk management and project scheduling. They play crucial roles to make sure the project is effectively and efficiently organized, including resources (hardware, software, and network) allocation, task and personnel assignment and monitoring [7, 10]. Software projects are quite different to other projects since software requirements are continuously changing (during software development life cycle), software projects are often behind schedule and over budget. Moreover, in reality, many software project managers either ignore or do not take appropriate risk management. This leads to project failure or customer complains on the quality, the schedule or the over budget of the project. Some other project managers who are aware of risk management, but they only rely on their own team skills or experience, even if they follow the capability maturity models CMM/CMMi (Capability Maturity Model Integration) or PMP (Project Management Professional). As can be seen in Figure 1.1, risk management affects all the processes in Process Groups. In addition, project teams could adjust or update the planning process while they are executing, monitoring and controlling their projects. Figure 1.1. Activities of project management according to PMBOK Guide. 1.1.2. Software project scheduling Software project scheduling is one of the most demanding tasks for software project managers. It is all about resources allocation during the project life cycle. In simple words, software project scheduling is splitting the whole project into smaller tasks and estimates the required time and resources to complete each task. Software development teams normally try to organize tasks concurrently to make optimal use of workforce as well as minimizing task dependencies to avoid delays caused by 26 one task waiting for another to complete. In reality, software project scheduling is dependent on project managers’ intuition and experience. In real-life software project, a schedule is represented as a set of activity diagrams (Work Breakdown Structure, Activity Charts) which clarifies the dependencies between activities (tasks) and personnel assignment. 1.2. Software project scheduling methods and techniques 1.2.1. Overview There are many popular techniques for project scheduling, include: - - Graphical representations used to illustrate the project schedule such as + Work Breakdown Structure: show project breakdown into tasks. + Activity Charts: show task dependencies and the critical path. + Gantt Charts: Bar charts show schedule against calendar time. Critical Path Method – CPM [10, 15, 20]. Program Evaluation and Review Technique – PERT [12, 13, 15, 35]. Project scheduling (especially under uncertainty) is the most widely studied area of risk quantification in project management. Producing a reasonable and reliable project schedule is one of the crucial tasks of project managers. Moreover, having a realistic schedule for the project is one of the most cited factors of project success [36]. Several techniques are proposed for modelling risk and uncertainty in project scheduling [10, 35, 37]. This section reviews some notable techniques. CPM and PERT are the classical approaches for project scheduling. Simulation-based techniques are more modern approach that is adopted by many project management software tools and some argue the best practice available. Alternative approaches are Critical Chain Method and Fuzzy logic will be reviewed briefly. Last but not least, scheduling technique and method for agile software development will also be discussed. 1.2.2. Traditional scheduling methods and techniques a) Critical Path Method (CPM) Critical Path Method (CPM) is one of the most famous techniques in project scheduling. Developed in 1957 by DuPont, CPM has become the standard technique in project management and most project management tools support CPM calculation [15]. According to Pollack-Johnson and Liberatore [38], almost 70% of project managers or professionals use CPM. CPM calculation includes the following steps: 27 - Specify the individual activities using a work breakdown structure. Determine the sequence of those activities and dependency between them. Draw a network diagram (that models the activities and their dependency). Estimate the completion time (duration) for each activity. Identify the critical path (the shortest-duration path through the network). Update the CPM diagram as the project progresses. The basic mathematical notations used for CPM calculation is shown in the Table 1.1. In fact, the parameters D, ES, EF, LF, LS are common used in scheduling techniques. Table 1.1 Basic mathematical notations used for CPM calculation No. Notation 1 aj 2 Dj Description activity j Duration of aj Formula Note i is one of the predecessor activities 3 ES Earliest start of aj ESj = Max[ESi + Di ] 4 EF Earliest finish of aj EFj = ESj + Dj 5 LF Latest finish of aj 6 LS 7 TF Latest start of aj Total float of aj the time that the activity’s duration can be increased without increasing the overall project completion time LFj = Min [LFk – Dk ] LSj = LFj – Dj k is one of the successor activities TFj = ESj – LSj = LFj – EFj A critical activity is the one with no float time (TF = 0) and should receive special attention, since delay in critical activity will lead to delay the whole project. Informally, the critical path is determined by performing forward and backward passes through the project network. The forward path computes the earliest start (ES) and the earliest finish (EF) time for each activity. The backward path computes the latest start (LS) and the latest finish (LF) time for each activity. The total float for each activity is the difference in the latest and earliest finish of each activity [15]. The connections among these parameters in an activity are described in Figure 1.2. 28 Therefore, CPM is a deterministic model which uses a fixed time estimate for activities. Although CPM (“pure deterministic in nature” [20]) was not developed to handle or quantify uncertainty, it does provide very useful information about relations between activities, activities time and the overall project schedule (so that project scheduling can be controlled). Figure 1.2. CPM parameters in an activity b) Program Evaluation and Review Technique (PERT) PERT was introduced in 1957 by the US Navy as one of earliest research incorporating risk in project management [13, 15]. A special feature of PERT is its ability to handle uncertainty in activity duration. This means if there is a variation in time estimate of an activity; it may affect the whole project. PERT methodology is developed to help completing the project successfully when the time estimate is not definitive. In order to do that, instead of a single estimation in CPM, PERT provides a beta probability distribution to each project activity. Three time estimates (optimistic, most likely, and pessimistic time estimates) can be obtained and can be used to estimate the expected time and the standard deviation for an activity i. Optimistic time estimate is the estimate determined considering all favorable conditions; i.e. in the best-case scenario or when everything goes right. In other words, this is the shortest time in which the activity may be completed. 29 Most likely time estimate is the time duration where there is a high probability of completing the activity within the given time duration. In other words, it is the estimate in case of normal problems or opportunities. Pessimistic time estimate is the estimate determined when we consider all unfavourable conditions; i.e. in the worst case scenario or when everything goes completely wrong. In other words, this is the longest time the activity might require to complete. - Expected time: μi = (Optimistic + 4xMost likely + Pessimistic)/6 - Standard deviation: σi = (Pessimistic – Optimistic)/6 The critical path is the sequence of project activities that determines the earliest time by which the project can be completed, and the total duration determines the completion date of the project. PERT assumes that only one path is the critical path and that the path does not change. Therefore, managers using PERT are advised to focus on these critical activities to ensure the project completion date remains unchanged. The expected value of a critical path is calculated by the expected value of each activity, and the variance of the critical path is the sum of the variances of all activities in the path. Based on the calculation, the probability that the project will be completed by a certain date can be calculated. Therefore, PERT is somehow similar to CPM. The main difference is that each activity in a PERT network has a variance associated with its completion time. In other words, CPM is deterministic, while PERT is somehow probabilistic. c) Simulation-based techniques Monte Carlo Simulation (MCS) was first proposed for project scheduling in the early 1960s [39]. However, it was not until the 1980s when sufficient computer power became available that simulation became the dominant technique for handling risk and uncertainty in projects [40, 41]. In its simplest approach, MCS uses the project activity diagram. The duration of each activity is estimated by shortest, most likely and longest duration and also the shape of the distribution (such as Normal, Beta etc.). Then critical path calculation is performed several times, each time using random values from the activities’ distribution function. More advanced tools like PertMaster (Oracle Primavery Risk Analysis [42]) use simulation-based approach not only for handling uncertainty in duration and cost, but also for providing a whole risk analysis process. They can link the project 30 schedule to the risk register and apply simulation-based techniques to carry out probability impact analyses. A survey by the Project Management Institute [43] showed that nearly 20% of project management software packages support Monte Carlo Simulation. Another survey by Pollack-Johnson and Liberatore in 2003 [44] found that 17% of project managers used probabilistic analysis and/or simulation within project management software. However, simulation has its own drawbacks. One serious methodological flaw in traditional MCS of project networks is the assumption of statistical independence for individual activities which share risk factors in common with other activities [38]. Most available simulation packages assume that the marginal distributions of uncertainty for individual activities in the project completely define the multivariate distribution for project schedule. It is intuitively obvious that this assumption is highly suspect for many projects which involve multiple activities of a similar type and/or have different activity types, which are influenced by common risk factors. van Dorp and Duffey in 1999 [45] demonstrated that failure to model such types of risk dependence during MCS can result in the underestimation of total uncertainty in project schedule. The most effective way to deal with dependence in a statistic is use a causal structure to explain it. MCS is not capable of modelling causal structures. Another weakness of MCS explained by Williams [46] is the inability of simulation to capture the actions taken by the managers to recover any slippage in activity/project duration. MCS simply runs through a network assigning values to random variables on each iteration. It ignores the fact that in reality if an activity was running late, management would take actions to affect the activity duration. Uncertainty in an activity is usually the result of a chain of causes (sources) and can be affected by a chain of actions (controls). Furthermore, MCS is only as good as the information that is fed into it. If the duration distributions of the project activities are incorrect or inadequate, the simulation results are erroneous and invalid. In reality duration of most activities are estimated subjectively. In order to capture all aspects of uncertainty in activity (project) duration various known and unknown sources of risk have to be addressed. Therefore, MCS will not be applied as a scheduling technique in the scope of this thesis. 31 d) Fuzzy logic An alternative approach that has interested several researchers in the past two decades [47, 48] is Fuzzy project-scheduling. The fuzzy set scheduling literature recommends the use of imprecision rather than uncertainty, fuzzy numbers rather than stochastic variables and membership functions rather than probability distributions. The output of a fuzzy scheduling will normally be a fuzzy schedule, which indicates fuzzy starting and ending times for the activities. This may be as difficult to generate as probability distributions of activity duration and also there is no generally accepted computational approach available. Therefore, the fuzzy project-scheduling approaches have been kept in the academic sphere. A summary of most of the published research works in fuzzy project scheduling can be found in the work of Bonnal et al. in 2004 [49]. 1.2.3. Agile software project scheduling From the late 1990s several methodologies like RUP, XP, FDD, Scrum etc. began to get increasing public attention and has become mainstream software development methods, especially in Vietnam where most software vendors are small and medium enterprises. These methods are representative of agile software development. Agile – denoting “the quality of being agile; readiness for motion; nimbleness, activity, dexterity in motion” [50] – software development methods are attempting to offer an answer to the eager business community asking for lighter weight along with faster and nimbler software development processes. This is especially the case with the rapidly growing and volatile Internet software industry as well as for the emerging mobile application environment. Agile development is a way of organizing the development process, emphasizing direct and frequent communication – preferably face-to-face, frequent deliveries of working software increments, short iterations, active customer engagement throughout the whole development life-cycle and change responsiveness rather than change avoidance [51]. Thus, agile software development recognizes that software development is inherently a type of product development and therefore a learning process. It is iterative, explorative and designed to facilitate learning as quickly and efficiently as possible. Two of the most significant characteristics of agile approaches are: 1) they can handle unstable requirements throughout the development cycle; and 2) they deliver products in shorter time-frames and under budget constraints when compared with traditional development methods. 32 An agile approach can be seen as a contrast to (traditional) waterfall-like processes [52, 53, 54] which pay attention to thorough and detailed planning and design upfront and consecutive plan conformance. The waterfall model is the oldest and the most mature software development model [53]. In practice, the waterfall development model can be followed in a linear way, and iteration in an agile method can also be treated as a miniature waterfall lifecycle. Agile approaches have been widely employed in a domain of low cost of failure or linear incremental cost of failure [55]. Examples within this domain include webbased applications, mobile applications [50], Internet commerce, social networking, games development, and even some areas in government, finance and banking software development. Table 1.2 summarizes some of the differences between waterfall and agile projects. Table 1.2. The differences between waterfall and agile projects Criteria Product/ scope Waterfall Agile An often bloated product that The best possible product according is still missing features (i.e., to customers own prioritization, rejected change requests or de- incorporating learning from actual scoped to meet deadlines). use (revolves with the increments). Schedule/ time Deadlines are usually missed, Very high probability of meeting and it is unlikely for a project fixed date commitments; can often to deliver early. deliver early with the highest value. Quality Defects must be tested Quality is built in, and is the key to extensively and expensively. productivity (writing tests before writing code). Return/ value creation Revenue earning and value creation are delayed until the lowest priority features are implemented and delivered. Relationship Contractual to the customer Value is generated early, as soon as the minimum highest prioritized features are delivered. Greater return on investment. Collaborative 33 Since agile software development is organized iteratively and incrementally in iterations, agile software scheduling is actually iteration scheduling. Iteration scheduling aims at determining a very feasible and precise plan for the development that schedules the implementation of selected features within an iteration (i.e. assigning tasks to developers). Technical tasks (or Sprint backlog items in Scrum) are the main concepts of iteration scheduling. These tasks are the fundamental working units accomplished by one developer, and usually require some working hour realization effort that is estimated by the team. The aim of iteration scheduling is to break down selected requirements into technical tasks and to assign them to developers [56]. In that process, the development team also needs to care about tasks dependencies (sequencing) and time constrains. The problem of optimized Agile iteration scheduling will be discussed in details in Section 3.1. 1.3. Risk management in software project scheduling 1.3.1. Overview of project risk management Risk management has become an important part of project management and has attracted a wide range of research during the last two decades [11]. Since 1990 various Risk Management Processes (RMP) have been proposed. Probably the most popular Project Risk Management Processes (PRMP) is Chapter 11 of the PMBOK (Project Management Body of Knowledge) guide [7], the PRAM (Project Risk Analysis and Management) guide [57] and the RAMP (Risk Analysis and Management for Projects) guide [58]. Most organisations adopt one of these guides or use them to develop their own process. This thesis does not intend to explore the detailed differences between different guides since, apart from fundamental differences in assumptions and methodologies [59], they all aim to capture risk and uncertainty in the following three stages: - Risk Identification Risk Analysis Risk Response The Risk Identification stage attempts to discover the main sources of risk. This stage is also known as qualitative risk management. By using various data gathering techniques (e.g. interviewing, brainstorming, Delphi technique, checklists etc.) from all parties involved in the projects, the possible risks that might affect the project are identified. The usual output of the risk identification stage is a document called the Risk Register. Many authors have discussed risk registers in their works [60]. Williams [61] stated two main roles for a risk register: 34 - A repository of a corpus of knowledge. To initiate the analysis and plans that flow from it. Chapman and Ward [14] consider a risk register as documentation of the sources of the risks, their responses and also risk classification. Ward [62] described the purpose of a risk register “to help the project team review project risk on a regular basis throughout the project”. Patterson and Neailey [63] presented a risk register database system to aid managing project risk. Risk registers can be a good management tool during the course of a project. However, it is not possible to identify all risks and capture all aspects of them. There are always unknown (i.e. undiscovered, unattended or immeasurable) risks that often are more important than the identified risks in the risk register. The Risk Analysis stage attempts to measure the risk and its impacts on different project outputs (i.e. cost, time, and performance). This stage is also known as quantitative risk management. The likelihood that each identified risk will occur and also its possible impact on the project is estimated. The combination of the risks, probabilities and their impact create ‘probability-impact’ (PI) matrices. This matrix can be used to assign ranks to risks and then prioritise them. Most of the available quantitative tools and techniques (simulation based tools) implement the PI values to quantify uncertainty in projects. However, use of PI matrices has some important shortcomings [11]. The Risk Response stage attempts to formulate management responses to the risk. Also known as “Risk Mitigation”, it uses the results of the analysis stage in order to improve the chance of achieving the project objectives. “Risk Response” is a decision making process. A number of alternative strategies are available when planning risk responses, which can be described under one of the following strategies [64]: - Avoid - seeking to eliminate uncertainty by reducing either the probability or the impact to zero. - Transfer – seeking to transfer ownership and/or liability to a third party (e.g. insurance). - Mitigate – seeking to reduce the size of the risk exposure in order to make it more acceptable to the project or organization. - Accept – recognizing residual risks and responding either actively by allocating appropriate contingency, or passively doing nothing except monitoring the status of the risk. 35 There are several other publications with different perceptions of project risk management processes. For example, Al-Bahar and Crandall [65], the UK Ministry of Defence [66], del Caano and de la Cruz [67], Wideman [68], British Standard Institute (BSI) [69], NASA (Rosenberg et al. 1999) [70], the U.S. Department of Defence [71], and the US Department of Transportation [72] suggest the use of processes with different stages or phases. Even though risk management process is adopted for managing risk/uncertainty, risk analysis always plays an important role in the process. 1.3.2. Project risk analysis The term risk analysis in the scope of this research is the same with quantitative risk analysis and related to risk measurement, as we focus on quantitative issues of project risks. Project risk analysis is one stage of project risk management. In some literature, risk analysis is even synonymous with risk management. In fact, risk analysis is usually started out by a qualitative analysis and its results support the decision making process in the Risk Response stage. It is a continuous process that can be started at almost all stages in the duration of a project. However, it is the best to use risk analysis in the beginning stages of projects (i.e. some phases like feasibility study and planning) and continually update it during the implementation phase. This can be done iteratively at intervals, and this also matches with agile software development. Risk analysis is the most “formal” aspect of the project risk management process [64]), often involving sophisticated techniques and usually requiring computer software (or tools). Such techniques may be applied with various levels of effort depending on the available resources for the analysis and also on the details. Risk analysis can bring in certain benefits to software project, including: - Help to make decisions and make it possible for more effective and efficient risk management. - Help to make more feasible (realistic) plans, in terms of both duration and costs. - Help to form statistical data of historical risks. This in turn would be benefits in better planning and implementation of future projects. 36 1.3.3. Unknown risks One important category of uncertainty in projects is “Unknown Risks”. These are important sources of uncertainty because their impact on a project may outweigh all other sources of risks. Although unknown risks are thoroughly acknowledged (perhaps with different names) by several authors, none of the existing approaches for project scheduling is able to model and quantify this type of risk. The conventional “probability impact” approach at best is only capable of modelling “known risk”. Most of the current quantitative techniques for risk analysis are event-oriented and more concerned about ‘risk of something happening’. They assume that a list of events (conditions) that may take place is known, the impact of each risk on activity duration is also known and even the nature of the response to each risk is roughly known [15]. However, unknown risks are unpredictable and immeasurable (their impacts are unknown or hard to quantify). Those risks required much effort to clarify. An example of unknown risks is Internally Generated Risk - IGR [73]. As their names already reveal, IGRs originated from within the project team or organization, from rules, policies, regulations, structures, actions, behaviours or culture of the organization. IGRs have the following features: - Common, since organizational issues such as policies, processes, culture etc. are widespread in most projects of the organization. - Important, since they often have impact on more than one activity. - Not well-managed in projects, as they are unpredictable (and hardly put in documents or risk registers) and hard to quantify. 1.3.4. Risk aspects in software project scheduling In different project management processes there are different aspects of uncertainty/risk [20]. This thesis focuses on quantitative risk management which concerns about risks affecting project schedule (or project time frame), including risks affecting project scheduling (a phase or a process in project planning). As can be deduced from the previous sections, these risks cannot be completely separated from risks of other processes or phases. In project scheduling, the most obvious risk is in duration estimation for a particular activity. Difficulty in this estimation can arise from a lack of knowledge of what is involved as well as from the uncertain consequences of potential threats or opportunities. Some sources of uncertainty: 37 - Level of available and required resources (including inexperienced or lack of training developers). - Incomplete (or often changing) requirements. - Tradeoff between resources and time. - Possible occurrence of uncertain events (especially those cause badly impact, or risks). - Challenges from technology (incompatible technology, built-in API without sufficient documentation, insufficient architecture etc.). - Causal factors and interdependencies including common causal factors that affect more than one activity (such as organizational issues). - Lack of previous experience and use of subjective instead of objective data. - Incomplete or imprecise data, or lack of data. - Uncertainty about the basis of subjective estimation (i.e. bias in estimation). 1.4. Bayesian Networks 1.4.1. Bayesian approach vs classical approach The fields of statistics and data analysis are concerned about inferring the probability of an uncertain event. The difference between the classical (also called Frequentist) style and Bayesian approach is summarised in Table 1.3. Table 1.3. The differences between Bayesian and Frequentist approaches Criteria Bayesian Frequentist Parameters/ Variables Uncertain Random Probability Degree of belief Physical property (subjective) (objective) Inference Bayes’ theorem Confidence interval Judgement Depends on the person’s (subjective) opinions or beliefs A fact, independent on the analyst’s opinions or beliefs 38 Criteria Bayesian Frequentist Samples/ Observations Any number of samples or observations Large enough number of samples or observations The fact about risks is that most uncertain events do not have much historical data associated with them. The analyst does not have much data, although he or she may have certain opinion or belief (prior probability). In other words, even where relevant historical data does exist it must still usually be informed by subjective judgements before it can be used for measuring uncertainty. Moreover, the amount of real-life software project data collected (samples/ observations) may be also limited. Therefore, we cannot rely on the classical approach to measure uncertainty, and Bayesian approach is the most suitable for risk analysis in software projects. The Bayesian approach can also provide a rational way of revising our beliefs in the light of new information (i.e. evidence) which will be explained in the next section. 1.4.2. Probabilistic approach using Bayesian Networks Bayesian Network (BN, or also known as Bayesian Belief Network, Causal Probabilistic Networks, Probabilistic Cause-Effect Models, and Probabilistic Influence Diagrams) is a special type of graphs that associated together with a set of probability tables. BN models causal relationships of a system or dataset and provides a graphical representation of this causal structure through the use of directed acyclic graphs (DAGs) with nodes and edges. The DAG representation provides a framework for inference and prediction. The nodes represent random variables with probability distributions, while edges represent weighted causal relationships between the nodes. Each node has a probability of having a certain value (a finite set of mutually exclusive states). A directed edge exists from a parent to a child. Each child node A has a conditional probability table P(A|B1,…,Bn) based on its parental values B1,…,Bn. If the node has no parents, then the table becomes the unconditional probabilities P(A) (i.e. prior probability). BN is based on Bayes’ Theorem, with the well-known formula presenting the joint probabilities: P(R|S) = 𝑃(𝑅,𝑆) 𝑃(𝑆) (1.1) It follows to be expressed in the basic form of Bayes’ rule as [23]: 39 P(R|S) = 𝑃(𝑆|𝑅)𝑃(𝑅) 𝑃(𝑆) (1.2) The above Bayes rule is interpreted in terms of updating the belief (posterior probability of each possible state of a variable, that is, the state probabilities after considering all the available evidence) about a hypothesis R in the light of new evidence S. So, the posterior belief P(R/S) is calculated by multiplying the prior belief P(R) by the likelihood P(S/R) that S will occur if R is true (see more about updating probability in Section 1.4.3). We can re-arrange the formula for conditional probability to get the following formula in form of product rule: P(R,S) = P(R|S)*P(S) (1.3) We can extend the above product rule for three variables: P(A,B,C) = P(A|B,C)*P(B,C) = P(A|B,C)*P(B|C)*P(C) (1.4) And it follows the generalized formula to n variables that: P(A1,A2,…,An) = P(A1|A2, … ,An)*P(A2|A3, … ,An)*…*P(An-1|An)*P(An) (1.5) Formulas 1.4 and 1.5 are often referred to as the “Chain Rule”, which says in a BN the full joint probability distribution is the product of all conditional probabilities specified in the BN. These formulas are important ones considering BN since they provide means of calculating the full joint probability distribution in BNs [5]. Many of the variables Ai will be conditionally independent which means that the formula can be simplified as shown. BN allows an injection of probability distributions associated with individual nodes. The initial probability distributions can be simply based on “expert opinions”, survey or other mathematical methods, i.e., BN approach is consisted of expert opinions and mathematical calculations. A BN consists of two parts: 1) qualitative part represents the relationships among variables by a directed acyclic graph, and 2) quantitative part specifies the probability distributions associated with every node of the model. The Figure 1.3 shows a BN representing a simple case about the relationship between sub-contract, (team) staff quality and the possibility of delay in a task [20]. In the BN in Figure 1.3, the qualitative part consists of three nodes (represent uncertain variables) and two edges. Each node has a set of states. For example, the node Staff Quality has two states: “Good” and “Poor”. Another part of the directed graph – the edges – represents influential relationships between variables. For 40 instance, an observed event on Sub-contract or/and Staff Quality may lead to Delay in Task. For the quantitative part: there is probability table associated with each node, providing the probabilities of each state of the variable. For nodes without parents (i.e., prior nodes), the associated table are not conditioned on the other variables and are called prior probabilities or prior distributions that represent prior belief. For example, for the node Staff Quality, P(“Good”) = 0.7 and P(“Poor”) = 0.3. For a node with parents, the probability table has conditional probabilities for each combination of the parents’ states (for example, see the table for the node Delay in Task in the Figure 1.3). Figure 1.3. An example of BN which represents a simple case 1.4.3. Bayesian Inference Bayesian inference is based on a conceptually simple collection of ideas. We are uncertain about the quantity of a parameter. We can quantify our uncertainties as subjective probabilities for the parameter (prior probability), and also conditional probabilities for observations we might make given the true value of the parameter (likelihood function). When data arrives, Bayes’ theorem tells us how to move from our prior probabilities to the new conditional probabilities for the parameter (posterior distribution) [74]. For example, in the Figure 1.3, a project manager is analyzing the cause of delay in a particular task in a project. A part of the task is done by a sub-contractor. Based on previous experience and the good reputation of the sub-contractor, the project manager believes that the chance of delivering the sub-contract on time is 95 percent. There is an 80 percent chance of delay in the 41 task if the sub-contractor fails to deliver on time. Even if the sub-contractor delivers on time, there is still 10 percent chance that the task is over scheduled (as a result of other internal reasons). If the task is actually late, what is the probability that the sub-contractor had failed to deliver on time? Before knowing about this particular task, subjective estimation (e.g. subcontractor’s reputation) reflects the prior probability of having the sub-contract delivered on time (SC): ̅̅̅) = 0.05. P(SC) = 0.95, therefore P(SC The likelihood function is the conditional probability of delay in task in the task given the actual state of sub-contract delivery: ̅̅̅̅̅̅̅|SC) = 0.9. P(Delay|SC) = 0.1 hence P(Delay ̅̅̅̅̅̅̅|SC ̅̅̅) = 0.8 and hence P(Delay ̅̅̅) = 0.2. P(Delay|SC Using Bayes’ rule (Formula 1.2) to update the probability, the posterior probability, or the chance of sub-contract being delivered on time given the task is late, is: P(SC|Delay) = P(Delay|SC)∗P(SC) ̅̅̅̅)∗P(SC ̅̅̅̅) P(P(Delay|SC)∗P(SC) +P(Delay|SC = 0.1∗0.95 0.1∗0.95 + 0.8∗0.05 ≈ 0.70. So the prior probability of 95 percent is revised to 70 percent as a result of the evidence of a delay in the task. Bayesian inference works simply well when there are only two variables involved. It would become much more complex when several variables with several states are involved and a complex set of conditional dependencies exists between them. To overcome this problem, BNs will be built up. 1.4.4. Bayesian Networks and project risk management BNs are a rigorous, normative method for modelling uncertainty and causality which are already used for risk assessment in domains such as medicine and finance, as well as critical systems generally [75]. Therefore, BNs are highly suitable in the area of project risk analysis, with the following key benefits [15]: - BNs provide a rigorous method to make formal use of subjective information. BNs provide a visual and formal mechanism for observing and testing subjective probabilities. This is a particularly attractive feature in project risk analysis, as in most cases the only practical choice is the use of subjective judgments. 42 - BNs explicitly quantify uncertainty. Their causal framework provides a useful and unambiguous approach for analyzing risk. This is in stark contrast with the probability impact approach (as discussed in Section 1.4.2) where none of the concepts has a clear unambiguous interpretation. - Parameter learning. The probabilistic inference capability of BNs leads to updating the posterior probability distribution in the light of observed values (i.e. evidence). This specially offers a mechanism for updating the belief about unknown factors, which are very difficult to measure and were assessed subjectively before (see Section 1.4.2). - Complex sensitivity analysis. BNs are capable of reasoning from effect to cause as well as cause to effect. This can answer a wide range of ‘what-if?’ questions and offer a complex sensitivity analysis when several variables change simultaneously. - Make predictions with incomplete data. BNs provide an ideal approach for modelling uncertainty in projects; however, they are rarely used in project risk analysis. The first efforts to apply BNs in project scheduling were conducted by McCabe [76] and Nasir et al. [77]. They developed a BN to model the relationship between major risk variables that affect duration of activities in a construction project. They identified ten risk categories specific to building construction schedules (e.g. environment, geotechnical, owner, labor, design, area, contractor, political, non-labor resources and material). Detailed risk variables (in total 70 risks) in each category were identified. Eight activity groups were identified to represent all types of activities in a construction project (e.g. mobilization, demobilization, foundation/piling, labor intensive, equipment intensive, technical/electrical, roof/external, demolition, and commissioning). In the next step, by reviewing the literature and conducting a comprehensive expert survey, the relationships between different risks and different activity types were identified and subsequently quantified. For each activity group the output of the model suggested a percent increase or decrease from the most likely duration to define the pessimistic and optimistic durations. The most likely duration of activities is assumed to be known and is used as a reference point. The result of the BN model (in the form of upper and lower limits of activities duration) was exported to a MCS model to incorporate the effect of risks on the project schedule. The BN model provided a very flexible modelling environment. It was validated with historical data from 17 case studies with very good results. However, the model had the following limitations: 43 - The model was specific to building construction projects; therefore, it cannot be applied to other industries and different type of projects. - The BN model predicted the upper and lower bounds of activity duration as percentage of the most likely duration. It assumes that the most likely duration is already known and takes it as an input to the model. - The output of the model (the upper and lower limits of activity durations) needs another approach (i.e. MCS) to calculate decision making results such as the expected project duration, the probability of delay/completion etc. - The upper and lower bounds of activity duration were restricted to a few pre-defined values. For example, on the pessimistic side the percent increase of activity duration is limited to 10, 25, 50 and 100%. - All the risk variables were binary types. Variables with more than two states could not be modelled properly. - The final BN model was overly complex. The graphical structure was unorganized and difficult to follow and understand. - Although it provided good predictive results, the most powerful feature of BNs namely diagnostic analysis (e.g. reasoning from effect to cause, learning and “what if?” type analysis) was not used. Since many techniques of engineering project management are equally applicable to software project management and technically complex engineering systems tend to suffer from the same problems as software systems, in this thesis, the author develops a BN approach to model and quantify risks/uncertainty in software project scheduling. 1.5. Chapter remarks This chapter has overviewed fundamental background on software project management, software project scheduling, and risk management. The probabilistic approach using BNs was also introduced, including Bayesian inference and how to build up BNs. Bayesian features and BNs’ benefits make them the most suitable approach for managing risks in software project scheduling. As mentioned in the Introduction section, there is now still limited research on the topic of applying BNs into risk management in software project scheduling. Therefore, together with reviewing literature about project management, project scheduling and risk management to get relevant knowledge for risk management in 44 software project scheduling, this thesis also takes into consideration of risk factors in software project scheduling as specific attributes in software projects. Those backgrounds will be used for proposed approaches and experiments in Chapter 2 and Chapter 3. Chapter 2 will consist of initial attempts of applying BNs into risk management in software project scheduling as well as experiments on common risk factors and their impacts in software project scheduling. 19 common risk factors for both traditional software development projects and agile software projects are proposed. Chapter 3, in turn, will incorporate BNs into popular software project scheduling techniques, namely CPM, PERT and agile software scheduling to enhance the predictability of schedules using those techniques. BNs are also applied in examining the relationships among risk factors proposed in Chapter 2. 45 Chapter 2. Common risk factors and experiments on Bayesian Networks and software project scheduling This chapter is about the author’s work to find out a quantitative method to better assess and analyse risks in software project scheduling (to achieve the first objective mentioned in the Introduction section). In order to achieve this objective, the chapter has to answer to following questions: what are the risks’ attributes of software project scheduling? How to manage risks in software project scheduling better? The chapter is on building tools and coming up with common risk factors in software project scheduling. Experiments are carried out to test the tools and the model of applying BNs and common risk factors. As mentioned above, risk management has become crucial in software project management since software development always involves uncertainty. The first section of this chapter aims at providing an effective mathematical model and proves that software teams can rely on the model to predict and quantify uncertainty and their impacts on the success of the project, right from the early phases of the project. From the model, an algorithm and a tool can be developed to help software teams understand and evaluate possible risks. Based on the calculation, the team can make appropriate decisions and take actions accordingly to mitigate risks, and the project manager can better keep track of the project budget and schedule. The author proposes the BRI algorithm to calculate risks and impacts in a BN model. A software tool has also been built for experiments on the proposed model and algorithm. The second section of this chapter examines a model and a probabilistic tool CKDY using BNs to evaluate risk factors in software project scheduling. 2.1. Application of Bayesian Networks into schedule risk management in software project This section is the work represented in publication 2 [PUB2]. The goal of this section is to introduce a mathematical model and algorithm (BRI) to assess values that are critical to a project by calculating their associated risks and the probability of their occurrence each with a weight factor to derive their impact. Experiments are carried out to prove that software development teams can rely on the model and the algorithm to accurately predict, calculate the risks and their impacts on the success of the project. 46 2.1.1. Common risk factors in software project management The Arizona State University at Tempe in 2000 conducted a research to develop a model that can be used to assess potential impacts of software risk factors on a software development project. They came up with a model that consisted of 24 common risk factors in software projects [78]. Hui and Liu [5] built up software to calculate the impact of these 24 risk factors to the chance of software projects’ success. Based on the software and model, they surveyed 29 IT specialists who had 5 to 25-year experience in the IT industry. Each specialist was interviewed and asked to refine the model by adjusting the associated probabilities and weights. After collecting the survey results, the research proposed the list of 24 risk factors together with associated occurring probabilities as shown in the Table 2.1. It can be seen from the Table 2.1 that although the 24 risk factors are for software project in general, they are directly have impact of projects’ duration or schedules. Therefore, the list can be used as the starting point for assessing risk factors in software project scheduling. Table 2.1. Hui and Liu’s common risk factors [9] No. Group of Issues Risk factor Probability 1 Resources Staff experience shortage 0.30 2 Resources Reliance on few key person 0.75 3 Resources Schedule pressure 0.70 4 Personnel Low productivity 0.22 5 Personnel Lack of staff commitment 0.20 6 Customer Lack of client support 0.35 7 Customer Lack of contact person competence 0.15 8 Research data Lack of quantitative historical data 0.50 9 Research data Inaccurate cost estimating 0.50 10 System Large and complex external interface 0.40 11 System Large and complex project 0.45 12 System Unnecessary features 0.30 47 No. Group of Issues Risk factor Probability 13 System Creeping user requirement 0.75 14 System Unreliable subproject delivery 0.45 15 Management Incapable project management 0.58 16 Management Lack of senior management commitment 0.50 17 Management Lack of organization maturity 0.25 18 Technology Immature technology 0.46 19 Technology Inadequate configuration control 0.45 20 Technology Excessive paperwork 0.3 21 Technology Inaccurate metrics 0.5 22 Technology Excessive reliance on a single process 0.5 23 Experience Lack of experience with project environment 0.625 24 Experience Lack of experience with project software 0.42 2.1.2. Bayesian Networks of risk factors From the list of 24 software risk factors above (in Section 2.1.1), we have built sub BNs (Figures 2.1 to 2.24 demonstrate the BNs associated 24 risk factors) and overall BNs (Figure 2.25) for risk modelling in software projects. The BNs also show the risk factors and its impacts and effects in three weight levels (+ means level ONE, ++ means level TWO, and +++ means level THREE respectively. + is lighter than ++ and ++ is lighter than +++) which will be described in the calculation in Section 2.1.3. For example, in Figure 2.1 the risk factor “Staff experience shortage” has one level of impact weight to staff_training and one level of impact weight to untrained_staff; staff_training has one level of impact weight to project_schedule. This project_schedule effect is also related to risk factor “Low productivity” (Figure 2.2), risk factor “Lack of senior management commitment” (Figure 2.6) and risk 48 factor “Inadequate configuration control” (Figure 2.7). Of course, this is also related to the risk factor “Schedule pressure”. staff_experience_shortage +staff_training +untrained_staff +project_schedule Figure 2.1. A sub BN for the risk factor “Staff experience shortage” As can be seen from Figure 2.3, the risk factor “Lack of client support” is related to the risk factor “Creeping user requirements” and that it has potential impact on software project schedule. Figure 2.2. A sub BN for the risk factor “Low productivity” 49 +defect_rate ++lack_of_client_input +lack_of_staff_commitment lack_of_client_support ++missed_requirement +creeping_user_requirements Figure 2.3. A sub BN for the risk factor “Lack of client support” inaccurate_cost_estimating +staff_experience_ shortage ++schedule_pressure Figure 2.4. A sub BN for the risk factor “Inaccurate cost estimating” Figure 2.4 is an another example of the sub BN of the risk factors affecting the software project schedule. This sub BN related to risk factor number 16, “Lack of organization maturity”, risk factor number 9 “Inaccurate cost estimating” and 50 eventually risk factor 3 “Schedule pressure” in Table 2.1. Risk factors “Inaccurate cost estimating” and “Schedule pressure” are also related to risk factor “Inaccurate metrics” (Figure 2.8) and risk factor “Excessive reliance on a single process” (Figure 2.9). Figure 2.5 shows the sub BN of the risk factor “Incapable project management” which stated in literature to have high level of impact on project schedule. The experiments in this thesis will also confirm that. The risk factor relates to “Lack of senior management commitment” (which will also a common risk factor in all the lists examined in this thesis) and “Creeping user requirement”. According to Hui and Liu [5], these three risk factors all have high probability of affecting software projects. Figure 2.5. A sub BN for the risk factor “Incapable project management” 51 +staff_experience_ shortage +low_moral lack_of_senior_management _commitment +schedule_pressure ++project_schedule Figure 2.6. A sub BN for the risk factor “Lack of senior management commitment” +rework +productivity ++defect_rate inadequate_configuration_ control +manual_efforts +project_schedule Figure 2.7. A sub BN for the risk factor “Inadequate configuration control” 52 +schedule_pressure ++inaccurate_reporting ++inaccurate_cost_ estimating inaccurate_metrics Figure 2.8. A sub BN for the risk factor “Inaccurate metrics” +inaccurate_cost _estimating +schedule_pressure excessive_reliance_on_a_sing le_process_improvement +defect_rate Figure 2.9. A sub BN for risk factor “Excessive reliance on a single process improvement” 53 For further details on the sub BNs, please see Appendix. Sub Bayesian Networks of the 24 risk factors. Figure 2.10 shows the BN of overall model in software projects. Figure 2.10. The overall BN for software risk factors 2.1.3. Risk impact calculation As discussed in Section 1.4.2, BNs allow us to associate probability distribution with each individual node. The initial probability distributions can be based on expert opinions, surveys, or mathematical methods. The derived probabilities can be calculated by Bayes rule, chain rules (as mentioned in Section 1.4.2) and Bayesian inference (as described in Section 1.4.3). We apply the following characteristics of BNs to calculate the impacts of risks in software projects [37]: - Expression of expert opinions, experiences or beliefs about the dependencies between different factors. - Consistent propagation of the impact of uncertain evidence on the probabilities of outcomes. 54 - Calculation and revised calculation of probability when the evidence is known. Figure 2.11 illustrates how the above characteristics are applied. The Figure shows that the events x, y, and z are dependent of each other, x is independent of z with the condition y. Figure 2.11. A simple example of Bayesian inference - Expert opinions, experiences, beliefs: z impacts y, and y impacts x. - Propagation of the impact of evidence: If we know that the probability of z happen is P(z) = 0.9, the condition probability of y given z happen is P(y|z) = 0.7 and the condition probability of x given both y and z happen P(x|y,z) = 0.6. Then by applying the chain rule, we can calculate that the probability P(x). - First we calculate P(y): P( y) P( yz i ) P( y | zi ) P( z ) Since: ___ P( yz i ) P( yz ) P( y z ) Therefore: ___ ___ P ( y ) P( y | z ) P ( z ) P( y | z ) P( y ) __ Assume P( y | z ) 0.5 , then: P(y) = 0.7x0.9 + 0.5x0.1 = 0.68 - Now we can calculate P(x): 55 P(x)= P( x, y z) P( x | y z)P( yz) i i Since: __ P( x, y z) P( xyx) P( x y z) i Therefore: __ __ P(x)=P(x|yz)P(y)+P(x| y z ) P( y ) __ Assume P( x | y z ) 0.5 , then: P(x) = 0.6x0.68 + 0.5x0.32 = 0.568. Given Figure 2.26, the formula to calculate p(x|z): p x | z P( x | z yi ) P( yi | z ) yi P ( x | yi ) P ( yi | z ) (2.1) yi Bayes Theorem has a very important property that we can calculate revised parent probability when we know that the child is true. Recall formula 1.2 that: P(x|y) = P(y|x)*P(x)/P(y). - Revised probability of y being true: P(y|x) = (P(x|y)*P(y))/P(x) = 0.6 * 0.68 / 0.568 = 0.7183. - Revised probability of z being true: P(z|y) = (P(y|z)*P(z))/P(y) = 0.7 * 0.9 / 0.68 = 0.9265. Based on the above model, we looked at an algorithm that calculates the impact of risk factors, allowing project managers to estimate and make appropriate decisions for the team development, aiming to bring the software project completed on time. 56 Figure 2.12. The three nodes of a simple-chain BN Which is in Figure 2.12: x: the examined risk; y: the risk directly generated from the examined risk; z: the risk generated in the condition that the two previous risks occurred. P(y|x), P(z|xy): possibilities of risks when the conditions are true (in three weight levels: + (low) (p=0.3), ++ (medium) (p=0.6), +++(high) (p=0.9)). 2.1.4. Bayesian Risk Impact algorithm We propose the algorithm BRI (Bayes Risk-Impact) to assess the impact of risk factors. * Input: Risk factors and probability (Table 2.1) * Output: ImpactWeight(examined_risk) - the degree of the impact of the risk factor on the fulfillment of a software project in the form of a vector of numerical values. The higher the value, the greater the impact. The algorithm BRI assesses the impact of risk factors: Step 1. Based on known probabilities, calculate the possibilities of child nodes in each sub BN. Step 2. With each child node, recursively find ImpactWeight(child_node). Find Bayesian networks started by the examined child node in the original BN. Calculated ImpactWeight(child_node) with the probability calculated in Step 1. If not found, ImpactWeight(child_node) = P(child_node). Step 3. ImpactWeight(examined_risk) = ∑ImpactWeight(child_node). Step 4. Sum up together ImpactWeight(examined_risk) into impact vector. Step 5. Repeat to examine the next risk. 57 Each risk factor in the BN might have child nodes in sub BNs. In the beginning, we have known probabilities (or prior probabilities). Based on Bayes Theorem and Bayes inference, child nodes’ probabilities could be calculated. Each child node, in turn, might belongs to one or more sub BNs. Its ImpactWeight value is initiated as its probabilities, and is summed up in each BN associated with it. The ImpactWeight value of examined risk is set as the sum of all the ImpactWeights of its child nodes. For example, assume that the risk factor 14 “incapaple project management” is examined. As could be seen in Figure 2.5, the node “incapaple project management” has three child nodes. The child nodes associated in other sub BNs and in the figure they are related to risk factors 13 (“creeping user requirement”) and 16 (“lack of senior management commitment”, associated with another BN in Figure 2.6) which are also examined in some steps of the BRI algorithm. As a result, the ImpactWeight of “incapaple project management” will be higher than the ImpactWeights of “creeping user requirement” and “lack of senior management commitment”. 2.1.5. Tool and experiments a) Building tool The purpose of the tool is to stimulate the above model and algorithm, helping managers assess the level of impact of the risks on the ability to complete a software project when the probability of occurrence of the risks is known in advance. The software is built in C# programming language, with MS.NET Framework 4.5 library and the integrated developer environment (IDE) is Visual Studio 2012. The graphical interface of the software is shown in Figure 2.13. To use this tool, it is only required to input the initial probabilities of the risk factors or the tool can simply accept the default probabilities that were established in the tool based on research results. The tool then, will automatically calculate the impacts weight level in terms of a numeric value. At the different phases of a software project, managers and project teams can assess the risks and their probabilities of occurrence, as well as making decisions in 58 the planning and management tasks. Through a comparison of the metrics, the project team will make decisions to the software project. Figure 2.13. The graphical interface of the tool b) Experiments The sample data set is the data of 2 real software projects and the results of the research of Hui and Liu [9] which is shown in Table 2.1. Test results (calculation of impact levels of risk factors) with the data of the research of Hui and Liu [9] (the risk factors and the initial probabilities of the risk 59 factors are shown in Table 2.1) are summarized in Figure 2.14. The results show that the two factors have the highest impacts are “incapable project management” and “lack of client support”. This result fits with the fact that the sub-BNs of these risk factors are of the most complex ones which related to some other key risk factors (Figure 2.3 and Figure 2.5). Figure 2.14. Result of experiment 1 With the two real-life projects, the project managers and secretaries based on their practical experience in those projects to help the authors estimate the initial probability of the risk factors. Project 1 is a project about a social networking game, consisting of 8 people (1 admin, 1 tester, 5 developers and 1 designer). The project is expected to be in 4.5 months but last for 10 months. Some of the main problems encountered by the project team were the large self-built framework, many bugs generated, workloads, and slow response. Project 2 is an outsourcing software project for a Japanese telecommunications company. This project is expected to be done in 5 months with 15 stages, but in fact it lasted in 10 months like Project 1. The biggest problems encountered are identifying customer requirements (in phase 1), assessing the complexity of each module to allocate resources, and a long lead time for transfer and guidance to customer. 60 Table 2.2. Risk factors in the phases Phase Risk factor Impact (High, Medium, Low) Probability Requirements Identification Creeping user requirement High 0.5 Requirements Identification Incapable project management High 0.1 Requirements Identification Lack of client support High 0.2 Requirements Identification Staff experience shortage High 0.4 Software Design Staff experience shortage High 0.2 Software Design Immature technology Medium 0.1 Software Design Unnecessary features Medium 0.99 Experiment method: joint testing for all 3 data sets and testing in each phase of each project. The general experimental results are shown in the comparison chart in Figure 2.15. Experimental results with all three data sets show the similarity of the levels of impacts (all high) of the risk factors (e.g. incapable project management, lack of client support, or excessive reliance on a single process...). With Project 2 data: since the project is well organized from the early phases, the author would like to go into the analysis of the impact of risk factors in each phase of the project. The authors have been provided with estimates/ assessments by the project manager on possible risks in some phases as shown in Table 2.2. The project team highly considered the influence of factors incapable project management and lack of client support. 61 The experimental results of the software with the Requirements Identification phase show more clearly: for this phase, the greatest level of risk impact is incapable project management, which requires the improvement of management skills and quality; there should be clear and specific commitments from project managers; do not let errors occur while making management requests and accurately identify user requirements. For Software Design phase, the biggest risk impact is from immature technology (Figure 2.16), this requires a strategy to avoid risks when handling methods; minimize the risk of schedule pressure. According to the project manager, this result is consistent with what happened in Project 2. Figure 2.15. Results of the three experiments Based on the vector of the influence level of risks in each project period and the overall project, it is possible to realize that the management's risk affects the ability to complete the project on time as well as project success. The bigger the project, the more complex it is and the more risk factors that need the higher skills level, experience, and capabilities of the project manager. Other important factors are lack of commitment and support from clients, incapable project management, and excessive reliance on a single process... This result confirms the need for supportive tools for the project team to estimate, evaluate and promptly adjust. Project 62 managers need to pay more attention to the risk factors that have the greatest impact on the project to ensure the project is carried out smoothly. Figure 2.16. Experimental results for Software Design phase 2.1.6. Conclusion and contribution The tool and experimental results show that the algorithm BRI accurately assesses the impact of risk factors on the project schedule. The algorithm would quickly help the management team to foresee impacts weight caused by their risk factors. Users of the experimental tool are only required to input the initial probabilities of the risk factors or they can simply accept the default probabilities that were established in the tool based on our research results. Project managers need to pay more attention to the risk factor that causes highest impacts in order to keep their software development projects from falling into troubles, especially time, cost and quality ones. The proposed models, algorithms and tools, in addition to quantifying risks and their consequences, can also help identify problems and potential risks at the first phase of the project – project scheduling and project planning. The authors also assert that if we can identify and control issues from the early phases of the project, we can significantly increase the likelihood of the project's success. Although the BN model generally provides an accurate picture of the risks of typical software projects at an early phase, it still needs further development especially when using it for other specific industries, or at later phases of software development projects. 63 Therefore, further research on this issue can focus on implementing the application of BN technique in modelling risks in project scheduling by incorporating BNs with different project scheduling techniques (CPM, PERT, simulation ...), then evaluate and make better recommendations to the project team. The author would also collect and run the algorithm with more empirical data sets, so that there is better evaluation and analysis for the algorithm. An expert survey will also be carried out so that, together with the test results of the tool, a list of risk factors that best suits each type of software developed as well as each method of software development. The authors will also study more closely and integrate more sources of risk into the scheduling process and how to deal with other types of unforeseen factors (such as unknown unknowns). The final development of the project is the management of actions and decision making support for project managers when the project has a scheduling problem (some phase is delayed or the whole project is delayed) by evaluating multiple schedule scenarios right during the project scheduling process. 2.2. Experiments on common risk factors This section is the work whose results were represented in publication 3 [PUB3]. In reality, all the phases of the software development life cycle (SDLC) are potential sources of uncertainty since they have to deal with hardware, software, technology, people, cost, and processes. Current state-of-the-art scheduling techniques based on the assumption that every task, activity or phase of the project is carried out exactly as it is planned, which almost never happens in real-life projects. Recent research on risk management focuses on the relationships between uncertainty (risk factors) and the outcomes of a project. This section examines a model and a probabilistic tool CKDY using Bayesian Belief Network to evaluate risk factors in software project scheduling. 2.2.1. Discovering the top ranked risk factors We apply the method proposed by Kumar and Yadav [23] including 4 steps: (1) Selecting the top ranked risk factors in software project scheduling; (2) Constructing causal relationships among the software risk factors; (3) Constructing the node probability table (NPT) for each node (factor) of the model; (4) Calculating the probability value of software risk factors for the project. In each step, we choose the right solution to put into the building of the tool CKDY. The following is a detailed description of the options. 64 a) Selecting the top ranked risk factors The risk factors in software project scheduling depend on various software aspects such as project size, budget, human resources etc. There are certain sources of useful information for identifying risk factors such as previous research and analysis, historical data and lessons learned, system safety and reliability analysis, expert interviews etc. A number of software risk prediction and risk assessment models using software risk factors has been proposed [33, 79, 80]. Most of the existing models evaluate a number of risk factors, although some risk factors are not suitable for some types of projects, or less important. However, assessment and estimation of software risk by taking all the risk factors have some drawbacks like: computationally complex and more expensive processing cost. Selecting the most important software risk factors that affect an entire project or each phase of the project could increase the accuracy of the risk prediction and risk assessment. We synthesize from a number of published risk factors such as SEI risk classification [81], NASA NPD2820 risk classification [82], along with research results (24 risk factors) of Hui and Liu [5], the selected 27 risk factors by Kumar and Yadav [23], to select a set of risk factors in software project scheduling to test the method proposed by Kumar and Yadav [23]. The set of 5 top ranked risk factors in software project scheduling is shown in Table 2.2. These risk factors have consequences on the software project and eventually lead to the project over scheduled. Table 2.3. Risk factors, consequences and impact Component Sub component Risk Factors Poor management skills and experience Pressure on the schedule Frequent changes in customer requirements Inappropriate process Inappropriate technology Consequences Incomplete mission Wasted resources Reliability Impact Over scheduled 65 b) Constructing causal relationships among the software risk factors As mentioned in Section 1.4, causal relationships among the nodes in a BN can be constructed from historical data, experimental observation and with the help of domain expert (expert opinions). In this case, constructing causal relationships among risk factors is the modelling process of causal relationships among Risk Factors, Consequences and eventual Impact with the help of domain experts. Sub BNs of risk factors and consequences are illustrated in Figure 2.17 and Figure 2.18, and the overall BN model is shown in Figure 2.19. Figure 2.17. Sub BN 1 Figure 2.18. Sub BN 2 c) Constructing the node probability table (NPT) for each node of the model Designing the NPT data is one of the fundamental issues associated with a BN. Constructing the NPT for each node of the model requires project data. The indispensable factor in applying BN to project management is the evaluation and judgment from experts, which help the project manager easily builds the model as well as constructing NPTs. We chose to use the built-in PSPLIB (Project 66 Scheduling Problem Library) data set5 to construct NPTs for the model. Table 2.4 below is an example of probability of risk factors. Figure 2.19. The overall BN model Table 2.4. Examples of risk factors and probabilities Risk factor Probability (if it happens) Probability (if it does not happen) Poor management skills and experience 0.575 0.425 Pressure on the schedule 0.6179 0.3821 Frequent changes in customer requirements 0.626 0.374 Inappropriate process 0.611 0.389 Inappropriate technology 0.55 0.45 d) Calculating the probability value of software risk factors for the project Appling Bayes formulas to calculate the probability of each node, and finally the probability of the success of the project. To make the calculations easier, a support 5 PSPLIB http://www.om-db.wi.tum.de/psplib/ 67 library - HuginExpert6 – is used. In addition to predicting the probability of failure of risk management in project scheduling, we also recommend a risk management sequence, which helps managers to clearly see where the problem in the project is, and which issues to solve first, which issues to solve later, as well as prioritizing resources. The essence of the risk management sequence is to monitor the project schedule through each phase, as well as giving timely alerts to the project managers when the risk factors or consequences exceed the permissible threshold resulting in an impact to the whole schedule. 2.2.2. Tool CKDY a) Building tool The tool CKDY7 is developed which inherits Hugin classes, functions and APIs that provide Bayes analysis and prediction solutions for Java programming language (ParseListener, Domain, Compiler...). The functions of the tool are shown in Figure 2.13 including: Calculating, predicting probability of risks in the project phases; giving warnings to managers after each phase; ranking risk factors; provide a visual graph of probability variation for each period. According to the IEEE Standard 1540 (2001) [83], the process of managing risks consists of the following activities: a) Plan and implement risk management; b) Manage the project risk profile; c) Perform risk analysis; d) Perform risk monitoring; e) Perform risk treatment; f) Evaluate the risk management process. Based on these guidelines, the author proposes the process of handling risks using BNs in software projects that consists of 4 following steps. This process is implemented into the CKDY tool. Step 1 - Initialize the BNs (for carrying out “Plan and implement risk management”): Based on the common BN model, suitable specific risk factors to each project are put in. In this step, we identify nodes needed to be monitored and make assumptions on the status of each node. Step 2 - Calculate and make predictions from the BNs (for “Manage the project risk profile”): When the project is started, it also starts a loop to monitor the status of nodes. Whenever new data is updated they are added to the BNs to calculate and update the probabilities and estimations. The data history of each week of the project schedule is archived for easily referencing later. 6 7 HuginExpert http://www.hugin.com/ CKDY tool and data samples: http://bit.ly/2r4MsWb 68 Step 3 - Monitor and analyse risks, adjust resources (for “Perform risk analysis”, “Perform risk monitoring”). Step 3.1 - Monitoring and analyzing risks: In the general BN model, we have related nodes and directly affect the success of project scheduling such as “Incomplete mission”, “Wasted Resources”, “Reliability”. These nodes will be monitored for a certain period of time (depending on the project resources and the accuracy that the project manager will specify this time period). If the probability that these nodes occur or their children nodes occur above the threshold, the tool allows the Tracing function to be called to determine the cause. Step 3.2 - Adjust project resources: As we all know software activities/tasks always have a saturation point. The more resources we spend on tasks, the more cost it will take, but the quality will not improve significantly or the performance will decrease. Therefore, the Saturation function is called periodically (weekly or longer depending on the decision of the project manager) to check whether the monitored nodes have reached the saturation point or not from which appropriate decisions are made. The Ranking function is also called in cycles to sort the efficiency of the nodes: If the efficiency level of the resource has quantitative data, it can be easily ranked (according to the operation cost for each node). If there is no quantitative data for those nodes, then we will use the BN model for the efficiency evaluation. When a node is ranked higher, we will reallocate resources to increase the effectiveness of the project. Step 4 - Perform risk treatment: Based on the analysis as well as data on project risks, the project manager will choose appropriate measures to handle risks. b) The input structure The tool’s input file is a ".net" file created by the tool HuginExpert. The tool HuginExpert is based on Bayesian networks and influence diagram technology. It is used to build a causal relationship diagrams among nodes and provide NPT for each node of the model based on available data sets and experts’ judgment. c) Flow of processing The processing flow of the tool is to read the input data set from Hugin standarddesigned files. From the imported data, the library provided by Hugin will be used to calculate the probability for the network nodes. The processing of the input data determines the calculation results of the tool. 69 Once the calculation results are available, they will be printed on the screen so that managers can see the probability of each item. Based on the results, the project managers can predict, revise, add, replace some resources to meet the requirements of the project scheduling. In addition, the tool also allows scheduling up to seven project phases. This allow project managers to look into more details of the planning and scheduling process to find out issues that need to be addressed in the entire process. The tool suggests the project manager weekly warnings about network nodes exceeding the safe threshold on allowed time. This helps project managers easily manage their job. The basic Ranking and Saturation functions aim at suggesting warnings for over the threshold. d) Data sample The necessary data is the probability of selected risk factors in Table 2.3 (Poor management skills and experience, Pressure on the schedule, Frequent changes in customer requirements, Inappropriate process, Inappropriate technology). The PSPLIB (Project Scheduling Problem Library) dataset is used as a set of standard cases for evaluation solutions to single or multi-mode resource constrained project scheduling problems. RESCON software (RESource CONstrained)8 is used to display files in an intuitive interface. RESCON, developed by Katholieke Leuven University (Belgium), is free and open source software for researches of constrained project scheduling problems. 2.2.3. Experiments and analysis a) Testing the model Two data sets from PSPLIB were used. Each set has 7 corresponding files for 7 phases allowed in the design of the tool. For each data set there will be a test scenario (with 7 phases). Using RESCON models the files in the PSPLIB data set. Figure 2.20 is an example with the file j301_1.rcp in PSPLIB. For example, j30 in the first dataset, in the early start schedule, the scheduling task often violates the resource constraints since it is only aware to the earliest start time and the activities’ precedence. For example, Resource 1 uses up to 21 units while it can only take 12 resource units (vertical axis); or as Resource 2 uses up to 25 units while it can only take 13 resource units. In Figure 2.20, RESCON shows the limit on the number of resources bound by the red line. 8 RESCON http://feb.kuleuven.be/rescon/ 70 Figure 2.20. Experiment with j30 with the early start schedule Figure 2.21. Activity joint in the file j301_1.rcp The tool calculates the probability of each risk factor in each phase, as well as the probability of being behind schedule in each phase (see Figure 2.22). Based on probabilities and thresholds, the tool also alerts the consequences in 6 levels for each phase and each risk factor. Based on these parameters, the project manager can 71 consider a reasonable allocation of resources, to meet the schedule requirements in each project phase. Probability of risk factors in the whole project can be calculated by taking the average of the probability in the phases (Table 2.5 and Table 2.6). Figure 2.22. Diagram of probabilities of finishing phase by phase Tracking each phase of each test scenario we find out that if there are issues (risks) in the very first tasks, then the probability of failure of project scheduling increases gradually. If the project team (project manager) does not have any interventions, then the probability would increase beyond the allowed level as well as directly affecting the whole project. Table 2.5. Probability of risk factors in the whole project with data set 1 Risk factor Probability (if it happens) Probability (if it does not happen) Poor management skills and experience 0.505 0.495 Pressure on the schedule 0.7536 0.2464 Frequent changes in customer requirements 0.643 0.357 Inappropriate process 0.6625 0.3375 Inappropriate technology 0.666 0.334 72 Table 2.6. Probability of risk factors in the whole project with data set 2 Risk factor Probability (if it happens) Probability (if it does not happen) Poor management skills and experience 0.575 0.425 Pressure on the schedule 0.6179 0.3821 Frequent changes in customer requirements 0.626 0.374 Inappropriate process 0.611 0.389 Inappropriate technology 0.55 0.45 b) Remarks on the model and the tool The model proposed by Kumar and Yadav [23] is for risk management for the entire software project, but it also shows the effectiveness in applying for project scheduling since it takes advantage of BNs to predict the probability of project failure at each point of time, based on the highest ranked risk factors during project scheduling. The tool and experiments shows that nodes can be monitored and given warnings when they reach saturation status, as well as ranking the effectiveness of the nodes from which to provide supportive information for reallocating resources, or can help project managers always keep track of the probability of project failure within the allowed limit (in the experiments in this research, the limit is set at 0.5). The CKDY tool has also been compared with Microsoft's MSBNx software9 for building and computing based on BNs. MSBNx has been developed since 2001 and has been used in many experiments. The tool CKDY is compared with MSBNx in the proposed BNs model and input data of risk factors’ probability. The two software calculate the probability of consequences and eventually the impact (the probability of schedule failure). The results show the similarity in the evaluation of consequences and impacts according to the proposed research model. For example, with the probability set of risk factors shown in Table 2.7, we have Table 2.8 comparing the results of the two tools. 9 MSBNx https://msbnx.azurewebsites.net/ 73 Table 2.7. Probability of the experimental risk factors to compare with MSBNx Risk factor Probability (if it happens, %) Probability (if it does not happen, %) Poor management skills and experience 18.39 81.61 Pressure on the schedule 23.04 76.96 Frequent changes in customer requirements 14.46 85.54 Inappropriate process 13.93 86.07 Inappropriate technology 12.32 87.68 Table 2.8. CKDY compared with MSBNx MSBNx(%) Consequences Impact CKDY(%) Incomplete mission 9.3 9.26 Wasted resources 12.2 12.25 Reliability 11.3 12.47 Over scheduled 9.2 9.18 2.2.4. Conclusion and contribution The section has applied BNs into the risk management model in software projects in the early phase of planning - software project scheduling. By means of literature review, the study selected a set of risk factors that affect the project scheduling process. The tool CKDY using this set of risk factors has shown high accuracy and reliability. As the first objective of the thesis, in this section, the proposed model tries to give an accurate picture of risks in software projects at an early stage as well as helping project managers control risks early in the software project life cycle. To further develop the model, the author would continue to analyse and review the set of risk factors for the software project as well as each phase of the project. Modelling and quantifying risks at later phases of software projects will also be considered. Another relevant research direction is to consider integrating 74 probabilistic models into common software project scheduling techniques (CPM, PERT, Monte Carlo simulation, etc.). The tool CKDY is still in the experimental research stage so it is still difficult to use for non-professionals as well as there are still some limitations of functions and features. The tool needs to be diversified in features and interfaces as well as simplifying the input so that non-professionals can use it easily. The author also needs more expert judgment to help build the input probabilities and more real-case data samples in the software industry. 2.3. Proposed common risk factors in software project scheduling An implication of Section 2.1 and Section 2.2 is that selection of most important software risk factors could improve the software risk assessment and estimation accuracy. In this section, the author proposes lists of common risk factors that are related to (as well as having impacts) on software project scheduling. Section 2.3.1 is about common risk factors in traditional software projects, and Section 2.3.2 comes up with common risk factors in agile software projects. 2.3.1. The 19 common risk factors in traditional software project Wallace et al. [28] summarized all previous studies and defined 27 software risks which are classified into 6 categories: Organizational Environment Risk (4 risks), User Risk (5 risks), Requirement Risk (4 risks), Project Complexity Risk (4 risks), Planning and Control Risk (7 risks), and Team Risk (3 risks). In Section 2.2, the author of this thesis also examined a simple model of schedule risks in software project development with 5 risk factors listed in Table 2.2. These risk factors can be considered to relate to user risks, requirement risks, team risks and planning and control risks defined by Wallace et al. [28]. In addition, Rai et al. [26] pointed out the list of 43 risk factors in Agile software projects. The 43 risk factors cover 6 categories in Agile software development. They are Development Environment Risks, Process Issue Risks, Staff Size and Experience, Technical Issue Risks, Technology Risks and Schedule Risks. In our research, only common risk factors (that the Agile software development and the traditional software development have in common) those affect software scheduling and planning are examined. These common risk factors are also considered as specific software risks which are common to many activities or tasks in a project. They can be mostly derived from development environment, process issues, staff size, and experience and schedule risks. 75 In order to identify them, the three lists of risk factors mentioned above are compared and combined (in consideration of the planning/ scheduling phase of software development). The comparison and combination were based on the risk factors’ description, even when they are not stated literally the same. For example, the risk factor Continually changing system (in 27 risks [28]) can be considered the same with Frequent changes in customer requirements (in 5 risk factors in Section 2.2) and related to Customer not certain that the functionality requested is "doable" (in 43 risk factors [26]), or Poor management skills and experience (in 5 risk factors in Section 2.2) is considered the same with Lack of management experience (in 43 risk factors [26]). The author of this research also get advises from the experts (who are working with the author in this research as well as providing real-life projects’ data) on the risk factors. Table 2.9 lists the 19 common risk factors directly or indirectly influence the possibility of a schedule success, which combined from the three lists mentioned above. Their relationships were formed based on the literature in which the risk factors were described, and based on experts’ opinions. For example, risk factor (1) Large-scale, offshore and distributed would lead to risk factors (2) Insufficient training and (19) Lack of management experience due to the size (based on modules, function points, number of staff or duration) and the complexity of a project (based on the information or the opinion of the project experts). Table 2.9. List of 19 common risk factors for software project scheduling No. Risk factors 1 Large-scale, offshore and distributed. 2 Insufficient training. 3 Excessive preparation/planning. 4 Teams are not focused. 5 Inappropriate process. 6 The best people not available for self-organizing team. 7 The skill level of people (team/developer). 8 Staff is not committed for entire duration of the project. 9 Ineffective communication. 10 Staff does not receive necessary training. 76 11 Lack of tools and methods. 12 Software tools are not used to support software planning and tracking activities. 13 Configuration management software tools are not used to control and track change activity throughout the software process. 14 Incorrect scale. 15 Inappropriate technology. 16 Level of team/developer. 17 Customer not certain that the functionality requested is "do-able". 18 Lack of commitment of superior management. 19 Lack of management experience. 2.3.2. The 19 common risk factors in agile software project As mentioned in Section 2.3.1, Rai et al. [26] pointed out the list of 43 risk factors in Agile software projects. The 43 risk factors cover 6 categories in Agile software development. They are Development Environment Risks, Process Issue Risks, Staff Size and Experience, Technical Issue Risks, Technology Risks and Schedule Risks. In this section, only risk factors that affect iteration scheduling and planning are examined. Based on the author experience and experts’ opinions, these factors can be mostly derived from development environment, process issues, staff size, and experience and schedule risks. Table 2.10. List of 5 risk factors for software project scheduling in Section 2.2 No. Risk factors 1 2 3 4 5 Poor management skills and experience. Pressure on the schedule. Frequent changes in customer requirements. Inappropriate process. Inappropriate technology. In addition, Section 2.2 also examines a simple model of schedule risks in general software project development with five risk factors extracted in Table 2.9. 77 In this section, the two lists of risk factors above are compared and combined in consideration of Agile software project features (in the similar way with the way done in Section 2.3.1). Table 2.11 list the 19 risk factors directly or indirectly influence the possibility of an iteration success which combined from the two researches above. These 19 risk factors then modeled to examine their relationships using BNs. Each risk factor is represented by a node with a node probability table (NPT). In our research, each project team or project manager played as an expert to provide the values in the NPTs (based on his/her previous experience on the project features and the team). In order to have the most common risk factors in software project development, the list of 19 risk factors in iteration scheduling (in Table 2.11) only differs to the list in Section 2.3.1 (in Table 2.9) with the risk factor 9 (Ineffective communication versus Staff doesn’t attend to daily meeting). Table 2.11. List of 19 risk factors in iteration scheduling No. Risk factors 1 Large-scale, offshore and distributed. 2 Insufficient training. 3 Excessive preparation/planning. 4 Teams are not focused. 5 Inappropriate process 6 The best people not available for self-organizing team. 7 The skill level of people (team/developer) 8 Staff is not committed for entire duration of the project. 9 Staff doesn’t attend to daily meeting 10 Staff does not receive necessary training. 11 Lack of tools and methods 12 Software tools are not used to support software planning and tracking activities. 13 Configuration management software tools are not used to control and 78 track change activity throughout the software process 14 Incorrect scale 15 Inappropriate technology 16 Level of team/developer 17 Customer not certain that the functionality requested is "do-able". 18 Lack of commitment of superior management 19 Lack of management experience 2.3.3. Conclusion and contribution The section has come up with the two lists of proposed common risk factors for both traditional software project scheduling and agile software project scheduling. In some real-life projects, software teams or practitioners may find out some other specific risk factors. However, in the models and methods proposed from now on in this thesis, the two lists will be examined. 2.4. Chapter remarks This chapter has developed an algorithm (BRI) and a tool (CKDY) to assess the impact of risk factors and thereby propose a set of risk factors in software project scheduling. The algorithm BRI is a step forwards the BN model to analyse risks in software project scheduling while the tool CKDY is built on the purpose of assessing risks’ impacts in software project scheduling, using probabilistic approach including BNs. They both assert that Bayes Theorem and BNs can be used in modelling risks as well as in quantitative analysis of risks in software projects. The development of the algorithm BRI and the tool CKDY using the probabilistic approach confirm that with BNs we always need expert’s opinions or judgment, together with mathematical calculation. This chapter focuses on using BNs to model the relationships among risk factors as well as using Bayes Theorem in the calculation of risks’ impacts on software project schedule. By doing so, the authors found out that it is also important to examine common risk factors that affect software project scheduling in order to better keep track of software schedules. Common risk factors in software project scheduling are proposed. 79 Therefore, the chapter is the beginning effort to find out a quantitative method to better assess and analyse risks in software project scheduling in terms of finding out the risks’ attributes of software project scheduling and probabilistic methods to manage risks in software project scheduling better. Project managers now have scientific methods to keep track of risks in software project scheduling and they can no longer do that by relying on their experience. There could be further improvement of the way to use this probabilistic approach in term of applying BNs in scheduling techniques as well as well as examining the relationships among risk factors. This will be discussed in Chapter 3. 80 Chapter 3. Incorporation of Bayesian Networks into software project scheduling techniques Chapter 2 was initial attempts of applying BNs into risk management in software project scheduling as well as experiments on common risk factors and their impacts in software project scheduling. 19 common risk factors for both traditional software development projects and agile software projects are proposed. This Chapter 3 is about the author’s work to find out a probabilistic method to improve well-known software project scheduling techniques, including both techniques for traditional software scheduling and agile software scheduling (the second objective mentioned in the Introduction section). This chapter focuses on incorporating Bayesian Networks into software project scheduling techniques to predict the chance of project schedule success. Section 3.1 incorporates BNs into agile software scheduling to enhance iteration scheduling. Sections 3.2 to 3.4 both incorporate BNs into scheduling techniques and applying BNs to examine 19 common risk factors in software scheduling (which proposed in section 2.3). 3.1. Applying Bayesian Networks into specific software project development This section is the work represented in publication 1 [PUB1]. In software industry nowadays, Agile software development methods have been largely adopted. Agile software development methods themselves can be considered a certain level of reducing projects risks. However, optimization of software project scheduling has always been big challenges in both practice and academia, since industrial software development is a highly complex and dynamic process. There is also a need for a probabilistic method that better model and predict uncertainty in software projects. This section proposes an enhanced method and algorithm by combining optimized agile iteration scheduling and the ability to predict and handle risks in resource-constrained contexts of Bayesian Networks. Based on the method, software was developed as a support tool for managers to control their project schedules. The tool also provides a reliable set of strategies of sequencing tasks in agile iteration scheduling. 3.1.1. Introduction As introduced in Section 1.2, traditional software development methods often characterized as predictive which focus on visioning and planning the future in full details. A predictive development team announces exactly what features are planned 81 for the entire duration of the development process. Agile methods, in contrast, are adaptive. An adaptive team would have difficulty describing what features are planned for the entire duration of the development process, but they focus on adapting to changing realities quickly. When project changes occur then the team adapt themselves to the changes as well [50, 84, 85]. Agile methods break deliverables into small iterations (this would reduce overall risk of realization of software features [56, 86]). Iterations are short time frames (time boxes) that typically last from one to four weeks. Each iteration is a full software development cycle which includes planning, requirements analysis, design, coding, unit testing, and acceptance testing when a working software is demonstrated to users and/or customers. This minimizes overall risks and allows quick adaption to changes [87, 88, 89, 90]. Adopting Agile practices and processes brings certain benefits to organizations such as quicker return on investment, higher product quality, and better customer satisfaction [91]. However, they lack a sound methodological support of planning (contrary to the traditional plan-based approaches). VersionOne’s survey [92] identified 26 principal factors and the second one was iteration planning. The survey also showed that three out of the five most important concerns (in the total of 13 most commonly cited greatest ones) about adopting agile within companies are 1) the loss of management control (36%), 2) the lack of upfront planning (33%) and 3) the lack of predictability (27%). In addition, there have been some tools and techniques for project scheduling that project managers used (Fox & Spence [93], Pollack-Johnson [94]). Szoke A. [27] presented a new approach for iteration scheduling in agile software projects. As mentioned in previous sections, Khorakadami et al. [20] also presented an improved approach to incorporate uncertainty using BNs in general project scheduling. 3.1.2. Optimized Agile iteration scheduling Scheduling problems constitute an important part of the combinatorial optimization problems. Scheduling concerns about the allocation of limited resources to tasks over time. The goal is the optimization of one or more objectives in a decision-making process [27]. Software project scheduling has to deal with the fact that resources such as human, time, technology and money are not always predetermined. Moreover, there are always risks (uncertain events which cause badly impacts) in software projects. Technical tasks are the main concepts of iteration scheduling. These tasks are fundamental working units accomplished by developers. The aim of iteration 82 scheduling is to break down selected requirements into technical tasks and to assign them to developers (and usually require some working hour realization effort that is estimated by the team). In other words, iteration scheduling aims at determining a feasible fine-grained plan for the development that schedules the implementation of selected features within an iteration [52]. Optimized (Agile) iteration scheduling problem can be derived by selecting the extreme-valued schedule from the potentially feasible alternatives. This can be considered as an optimization problem in which the resource allocation consists in assigning time intervals to the execution of the activities (realization tasks) while taking into consideration both temporal constraints (precedence between tasks) and resource constraints (resource availability) and the minimum execution time objective. Although Agile software development represents a major approach to software engineering, there is no well-established conceptual definition and sound methodological support of Agile iteration scheduling. 3.1.3. Optimization model for Agile software iteration Let R be the set of resources and the following typical properties for scheduling be interpreted on technical tasks to schedule them: Effort: wj - time estimation (in hours) is associated with each task. It is calculated by simple expert estimation (e.g. 2, 4, or 8 working hour (Wh)). Pre-assignment: aj - in some cases resource pre-assignment is applied before scheduling. It is used by the scheduler algorithm during resource allocation. Let the vector S = {S0 , S1 , … , Sn+1 } be start times for tasks’ realizations - where Sj ≥ 0: j ⋲ A and S0 = 0. The vector S is called a schedule of development. In this definition, the 0 and n + 1 are auxiliary elements to represent iteration beginning and termination, respectively. Dependencies between tasks j and j’ can be defined as: Sj − Sj′ + dj′ ≥ Pj′,j ∶ j′ , j ⋲ A (3.1) with Sj is the start time for the realization of task j, Sj′ is the start time for task j', dj′ is duration time of task j', Pj′,j presented precedence tasks, and A is the set of tasks need to be done in the iteration. Let the R i ⋲ N is a set of capacities of resources that have been assigned to the project in an Agile iteration. The effort estimation yields resource requirements 83 r_(j,i) ∈ Z for each task j and each resource i. Let S be some schedule and let t be some point in time. Then let A∗ (S, t) = {j ⋲ A|Sj ≤ t ≤ Sj + wj } be the active set of tasks being in progress at position t. The corresponding requirement for resource i ⋲ R at time t is given by ri (S, t) = ∑j⋲A∗ (S,t) rj,i .. As a consequence, the resource constraints can be treated as follows: ri (S, t) ≤ R i ∶ i⋲R (3.2) Thus, optimization problem for iteration scheduling can be formulated as follows: Minimize z = Sn+1 Subject to Sj − Sj′ + dj′ ≥ Pj′,j ri (S, t) ≤ R i ∶ j, j′ ⋲ A ∶ i⋲R Sn+1 ≤ lI with lI is the length of the iteration. (3.3) a) Solving the optimization problem for iteration scheduling The vector r indicates the available resources (developers) in the iteration. Each wj is the planned effort (duration) for technical task j–both development and defect correction. Every element of vector aj contains a reference to a resource index (aj ⋲ {1. . |r|}) which indicates resources pre-assignment to task j. The aj = 0 means that task j is not pre-assigned. Thus, the algorithm will find the best resource to its realization. Precedence between tasks can be represented by a precedence matrix where Pj,j′ = 1 means that task j precedes task j′ , otherwise Pj,j′ = 0. Both conditions Pj,j = 0 (no loop) and P is directed acyclic graph (DAG) ensures that temporal constraints are not trivially unsatisfiable. Iteration time-box is asserted by variable lI . It is used as an upper bound in resource allocation to prevent resources overloading. The result of the algorithm is a schedule matrix S where rows represent resources, and columns give an order of task execution. Thus Si,p = j means that task j is assigned to resource i at the position p. The ensure section prescribes the post-condition on the return value (S): every task j has to be assigned to exactly one resource i. 84 An algorithm for iteration scheduling was proposed by Szoke A. [27]. Based on this algorithm, an enhanced algorithm incorporated BNs will be discussed in the next section. b) Incorporating the optimization problem with BNs The input of the above algorithm is defined which are resources, planned duration for each task, task precedence and the length of each iteration. It is assumed that given the resources, the task will be done in the planned duration. However, there are always risks in real projects such as those about personnel or technology. Those uncertainties can hardly be predicted, lead to the need of a probability model which can quantify the uncertain issues as well as addressing the most important concerns. BNs are believed to be the good probabilistic approach for modelling uncertainty in projects as well as helping project managers making decisions [20]. c) Using BNs to enhance Agile iteration scheduling The authors propose the following factors added to the algorithm proposed by Szoke A. [27]: a) A matrix which represents the relationship between each task duration and assigned resources: [B]n+2,|r| = {bij ⋲ [0,1] |i = 0. . n + 1, j = 1. . |r|} (3.5) i.e. the probability for task i to be done in the time wi and allocated resource j is bij b) When a schedule is created, its probability of completion is examined. An array should be imposed to represent the weights of tasks in an Agile iteration. [M]n+2 = {mi |i = 0. . n + 1} (3.6) Each task has an impact on the schedule for the iteration. Therefore, we can have an array T of those impacts: [T]n+2 = {t i |i = 0. . n + 1} (3.7) In this research, we propose the following formula for t i : Let the set Di of resources allocated for task i in the iteration t i = min{bij | j ⋲ Di } (3.8) The probability for the schedule is done successfully: 85 p= ∑n+1 i=0 ti ∗mi (3.9) ∑n+1 i=0 mi c) The algorithm with BNs The algorithm for calculating T: Input: S, B for i = 0 to n + 1 do ti = 1 /* initiate the default t */ for j = 0 to|r| do if ∃p ⋲ P′ Si,p ! = 0 ∧ bi,j < t i then 𝑡𝑖 = 𝑏𝑖,𝑗 /* if allocated */ /* update */ 𝐞𝐧𝐝𝐢𝐟 𝐞𝐧𝐝𝐟𝐨𝐫 𝐞𝐧𝐝𝐟𝐨𝐫 𝐫𝐞𝐭𝐮𝐫𝐧 𝑇 Thus, an enhanced algorithm of the one proposed by Szoke A. [27] using BNs, named Szoke-BN and formulated as follows: Input: 𝑟𝑖 ⋲ 𝑁 , 𝑙𝐼 ⋲ 𝑁 𝑎𝑗 ⋲ 𝑁 ∶ 𝑎𝑗 ⋲ {1. . |𝑟|}, 𝑤𝑗 ⋲ 𝑅 tasks*/ 𝑃𝑗,𝑗′ ⋲ 0,1 ∧ 𝑃𝑗,𝑗 = 0 ∧ 𝑃 𝑖𝑠 𝐷𝐴𝐺 /* resources and length of each iteration*/ /* pre-assignments and durations of /* precedences */ [𝐵]𝑛+2,|𝑟| = { 𝑏𝑖,𝑗 ⋲ [0,1] , 𝑖 = 0. . 𝑛 + 1, 𝑗 = 1. . |𝑟|}/* matrix of completion probability */ [𝑀]𝑛+2 = {𝑚𝑖 |𝑖 = 0. . 𝑛 + 1} /* weights of tasks in iteration */ [𝑇]𝑛+2 = {𝑡𝑖 |𝑖 = 0. . 𝑛 + 1} /* impacts of tasks in iteration */ Ensure: 𝑆𝑖,𝑗 ⋲ 0,1 ∧ ∀𝑗∃! 𝑖𝑆𝑖,𝑗 = 1 𝑚 ⇐ 𝑙𝑒𝑛𝑔𝑡ℎ(𝑟) , 𝑛 ⇐ 𝑙𝑒𝑛𝑔𝑡ℎ(𝑑 ) 𝑺 ⇐ [0]𝑚,𝑛 /* number of resources and tasks */ /* initial set of resources */ 86 𝑟𝑙𝑖𝑠𝑡 ⇐ Ø , 𝑠𝑙𝑖𝑠𝑡 ⇐ Ø , 𝑃′ ⇐ Ø P' list */ /* initiate ‘ready list’, ‘scheduled list’ and 𝐟𝐨𝐫 𝑗 = 0 𝑡𝑜 𝑛 𝐝𝐨 𝑝𝑜𝑡 ⇐ 𝑓𝑖𝑛𝑑𝑁𝑜𝑡𝑃𝑟𝑒𝑐𝑒𝑑𝑒𝑛𝑡𝑒𝑑𝑇𝑎𝑠𝑘𝑠(𝑃) 𝑟𝑙𝑖𝑠𝑡 ⇐ 𝑝𝑜𝑡 \𝑠𝑙𝑖𝑠𝑡 𝐢𝐟 𝑟𝑙𝑖𝑠𝑡 = Ø 𝐭𝐡𝐞𝐧 𝐫𝐞𝐭𝐮𝐫𝐧 Ø /* find potential task */ /* create ready list */ /* no schedulable task */ 𝐞𝐧𝐝𝐢𝐟 𝑗 ⇐ max{𝑎𝑗 } ∶ 𝑗 ⋲ 𝑟𝑙𝑖𝑠𝑡 /* select a task */ 𝐢𝐟 𝑎𝑗 = 0 𝐭𝐡𝐞𝐧 𝑖 ⇐ 𝑠𝑒𝑙𝑒𝑐𝑡𝑀𝑖𝑛𝐿𝑜𝑎𝑑𝑒𝑑𝑅𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑎𝑛𝑑𝑀𝑎𝑥𝑃𝑟𝑜 (𝑆) /* without assignment */ 𝐞𝐥𝐬𝐞 𝑖 ⇐ 𝑎𝑗 /* with assignment */ 𝒆𝒏𝒅𝒊𝒇 𝑙 ⇐ 𝑠𝑢𝑚(𝑆𝑖,{1..𝑛} ) /* calculate load of resource 𝑖 */ 𝐢𝐟 (𝑙 + 𝑤𝑗 ) > 𝑙𝐼 𝐭𝐡𝐞𝐧 /* overloaded iteration */ 𝐫𝐞𝐭𝐮𝐫𝐧 Ø 𝐞𝐧𝐝𝐢𝐟 𝑝 ⇐ 𝑓𝑖𝑛𝑑𝑁𝑒𝑥𝑡𝑃𝑜𝑠 (𝑆, 𝑖) 𝑃′ ⇐ 𝑃′⋃{𝑝} 𝑆𝑖,𝑝 ⇐ 𝑗 /* the next task */ * add the index of the next task into 𝑃′ */ /* assigned task 𝑗 with resource 𝑖 at position 𝑝 */ 𝑠𝑙𝑖𝑠𝑡 ⇐ 𝑠𝑙𝑖𝑠𝑡⋃{𝑗} 𝑃{1..𝑛},𝑗 = 0 /* add task 𝑗 into slist */ /* delete precedence related to scheduled task */ 𝐞𝐧𝐝𝐟𝐨𝐫 𝑇 ⇐ 𝑐𝑜𝑚𝑝𝑢𝑡𝑖𝑛𝑔𝑓𝑟𝑜𝑚(𝑆, 𝐵) /*calculating matrix T from B */ 𝑥 = 𝑐𝑜𝑚𝑝𝑢𝑡𝑖𝑛𝑔𝑃𝑟𝑜𝐹𝑟𝑜𝑚(𝑆, 𝑇, 𝑀) /* calculating 𝑥 from 𝑆, 𝑇, 𝑀 */ 𝐫𝐞𝐭𝐮𝐫𝐧 𝑆, 𝑥, 𝑃′ Where: 𝑓𝑖𝑛𝑑𝑁𝑜𝑡𝑃𝑟𝑒𝑐𝑒𝑑𝑒𝑛𝑡𝑒𝑑𝑇𝑎𝑠𝑘𝑠 - Find tasks without priority constraints based on matrix of task priority 𝑃 𝑠𝑒𝑙𝑒𝑐𝑡𝑀𝑖𝑛𝐿𝑜𝑎𝑑𝑒𝑑𝑅𝑒𝑠𝑜𝑢𝑟𝑐𝑒 (𝑆)- Select the resource with the minimum load and the highest probability of success (with the criteria of maximum load and probability). 87 Output: The set S of schedules set of time for tasks, and the probability of successful S. 3.1.4. Tool and experimental results a) Building tool To elaborate the proposed model and algorithm, we built the tool BAIS (Bayesian Agile Iteration Scheduling) using Java programming language. The tool allows users to enter the number of resources (developers), the number of tasks, the length of iterations, tasks’ precedence, pre-assignments and durations for tasks. The tool also requires the input for a table of probability for each resource finishes the assigned task in time (Figure 3.1). BAIS implements four strategies for selecting tasks in scheduling (or scheduling rules): - SPT (Shortest processing time first): sequences the tasks in increasing order of their processing time. - LPT (Longest processing time first): sequences the tasks in decreasing order of their processing time. - AF (Assigned First): sequences the tasks based on team pre-assignments. - AF+LPT: the combination of AF and LPT. Figure 3.1. Home GUI of tool BAIS 88 b) Experimental results and analysis The authors use two data samples in two experiments: The first sample is a randomly generated one. Given an iteration with two resources and 20 working-hour length (unbound). Its number of tasks is 8 and there is only one precedence that is the 5th task need to be done before the 3rd one. Table 3.1 shows T1 , T2 are probability tables of resource 1 and resource 2 successfully done their tasks (i.e., finish in time as scheduled) if the previous tasks were done in time, and D1 , D2 are their probability tables of finishing their tasks in time if the previous tasks were over scheduled. Table 3.1. The first data sample Task Time Preassignment 𝑇1 𝑇2 𝐷1 𝐷2 1 4 - 0.92 0.83 0.38 0.34 2 3 - 0.74 0.72 0.27 0.22 3 5 - 0.93 0.85 0.22 0.34 4 4 2 0.94 0.84 0.43 0.36 5 2 - 0.83 0.73 0.26 0.42 6 5 - 0.82 0.96 0.17 0.23 7 4 - 0.73 0.94 0.32 0.52 8 3 - 0.68 0.73 0.27 0.13 The results of four strategies of selecting tasks SPT, LPT, AF, and AF+LPT: SPT: the shortest time that all resources finish in an iteration – makespan – is 16 (hours). Resource 1 is assigned the tasks 5, 2, 7, 3 and Resource 2 is assigned the tasks 8, 1, 4, 6 respectively. The probability for success is 68.83%. LPT: makespan is 17. Resource 1 is assigned the tasks 6, 7, 8, 3 and Resource 2 is assigned the tasks 1, 4, 2, 5 respectively. The probability for success is 60.42%. AF: makespan is 15. Resource 1 is assigned the tasks 1, 2, 6, 8 and Resource 2 is assigned the tasks 4, 5, 3, 7 respectively. The probability for success is 66.77%. AF+LPT: makespan is 17. Resource 1 is assigned the tasks 6, 7, 8, 3 and Resource 2 is assigned the tasks 4, 1, 2, 5 respectively. The probability for success is 60.37%. 89 According to the above result, if the team considers minimizing makespan is the first optimized criteria, then the AF strategy should be chosen. If we want the highest probability of success, then we take the SPT strategy. Figure 3.2 shows the Gantt chart yielded by BAIS for the SPT strategy. Figure 3.2. Gantt chart for SPT strategy The second experiment was carried out with a real-life project data from the Company A. There are 3 developers who need to finish 15 tasks in the project. Each iteration is carried out in 40 hours. The authors asked the project manager and experts in the company to provide the pre-estimated probability of finishing the tasks based on each developer’s experience. The calculation from the tool BAIS gives us the probability table shown in Table 3.2. Table 3.2. The probability table for tasks and resources Probability Taks Time Previous tasks in time Previous tasks over scheduled Re. 1 Re. 2 Re. 3 Re. 1 Re. 2 Re. 3 1 6 1.00 0.96 0.65 0.88 0.8 0.49 2 9 1.00 1.00 0.9 0.85 0.68 3 5 1.00 1.00 0.87 0.7 0.81 0.65 4 6 1.00 1.00 0.83 0.88 0.74 0.75 0.96 90 Probability Taks Time Previous tasks in time Previous tasks over scheduled Re. 1 Re. 2 Re. 3 Re. 1 Re. 2 Re. 3 5 9 1.00 1.00 0.89 0.9 0.77 0.66 6 6 1.00 1.00 0.96 0.98 0.9 0.82 7 2 1.00 1.00 0.8 0.85 0.75 0.75 8 6 1.00 1.00 0.77 0.95 0.75 0.68 9 6 1.00 1.00 0.75 0.96 0.9 0.75 10 4 1.00 1.00 0.89 0.85 0.72 0.67 11 8 1.00 1.00 0.89 0.8 0.9 0.67 12 7 1.00 0.94 0.75 0.9 0.74 0.68 13 4 1.00 1.00 0.82 0.84 0.75 0.62 14 3 1.00 0.86 0.75 1.00 0.84 0.67 15 4 1.00 1.00 0.76 0.83 0.9 0.67 The results for makespans and overall probability: SPT: makespan is 30, the probability for success is 97.98%; LPT and AF+LPT: makespan is 28, the probability for success is 46.98 %; AF: makespan is also 28, and the probability for success is 88.89%. According to these results, the project manager should really pay more attention on the tasks assignments and adjust the iterations’ times during the project. The results of both experiments show that the tool can support decisions in agile software development planning to tailor the best plan for the specific project context and users’ and/or customers’ feedbacks by altering constraints, capacities and priorities. In a single project, the manager can also use the tool to predict next phases or next iterations schedule and better understand how the failure of the phase can impact the whole project. 91 3.1.5. Conclusion and contribution The section has developed an algorithm for agile iteration scheduling with the cooperation of Bayesian Networks to support software teams to analyse the schedules as well as predicting the chance of their success. The method improves the quality of agile software development planning to provide lower level risks by considering all major planning factors (e.g. dependencies, capacities) in a mathematical optimization model. The results of experiments on the available data sets indicate that the approach can provide practical value as a decision support tool for agile iteration planning. To further affirm this, more representative real-life data sets needed and some case studies can be carried out. This section can be considered as a step towards a conceptual model of agile iteration planning and scheduling. Since the research gives better insight into resource-constrained project scheduling problems, this may suggest a new optimization problem on agile iteration scheduling. The developed tool provides options for scheduling rules which enable us to compute an optimal active schedule for the singular resource or overall project. An upgrade can be further developed which incorporates BNs for representing and analyzing causal models involving uncertainty. The version can even provide a set of tools for constructing probabilistic inference and decision support systems on BNs and thus can assist software project managers in making decisions in scheduling and planning all kinds of software projects. 3.2. Incorporation of Bayesian Networks into CPM This section is the work represented in publication 5 [PUB5]. Although project managers nowadays can use a range of tools and techniques to develop, monitor and control project schedules, the task of creating project schedules is often very difficult since it has to deal with planning against uncertainty. Popular techniques for project scheduling based on the assumption that projects are carried out as planned or scheduled – which hardly happens. This section takes the advantage of BNs in modelling uncertainty and incorporates them in Critical Path Method - one of the most popular means of monitoring project scheduling. The section also examines common risk factors in project scheduling and proposes a model of 19 common risk factors. A tool was also built, and experiments were carried out to validate the model. 92 3.2.1. The RBCPM Model In this section, the idea is to use BNs to perform the well-known CPM calculation. In other words, CPM is incorporated with BNs. As described in Section 1.2.2, the main components of CPM calculation are activities. Since this research is on software project, the term “task” is used as “activity”. Tasks are linked together to represent dependencies. In order to incorporate a CPM network to a BN, we first need to map a single task. Each of the activity parameters formulated in Section 1.2.2 is represented as a node in the BN. Figure 3.4 shows a schematic model of a partial BN associated with a task. The Figure also shows the relationship between parameters of a task as well as its connection with other tasks, based on CPM algorithm and the incorporation with BNs. Figure 3.3. A part of a BN for 19 risk factors To form the overall CPM network (or the overall BN), in which a task is a node (and also a variable in BN), we define the connection between dependent variables. Predecessor node i and successor node j is connected by: Connecting EF of i with ES of j; Connecting LS of j with LF of i. In the directed graph CPM, each task is associated with parameters D, LS, LF, ES, and EF. In our model, each task is also affected by a general risk which represents the set of 19 risks. 93 Another BN formed is the BN for 19 common risk factors and a general risk (Figure 3.3). As mentioned above, their relationship is also analysed based on literature review and project managers’ experience. Table 3.3 shows the relationship between the 19 risk factors and the general risk. Each risk factor is represented by a node which may have parent node(s) or/and children node(s). For example, risk factor (node) 3 has parent node 19 and children nodes 4, 8, 9. Table 3.3. Risk factors analysis No. Risk factors Parents Children 1 Large-scale, offshore and distributed. 2 Insufficient training. 1 7,10 3 Excessive preparation/planning. 19 4,8,9 4 Teams are not focused. 3,6,9,18 8 5 Inappropriate process 1,6,12 20 19 4,5 6 The best people not available for selforganizing team. 7 The skill level of people (team/developer) 2,11 20 3,12,14 20 8 Staff is not committed for entire duration of the project. 9 Ineffective communication 3,12 4,14 10 Staff does not receive necessary training. 2,11 20 11 Lack of tools and methods 12 Software tools are not used to support software planning and tracking activities. 2,19 7,10,12,13 11 5,8 94 11 14,17 13 Configuration management software tools are not used to control and track change activity throughout the software process 14 Incorrect scale 9, 13, 16 8 15 Inappropriate technology 7, 19 20 16 Level of team/developer 17 Customer not certain that the functionality requested is "do-able". 18 Lack of commitment of superior management 19 Lack of management experience 1 20 General Risk 5,7,8,10,15,17 14,17 12,13,16,18 20 4,17 3,15 In addition, each risk factor or node is associated with a node probability table (NPT). In our research, each project team or project manager played as an expert to provide the values in the NPTs (based on his/her previous experience on the project features and the team). Since there might be impacts of risks on each task, the estimated time for the task (ED - Estimate D) is no longer a value, but a range of values. Each task probability is calculated based on NPTs of D and ES. From NPTs of D and ES we will have the NPT of EF as well as the NPT of ES of the successor task. 𝐸𝑆 + 𝐷 −> 𝐸𝐹 𝐸𝐹 (𝑘 ) = ∑ 𝑚 ∑𝑛 𝐷 (𝑚) . 𝐸𝑆(𝑛), with k = m + n. (3.10) In the beginning (t = 0), P(ES) is initiated 1 (100%). Let Pi(m) be the probability of finishing task i in m days. If m differs from ED - 1, ED, ED + 1 then Pi(m) = 0. 95 Figure 3.4. Task’s parameters and connection to other tasks. 3.2.2. The RBCPM Method The CPM calculation can now be adapted to the RBCPM (Risk Bayesian CPM) procedure for software project with the following steps: Step 1. Specify the individual tasks using a work breakdown structure. Step 2. Specify the common risk factors (in our research, they are 19 risk factors). Step 3. Determine the sequence of those tasks and dependency between them. Step 4. Determine the relationship between the common risk factors on the project basis (i.e., context based). Step 5. Form the BN for each task (that models the task parameters D, LS, LF, ES, and EF). Step 6. Form the CPM diagram in form of BN for all the tasks (that models the tasks and their dependency). Step 7. Calculate the general risk affecting each task and estimate the completion time (duration) for each task. Step 8. Identify the critical path (the longest-duration path through the network). Step 9. Update the CPM diagram as the project progresses. 96 3.2.3. Tool and experimental results a) The tool RBCPM The tool RBCPM (Risk Analysis based on Bayesian CPM) was built in Java programming language to test the RBCPM model and the procedure described above as well as supporting project managers. The tool has the following main functions: Model and calculate a project schedule (in form of CPM algorithm described in sections 1.2.2 and 3.2.1). Alert project managers about tasks could be over-scheduled (so that they can have impacts on the whole project). Calculate the possibility of the schedule (that is, chance that the project will be finished on time). Calculate the possibility of each task (i.e., probability of finishing on time). Model the network of risks (in form of BNs visualization) that have impacts on each task. Figure 3.5 is a screenshot of the tool which shows task information and its possibility of finishing on time. Figure 3.5. A screenshot of RBCPM b) Experimental results and analysis Data samples used in the experiments can be categorized as: 97 Data samples used for CPM algorithm: sets of tasks with start time, duration and precedence constraints. Data samples used for BNs: NPTs of nodes, which often provided by the project manager or some project expert. Data samples for CPM algorithm: two samples from real-life projects as shown in Table 3.4 and Table 3.5. In the first data sample: there were 13 tasks and the project was planned to be finished in 80 days. The initial time allocations and task precedencies are shown in Table 3.4. Similarly, the second project in the data sample 2 had 24 tasks and was planned to be finished in 112 days. Table 3.4. Data sample 1 No. Task Duration Predecessor 1 A 5 - 2 B 7 A 3 C 4 A 4 D 5 B 5 E 10 B 6 F 7 CD 7 G 6 F 8 H 6 EG 9 I 3 H 10 K 7 F 11 L 8 I 12 M 5 K 13 N 9 LM Data samples for BNs: Nodes in the BN are risks (or uncertainty) for each task of the project. The authors asked project managers and some other key people of the projects to judge the NPTs and relationship of risks based on their experience in their projects. They provided as initial input a particular set of values for the node's parent variables, and the NPTs of the variable represented by the node. There were 98 7 sets of NPTs provided by the project managers for the authors’ BNs of 19 common risks. Each task is impacted by one of these 7 sets of BNs. For the first experiment: the possibility of finishing on time (as planned 80 days) is 67.56% (Figure 3.6). In fact, the project is done in 95 days, with the possibility calculated by the tool is 88.34%. Figure 3.6 and Figure 3.7 are graphical illustrations from the tool BAIS to help users see the possibility of finishing the software project in certain point of time. Table 3.5. Data sample 2 No. Task Duration Predecessor 1 A 3 - 2 B 4 A 3 C 4 B 4 D 6 A 5 E 6 CD 6 F 7 E 7 G 10 E 8 H 3 E 9 I 3 H 10 J 4 I 11 K 12 F 12 L 3 GJ 13 M 4 K 14 N 2 KL 15 O 3 M 16 P 10 O 17 Q 4 P 18 R 6 P 19 S 18 N 20 T 7 SR 99 No. Task Duration Predecessor 21 U 6 T 22 V 15 SR 23 X 3 UV 24 Y 3 XQ For the second experiment: the possibility of finishing on time (as planned 112 days) is 69.11% (Figure 3.7). In fact, the project is done in 132 days, with the possibility calculated by the tool is over 90%. Figure 3.6. A result for experiment with data sample 1 The results show the reliability of the model and the tool, since the calculation is appropriate to the situation of the real-life projects. However, the reliability of the proposed model depends on the BN, i.e. 19 common risk factors, their relations and their NPTs. Therefore, the feedback from experts and from project managers is crucial to the results. This is also the similar case with any other system based on BNs. Results on RBCPM algorithm also confirm that tasks on the critical path have important impacts overall project. Therefore, project managers need to take care of these tasks. 100 Figure 3.7. A result for experiment with data sample 2 3.2.4. Conclusion and contribution The research has improved CPM-based schedule with the incorporation of BNs to support software teams to analyse the schedules as well as predicting the chance of their success in such a way that fully characterizes uncertainty. The approach makes it possible to capture different sources of risks and use them to analyse software project scheduling. It also expresses uncertainty about duration for each task and the whole project with full probability distribution. The method provides a good way to deal with the uncertainty that cannot be handled in traditional ways, such as the uncertainty caused by the co-relationship between activities and risk factors. This also improves the quality of software development scheduling to provide lower level risks by considering all major planning factors (e.g. dependencies, capacities) in a mathematical model. Therefore, this is an effective approach for software project scheduling. The authors would carry out further research on incorporating BNs and some other scheduling algorithms (e.g. PERT). Other expansions can be incorporating additional uncertainty sources into the model, or further handling common causal risks (which affect more than one task). The results of experiments on the available data sets and indicate that the approach can provide practical value as a decision support tool for software scheduling and planning. To further affirm this, more representative real-life data sets needed, and some case studies can be carried out. 101 3.3. Incorporation of Bayesian Networks into PERT This section is the work represented in publication 6 [PUB6]. This section takes advantage of BNs (including related mathematical calculations) in modelling and assessing uncertainty and incorporates them in software project scheduling with Program Evaluation and Review Technique (PERT) in case of a high level of uncertainty. Common risk factors in project scheduling are also examined, and a model of 19 common risk factors and their causal relationships proposed in Section 2.3.1 is confirmed. The research also borrows and implements categories and levels of risks from construction projects into software projects. An own-built tool was also built to experiment and validate the proposed model. 3.3.1. Proposed model Since PERT is similar to CPM, this research proposes a model identical to the model proposed in Section 3.2 with two main differences: (i) PERT scheduling technique is used instead of CPM, and (ii) risk factors are deeper analysed using the adapted risk categorization and levels from construction project. a) Common risk factors In this section, the list of 19 common risk factors (shown in Table 2.9) directly or indirectly influence the possibility of a schedule success is also examined similarly as what is done in Section 3.2. In order to model the risk factors using BNs, each risk factor is represented by a network node (and it is also an event with the possibility of happening at a certain probability). That is, each node may have parent node(s) or/and children node(s). For example, risk factor (node) 3 has parent node 19 and children nodes 4, 8, 9. A general (total) risk factor is used as a representative of the impact of the common risk factors on an activity at a point of time. The Table 3.3 analyses each risk factor (network node) and the relationship between the 20 nodes (19 risk factor nodes and one general risk node). In addition, each risk factor or node is associated with a node probability table (NPT). In our research, each project team or project manager played as an expert to provide the values in the NPTs (based on his/her previous experience on the project features and the team). b) Proposed Risk Bayesian PERT method To form the overall PERT network (or the overall BN), in which an activity is a node (and also a variable in the BN), we define the connection between dependent variables. Predecessor node i and successor node j are connected by: 102 Connecting tEF of i with tES of j; Connecting tLS of j with tLF of i. In the Bayesian PERT network, each activity is associated with parameters t (total duration), tLS, tLF, tES, and tEF as shown in Figure 3.8. Figure 3.8. Bayesian Network for each activity In the authors’ model, each activity is also affected by a general (total) risk which represents the set of 19 risks (Figure 3.9). The above Bayesian PERT network represents the time nodes for each activity as well as the relationship between activity nodes. However as mentioned in Section 3.1, there are always impact of risk factors on each activity. Therefore, the impact of risk factors needs to be brought into the model. It can be seen from the Bayesian PERT model the relationship between t(i) and tEF of each activity and between the tEF of the predecessor activity and the tES of the next activity through the Bayesian Network. Thus, if the total duration node t(i) is affected by the risks, indirectly the risks will also affect the tES and tEF nodes of the activity. Thanks to this, the calculation can be greatly reduced while still mentioning the impact of risks on all time nodes. Another BN is formed to model the 19 common risk factors and the general risk as described in Section 3.1. The BN ensures the impact of all risks to each activity node as well as easy and simple in integrating the impact of risk on the activity node. Details of the model are shown in Figure 3.9. It can be seen from Figure 3.9 that the "total duration" node is a representation of the execution time of the activity node after being affected by the risk. The risk model will be represented by a "total risk" node that represents the whole risk model. The node estimated duration t(i) is the estimated execution time of the 103 activity in the scheduling process. The above model is the Risk Bayesian PERT (RBPERT) Model. Figure 3.9. Risk integration network model into PERT scheduling c) The improved RBPERT Model Chang et al. [22] categorized construction project risks into 2 categories and 7 levels. Risks are divided into 2 categories: Risks due to the physical environment Resource risks include 5 subcategories: people (the availability of different skilled or unskilled laborers to perform an activity), machine (the availability of required equipment to perform an activity), materials (the availability of required materials to complete an activity), methods (the availability of appropriate methods to perform an activity), money (availability of required financial arrangements to conduct an activity). Adapted to software development project, in this research the machine category is considered as technological risks, and materials category is considered as support tools. Considering all the above risk factors, seven risk levels are classified for estimating the duration of an activity: Level 0: No risks Level 1: There is a risk of physical environment Level 2: There is a risk of physical environment + 1 of 5 resource risks Level 3: There is a risk of physical environment + 2 of 5 resource risks Level 4: There is a risk of physical environment + 3 of 5 resource risks Level 5: There is a risk of physical environment + 4 of 5 resource risks 104 Level 6: There is a risk of physical environment + all 5 resource risks The process of scheduling and risk management improvement is as following 8 steps and also shown in the Figure 3.10. Figure 3.10. Process in improved RBPERT Model Step 1: Project creation (initialize the basic information of the project). Step 2: Activity definition (determine the activities needed to be done in the project and estimate the activity durations). Step 3: Relationship connection (draw relationships between activities using BNs). Step 4: Risk item breakdown (analyse possible risk factors in the project and estimate their impacts on the activities). Step 5: Risk allocation (allocate risk factors to the activities). Step 6: Information check (check the accuracy of relationships between activities, and risk factors information). Step 7: Duration calculation (calculate the total duration of each activity node by applying the 7-risk-level method). 105 Step 8: PERT calculation (calculate the overall project duration and critical path for the 7 risk levels, with PERT technique). 3.3.2. Tool development and data collection a) The tool RBPERT The tool RBPERT10 was built in Java programming language which includes the following main functions: Calculates the start and end time of each task (activity) with PERT scheduling technique and calculation. Calculates and provides a distribution chart that accumulates the probability of project completion time. Provides the RBPERT model of the project. The Figure 3.11 shows the input screen of the tool, which simply allows choosing input file and yielding schedule in PERT technique as well as the PERT Bayesian Network associated with the schedule. Figure 3.11. The input screen of the RBPERT tool The input file of the tool is an XLS file which contains the list of the tasks in the software project (Figure 3.12). Task attributes are id, PERT time estimates (optimistic, most likely, pessimistic), id of predecessor tasks, name of the task (optional, if provided). 10 RBPERT (2019). The RBPERT https://github.com/tuanmasu/RBPERT/ source code and sample data. Available at 106 Figure 3.12. The input file type of the RBPERT tool The tool processes input data based on the following flow: Firstly, read data from the input file to create a tasks sequence and to build a predecessor relationship between tasks. Calculate the Duration node and Total Duration node of each task. After initializing the task, calculate the parameters: the earliest starting time tES, the earliest finish time of tEF, the latest starting time tLS, and the latest finish time tLF of each task. Initialize BNs of each task and between tasks as described in Section 3.2. Calculate of probability accumulation of the earliest starting point tES, the earliest finish time tEF of each task. After the calculation, the tool provides the chart of cumulative probability of the project completion time and the RBPERT network of the project (Figure 3.13). b) Data collection The authors collected data from 3 real-life projects from a big software company in Vietnam. The sample data were put onto github together with the source code of the RBPERT tool. The first software project had 10 tasks (Table 3.8) and was planned to finish in 79 days. However, in fact, it was done in 88 days. 107 Table 3.6. Task attributes of the first data sample No id Optimistic Most likely Pessimistic Predecessor 1 A 2 3 5 2 B 24 30 35 A 3 C 15 21 27 B 4 D 1.5 2 2 BC 5 E 1 2 2.5 D 6 F 2 3 4 E 7 G 1.5 2 3 AF 8 H 5 7 10 G 9 I 1.5 2 5 H 10 J 5 7 10 IJ The second software project consisted of 9 tasks (Table 3.9) and was expected to finish in 95 days. In reality, the project was done in 104 days. Table 3.7. Task attributes of the second data sample No id Optimistic Most likely Pessimistic Predecessor 1 A 2 3 5 2 B 10 14 18 A 3 C 16 21 25 BC 4 D 4 5 7 C 5 E 4 5 7 D 6 F 25 30 34 E 7 G 4 5 7 DEF 8 H 5 7 10 G 9 I 4 5 7 H 108 The third data sample is from a project of 15 tasks (Table 3.10) which lasted almost 900 days due to the last two tasks took 1 year each as they were about the assessment and confirm of the system security and the whole project supervision. The project was planned to be done in 892 days but in fact it lasted for 907 days. Figure 3.13. A result for the network provided by the RBPERT tool for the first data sample Table 3.8. Task attributes of the third data sample No id Optimistic Most likely Pessimistic Predecessor 1 A 5 7 10 2 B 5 7 10 A 3 C 5 7 10 BC 4 D 2 3 4 ABC 5 E 5 7 10 D 6 F 5 7 10 E 109 No id Optimistic Most likely Pessimistic Predecessor 7 G 2.5 3 5 F 8 H 4 5 7 G 9 I 10 14 15 H 10 J 50 60 70 I 11 K 6 7 9 J 12 L 12 14 18 K 13 M 16 21 25 L 14 N 360 365 370 M 15 P 360 365 370 N 3.3.3. Experimental results and analysis Results from the experiment with the RBPERT tool: For the first data sample: The probability for the project to finish on time (79 days): 69%. The probability for the project to finish in 88 days (as it happened): 86%. For the second data sample: The probability for the project to finish on time (95 days): 68%. The probability for the project to finish in 104 days (as it is the case in reality): 89%. For the third data sample: The probability for the project to finish on time (892 days): 71%. The probability for the project to finish in 907 days: 81%. The tool shows quantitatively the fact that the chance of success of the projects in the planned time is not so high. Therefore, the projects needed more time, and indeed they lasted longer than planned. However, the probability for the projects to be done in real time durations could not be 100% (instead, they are 86%, 89% and 81% in the experiments for the real-life data samples). This difference can be explained by the limitation of the PERT technique, which the tool also reflects. Firstly, PERT requires a subjective (time) estimates of the tasks, leads to the accuracy of PERT estimates rely on these subjective estimates; i.e., the schedule would be affected if the people provide these estimates are not focused, lack of experience or biased. Secondly, in our research, it is assumed that the critical path will remain the critical path throughout the whole project, which is not always 110 guaranteed in real life PERT technique. Besides, like CPM, PERT technique implemented in this research also assumes that all the resources are available during the whole project (which might not be the case in the real-life project). Figure 3.14. A result for RBPERT network provided by the tool for the first data sample Figure 3.15. A result for experiment with the third data sample (distribution of Total Duration of activity J) Another explanation for the differences between the result from the tool and the real-life projects is that in real-life, project teams can work overtime (so that the real total duration can be longer than reported) in order to finish the project faster. 111 3.3.4. Conclusion and contribution The section proposed a probabilistic approach for handling high level of uncertainty/risks in software projects by applying BNs into both scheduling technique (PERT) and common risks in software scheduling. The proposed approach also enriches the benefits of PERT in handling uncertainty by further incorporating risk factors together with the powerful analytical capability of BNs. This approach provides the quantitative calculation and analysis of influential factors to software schedules and software schedule success, so that software teams can better analyse the PERT-based schedules as well as predicting their chance of success. To better control projects’ schedules, this approach also helps examine the influential factors to software schedule, as well as levels of risks in the project so that project managers can capture different sources of risks and use them to analyse software project scheduling. The classification of risk factors into groups and levels would help assess better each risk factor impact on the success of the project as well as assessing better each risk factor impact in each group and level. To further confirm this approach as well as refining the adapted risk categorization and levels from a construction project, more literature study, more representative real-life data sets needed, as well as more case studies could be carried out. The risk categories would also be applied in a similar model of incorporating CPM and BNs (as the expansion of the results of Section 3.2). Other expansions can be incorporating additional uncertainty sources into the model and the risk categorization and levels, or further handling common causal risks (which affect more than one task). These are all about considering more about specific attributes of risks related to software scheduling. 3.4. Incorporation of Bayesian Networks into Agile software development scheduling This section is the work represented in publication 4 [PUB4]. This section proposes Bayesian Networks to model risk factors in Agile software projects as well as managing risks in Agile iteration scheduling. The section also addresses 19 common risk factors that affect iteration scheduling. Based on the method, a software was developed as a support tool for managers to control their project schedules as it can assess the possibility of each schedule. 112 3.4.1. Incorporation of risk model In this section, the list of 19 common risk factors (shown in Table 2.11) directly or indirectly influence the possibility of a schedule success is also examined similarly to the Section 3.2.1. Each risk factor is represented by a node with a NPT. Each project team or project manager played as an expert to provide the values in the NPTs (based on his/her previous experience on the project features and the team). Moreover, the relationships among the risk factors are also analysed based on literature review and project managers’ experience. Each task in an iteration at a certain point of time might be impacted by a general risk which represents 19 risk factors. 3.4.2. Tool and experimental results a) Building tool To elaborate the proposed model and algorithm, we built the tool BAIS (Bayesian Agile Iteration Scheduling) using Java programming language. The tool allows users to enter the number of resources (developers), the number of tasks, the length of iterations, tasks’ precedencies, pre-assignments and durations for tasks. The tool also requires the input for a table of probability for each resource finishes the assigned task in time. Figure 3.16. A screenshot of tool BAIS The tool has the following main functions: - Providing an iteration schedule based on the input data - Providing the possibilities for each task, resource and the whole iteration - Displaying BNs for resources. There are also options for adjusting the nodes’ NPTs so that project managers can examine the projects in different scenarios. 113 Figure 3.16 shows a screenshot of the tool in which the pop-up message shows an example of the possibility (69.95%) for finishing a schedule. b) Experimental results and analysis The authors use two real data samples in two experiments: a) The first sample is an e-commerce project using Scrum method. Given a sprint (an iteration) with 7 resources and 460 working-hour length. Its number of tasks is 43 and there are five precedence tasks: P[1] = 3, P[2] = 3, P[3] = 4, P[5] = 6, P[6] = 7 where P[i] = j means task j has to be done before task i. Table 3.9. The result for the first data sample R1 T4: 85% T8: 76% T13: 70% T17: 67% T22: 65% T27: 64% T30: 63% T40: 62% R2 T3: 85% T9: 76% T14: 70% T31: 66% T35: 65% T41: 63% R3 T1: 81% T18: 69% T23: 62% T28: 57% T34: 55% T37: 53% T42: 52% R4 T2: 85% T10: 76% T29: 70% R5 T7: 81% T11: 70% T15: 63% T19: 59% T24: 56% T38: 55% R6 T6: 81% T20: 70% T25: 63% T32: 59% T36: 56% T39: 55% T43: 54% R7 T5: 84% T12: 75% T16: 70% T21: 66% T26: 64% T33: 63% Table 3.13 shows the result for the first data sample. Each resource assigned tasks associated with the completion possibilities. The overall completion possibility is 59.85%. In this case, if the sprint is adjusted to 450 hours, the overall completion possibility calculated to be 60%. These numbers are reliable compared to the real situation of the project. b) The second experiment was carried out with a more complex real-life software project data in education. There are 17 developers who need to finish 85 tasks in 9 Scrum sprints of the project. There are 54 tasks in the first sprint (10 working days) and 35 tasks in the last sprint (10 working days). The calculation from the tool BAIS gives us (as also shown in Figure 3.17): 114 - The overall probability for the first sprint is 67.12%. Infact, only 35 tasks were done as scheduled, which can be considered as 64.82% (35/54). There are 2.3% difference between the tool calculation and the real situation. - The overall probability for the last sprint is 73.31%. Infact, the sprint has 9 tasks over-scheduled meaning 74.29% which is almost the same as calculated by the tool (0.98% difference). The results of the second case can be explained by the complexity of the project. There are so many tasks together with many constraints (precedencies) that lead to the possibilities of over-scheduled sprints. Figure 3.17. The result of the second experiment The results of both experiments show that the tool can support decisions in Agile software development planning to tailor the best plan for the specific project context and users’ and/or customers’ feedbacks by altering constraints, capacities and priorities. In a single project, the manager can also use the tool to predict next phases or next iterations schedule and better understand how the failure of the phase can impact the whole project. 3.4.3. Conclusion and contribution The section has developed an algorithm for Agile iteration scheduling with the incorporation of BNs to support software teams to analyse the schedules as well as predicting the chance of their success. The method improves the quality of Agile software development planning to provide lower level risks by considering all major planning factors (e.g. dependencies, capacities) in a mathematical optimization model. 115 The results of experiments on the available data sets indicate that the approach can provide practical value as a decision support tool for Agile iteration planning. To further affirm this, more representative real-life data sets needed, and some case studies can be carried out. The proposed 19 common risk factors in Agile iteration scheduling can also be further examined and refined by means of literature review and case studies. Besides its limitation, this section can be considered as a step towards a conceptual model of Agile iteration planning and scheduling. Since the research gives better insight into resource-constrained project scheduling problems, this may suggest a new optimization problem on Agile iteration scheduling. An upgrade of the tool BAIS can be further developed which incorporates BNs for representing and analyzing causal models involving uncertainty. The version can even provide a set of tools for constructing probabilistic inference and decision support systems on BNs and thus can assist software project managers in making decisions in scheduling and planning all kinds of software projects. 3.5. Chapter remarks This chapter aims at finding better model for handling uncertainty/risks in software project scheduling by considering both scheduling techniques and common risks in software scheduling. This chapter come up with proposing an improved scheduling method based on the integration of BNs and risk factors in software project scheduling with to the following techniques: PERT, CPM, and Agile software development scheduling. The experiments’ results confirmed that the model proposed for managing common risks in software project scheduling with a model of 19 common risks factors works well both with CPM and PERT in cases of high level of uncertainty. With these results, the thesis achieves the second objective (mentioned in the Introduction section). 116 Conclusion What has been done The research has done the following tasks: - Carrying out literature review on project scheduling techniques, software project scheduling techniques, and on risk factors in software project scheduling. - Proposing scientific models for risk management in software project scheduling that examines specific features of software project scheduling such as common risk factors and their impacts (on software project schedules) as well as supporting software project managers to keep track of their projects’ schedules. The models based on a probabilistic approach using Bayesian Networks, and applied both to traditional waterfall software development and agile software development. - Validating proposed models by building tools and carrying out experiments with data from the real world. The two objectives of the thesis (mentioned in the Introduction section) have been achieved. The results have been published in 6 conference and journal papers (see List of scientific publications section for more details). Main contributions The research has developed the algorithm BRI (Bayes Risk-Impact) and the tool CKDY to assess the impacts of risks and hence proposes common risk factors in software project scheduling. Based on literature review and experiments, the research has come up with 19 common risk factors in software project scheduling (for both agile development style and traditional development style). The research also proposes advanced scheduling methods in software project development. The methods based on incorporating Bayesian Networks and common risk factors models into popular software scheduling techniques such as PERT, CPM, and Agile software development. Tools have been built to experiment the proposed scheduling methods and models. Experimental results show that the proposed methods and models are reliable as well as providing practical value to software development teams in analyzing, monitoring and predicting risks and the chance of success of the project. Limitations Since the thesis aims at solving different pieces (in terms of the way of software project development and the set of risk factors) of the puzzle of risk management in 117 software project scheduling, different data set is provided for each piece. As a consequence, there has been no consistent data set for all the work presented in the two main chapters (Chapter 2 and Chapter 3) of the thesis. Moreover, although the author tried to get real software project data from well-known software companies in Vietnam, his approaches have not been applied by those companies into real ongoing software projects yet. All the experiments have been done with finished projects and with the judgements from the projects’ people (especially the projects’ managers). The definition of empirical evaluation criteria is also another limitation of the thesis since there is not enough data and information from real software projects (to act as the basis for evaluating according to the criteria). The evaluation of the experimental results in the thesis is currently based on the information provided by the project teams, and they also act as experts to consider the validity of the experimental results. In addition, although the research tries to find out the optimization algorithms for software project scheduling, it has not proposed a brand new algorithm yet. The research has only improved the existing methods and algorithms using probabilistic approaches. Further research The results of experiments on the available data sets and indicate that the approach proposed in the research can provide practical value as a decision support tool for software scheduling and planning. To further affirm this, more representative real-life data sets needed, and some case studies can be carried out in real on-going software projects. The author would come up with consistent data sets for both traditional software project development and agile software project development, and these data sets could also be contributed to the research community. Further research can be incorporating additional uncertainty sources into the model, or further handling common causal risks (which affect more than one task). The list of 19 common risk factors in software scheduling could be further refined by case studies or surveys. The research would also go further with finding out a software scheduling optimization algorithm using Bayesian Networks. 118 List of scientific publications PUB1: Nguyễn Ngọc Tuấn, Huỳnh Quyết Thắng (2017), “Iteration scheduling using Bayesian networks in Agile Software Development”, Kỷ yếu Hội nghị Quốc gia lần thứ X về Nghiên cứu cơ bản và ứng dụng Công nghệ thông tin (FAIR’10) – Đà Nẵng, ngày 17-18/8/2017, trang 300-308, ISBN: 978-604-913-614-6 PUB2: Nguyễn Ngọc Tuấn, Võ Thị Hường, Huỳnh Quyết Thắng (2017), “Hướng tới mô hình mạng Bayes để đánh giá rủi ro trong lập lịch dự án phần mềm”, Kỷ yếu Hội nghị Quốc gia lần thứ X về Nghiên cứu cơ bản và ứng dụng Công nghệ thông tin (FAIR’10) – Đà Nẵng, ngày 17-18/8/2017, trang 275-282, ISBN: 978604-913-614-6 PUB3: Nguyễn Ngọc Tuấn, Trần Trung Hiếu, Huỳnh Quyết Thắng (2017), “Phương pháp xác suất cải tiến sử dụng mạng bayes đánh giá rủi ro trong lập lịch dự án phần mềm”, Chuyên san Công nghệ thông tin và Truyền thông, Tạp chí Khoa học và Kỹ thuật - Học viện KTQS - Số 184 (06-2017), trang 45-61, ISSN: 18590209 PUB4: Nguyen Ngoc Tuan, Huynh Quyet Thang (2018), “Risk management in Agile software project iteration scheduling using Bayesian Networks”, New Trends in Intelligent Software Methodologies, Tools and Techniques, Volume 303, 2018, pp. 596 - 606 (SOMET 2018), ISBN 978-1-61499-899-0, DOI: 10.3233/978-161499-900-3-596, SCOPUS Indexed. PUB5: Ngoc-Tuan Nguyen, Quyet-Thang Huynh, Thi-Huong-Giang Vu (2018), “A Bayesian Critical Path Method for Managing Common Risks in Software Project Scheduling”, SoICT 2018 Proceedings of the 9th International Symposium on Information and Communication Technology, Danang City, Viet Nam - December 06 - 07, 2018, ISBN: 978-1-4503-6539-0, pp. 382-388, DOI: 10.1145/3287921.3287962 PUB6: Quyet-Thang Huynh, Ngoc-Tuan Nguyen (2020), “Probabilistic Method for Managing Common Risks in Software Project Scheduling Based on Program Evaluation Review Technique”, International Journal of Information Technology Project Management, Volume 11(3), pp. 77-94, ISSN: 1938-0232, DOI: 10.4018/IJITPM.2020070105. 119 References [1] Moore T. (2018), “Worst failure of public administration in this nation: payroll system”, The Sydney Morning Herald, Retrieved 24 July 2018, available online. [2] Glick B. (2014), “Government finally scraps e-Borders programme”, ComputerWeekly.com, Retrieved 24 July 2018, available online. [3] Boehm B.W. (1991), “Software Risk Management: Principles and Practices”, IEEE Software, 8(1), pp. 32–41. [4] Dedolph M. (2003), “The Neglected Management Activity: Software Risk Management”, Bell Labs Technical Journal, 8(3), pp. 91–95. [5] Hui A.K.T. and Liu D.B. (2004), “A Bayesian Belief Network model and tool to evaluate risk and impact in software development projects”, Reliability and Maintainability, 2004 Annual Symposium – RAMS, pp. 297-301. [6] Karollay G. O. V., Carlos E. S. S., Sandra M. N. (2020), “Risk Management in Software Development Projects: Systematic Review of the State of the Art Literature”, International Journal of Open Source Software and Processes (IJOSSP) 11(1), pp. 1-22. [7] PMI (2017), “A Guide to the Project Management Body of Knowledge (PMBOK Guide)”, 6th Edition, Project Management Institute. [8] Rao B.H., Gandhy A. & Rathod R.R. (2013). “A Brief View of Project Scheduling Techniques”, International Journal of Engineering Research & Technology, 2(12), pp. 1555-1559. [9] Jun-yan J. (2012), “Schedule Uncertainty Control: A Literature review”, Physics Procedia, Volume 33, pp. 1842 – 1848. [10] Kaur R. et al. (2013), “A review of various software project scheduling techniques”, International Journal of Computer Science & Engineering Technology, 4(7), pp. 877-882. [11] Williams T. (1995), “A Classified Bibliography of Recent Research Relating to Project Risk Management”, European Journal of Operational Resarch, 85(1), pp. 18-38. [12] Malcolm et al. (1959), “Application of a Technique for Research and Development Program Evaluation”, Operations Research, 7(5), pp. 646-669. 120 [13] Miller R.W. (1962), “How to plan and control with PERT”, Harvard Business Review, pp. 93-104. [14] Ward S. and Chapman C. (2003), “Transforming project risk management into project uncertainty management”, International Journal of Project Management, 21, pp. 97-105. [15] Khodakarami V. (2009), “Applying Bayesian Networks to model uncertainty in project scheduling”, PhD dissertation, Queen Mary, University of London. [16] Erhan P., Yasemin S. and Barbaros Y. (2020), “Integrating Risk into Project Control Using Bayesian Networks”, International Journal of Information Technology & Decision Making, 19(5), pp. 1327-1352. [17] Ali N., Siamak H. Y., Vahidreza Y. and Jolanta T. (2019), “Combining Monte Carlo Simulation and Bayesian Networks Methods for Assessing Completion Time of Projects under Risk”, International Journal of Environmental Research and Public Health, 16, 5024; doi:10.3390/ijerph16245024. [18] Lee, Y. P. and Shin J. G. (2009), “Large Engineering Project Risk Management Using a Bayesian Belief Network”, Expert Systems with Applications, vol. 36(3), pp. 5880–5887. [19] Sharma S.K. and Chanda U. (2017), “Developing a Bayesian Belief Network model for prediction of R&D project success”, Journal of Management Analytics, vol. 4 (2), pp.1-24. [20] Khodakarami V., Fenton N. and Neil M. (2007), “Project Scheduling: Improved Approach to Incoporate Uncertainty using Bayesian Networks”, Project Management Journal, 38(2), pp. 39-49. [21] Fenton N.E. and Neil M. (2014), “Decision support software for probabilistic risk assessment using Bayesian Networks”, IEEE Software, 31(2), pp. 21-26. [22] Chang H.K, Yu W.D. and Cheng S.T. (2017), “A Risk-based Critical Path Scheduling Method (I): Model and Prototype Application System”, Proceedings of 34th International Symposium on Automation and Robotics in Construction ISARC. [23] Kumar, C. & Yadav, D.K. (2015), “A Probabilistic Software Risk Assessment and Estimation Model for Software Projects”, Procedia Computer Science, 54, pp. 353–361. [24] Hu Y. et al. (2013), “Software Project Risk Analysis Using Bayesian Networks with Causality Constraints”, Decision Support Systems, vol. 56, pp. 439–449. 121 [25] Anthony B.J. et al. (2016), “A Proposed Risk Assessment Model for Decision Making in Software Management”, Journal of Soft Computing and Decision Support Systems, vol. 3 (5), pp. 31-43. [26] Rai A. K., Agrawal S. and Khaliq M. (2017), “Identification of Agile Software Risk Indicators and Evaluation of Agile Software Development Project Risk Occurrence Probability”, Proceedings of 7th International Conference on Engineering Technology, Science and Management Innovation (ICETSMI-2017), pp. 489-494. [27] Szoke A. (2014), “Models and Algorithms for Integrated Agile Software Planning and Scheduling”, PhD Dissertation. [28] Wallace L., Keil M. and Rai A. (2004), “How software project risk affects project performance: an investigation of the dimensions of risk and an exploratory model”, Decision Sciences, 35(2), pp. 289-321. [29] J. Menezes Jr., Gusmao C. and Moura H. (2013), “Defining Indicators for Risk Assessment in Software Development Projects”, CLEI Electronic Journal, 16(1). [30] Sadiq M. and Shahid M. (2013), “A Systematic Approach for the Estimation of Software Risk and Cost using EsrcTool”, CSIT, vol. 1(3): 243–252. [31] Kumar C. and Yadav D. K. (2015), “A Bayesian Approach of Software Risk Assessment”, International Journal of Applied Engineering Research (IJAER), 10, pp. 2366-2371. [32] Jefferson F.B., Hermano P.d.M, Marcelo L.M.M (2020), “Towards a Quantitative Model to Deal with Uncertainty Management in Software Projects”, The XI Brazilian Software Congress: Theory and Practice. [33] Yong J. & Zhigang Z. (2011), “The Project Schedule Management Model Based on the Program Evaluation and Review Technique and Bayesian Network”, Proceedings of the IEEE International Conference on Automation and Logistics, Chongqing, China, pp. 379-383. [34] Wrike blog (2019), “What Is Software Project Management?”, Retrieved 3 September 2019, available online. [35] Moder J. (1988), “Network Techniques in Project Management”, Project Management Handbook, New York, Van Nostrand Reinhold. 122 [36] Fortune J. and White D. (2006), "Framing of Project Critical Success Factors by a Systems Model", International Journal of Project Management, 24(1), pp. 5365. [37] Fenton N. and Neil M. (2013), “Risk Assessment and Decision Analysis with Bayesian Networks”, Reading book, CRC Press. [38] Pollack-Johnson B. and Liberatore M.J. (2005), “Project Planning under Uncertainty Using Scenario Analysis”, Project Management Journal, 36(1), pp. 1526. [39] Van Slyke R.M. (1963), “Monte Carlo Methods and the Pert Problem”, Operations Research, 11(5), pp. 839-860. [40] Fishman G.S. (1986). A Monte Carlo Sampling Plan for Estimating Network Reliability. Operations Research, 34(4), pp. 581-594. [41] Ragsdale C. (1989), “The current state of network simulation in project management theory and practice”, Omaga, 17(1), pp. 21-25. [42] Oracle (2018), “Oracle Primavery Risk Analysis (Pertmaster®)”, Emerald Associates, available online. [43] PMI. (1999), “Project Management Software Survey”, Newtown Square, PA: Project Management Institute. [44] Pollack-Johnson B. and Liberatore M.J. (2003), "Analytical Techniques in Project Planning and Control: Current Practice and Future Research Directions", Unpublished manuscript, Villanova University. [45] Van Dorp J. R., Duffey M. R. (1999), “Modelling statistical dependence in risk analysis for project networks”, International Journal of Production Economics, 58, pp. 17-29. [46] Williams T. (2004), “Why Monte Carlo Simulations of Project Networks Can Mislead”, Project Management Journal, 35(3), pp. 53-61. [47] Liberatore M.J. (2002), “Project Schedule Uncertainty Analysis Using Fuzzy Logic”, Project Management Journal, 33(4), pp. 15-22. [48] Kuchta D. (2001), “Use of Fuzzy Numbers in Project Risk Assessment”, International Journal of Project Management, 19(5), pp. 305-310. [49] Bonnal P. et al. (2004), “Where do we stand with Fuzzy project scheduling?”, Journal of Construction Engineering & Management, 130(1), pp. 114-123. 123 [50] Abrahamsson P. et al. (2002), “Agile Software Development methods: Review and Analysis”, VTT Publications 478, pp. 3-107. [51] Stalhane T. and Hanssen G. K. (2008), “The application of ISO 9001 to Agile Software Development”, PROFES 2008, pp. 371-385. [52] Schwaber K. (1995), “The Scrum development process”, In OOPSLA ’95 Workshop on Business Object Design and Implementation, Austin, Texas, USA, October 1995. ACM Press. [53] Huo M., Verner J., Zhu L., Babar M.A. (2004), “Software quality and Agile methods”, Proceedings of COMPSAC’04, pp. 520-525. [54] Wailgum T. (2007), “From Here to Agility”, CIO.com, Retrieved June 2018, available online. [55] Glazer H., Dalton J., Anderson D., Konrad M., Shrum S. (2008), “CMMI or Agile: Why not embrace both!”, Technical Note CMU/SEI-2008-TN-003, Software Engineering Institute, Carnegie Mellon University. [56] Cohn M. (2005), “Agile Estimating and Planning”, NJ, USA: Prentice Hall PTR, ISBN: 0131479415. [57] PRAM (2004), “Project Risk Analysis and Management Guide”, High Wycomb, Association for Project Management (APM). [58] RAMP (2005), “Risk Analysis and Management for Projects”, London Institute of Civil Engineering and the Faculty and Institute of Actuaries, Thomas Telford. [59] Chapman C. (2006), “Key Point of Contention in Framing Assumptions for Risk and Uncertainty Management”, International Journal of Project Management, 24(4), pp. 303-313. [60] Barry, J. B. (1995), “Assessing Risk Systematically”, Risk Management, 42, pp. 12-15. [61] Williams T. M. (1994), “Using a Risk Register to Integrate Risk Management in Project Definition”, International Journal of Project Management, 12(1), pp. 1722. [62] Ward S.C. (1999), “Assessing and Managing Important Risks”, International Journal of Project Management, 17(6), pp. 331-336. 124 [63] Patterson F.D. and Neailey K. (2002), “A Risk Register Database System to Aid the Management of Project Risk”, International Journal of Project Management, 20(5), pp. 365-374. [64] Hillson D. (1999), “Developing Effective Risk Responses”, Proceedings of the 30th Annual Project Management Institute Seminars and Symposium, Philadelphia USA. [65] Al-Bahar J. and Crandall K.C. (1990), “Systematic Risk Management Approach for Construction Projects”, Journal of Construction Engineering and Management, 116(3), pp. 533-546. [66] UK Ministry of Defence (1991), “Risk Management in Defence Procurement”, Ministry of Defence, Whitehall, London. [67] del Caano A. and de la Cruz M.P (2002), “Integrated Methodology for Project Risk Management”, Journal of Construction Engineering and Management, 128(6), pp. 473-485. [68] Wideman R.M. (1992), “Project and Program Risk Management”, Newtown Square, PA, USA, Project Management Institute. [69] BSI (1999), “Guide to Project Management”, London, British Standard. [70] Rosenberg L.H. et al. (1999), “Continuous Risk Management at NASA”, NASA, available online. [71] Defense Systems Management College (2000), “Risk Management Guide for Dod Acquisition”, USA, Department of Defense. [72] US Department of Transportation (2000), “Project Management in the Department of Transportation”. [73] Baber R.B. (2005), “Understanding Internally Generated Risks in Projects”, International Journal of Project Management, 23(8): 584-590. [74] Goldstein M. (2006), “Subjective Bayesian analysis: Principle and practice”, Bayesian Analysis, 1(3), pp. 403-420. [75] Joshua H., Martin N., Norman E. F. (2020), “Product risk assessment: a Bayesian network approach”, Proceedings of the 2020 ACM Southeast Conference, April 2020, pp. 34–38. 125 [76] McCabe B. (1998), “Belief Networks in Construction Simulation”, Proceedings of the 30th Conference on Winter simulation, IEEE Computer Society Press. [77] Nasir D., McCabe B. et al. (2003), "Evaluating Risk in Construction-Schedule Model (Eric-S): Construction Schedule Risk Model", Journal of Construction Engineering & Management, 129(5), pp. 518-827. [78] Houston D. (2000), “Survey on potential effects of major development risk factors”, Arizona State University Research Project. [79] Cortellessa V. et al. (2005), “Model-Based Performance Risk Analysis”, IEEE Transactions on Software Engineering, 31(1): 3–20. [80] Islam S. (2012), “Software Development Risk Management Model - A GoalDriven Approach”, Technical Report. [81] Alberts C.J. and Dorofee A.J. (2010), “Risk management framework”, SEI Technical Report. [82] NASA Policy Detective (2005), NPD 2820.1A NASA Software Policies. [83] IEEE Computer Society (2001), “IEEE Standard for Software Life Cycle Processes - Risk Management”. [84] Tore D. and Torgeir D. (2008), “Empirical studies of agile software development: A systematic review”, Information and Software Technology 50.9-10, pp. 833–859. [85] Augustine S. (2005), “Managing Agile Projects”, Upper Saddle River, NJ, USA: Prentice Hall PTR. [86] Tsun Chow and Dac-Buu Cao (2008), “A survey study of critical success factors in agile software projects”, Journal of System and Software 81(6), pp. 961– 971. [87] Schwaber K. and Beedle M. (2001), “Agile Software Development with Scrum”. [88] Martin R.C. (2002), “Agile Software Development, Principles, Patterns and Practices”. [89] Miller A. (2008), “Distributed Agile Development at Microsoft patterns and practices”. 126 [90] Agile Alliance, “Manifesto for agile software development”, [Online] Retrieved 14 May 2017. Available at: http://agilemanifesto.org [91] Nguyen N.T. and Huynh Q.T. (2013), “Combining Maturity and Agility – Lessons Learnt From A Case Study”, Proceedings of the 4th International Symposium on ICT SoICT 2013, pp. 267-274. [92] VersionOne, 7th Annual Survey (2013), “The State of Agile Development”, Full Data Report. [93] Fox T. L. and Spence J. W. (1998), “Tools of the trade: a survey of project management tools”, Project Management Journal, 29, pp. 20-28. [94] Pollack-Johnson B. (1998), “Project management software usage patterns and suggested research directions for future development”, Project Management Journal, 29, pp. 19-29. 127 Index A agile iteration scheduling ............................. 3, 4, 81, 92 agile software development 4, 22, 27, 32, 34, 36, 91, 92, 117, 126, 127 R Bayesian Networks .... 3, 4, 5, 18, 22, 23, 38, 39, 42, 46, 48, 81, 92, 102, 112, 116, 117, 118, 119, 121, 123 BNs18, 19, 21, 22, 40, 42, 43, 44, 45, 46, 48, 54, 64, 66, 68, 73, 74, 78, 79, 80, 81, 82, 85, 86, 92, 93, 97, 98, 100, 101, 102, 105, 107, 112, 113, 115, 116 BRI.................................... 4, 10, 23, 46, 57, 63, 79, 117 risk analysis ............ 3, 4, 16, 17, 30, 36, 37, 42, 43, 123 risk factors 3, 4, 5, 18, 19, 20, 22, 23, 31, 45, 46, 47, 48, 50, 51, 54, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 92, 93, 94, 96, 100, 101, 102, 103, 104, 105, 112, 116, 117, 118, 126 Risk management.................................... 3, 34, 119, 126 risks .. 3, 4, 15, 16, 17, 18, 19, 21, 22, 23, 34, 35, 36, 37, 38, 43, 44, 46, 54, 57, 58, 61, 62, 63, 64, 68, 69, 74, 75, 76, 77, 79, 81, 82, 85, 92, 93, 95, 97, 98, 101, 102, 103, 104, 105, 112, 115, 117, 118 C S CKDY ............. 4, 19, 23, 46, 64, 68, 73, 74, 75, 79, 117 CPM ....3, 4, 5, 10, 17, 18, 19, 21, 22, 23, 27, 28, 29, 30, 45, 64, 75, 92, 93, 96, 97, 98, 101, 102, 111, 112, 116, 117 Scrum............................................ 32, 34, 114, 124, 126 Software project management ........................... 3, 24, 25 Software project scheduling........................ 3, 26, 27, 82 B T M makespan ................................................................... 89 P PERT...3, 4, 5, 10, 11, 17, 18, 20, 21, 22, 23, 27, 29, 30, 45, 64, 75, 101, 102, 103, 104, 106, 110, 112, 116, 117, 121 project management . 3, 5, 16, 17, 18, 22, 24, 25, 27, 29, 31, 34, 37, 44, 46, 47, 48, 60, 61, 62, 66, 123, 127 The RBCPM Method............................................... 96 The RBCPM Model ................................................. 93 the tool BAIS ................................ 88, 90, 113, 114, 116 The tool RBCPM ..................................................... 97 The tool RBPERT.................................................. 106 U uncertainty .... 3, 4, 17, 18, 19, 21, 22, 27, 29, 30, 31, 32, 34, 35, 36, 37, 43, 44, 46, 64, 81, 82, 85, 92, 98, 101, 102, 112, 116, 118, 121 Q quantitative risk analysis .................................. 3, 17, 36 128 Appendix. Sub Bayesian Networks of the 24 risk factors This appendix demonstrates in details the sub BNs associated 24 risk factors which was examined in Section 2.1.2. staff_experience_shortage +untrained_staff +staff_training +project_schedule Figure 1. A sub BN for the risk factor “Staff experience shortage” +decision_make_delay reliance_on_a_few_person +productivity +low_moral Figure 2. A sub BN for the risk factor “Reliance on few key person” 129 Figure 3. A sub BN for the risk factor “Schedule pressure” Figure 4. A sub BN for the risk factor “Low productivity” 130 lack_of_staff_commitment ++productivity +loss_of_staff +staff_experience_shortage Figure 5. A sub BN for the risk factor “Lack of staff commitment” +defect_rate ++lack_of_client_input +lack_of_staff_commitment lack_of_client_support ++missed_requirement +creeping_user_requirements Figure 6. A sub BN for the risk factor “Lack of client support” 131 ++decision_making_delay ++low_moral +rework lack_of_contact_person_ competence +++schedule_pressure +communication_overhead ++missed requirement +creeping_user_requirements Figure 7. A sub BN for the risk factor “Lack of contact person competence” lack_of_quantitative_historical_data ++inaccuring_cost_estimating Figure 8. A sub BN for the risk factor “Lack of quantitative historical data” 132 inaccurate_cost_estimating +staff_experience_ shortage ++schedule_pressure Figure 9. A sub BN for the risk factor “Inaccurate cost estimating” ++large_and_complex_project large_and_complex_external_interface Figure 10. A sub BN for the risk factor “Large and complex external interface” +communication_overhead +++defect_rate large_and_complex_project Figure 11. A sub BN for the risk factor “Large and complex project” 133 ++project_size unnecessary_features Figure 12. A sub BN for the risk factor “Unnecessary features” ++rework ++project_size creeping_user_requirement Figure 13. A sub BN for the risk factor “Creeping user requirement” +defect_rate ++schedule_delay unreliable_subproject_delivery Figure 14. A sub BN for the risk factor “Unreliable subproject delivery” 134 Figure 15. A sub BN for the risk factor “Incapable project management” +staff_experience_ shortage +low_moral lack_of_senior_managem ent_commitment ++project_schedule +schedule_pressure Figure 16. A sub BN for the risk factor “Lack of senior management commitment” 135 lack_of_organization_maturity ++incurate_cost_estimating ++inadequate_process_method ++schedule_pressure Figure 17. A sub BN for the risk factor “Lack of organization maturity” +rework +schedule_pressure ++inadequate_process_ method +productivity immature_technology +defect_rate Figure 18. A sub BN for risk factor “Immature technology” 136 +rework +productivity ++defect_rate inadequate_configuration_ control +manual_efforts +project_schedule Figure 19. A sub BN for the risk factor “Inadequate configuration control” +defect_rate +productivity ++low_moral excessive_paper_work Figure 20. A sub BN for the risk factor “Excessive paperwork” 137 +schedule_pressure ++inaccurate_reporting ++inaccurate_cost_ estimating inaccurate_metrics Figure 21. A sub BN for the risk factor “Inaccurate metrics” +inaccurate_cost _estimating +schedule_pressure excessive_reliance_on_a_sing le_process_improvement +defect_rate Figure 22. A sub BN for risk factor “Excessive reliance on a single process improvement” 138 +communication_overhead ++productivity lack_of_experience_with_ project_environment ++staff_training Figure 23. A sub BN for the risk factor “Lack of experience with project environment” +defect_rate +communication_overhead lack_of_experience_with_project_software +productivity +staff_training Figure 24. A sub BN for the risk factor “Lack of experience with project software” 139