A Survey on Software Cost Estimation in the Chinese Software Industry Da Yang, Qing Wang, Mingshu Li, Ye Yang, Kai Ye, and Jing Du Institute of Software, Chinese Academy of Sciences Lab for Internet Software Technologies Outline 1. Background and Research Questions 2. Research Methods 3. Survey Results and Discussions 4. Threat to Validity 5. Future Works Background Software Cost Estimation Basis for project bidding, budgeting, planning, and cost control Many software projects suffer from schedule and effort overruns. An active research field (ESEM2008 10 papers) During the last 40 years, many cost estimation models were proposed. But very few organization used them. What to improve and how to improve? Important to know the current situation of software cost estimation in the industry Performance, methods used, causes for the low use of cost estimation models, etc. Background Factors motivated this survey research Still no survey on software cost estimation in the Chinese software industry Most of the previous surveys were conducted in the 1980s or early 1990s. Business environment, development technology, and process are changing Lacked research on the factors which influence the adoptions of software cost estimation methods and the factors lead to the low use of model-based software cost estimation Research Questions Current Performance RQ1: What is the accuracy of effort and schedule estimation? RQ2: Does project size affect effort and schedule estimation accuracy? Current Practice RQ3: To what extent are different estimation methods used in the industry? RQ4: For what purposes are cost estimations used? RQ7: When do organizations usually make cost estimations? Research Questions Current stakeholders RQ5: How important do people think estimation is, in comparison with other aspects of development? RQ6: How satisfied are people with the current software cost estimation? Factors for current situation RQ8: What are the causes of inaccurate estimations? RQ9: What are the barriers and difficulties in the application of software cost estimation models? Outline 1. Background and Research Questions 2. Research Methods 3. Survey Results and Discussions 4. Threat to Validity 5. Future Works Research Methods Investigate performance of estimation Needs large amount of detailed project data The CSBSG data set Collected by the Chinese Software Benchmarking Standards Group to reflect the status and best practices of the Chinese software industry. 112 projects have recorded complete information of planned and actual values of project development effort and schedule. Projects distribution among business areas Summary of the 112 projects Mean Size (SLOC) 123788.1 Effort (Man-Hours) 7883.6 175.1 Schedule (Days) Median 46116.5 4102 150 Min 1480 160 10 Max 2339728 115816 851 Research Methods Relative Error to the Estimate The percentage difference between the actual value and the estimated value This measure is more meaningful since profit or loss should be calculated on the basis of expected cost by most project managers x y REE y x = actual , y = estimate Research Methods Conducting questionnaire survey Followed SEI Guideline of Survey Design 1) Identify the research objectives 2) Identify and characterize the target audience 3) Design the sampling plan 4) Design and write the questionnaire 5) Pilot test the questionnaire Reduce ambiguities, remove difficult questions 6) Distribute the questionnaire "2007 Chinese Systems and Software Process Improvement Conference“ 400 questionnaires. Got 171 respondents from 116 organizations. 7) Analyze the results Size of the 116 respondent organizations Organization size (Persons) < 51 51-100 101-200 201-500 501-1000 > 1000 # of organizations 9 11 26 32 19 19 Information about software process improvement Software Process Improvement Standards CMM CMMI ISO-9000 CMM&CMMI CMM&ISO-9000 CMMI&ISO-9000 CMM, CMMI, ISO-9000 No SPI assessment Total # of organizations 5 43 15 1 8 25 4 15 116 Outline 1. Background and Research Questions 2. Research Methods 3. Survey Results and Discussions 4. Threat to Validity 5. Future Works Survey Results and Discussions To present the survey results, we needs to Organize our various observations estimation performance, technology usage, barriers of technology transfer, potential improvement, etc. Provide a holistic view of the current situation of software cost estimation The UTAUT (Unified Theory of Acceptance and Use of Technology) model Survey Results and Discussions Comparison of the estimation accuracy Sources Year Cost overrun [6] 1984 34% median Effort Act. > Est. Effort Act. < Est. Schedule overrun 61% Schedule Act. > Est. Schedule Act. < Est. 65% [7] 1988 33% mean [8] 1989 [9] 1991 70% 63% 14% 10% 22% mean 4% 80% [10] 1992 33% mean [11] 2003 18% mean 59% [3] 2004 21% median 41% mean 76% New 2007 5% median 12% mean 68% 15% 19% 29% 23% mean 35% 9% median 25% mean 62% 7% median 17% mean 63% 3% 2% 21% Distribution of the cost estimation REE 50% 40% 22% projects Overrun > 20% 20% Percent Percent 30% 29% projects Overrun > 20% 30% 20% 10% 10% 0% 0.00 0.50 1.00 REE of Effort Estimation 0.00 1.00 2.00 REE of Duration Estimation OB1: More than half of the software projects suffered from effort or cost overruns. 22% projects overran effort larger than 20%, 29% projects overran schedule larger than 20%. REE of Std. Groups Mean Median Estimation Dev. Effort Schedule SMALL 0.07 0.04 0.20 LARGE 0.16 0.08 0.33 SMALL 0.06 0.00 0.27 LARGE 0.15 0.49 0.28 Levene's Test for t-test for Equality of Equality of Variances (p) Means (p) 0.004 (**p < 0.01) 0.057 (+p < 0.01) 0.033 (*p < 0.05) 0.003 (**p < 0.01) OB2: Large software projects vs. small ones had lower effort and schedule estimation accuracy were more prone to effort and schedule overruns had higher variance of estimation accuracy Causes of inaccurate estimations Rate on the extent of responsibility 1 (min) – 5 (max) 1 Requirements are volatile 2 Requirements are unclear Pressure from senior manager and client to set or change the 3 estimation results 4 Not enough resource for estimation 5 Not efficient historical projects Lack of appropriate software cost estimation methods and 6 process 7 Lack of stakeholder collaboration 8 Lack of risk assessment and management 9 Lack of cost control according to plan 10 Lack of estimation tools 11 Hard to assess the ability of developers 12 Lack of product risk assessment 13 Estimation lack involvement of developers Other causes proposed by respondents 14) The project bidding requirements predefined the project cost 15) The survival pressure and business pattern of company 16) Lack of training and appropriate application of estimation methods Rating (1-5 scale) 3.82 3.70 3.22 3.17 3.13 3.10 3.07 3.06 3.01 2.97 2.90 2.87 2.72 Survey Results and Discussions Estimation methods used by organizations Sources [14] [8] [4] [3] New Year 1987 1989 1995 2004 2007 Percentage used each methods (more than one method possible) Expert consultation 26% 86% 100% 70% Intuition and experience 85% 62% Analogy Software cost models Price-to-win 13% Capacity related Top-down 1) existing literature suggests that the capacity-related and price-toBottom-up win methods reinforce poor Other and generally produce 12% practices large overruns 61% 14% 8% 65% 26% 16% 28% 70% 15% 53% 11% 2) OB3 13% indicates that 'set or change estimation 51%results according to pressure like capacity or price' is 9%one of the 0%top three rated causes for inaccurate cost estimations. 21% Survey Results and Discussions Barriers or difficulties of applying cost models Persons (n) % 1) Software cost estimation models cost a lot of effort to collect 89 58% Effortmodels, Expectancy data, configure parameters, calibrate etc 2) Organization do not have sufficient investment for improving 69 45% software cost estimation Facilitating Conditions 3) The software cost estimation models cannot bring significant 61 40% Performance Expectancy benefit 4) Lack corresponding tools which are easy to use 56 37% 5) Haven't found appropriate software cost estimation models or 52 34% tools 6) Software cost estimation models are hard to learn and use 34 22% Total 153 100% Other causes proposed by respondents 7) Client didn't require using software cost estimation models 8) Estimation models were not accurate and effective 9) The culture didn't favor using software cost estimation models Influence 10) Senior managers only lookedSocial at the result and didn't care for the estimation process 11) Schedule pressure was very high, and the estimation could not help 12) Estimation models haven't well defined what kind of data need to be collected Survey Results and Discussions The UTAUT model and our findings suggest : Only providing accurate cost estimation is not enough to guaranty the acceptance and usage of a cost estimation technology To introduce new cost estimation technology and change the current situation, we should also pay attention to the "performance expectancy", "effort expectancy", "social influence", and "facilitating conditions“ of the technology. Survey Results and Discussions The importance of software cost estimation Importance Most unimportant Very unimportant Unimportant Neutral Important Very important Most important Total Persons (n) 1 2 3 15 63 77 5 166 Percentage 1% 1% 2% 9% 38% 46% 3% 100% The satisfaction level of respondents Satisfaction Level Very unsatisfactory Unsatisfactory Neutral Satisfactory Very Satisfactory Total Persons (n) 2 34 100 26 0 162 Percentage 1% 21% 62% 16% 0% 100% Survey Results and Discussions The usage purposes of software cost estimation Where are estimates used? Project proposal evaluation Contract negotiation Making budget Project-level planning and control, e.g. effort or schedule distribution among development phases Short-term planning and control, e.g. weekly or monthly team work plan Organizations (n) Percentage 52 45% 53 46% 85 74% 97 84% 58 50% Software process improvement, e.g. assess new process, improve productivity 43 37% Total 115 100% When cost estimations are usually made Software Development Phases Initial project proposal stage Feasibility study Requirement Design Implementation Integration and testing Transition Total Organizations (n) 64 76 84 41 31 20 13 113 Percentage 57% 67% 74% 36% 27% 18% 12% 100% OB8 and OB9 indicate two needs of software cost estimation better support of various estimation goals better support of early lifecycle cost estimation (when uncertainty is large) Outline 1. Background and Research Questions 2. Research Methods 3. Survey Results and Discussions 4. Threat to Validity 5. Future Works Threat to Validity Projects Data Questionnaire samples may be recorded by organizations with higher maturity levels may be biased to organizations above average size and process maturity level of the industry Generalizability of the results Cultural issues that reduce the generalizability of the results Future Works Design measures To evaluate cost estimation methods based on 1)performance expectancy, 2)effort expectancy, 3)social influence, and 4)facilitating conditions. Developing new estimation methods and tools Future Works Conduct surveys and experiments in more detail See the variance and changes in the performance of software cost estimations Identify related factors and potential improvements Q&A Thanks!