Automation in Construction 19 (2010) 619–629 Contents lists available at ScienceDirect Automation in Construction j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / a u t c o n Estimate at Completion for construction projects using Evolutionary Support Vector Machine Inference Model Min-Yuan Cheng a, Hsien-Sheng Peng b,⁎, Yu-Wei Wu a, Te-Lin Chen a a b Department of Construction Engineering, National Taiwan University of Science and Technology, Taiwan Ecological and Hazard Mitigation Engineering Research Center, National Taiwan University of Science and Technology, Taiwan a r t i c l e i n f o Article history: Accepted 24 February 2010 Keywords: Estimate at Completion Fast Messy Genetic Algorithms Support Vector Machine a b s t r a c t Construction projects are influenced by a range of factors that impact upon final project cost. Estimate at Completion (EAC) is an important approach used to estimate final project cost, which takes into consideration probable project performance and risks. EAC helps project managers identify potential but still unknown problems and adopt response strategies. This study constructed an evolutionary EAC model to generate project cost estimates that proved significantly more reliable than estimates achievable using currently prevailing formulae. The developed learning model fused two artificial intelligence approaches, namely the fast messy genetic algorithm (fmGA) and Support Vector Machine (SVM), to create an Evolutionary Support Vector Machine Inference Model (ESIM). The ESIM was then applied to estimate final project costs for historical cases. Finally, using the EAC estimate, project cost influence indices, and project cost diagrams, the discrepancy between estimate and practical values was examined to determine potential problems in order to help project managers better control project costs. The learning results were validated in real applications that showed good performance for training models. Providing project managers reliable EAC trend estimates is helpful for their effective control of project costs and taking appropriate peremptory measures to handle potential problems. © 2010 Elsevier B.V. All rights reserved. 1. Introduction Engineering projects face myriad uncertainties attributable to chosen construction method as well as environmental and process factors [1,2]. Construction firms typically focus only on budget planning during the initial project stage, which ignores engineering cost changes, information updates and cost management during construction and, in turn, prevents effective project cost control and the identification of potential problems. Cost overruns are frequently not discovered until later project stages, at which time it is typically too late to take effective remedial measures. Effective project management requires that plans be constantly revised in accordance with actual project conditions. However, factors that influence project cost are numerous, and it is difficult to consider individually each factor of influence at each stage. Moreover, data on construction cost are manifold and variability is great. In order to update cost information item by item in a timely manner, management must adopt an efficient approach to the issue and invest significant time. ⁎ Corresponding author. #43, Sec. 4, Keelung Rd., Taipei, 106, Taiwan. Tel.: +886 2 27301212; fax: +886 2 27301074. E-mail address: hspeng@mail.ntust.edu.tw (H.-S. Peng). 0926-5805/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.autcon.2010.02.008 The Estimate at Completion (EAC) is a quick and automatic formula used by managers to assess the cost of work to complete schedule activities [1]. Many researches have already been done in this area and the methodologies that the existing previous works used were various. Barraza et al. [3] used the concept of stochastic S curves (SS curves) to determine forecasted project estimates as an alternative to using deterministic S curves and traditional forecasting methods. A simulation approach is used for generating the stochastic S curves, and it is based on the defined variability in duration and cost of the individual activities within the process. Lee [4] introduced a software, Stochastic Project Scheduling Simulation (SPSS), developed to measure the probability to complete a project in a certain time specified by the user. The SPSS finds the longest path in a network and runs the network a number of times specified by the user and calculates the stochastic probability to complete the project in the specified time. Lee and Arditi [5] described a stochastic simulation-based scheduling system (S3) that integrates the deterministic critical path method (CPM), the probabilistic program evaluation and review technique (PERT), and the stochastic discrete event simulation (DES) approaches into a single system. The system is based on an earlier version of the system called Stochastic Project Scheduling Simulation and makes use of all the capabilities of this system. Kim and Reinschmidt [6] introduced a new probabilistic forecasting method for schedule performance control and risk management of on-going projects. The Bayesian betaS-curve method (BBM) is 620 M.-Y. Cheng et al. / Automation in Construction 19 (2010) 619–629 Fig. 1. ESIM framework. based on Bayesian inference and the beta distribution. The BBM provides confidence bounds on predictions, which can be used to determine the range of potential outcomes and the probability of success. Earned Value Management (EVM) is one of the theoretical methods for determining EAC that was originally developed for cost management and later adopted to forecast project duration. Kim et al. [7] indicated that EVM is gaining wider acceptance due to increasing recognition of its ability both to diminish EVM problems and improve utilities. A broader approach was developed that considers the four-factor groups (i.e. EVM users, EVM methodology, project environment, implementation process) together to improve significantly the acceptance and performance of EVM in different types of organizations and projects. Cioffi [8] presented a new formula and corresponding notation for earned value analysis that makes earned value calculations more transparent and flexible. Vandevoorde and Vanhoucke [9] compare the classic earned value performance indicators SV and SPI with the newly developed earned schedule performance indicators SV(t) and SPI(t), and then present a generic schedule forecasting formula applicable in different project situations and compare the three methods from literature to forecast total project duration. Finally, the use of each method was illustrated on a simple one activity example project and on real-life project data. Vitner et al. [10] investigated the possibility of using the data envelope analysis (DEA) approach to evaluate project performances in a multi-project environment, which evaluates projects using earned value management system (EVMS) and multidimensional control system (MPCS) methods. Vanhoucke and Vandevoorde [11] extensively reviewed and evaluated earned value (EV)-based methods to forecast the total project duration, and investigated the potential of a recently developed method, the earned schedule method, which improves the connection between EV metrics and the project duration forecasts. Lipke et al. [12] provided a method to improve the capability of project managers to make informed decisions by providing a reliable method to forecast final cost and duration. Their method and its evaluation made use of a well established project management method, a recent technique developed to analyze schedule performance, and statistical mathematics to develop EVM, earned schedule (ES) and statistical prediction and testing methods. Plaza and Turetken [13] Table 1 Influencing factors of construction cost. Classification Influencing factor Index Definition Time now Construction duration Duration to date/revised contract duration Actual cost Construction progress percentage ACP Planned cost EVP Construction Cost management management Time management Subcontractor management Contract Contract scope payment Change order External environment Construction price fluctuation No. of rainy day Fig. 2. EAC prediction model. CPI Actual cost/budget at completion Earned value/budget at completion Earned value/actual cost SPI Earned value/planned value Subcontractor billed index Owner billed index Change order index CCI Subcontractor billed amount/ actual cost Owner billed amount/earned value Revised contract amount/ budget at completion Construction material price index of that month/ construction material price index of initial stage (Revised project duration — no. of rainy day)/revised project duration Climate effect index M.-Y. Cheng et al. / Automation in Construction 19 (2010) 619–629 621 Table 2 Historical cases. Project name Total area (m2) Underground floors Ground floors Buildings Start date Finish date Duration (days) Contract amount (NTD) ESIM prediction periods A B C D E F G H I J K L M Total Subtotal (training cases) Subtotal (testing cases) 12,622 4919 19,205 5358 27,468 31,797 7707 10,087 3479 6352 4774 7289 3094 2 3 5 3 2 2 2 3 1 4 2 2 2 9 11 8 9 11 9 14 14 10 11 11 8 7 1 1 1 1 3 4 1 1 1 1 1 1 1 2003/12/1 2003/12/13 2000/5/20 2000/11/15 1999/12/16 2001/7/4 2001/11/24 2002/6/18 2003/6/2 2004/3/5 2004/2/21 2005/6/15 2005/10/1 2005/8/22 2005/11/10 2002/5/19 2002/11/14 2001/12/3 2003/3/31 2003/10/20 2004/7/6 2004/9/30 2006/2/18 2006/2/20 2006/9/15 2007/2/28 630 698 729 729 718 635 695 749 486 715 730 457 515 289,992,000 149,300,000 332,800,000 199,600,000 1,142,148,388 530,000,000 153,500,000 216,000,000 85,714,286 202,241,810 145,377,589 190,844,707 102,500,000 29 24 20 25 26 20 22 27 18 31 27 20 17 306 269 37 proposed an extended version of EVM (EVM/LC) that addresses the effect of learning on the performance of project teams. A spreadsheetbased decision support tool that automates calculations and analyses was presented in EVM/LC. Leu and Lin [14] attempts to refine and improve the performance of traditional EVM by the introduction of statistical control chart techniques. Individual control charts are used as tools to monitor project performance data in order to detect adverse changes in a timely manner. At least eight common methods have already been put forward that use EVM to predict the EAC for construction projects [15,16]. Each has been applied to different special projects and achieved differing EAC error rates. When applied to different special projects, the predictions achieved Note Testing case Testing case by the single method are extremely accurate for some and present obvious errors for others. This has created confusion in the industry as to which kind of prediction method should be chosen for particular project types. Another issue with EVM is that it must be applied to each distinct construction project process, with revisions conducted manually. Such makes EVM both complicated and time consuming. Consequently, computerization of the engineering management process is critical if EVM is to be applied effectively to control construction costs. However, most construction firms in Taiwan use computer systems powerful enough only to analyze initial stage budgets. Systems are not equipped to react to changes at each construction stage or use the EVM method to predict construction project EAC. Table 3 Input variables. Training set name and no Construction progress percentage ACP EVP CPI SPI Subcontractor billed index Owner billed index Change order index CCI Climate effect index Project-A-1 Project-A-2 Project-A-3 Project-A-4 Project-A-5 Project-A-6 Project-A-7 Project-A-8 Project-A-9 Project-A-10 Project-A-11 Project-A-12 Project-A-13 Project-A-14 Project-A-15 Project-A-16 Project-A-17 Project-A-18 Project-A-19 Project-A-20 Project-A-21 Project-A-22 Project-A-23 Project-A-24 Project-A-25 Project-A-26 Project-A-27 Project-A-28 Project-A-29 Project-B-1 Project-B-2 Project-B-3 Project-M-15 Project-M-16 Project-M-17 3.8% 7.9% 11.7% 15.8% 19.9% 23.8% 27.9% 31.8% 35.9% 40.0% 44.0% 48.0% 52.0% 56.1% 59.9% 67.9% 71.8% 75.9% 79.9% 84.0% 88.0% 92.0% 96.1% 100.0% 104.1% 107.9% 111.8% 115.9% 119.9% 2.4% 6.9% 0.0% 1.8% 3.4% 4.9% 9.0% 12.1% 13.8% 14.5% 18.2% 22.8% 26.9% 32.0% 37.4% 40.3% 42.1% 47.4% 53.1% 62.9% 71.2% 83.5% 92.0% 94.1% 97.1% 98.9% 100.2% 100.4% 100.6% 104.1% 108.7% 0.0% 2.5% 0.0% 3.6% 7.7% 10.1% 11.4% 12.4% 13.7% 15.2% 16.7% 16.7% 23.0% 30.3% 32.9% 39.1% 43.0% 57.0% 66.8% 76.1% 85.8% 97.3% 98.2% 98.5% 100.2% 100.2% 101.4% 101.4% 101.4% 101.4% 101.4% 0.0% 0.0% 1.00 2.01 2.24 2.05 1.27 1.02 1.00 1.04 0.92 0.73 0.86 0.95 0.88 0.97 1.02 1.20 1.26 1.21 1.20 1.16 1.07 1.05 1.03 1.01 1.01 1.01 1.01 0.97 0.93 1.00 1.00 1.00 0.69 0.89 0.99 0.99 1.00 0.99 1.00 1.02 1.02 0.93 1.02 0.94 0.95 0.95 0.94 0.95 0.96 0.97 0.99 0.99 0.99 0.99 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.67 1.33 0.73 1.07 1.06 1.14 0.91 0.68 0.64 0.71 0.69 0.74 0.81 0.84 0.86 0.87 0.88 0.97 0.88 0.95 0.92 0.91 1.01 1.01 1.01 0.97 0.93 1.00 1.16 1.00 1.00 0.75 0.65 0.57 1.05 1.06 1.09 0.99 0.93 0.75 0.75 0.78 0.77 0.79 0.70 0.68 0.72 0.73 0.83 0.82 0.91 0.89 0.89 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.01 1.01 1.01 1.01 1.01 1.01 1.01 1.00 1.00 1.00 1.03 1.09 1.12 1.11 1.10 1.10 1.12 1.13 1.13 1.14 1.13 1.12 1.12 1.12 1.13 1.12 1.11 1.11 1.11 1.12 1.12 1.12 1.12 1.12 1.13 1.14 1.16 1.20 1.00 1.03 1.00 0.99 0.98 0.97 0.96 0.95 0.95 0.94 0.92 0.91 0.90 0.90 0.90 0.89 0.88 0.87 0.86 0.84 0.83 0.82 0.81 0.80 0.80 0.79 0.79 0.79 0.78 0.76 0.75 1.00 0.99 105.8% 111.8% 84.4% 92.5% 99.9% 95.9% 1.18 1.04 1.00 1.00 1.08 1.04 0.92 1.00 1.00 0.96 1.13 1.13 0.76 0.75 622 M.-Y. Cheng et al. / Automation in Construction 19 (2010) 619–629 Fig. 4. Predicted output and actual output of estimate to completion for case L. Fig. 3. Predicted output and actual output of training data. In light of the above, the development of a fast and effective system that considers the uncertainty and myriad problems involved in cost control over the course of a project course while predicting construction project EAC is an important issue to be resolved. This research aims to resolve questions encountered in project cost management by collecting relevant papers in the literature and historical cases in order to identify the set of factors that influence significantly on project cost. A project cost flow trend was then set up using historical case data. The relationship between monthly costs and project EAC was mapped based on past knowledge and experience. An Evolutionary Support Vector Machine Inference Model (ESIM) was established using historical case experience to predict and control EAC variation tendency of the project during construction. Finally, indices of project cost and project cost diagrams were employed to identify the reason for the difference between actual and ESIM-predicted values so that potential problems can be found and effective control and management actions be implemented at proper points in time. 2. Evolutionary Support Vector Machine Inference Model (ESIM) 2.1. Support Vector Machine (SVM) Support Vector Machine (SVM) is a computer training technique popularized in recent years. It is based on a statistics learning theory described by Vapnik [17]. Traditional training techniques usually focus on minimizing empirical risk; i.e., minimizing the classification error of training data. However, SVM aims to minimize the structural risk in finding a probable upper bound of the classification error of training data [18]. This new computer training technique effectively minimizes the upper bound of theoretical error. Data classification and regression, two critical components of computer science, are being used in increasingly broad and general applications. Traditional classification methods include neural networks (NNs), decision trees and nearest neighbour method, among others. SVM is a new method that has already proved its value through good results in many applications. SVM has relatively firmer theoretical foundations than NNs. Support Vector Classification (SVC) is founded on the principle of minimizing training theoretical structure risk. An important advantage of SVC is its ability to handle linear inseparable problems. SVC utilizes existing data to do training and then selects several support vectors by analyzing the training data to represent the whole data. Some extreme values were eliminated in advance. Finally, selected support vectors were packed into a model and SVC was used to carry out classification on testing data. The concept of Support Vector Regression (SVR) is similar to that of SVC. It maps regression problems from low dimensional to high dimensional vector spaces to identify the support vector in which the appropriate linear regression equation could be obtained. 2.2. Fast Messy Genetic Algorithms (fmGA) The Simple Genetic Algorithm (sGA), an efficient and accurate algorithm, was first developed by Holland in 1975. Goldberg et al. subsequently developed the Messy Genetic Algorithm (mGA) in 1989 in order to improve sGA shortcomings. Several experiments [17,19] have since shown the mGA much better at solving permutation problems than sGA. In 1993, Goldberg established the Fast Messy Genetic Algorithm (fmGA) to reduce the high memory consumption of operation processes [20]. The mGA resolved the problem that sGA does not consider logical limitations amongst gene bunches during the optimization process. There are four main differences in solving mechanisms between fmGA and sGA [21,22]. The first is that chromosomes of variable length could be adopted in fmGA. Secondly, simple cut and splice are used to replace the sGA operator mechanism. Thirdly, the optimization process includes a primordial and juxtapositional stage. Lastly, competitive templates are adopted to retain the most outstanding gene building blocks in each generation [23]. 2.3. ESIM framework The Evolutionary Support Vector Machine Inference Model (ESIM) fuses SVM and fmGA [20,24,25]. In this model, SVM is used to sum up the complicated relationship between input parameters and output parameters, while fmGA searches for the best parameters (C and γ) needed by SVM needs in order to improve SVM prediction accuracy. The framework of SVM is shown in Fig. 1. Steps are explained as follows: Default C, γ: The value of C and γ may be set up differently to reflect case and problem characteristics. C and γ can be selected as 1 and 1/M respectively, where M stands for parameter number. Table 4 Validation of training cases. Project name A B C D E F G H I J K Total Number of periods Average error of EACP Qualified (b10%) 29 8.7% Yes 24 5.6% Yes 20 4.1% Yes 25 5.6% Yes 26 6.2% Yes 20 7.6% Yes 22 3.6% Yes 27 14.0% No 18 5.0% Yes 31 4.2% Yes 27 3.7% Yes 269 91% M.-Y. Cheng et al. / Automation in Construction 19 (2010) 619–629 623 Table 5 (a) Testing results of case L. (b) Testing results of case M. Training set name and no Predicted output Actual output EACP (predicted) EACP (actual) e (error of EACP) (a) Project L-1 Project L-2 Project L-3 Project L-4 Project L-5 Project L-6 Project L-7 Project L-8 Project L-9 Project L-10 Project L-11 Project L-12 Project L-13 Project L-14 Project L-15 Project L-16 Project L-17 Project L-18 Project L-19 Project L-20 Average error 0.76985 0.75566 0.69682 0.66103 0.62353 0.57243 0.52164 0.47307 0.43308 0.40248 0.42472 0.39262 0.37184 0.33271 0.30792 0.13018 0.11847 0.12773 0.09538 0.05293 0.72710 0.72230 0.70360 0.63210 0.59440 0.54420 0.50640 0.44560 0.40210 0.37220 0.33140 0.29480 0.25570 0.22340 0.17890 0.15060 0.12780 0.10770 0.08770 0.08330 89.76% 88.53% 83.28% 87.95% 87.98% 87.86% 86.17% 87.75% 88.22% 88.12% 96.34% 96.93% 99.32% 98.42% 101.00% 81.52% 82.96% 86.79% 85.18% 80.21% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 6.63% 5.17% 1.06% 4.48% 4.52% 4.38% 2.37% 4.25% 4.80% 4.68% 14.45% 15.16% 18.00% 16.92% 19.98% 3.16% 1.44% 3.11% 1.19% 4.71% 7.02% (b) Project M-1 Project M-2 Project M-3 Project M-4 Project M-5 Project M-6 Project M-7 Project M-8 Project M-9 Project M-10 Project M-11 Project M-12 Project M-13 Project M-14 Project M-15 Project M-16 Project M-17 Average error 0.74012 0.65222 0.61894 0.57882 0.56265 0.51792 0.49309 0.45539 0.41551 0.39051 0.29915 0.26694 0.23630 0.18320 0.17857 0.15536 0.08462 0.72410 0.65710 0.62030 0.59120 0.61940 0.51410 0.48480 0.44820 0.41580 0.39570 0.31270 0.24910 0.22980 0.20250 0.19580 0.14510 0.08330 94.58% 91.85% 92.32% 90.88% 85.09% 92.99% 93.58% 93.42% 92.45% 91.81% 90.71% 94.82% 93.34% 89.97% 90.24% 93.83% 82.66% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 84.18% 2.26% 0.69% 0.18% 1.74% 8.00% 0.55% 1.18% 1.01% 0.04% 0.73% 1.92% 2.53% 0.92% 2.72% 2.43% 1.46% 0.18% 1.68% Fig. 5. ESIM-predicted and actual EAC values of case L. Search for fmGA parameters: In this step, fmGA searches for relatively appropriate C and γ to serve as parameters for the next generation. Optimized parameters: According to the above-mentioned optimization calculations, the best gene set will be retained. The optimum inference model is obtained after gene set decoding as the C and γ value for SVM type. 3. Constructing EAC prediction model using ESIM (EAC–ESIM) Training data set: Before executing the prediction model, patterns of influence must first be found and as training data into the system as prediction input parameters. SVM training model: In this step, the user collects relevant historical cases for research. Case influence patterns serve as input parameters and the case decision serves as output parameters. Such input and output values became the training data set, and are input into the model as initial training data. SVM regards selected C and γ values as default patterns for the first training process. Average accuracy: This step regards the reciprocal of the objective function as the fitness function. A larger value correlates to a superior model framework. Termination criteria: The procedure operates continuously until certain conditions are satisfied, e.g., confirmation of appropriate fitness or absence of conspicuous fitness after making calculations in several generations to demonstrate that convergence has already been reached. Table 6 Validation of training cases qualification of testing cases. Project name Number of periods ESIM RMS Average error of EACP Qualified (b10%) L M 20 17 0.0594 0.0173 7.02% 1.68% Yes Yes During the construction phase of the project, planning objectives and achieved percentage are affected by actual work conditions, design changes and external environmental conditions. Adjustments that result to reflect such can generate differences between planned and actual completion costs. The EAC prediction is generally based on the construction budget developed based on initial project conditions (i.e., the construction scope specified in the contract and environmental factors). EAC factors of influence deduced from reference material are represented as input and prospective project cost is the output. The cost database of project cases was established accordingly. Database records were used to plan values and actual values for each month and calculate the difference between the two. The mapping relationship between input and output was found via case learning and ESIM training. Finally, the prospective cost of a new project was predicted by inputting the monthly cost of a project into the developed system in accordance with training and testing results. The process is described in Fig. 2. EACP is defined as Eq. (1) in this research. ð1Þ EACP = ACP + ETCP where EACP (estimate at completion percentage) predicting EAC in advance/total cost of original budget ACP (actual cost percentage) actual cost at some specific moment/ total cost of original budget Table 7 General EAC prediction methods of EVM. Item Equation EAC1 EAC2 EAC3 EAC4 EAC5 EAC6 EAC7 = AC + BAC − EV = BAC / CPI = BAC / SPI = AC + (BAC − EV) / CPI = AC + (BAC − EV) / SPI = AC + (BAC − EV) / SCI = AC + (BAC − EV) / (W1 * CPI + W2 * SPI) EAC8 Execution budget amount Note SCI = CPI * SPI W1 = 0.8 W2 = 0.2 Approved budget 624 M.-Y. Cheng et al. / Automation in Construction 19 (2010) 619–629 Table 8 Error of EVM and ESIM prediction results. Project name EAC1 A B C D E F G H I J K Average Qualified percentage L M Average Qualified percentage 10.0% 7.0% 27.0% 9.1% 9.7% 5.0% 3.5% 13.6% 8.7% 3.2% 3.3% 9.1% 81.8% 7.6% 4.6% 6.1% 100% EAC2 17.1% 9.1% 3.2% 6.5% 35.0% 14.0% 14.1% 8.9% 23.1% 11.4% 6.7% 13.6% 45.5% 21.0% 16.2% 18.6% 0% EAC3 6.9% 10.6% 15.9% 23.9% 5.6% 12.9% 17.5% 35.4% 19.4% 7.2% 8.6% 14.9% 36.4% 23.5% 12.2% 17.9% 0% EAC4 16.8% 9.1% 3.2% 7.1% 13.7% 12.3% 13.7% 7.5% 23.1% 9.2% 7.4% 11.2% 54.5% 19.4% 16.0% 17.7% 0% EAC5 9.7% 7.3% 7.3% 9.1% 9.7% 11.6% 14.8% 13.7% 7.6% 5.4% 3.9% 9.1% 72.7% 12.1% 13.1% 12.6% 0% ETCP (estimate to completion percentage) estimate to completion at some specific moment/total cost of original budget, also the output value of ESIM The processes of developing the model include “Identify Significant Factors of Influence”, “Historical Data Collection”, “Model Training” and “Model Testing”. The details of each process are illustrated accordingly. EAC6 15.9% 9.4% 3.2% 7.1% 13.7% 14.4% 40.0% 7.5% 21.7% 9.9% 7.0% 13.6% 54.5% 19.3% 21.7% 20.5% 0% EAC7 15.7% 8.4% 3.3% 5.9% 13.0% 11.4% 13.3% 7.3% 20.7% 8.5% 6.1% 10.3% 54.5% 18.7% 15.5% 17.1% 0% EAC8 5.2% 7.6% 15.9% 24.8% 4.3% 1.7% 6.1% 32.4% 14.2% 2.1% 6.8% 11.0% 63.6% 18.0% 9.3% 13.7% 50% ESIM prediction 8.7% 5.6% 4.1% 5.6% 6.2% 7.6% 3.6% 14.0% 5.0% 4.2% 3.7% 6.2% 90.9% 7.0% 1.7% 4.4% 100% Note Testing case Testing case 3.1. Identify Significant Factors of Influence This research identified factors that significantly affect the EAC of construction projects using several relevant publications (listed in Table 1). Input parameters of the model were obtained after timedependent cost factors and performance management concepts were also considered. Fig. 6. Processes of applying EAC–ESIM for cost management. M.-Y. Cheng et al. / Automation in Construction 19 (2010) 619–629 625 Fig. 7. Cost exception management based on prediction results. Due to the variety of construction project categories and resultant differences in data inputs, this research focused only on construction projects done in reinforced concrete in order to control for potential variance in results attributable to construction type. Ten significant influence factors for EAC were calculated as input values based on monthly construction cost data. Prospective cost was designated as the output value. input values in accordance with definition equations (see Table 3). The 13 historical cases were divided into training and testing groups. The 11 cases in the training group comprised 269 periods and the 2 cases in the testing group comprised 37 periods. Output parameters were normalized by linear scaling. The normalization method was revised to keep estimated cost in the range of 0 to 1, so that maximum and minimum of output parameters were enlarged by 10%. The detail processes are shown in Eqs. (2)–(4). 3.2. Historical Data Collection The 13 historical cases included in this research were all reinforced concrete projects executed between 2000 and 2007 by one engineering company located in Taipei City. Projects were located chiefly in northern Taiwan, with selection criteria considering data distribution and completeness. Building height ranged from 9 to 17 stories (inclusive of stories belowground). The value of contracts ranged from NT$80 million to NT$1.1 billion. Overall floor area of cases studied ranged from 3094 m2 to 31,797 m2. Construction durations ranged from 457 to 749 days. Relevant data from historical cases used are arranged in Table 2. Monthly cost records for every case were collected and the ten identified factors of significant influence on EAC were calculated as Table 9 Cost influencing indices. Index name Definition Criterion Cost performance index Schedule performance index Subcontractor billed index Owner billed index Change order index Construction cost index Earned value/actual cost ≧ 1.05 Earned value/ planned value Billed amount/ actual cost Billed amount/ earned value Revised contract amount/budget at completion Construction material price index of that month/construction material price index of initial stage ≧1 Climate effect index (Revised project duration — no. of rainy day)/revised project duration ≧ 1, ≦1.1 ≧ 0.9 ≦ 1.1, ≧ 0.9 Progress b 30%, ≦ 1.02 Progress b 60%, ≦ 1.03 ≧ 1 − 0.2 × progress percentage Xn = Xa −XL XU −XL ð2Þ XU = Xmax + Xrange × 10% ð3Þ XL = Xmin − Xrange × 10% ð4Þ where Xn Xa XU XL Xmax Xmin Xrange output parameter after normalization, the range is between 0 and 1 output parameter before normalization upper bound of the output parameter lower bound of the output parameter maximum of the output parameter minimum of the output parameter difference between maximum and minimum Table 10 Monthly cost information. Item Definition Estimate at completion percentage (EACP) Actual cost percentage (ACP) Earned value percentage (EVP) Predicted EAC of ESIM/original budget amount Actual paid of cost/original budget amount Budget cost at completion/original budget amount Budget cost of planning/original budget amount Proprietor pricing amount/original contracted amount Planned value percentage (PVP) Contract billed percentage (CBP) 626 M.-Y. Cheng et al. / Automation in Construction 19 (2010) 619–629 Table 11 Comparison of cost variance. Definition Illustration Budget Revised budget % (RBP) — estimate at completion % (EACP) Schedule Earned value % (EVP) — planned value % (PVP) Cost Earned value % (EVP) — actual cost % (ACP) Contract billed Contract billed % (CBP) — actual cost % (ACP) 3.3. Model Training Training of the ESIM training module began after parameter selection. A total of 100 generations were searched for a period of 1.0167 min. The fault-tolerant parameter C was 20. The kernel function parameter γ equalled 0.1. The Root Mean Square (RMS) which describes the quality of how well the model fits the data of the model obtained from the optimum chromosome equalled 0.0559. 3.4. Model Testing The ESIM performance assessment module was employed after completion of training. ESIM decoded the optimum chromosome as the EAC prediction model to facilitate Model Testing revelation of prediction error and learning accuracy. Model Testing included testing of training error and case verification. Model regulation was learned by training historical case data. The model was established utilizing ESIM to search for training cases that possessed consistency between inference output and actual output. After training, model accuracy was tested by comparing differences between output results and actual values. Estimating criteria used in this research are shown in Eq. (5). The model was qualified when the error fulfilled selected requirements. ESIM ei = jEAC Pi −AEAC P j × 100% AEAC P ð5Þ where error percentage of ith period predicted by ESIM ei estimate at completion percentage of ith period predicted by EACESIM Pi ESIM actual estimate at completion percentage of the case AEACP N0, cash balance of budget b0, not enough in budget, exception management N0, exceed in progress b0, delay in progress, overtime planning N0, cash balance of contract, not pricing by subcontractor b0, overspend of contract, missed list, not transacting budget increase N0, cash balance of amount, large quantities of payment, subcontractor not valuating b0, contract increased or constructed without valuation, contract decreased and not revised, over amount 3.4.1. Testing of training error The 269 periods collected from 11 cases were input into the database. An estimated value of prospective cost percentage for each case could be obtained via the ESIM performance assessment module. The estimated value is termed ‘Predicted output’. The actual value of the prospective cost percentage for each case is termed ‘Actual output’. Fig. 3 shows the relationship between Predicted output and Actual output produced by ESIM training. EACP (see Eq. (2)) could be obtained after de-normalizing the Predicted output and the training error percentage could be calculated by making a comparison with the corresponding percentage of actual cost at completion. During the constructing process, managers are most concerned with the development trend and the influence of such on project decision making. To determine model accuracy after training and achieve the practical needs of managers, this research selected a 10% average error as the qualification ceiling. The qualified rate was 91% for the 11 training cases in this research, as shown in Table 4. 3.4.2. Verification of testing cases A total of 269 periods collected from 11 cases were input into the database. An estimated value of prospective cost percentage for each case could be obtained using the ESIM performance assessment module. As above, the estimated value is termed ‘Predicted output’, while the actual value of prospective cost percentage for each case is termed ‘Actual output’. Fig. 4 shows the relationship between Predicted output and Actual output produced by ESIM training. After de-normalizing the Predicted output, EACP (see Eq. (2)) could be obtained and the percentage of the training error could be calculated through a comparison with the percentage of actual cost at completion. The main purpose of verification is to examine whether the model trained by ESIM can be employed to infer or predict cases beyond training cases. Data for two testing cases (Case L and M) were input Fig. 8. Time series of the cost influencing indices for case L. M.-Y. Cheng et al. / Automation in Construction 19 (2010) 619–629 627 Fig. 9. Cost management diagram of case L. into the ESIM performance assessment module, with results shown in Table 5. Comparing the predicted values of EAC with actual values, average errors were 7.02% and 1.68%, respectively. Both results were considered ‘qualified’ under the definition set above (Table 6). Figs. 4 and 5 were drawn in order to perform further analysis for case L, which returned a comparatively large error. From the 11th period to 15th period, the error values between 14.45% and 19.98% were noticeably large. A possible reason could be due to a rapid revision of project budget during this period. Such would make data unstable and cause larger error values. The revisions and updates of the project budget are not rare in practice for construction projects with or without change order. In this study, the historical cases for Model Training implicitly included the uncertainty of change order for replan. This is the reason of the study to use AI to solve the uncertainty. To identify the impacts of change order on EAC, a real case with significant change order, i.e. Project L, was selected for Model Testing. The results showed that the predicting error would increase in a period of time after change order (but the error was still tolerable). Overall, it would converge to the actual result in the end. 4. Comparison of EAC–ESIM prediction with 8 EVM methods To demonstrate that the method of EAC prediction model proposed in this research is feasible and reliable, EAC values used were calculated using eight other EVM methods [16]. The values are compared with the predictions generated by the ESIM (Table 7) to assess comparative accuracy. This research selected 10% of error to be the qualified criterion, and calculated the average error and the qualified percentage for each prediction methods. Results of comparison are shown in Table 8 and discussed below: 1. No single EVM method predicts EAC at a consistently high level of accuracy. Prediction results varied with differences in project details. For the cases studied in this research, EAC1 and EAC5 attained better average error and qualified rates. 2. ESIM predictions showed a larger difference of accuracy during the initial 30% of project work schedule. This may be attributable to data instability in the initial stages and the influence of design changes. However, the prediction result attained by ESIM is more precise than by EVMs. 3. The predictive error of ESIM is comparatively steady. Qualified rates of training and testing cases were 91% and 100%, respectively. Both rates were larger than those of EVMs. Such proves the feasibility of EAC predictions using ESIM. 5. Applications of EAC–ESIM to construction management 5.1. Applying EAC–ESIM for cost management After the EAC prediction model using ESIM was established, the feasibility of the model needed to be proven using actual projects. A prediction could be made once data had been collected and formatted to model requirements. As the prediction model must address real project aspects, the procedure used to apply the EAC prediction model was designed as shown in Fig. 6, in accordance with various situations of every construction stages. 5.2. Cost exception management based on prediction results Table 12 Cost influencing indices analysis of case L. Index name Value Illustration Cost performance index 1.17 Schedule performance index 0.96 Subcontractor billed index Owner billed index Change order index Construction cost index 1.05 0.90 1.03 1.09 Climate effect index 0.83 ≧1.05, well cost management, but the ratio is slightly high. ≦1, slightly delay in progress, and the trend is increasing. ≧1, ≦ 1.1, normal. ≧0.9, normal. ≦1.1, normal. ≧1.03, slightly high, but the main decoration engineering items are contracted. ≦0.86, normal. An effective prediction model establishes a response system able to identify factors that significantly influence the EAC at different project stages. The predicted trend acquired from the ESIM can then be compared with project cost influencing indices and project cost diagrams. Finally, revised propositions for deviations may be addressed and tracked continuously in order to effectively control costs. The EAC prediction provides a key reference to construction managers by analyzing the cost information over the course of the project. Furthermore, the managers may assess, term by term, the factors of influence over cost and consider the various potential reasons that might result in cost overruns when an EAC prediction exceeds the approved budget, allowing managers to identify potential problems and control costs. 628 M.-Y. Cheng et al. / Automation in Construction 19 (2010) 619–629 Table 13 Cost diagrams analysis of case L. Item RBP EACP ACP EVP PVP CBP Value 104% 97% 57% 66% 69% 60% Analysis Difference Illustration RBP N EACP EVP b PVP EVP N ACP CBP N ACP +7% − 3% +9% +3% Possible to decrease budget. Delay in progress, and the trend is increasing. Cash balance of contract or not pricing by subcontractor. Cash balance of amount, but the ratio reducing, indicates that the proprietor adds items without pricing. Reasons 1. Addendum of structural engineering revised in the eleventh month with an evaluated delay of three months. 2. Current contract cash balance: 5%, but not transacts the cash balance decrease while the budget increases in the eleventh month. 3. Items not yet priced by the subcontractor: approximately 2%. 4. Fits with estimation results. Training and convergence testing demonstrated model fitness for use. This research uses the Variance at Completion percentage (VACP, defined as Eq. (6)) as the criterion for exception management. The application procedure of EAC management is illustrated with Fig. 7. VACP = RBP EACP ð6Þ where RBP Revised Budget Percentage 5.2.1. Project cost influencing indices analysis Indices can monitor and identify project deviations effectively and rapidly. However, factors that influence project cost are complicated and the definition and criteria of indices are indeterminate in research done in relevant fields (see a list of literature reviewed in Table 9). 5.2.2. Project cost diagrams analysis Indices that influence project cost can be provided to managers to help their control of prospective costs and investigate pre-emptively potential cost management problems. Nevertheless, overall construction cost trends cannot be determined by analyzing project cost indices alone as cost problems may display abnormalities in some situations. Thus, project cost diagram analysis is used as a supplementary method in this research. This research used original contract documents to define the scope of comparative data. Project cost diagrams were drawn on a monthly basis based on EAC prediction results and data items shown in Table 10. Project schedule tendencies and costs may be obtained by comparing and analyzing variations in cost information (Table 11). Therefore, appropriate prewarnings may be heeded and operations enforced in accordance with project characteristics. Furthermore, comprehensive inspection and strategy direction may be provided. 5.2.3. Application of an actual example case Example Case L: Time series for cost influencing indices and cost management are shown in Figs. 8 and 9, respectively. Moreover, this research selects the time point at which project progress reaches 71.16% (Project-L-12) to analyze indices of project cost (see Table 12) and project cost diagrams (Table 13 and Fig. 10). Prediction results were verified by comparing analysis results with the project completion report, leading us to conclude that results are feasible when applied to the effective management and control of project costs. 6. Conclusion This research proposed an EAC prediction method using ESIM that employs fmGA and SVM. Results obtained in this paper are summarized as follows: 1. Research established an EAC predication method utilizing ESIM that identifies significant factors of influence on project cost and performs predictions by collecting and arranging experience-based rules from historical cases. ESIM results represent a significant improvement over results obtainable using traditional EAC prediction methods. 2. The EAC prediction model established in this research is considerably accurate. The only inputs that need to be entered into the model are the key factors influencing costs during the current month following training and testing. A significant disadvantage for traditional construction projects, i.e., cost tendencies cannot be predicted in real time, has been effectively remedied. 3. ESIM prediction results were compared with eight common EVM prediction methods. Values obtained by the EVM methods were relatively unstable due to the wide variance in data among projects. Conversely, the model developed in this research generated comparatively steady prediction values. Such verified the feasibility of using ESIM to predict construction project EACs. 4. Prediction results were analyzed further using project cost influencing indices and project cost diagrams. This helped identify the causes underlying EAC differences and trends. Results help managers to control project costs in real time. References Fig. 10. Cost diagram of case L. [1] D. Bolles, S. Fahrenkrog (Eds.), A Guide to the Project Management Body of Knowledge (PMBOK Guide), 3rd Ed, Project Management Institute, Newtown Square, 2004. [2] G. Clifford, E. Larson, Project Management: the Complete Guide for Every Manager, McGraw-Hill, New York, 2002. M.-Y. Cheng et al. / Automation in Construction 19 (2010) 619–629 [3] G.A. Barraza, W.E. Back, F. Mata, Probabilistic forecasting of project performance using stochastic S curves, Journal of Construction Engineering and Management 130 (1) (2004) 25–32. [4] D.E. Lee, Probability of project completion using stochastic project scheduling simulation, Journal of Construction Engineering and Management 131 (3) (2005) 310–318. [5] D.E. Lee, D. Arditi, Automated statistical analysis in stochastic project scheduling simulation, Journal of Construction Engineering and Management 132 (3) (2006) 268–277. [6] B.C. Kim, K.F. Reinschmidt, Probabilistic forecasting of project duration using Bayesian inference and the beta distribution, Journal of Construction Engineering and Management 135 (3) (2009) 178–186. [7] E.H. Kim, W.G. Wells Jr., M.R. Duffey, A model for effective implementation of earned value management methodology, International Journal of Project Management 21 (5) (2003) 375–382. [8] D.F. Cioffi, Designing project management: a scientific notation and an improved formalism for earned value calculations, International Journal of Project Management 24 (2) (2006) 136–144. [9] S. Vandevoorde, M. Vanhoucke, A comparison of different project duration forecasting methods using earned value metrics, International Journal of Project Management 24 (4) (2006) 289–302. [10] G. Vitner, S. Rozenes, S. Spraggett, Using data envelope analysis to compare project efficiency in a multi-project environment, International Journal of Project Management 24 (4) (2006) 323–329. [11] M. Vanhoucke, S. Vandevoorde, A simulation and evaluation of earned value metrics to forecast the project duration, Journal of the Operational Research Society 58 (10) (2007) 1361–1374. [12] W. Lipke, O. Zwikael, K. Henderson, F. Anbari, Prediction of project outcome: the application of statistical methods to earned value management and earned schedule performance indices, International Journal of Project Management 27 (4) (2009) 400–407. [13] M. Plaza, O. Turetken, A model-based DSS for integrating the impact of learning in project control, Decision Support Systems 47 (4) (2009) 488–499. 629 [14] S.S. Leu, Y.C. Lin, Project performance evaluation based on statistical process control techniques, Journal of Construction Engineering and Management 134 (10) (2008) 813–819. [15] H.L. Stephenson, Identifying risks and opportunities using EAC, Proc. 48th AACE International Annual Meeting '04, AACE International Transactions, 2004, pp. CSC.06.1–CSC.06.9. [16] S. Alexander, Earned Value Management Systems (EVMS): Basic Concepts, Project Management Institute, Washington DC, 2002. [17] V.N. Vapnic, The nature of statistical learning theory, Springer, New York, 1995. [18] C.W. Hsu, C.J. Lin, A simple decomposition method for support vector machine, Machine Learning 46 (1–3) (2002) 219–314. [19] C.F. Lin, Fuzzy support vector machines, Ph.D. Thesis, Department of Electrical Engineering, National Taiwan University, 2004. [20] D.E. Goldberg, K. Deb, H. Kargupta, G. Harik, Rapid, accurate optimization of difficult problems using fast messy genetic algorithms, Proc. 5th International Conference on Genetic Algorithms '93, Morgan Kaufmann Pub. Inc, San Mateo, 1993, pp. 56–64. [21] R. Day, J. Zydallis, G. Lamont, Competitive template analysis of the fast messy genetic algorithm when applied to the protein structure prediction problem, Proc. 2nd ICCN '02, Computational Publications, Cambridge, 2002, pp. 36–39. [22] C.W. Feng, H.T. Wu, Integrating fmGA and CYCLONE to optimize the schedule of dispatching RMC trucks, Automation in Construction 15 (2) (2006) 186–199. [23] H. Drucker, C.J.C. Burges, L. Kaufman, A. Smola, V. Vapnik, Support vector regression machines, Proc. 10th Annual Conference on NIPS '96, Advances in Neural Information Processing Systems, vol. 9, MIT Press, Cambridge, 1997, pp. 155–161. [24] C.C. Chang, C.J. Lin, Training nu-support vector classifiers: theory and algorithms, Neural Computation 13 (9) (2001) 2119–2147. [25] D. Knjazew, G.A. Ome, A Competent Genetic Algorithm for Solving Permutation and Scheduling Problems, Kluwer Academic Publishers, Boston, 2003.