TOWARDS AN EVOLUTIONARY MODEL GENERATION FOR ERP PERFORMANCE SIMULATION Daniel Tertilt, Stefanie Leimeister Fortiss – Research Institute at Technische Universitaet Muenchen Guerickestrasse 25, 80805 Muenchen, Germany Stephan Gradl, Manuel Mayer, Helmut Krcmar Chair for Information Systems, Technische Universitaet Muenchen Boltzmannstraße 3, 85748 Garching, Germany ABSTRACT The performance of ERP systems is a critical success factor for the reliable operation of a business. A promising approach to cope with the complexity of nowadays' ERP systems and to predict their actual behavior is simulation. Commercial ERP systems, however, only provide limited insight and thus several components have to be handled as black boxes and require a modeling approach. In this paper we depict an approach to increase the accuracy of ERP system performance simulation by using an evolutionary algorithm for modeling the black boxes performance behavior. We can show that evolutionary algorithms are able to generate performance models for ERP components based on measured performance data that describe the performance behavior of these components accurately. Furthermore we point out the characteristics of the algorithm, as well as its advantages and disadvantages, and give an outlook about the future research. KEYWORDS Performance modeling, performance simulation, ERP, evolutionary algorithm 1. INTRODUCTION The performance of an enterprise resource planning (ERP) system is a business critical non-functional requirement (Schneider, 2006), and strongly dependent on the infrastructure it is hosted on. Bögelsack et al. (2010) showed that changes in the infrastructure can significantly influence the performance and consequentially the usability of an ERP system. Performance predictability for ERP systems is thus very desirable, as it allows the handling of performance problems before they occur (Balsamo et al., 2004), thereby reducing the risk of infrastructure changes significantly. At the same time performance prediction is hard to achieve, due to the complexity of modern ERP systems (Anderson and Mißbach, 2005). Often this complexity is managed by analyzing the internal structure of the ERP system (white box approach), identifying the correlation between the internal components, and performing performance prediction by simulation. Simulation though brings some “smallest elements”, components that cannot be seen inside (e.g. to protect intellectual property), and that have to be handled as black boxes. The performance behavior of these black boxes is often modeled using simple mathematical functions like the mean value on measured performance data (see e.g. (Woodside, 2002)), though ignoring the characteristics of the underlying infrastructure. As Noblet et al. (2004) state simulation results can only be valid and trusted if the simulation is as close to reality as possible. Our aim is to optimize the simulation results by increasing the accuracy of the black box performance models. For this, we develop an approach to model the performance behavior of these black boxes using an evolutionary algorithm (Zitzler and Thiele, 1999) performing a multi-objective optimization (Zitzler and Thiele, 1998) on measured performance data. In contrast to exact mathematical or algorithmic modeling, the evolutionary approach promises usable approximations even of multidimensional models in an acceptable time span (Gwozdz and Szlachcic, 2009), providing the possibility to consider multiple factors for the response time prediction of a black box. 2. RELATED WORK The approach presented in this paper is intended to enhance the interface between the ERP performance measurement and the ERP performance simulation. The most related work is shown in table 1, where we also point out which subject the documents deal with. The two documents that deal with modeling as well as simulation describe a hybrid approach, as it is developed in this paper. Table 1. Related Work concerning performance measurement, modeling and simulation Document Bögelsack et al. (2008) Pllana et al. (2008) Jehle (2009) Gradl et al. (2009) Rolia et al. (2009) Kraft et al. (2009) Bögelsack et al. (2010) Measurement No No Yes No No No Yes Modeling No Yes No No Yes Yes No Simulation Yes Yes No Yes No Yes No 3. ERP COMPONENT PERFORMANCE MODELING 3.1 Structure of the Algorithm Oriented at natural evolution the evolutionary algorithm sets up a population of concurring threads, each trying to generate a model that best matches the given set of measured performance data of the ERP black box component. A selection of the fittest is done by competition, and evolution evolves by constantly passing better models to threads that lost competitions, and by mutation of these models. Competition in this context means that the fitness values of two threads (defined by the fitness function described later) are compared, resulting in the thread with the better fitness value to win the competition. Central Component Create Fetch opponent Report Population Compete Fig. 1. Schematic representation of the functionality of the evolutionary algorithm. In order to create the performance model, several items have to be introduced. First, a central component is created at startup. This component manages the population of threads that generate the performance model by creating them at startup and by selecting an available opponent whenever a thread is in time to compete against another thread. Furthermore, the central component monitors the results of each competition, to check if the end criterion is reached. The population size is restricted by the system resources the algorithm is executed on. A higher population will result in higher parallelization of the evolution. Figure 1 is a schematic sketch of the functionality of the algorithm. 3.2 Model Representation Each thread stores a representation of its actual model in memory. This model is represented by an object tree that has method objects as nodes and fixed value or variable objects as leafs. Method objects represent a mathematical operator. Mathematical operators can be binary operators like addition, subtraction, multiplication, division and power, but also unary operators like sine or cosine. They are nodes in the tree, as they require parameters, either again method objects, or fixed value or variable objects. Fixed value objects are leafs that are set to a fixed numeric value. Variable objects are leafs that are set to the parameter values of the measured performance data used as basis for the modeling on evaluation time. They represent the variables in the model. X1 Figure 2 shows the representation of an exemplary model p(X1, X2) = + X2 to illustrate the structure a of the model as an object tree. add div X1 X2 a Fig. 2. Example of the representation of a mathematical model as an object tree. The nodes add and div are method objects, where X1, X2 are leafs and variable objects, and a is also a leaf and a fixed value object (a represents a number in this example). 3.3 Fitness Function During the competition the models of the two opponent threads are evaluated. For every set of available performance data the performance parameters (like number of parallel users, size or type of request) are set for the variable objects, and the modeled response time is calculated. Based on the relative deviation for every given performance data entry, an error index sErr is calculated as the sum of the deviation between every available measured value and the corresponding value calculated by the mathematical model, divided by the number of available performance data entries. Formula 1 shows the implemented directive to achieve sErr . |rmeasured − rmodeled | i i rmeasured i ∑n i=0 sErr = n . (1) In this formula, n is the number of available performance data entries, rmeasuredi the measured response time for data entry i, and rmodeledi the modeled response time for that data entry. The thread with a lower error index sErr wins the competition and passes its model to the loser thread. We chose this fitness function, as it describes the distance between the model and the measured values accurately, and it is efficiently calculable. The negative side of the function is that it allows big deviations in some points, when others are modeled very close to the measured values. The evaluation of other fitness functions will be part of the algorithm optimization process. 3.4 Model Passing and Mutation Beside the competition, inheritance is the second important factor of an evolution. After a fitter thread has been identified by competition, it has to pass its model to the loser thread in a way that the chance for keeping the positive characteristics of the model is high, but at the same time there is a considerable chance for optimizing the model by mutation. Passing a model from the winner thread to the loser thread is done by deep cloning of the object tree representing the model, and replacing the model stored by the loser. After the model is passed, the loser thread itself mutates its new model. In our approach with a given chance either a fixed value object, a variable object or a method object is chosen for mutation. If a fixed value object is selected, a random value is added to its value. The selection of a variable objects results in the allocation of a random performance parameter to this object. In the case that a method object is chosen, the mathematical operator is set to a randomly selected one with the same number of parameters. In the latter case the parameters of the method stays the same. The optimal probabilities for mutating a fixed value, variable or method object will be analyzed by experiments and future case studies. By performing continuous optimization, there is a risk of getting stuck in a local optimum (Rocha and Neves, 1999). We mitigate this problem of local optima by re-bearing every thread that lost 10.000 times in a row, resulting in a re-initialization of the model of this thread. The threads themselves keep track of the number of failed competitions. When the limit is reached, they trigger re-initialization, drop the existing model, and create a new one from scratch. Even when optimization is advanced, “rebirth” opens a way for leaving local optima for finding the global one. 3.5 End Criterion As the end criterion, an error index limit has to be defined. The error index is a significant indicator of a model’s distance to the measured data, and as it is calculated in every competition, the end criterion is checked without additional effort. Choosing the error index as end criterion though, as mentioned before, involves the disadvantage of allowing big deviations for some data entries, if the majority of the data entries are modeled very exactly. This might lead to unacceptable prediction errors if the measured performance data is not equally distributed, including the algorithm to stop on a model that is not usable. This disadvantage though is solved by improving the fitness function. 4. PRELIMINARY RESULTS A prototype of the evolutionary algorithm has been implemented, and applied to the measured data of an SAP benchmark as sample data (Jehle, 2009). A subset of the measured data was used for modeling, while the rest served for validating the model. First results show that the approach delivers usable results (average error less than 3%) when the data used for modeling was equally distributed, while the error becomes big if the data is unbalanced. Future improvement like weighting the input data will be necessary to remove the requirement of equally distributed input data. Another important factor for efficiency of the presented approach is the configuration of the evolutionary algorithm, especially the mutation. First experiments we conducted on the SAP benchmark data proved that the selection of a fixed value or variable object in 95% of all mutations, and the selection of a method object in 5% of all cases, results in a fast evolution with reliable convergence to an optimum. For the relatively small set of input data with around 200 entries the prototype returned a usable model after around two to three minutes when hosted on two Intel Core2 Duo machines (1.6 and 3 GHz, both 4GB RAM). The scalability of the evolutionary algorithm itself will have to be tested in future case studies. 5. CONCLUSION The prototypical implementation of the evolutionary algorithm showed the feasibility of the depicted approach. Further, the first case study pointed out the efficiency, but also difficulties of the evolutionary algorithm. As next steps, further, more complex case studies will be performed. For this, we will develop an interface for integrating the generated models into a LQN simulation of a SAP system (Gradl et al., 2009). Executing the simulation with the traditional black box modeling and afterwards using the generated models will deliver comparable results. In parallel, the prototype will be extended and optimized. A literature review about the vehicle routing problem, a field where evolutionary algorithms are applied since many years, revealed the complexity of possible configurations and modifications of the algorithm. Using the LQN simulation, we will analyze different configurations and develop an optimal algorithm for the field of ERP performance simulation. REFERENCES Anderson, G. W. and Mißbach, M., 2005. Last-Testing und Performance-Tuning. SAP Press, Bonn, Germany. Balsamo, S. et al., 2004. Model-based performance prediction in software development: A survey. IEEE Transactions on Software Engineering, Vol. 30, No. 5, pp. 295-310. Bögelsack, A. et al. 2008. An Approach to Simulate Enterprise Resource Planning Systems. In: ULTES-NITSCHE, U., MOLDT, D. & AUGUSTO, J. C. (eds.) 6th International Workshop on Modelling, Simulation, Verification and Validation of Enterprise Information Systems, MSVVEIS-2008, In conjunction with ICEIS 2008. Barcelona, Spain: INSTICC PRESS. Bögelsack, A. et al. 2010. Performance Overhead of Paravirtualization on an Exemplary ERP System. 12th International Conference on Enterprise Information Systems. Funchal, Madeira, Portugal. Gradl, S. et al., 2009. Layered Queuing Networks for Simulating Enterprise Resource Planning Systems. In: MOLDT, D., AUGUSTO, J. C. & ULTES-NITSCHE, U., eds. 7th International Workshop on Modelling, Simulation, Verification and Validation of Enterprise Information Systems, MSVVEIS-2009, In conjunction with ICEIS 2009, May 2009 2009 Milan, Italy. INSTICC PRESS, pp. 85-92. Gwozdz, P. and Szlachcic, E., 2009. An Adaptive Selection Evolutionary Algorithm for the Capacitated Vehicle Routing Problem. In: Logistics and Industrial Informatics, 2009. LINDI 2009. 2nd International, 10-12 Sept. 2009 2009. pp. 1-6. Jehle, H. 2009. Performance-Messung eines Portalsystems in virtualisierter Umgebung am Fallbeispiel SAP. CVLBA Workshop 2009. 3. Workshop des Centers for Very Large Business Applications (CVLBA). Magdeburg, Deutschland: Arndt, H.-K.; Krcmar, H. Kraft, S. et al. 2009. Estimating service resource consumption from response time measurements. Proceedings of the Fourth International ICST Conference on Performance Evaluation Methodologies and Tools. Pisa, Italy: ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering). Noblet, C. M. H. et al., 2004. Enabling UMTS end-to-end performance analysis. In: 3G Mobile Communication Technologies, 2004. 3G 2004. Fifth IEE International Conference on, 2004 2004. pp. 29-33. Pllana, S. et al., 2008. Hybrid Performance Modeling and Prediction of Large-Scale Computing Systems. In: Complex, Intelligent and Software Intensive Systems, 2008. CISIS 2008. International Conference on, 4-7 March 2008 2008. pp. 132-138. Rocha, M. and Neves, J. 1999. Preventing premature convergence to local optima in genetic algorithms via random offspring generation. Proceedings of the 12th international conference on Industrial and engineering applications of artificial intelligence and expert systems: multiple approaches to intelligent systems. Cairo, Eygpt: Springer-Verlag New York, Inc. Rolia, J. et al. 2009. Predictive modelling of SAP ERP applications: challenges and solutions. Proceedings of the Fourth International ICST Conference on Performance Evaluation Methodologies and Tools. Pisa, Italy: ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering). Schneider, T., 2006. SAP Performance Optimization Guide. Galileo Press, Bonn, Boston. Woodside, M. 2002. Tutorial Introduction to Layered Modeling of Software Performance. Available: http://www.sce.carleton.ca/rads/lqns/lqn-documentation/tutorialg.pdf. Zitzler, E. and Thiele, L. 1998. Multiobjective Optimization Using Evolutionary Algorithms - A Comparative Case Study. In: EIBEN, A. E., BÄCK, T., SCHOENAUER, M. & SCHWEFEL, H.-P. (eds.) Parallel Problem Solving from Nature - PPSN V, 5th International Conference, Amsterdam, The Netherlands, September 27-30, 1998, Proceedings. Springer. Zitzler, E. and Thiele, L., 1999. Multiobjective Evolutionary Algorithms: A Comparative Case Study and the Strength Pareto Approach. IEEE Transactions on Evolutionary Computation, Vol. 3, No. 4, pp. 257 - 271.