QUANTITATIVE MODELLING AND ANALYSIS OF BUSINESS PROCESSES Henk Jonkers and Henry M. Franken Telematics Research Centre P.O. Box 589, 7500 AN Enschede The Netherlands E-mail: {h.jonkers, h.franken}@trc.nl KEYWORDS Business re-engineering, model design, performance analysis, optimisation ABSTRACT Redesign of administrative business processes is currently attracting a lot of interest. Although many software tools exist to model such processes, hardly any attention is paid to the analysis of quantitative aspects to support or optimise the (re)design. In other application areas in which distributed systems play a role, such as computers and telecommunication systems, manufacturing systems and traffic, many techniques for quantitative analysis of, among others, temporal aspects have been developed. Because of the similarities between these kinds of systems and administrative business processes, it will often be possible to apply these techniques, possibly with small modifications, to business processes. In this paper, we describe the similarities and differences between business processes and other distributed systems; we distinguish several related temporal performance measures, and we give an overview of quantitative analysis methods. INTRODUCTION Recently, business process redesign (BPR) and related subjects, such as business process improvement (BPI), have been receiving a lot of attention. In addition to literature on the management aspects of BPR (Hammer 1990; Davenport and Short 1990), several modelling frameworks and software tools to support the redesign trajectory have been presented. An important advantage of such frameworks is that they allow for the off-line analysis of the effects of BPR, without disturbing actual business processes (and jeopardising their continuity) (Franken et al. 1996). Most of the currently available tools provide some kind of (informal) diagramming technique to document a process. Some of these also allow for model simulation, stressing the functional aspects and in some cases allow for the analysis of a limited number of quantitative aspects. However, hardly any studies are found in the literature that show the use of models for a study of the quantitative behaviour and optimisation of business processes. The cost of a process generally depends on its time behaviour, because of time-dependent cost factors (e.g., salaries and office rents). Also, factors which cannot easily be quantified, such as customer satisfaction, often depend on performance measures (e.g., the response time). For, among others, these reasons, optimising the quantitative performance of a business process is important. This is closely related to the process of benchmarking (Camp 1989), the study of methods that will lead to optimal business performance. For the effective (re)design and optimisation of business processes, feedback to the designer, concerning the functional as well as the quantitative properties of alternative design options, is indispensable. Quantitative feedback is useful during all design stages. In the early stages, in which potentially many alternative design options need to be compared, fast estimates of the quantitative properties (measures) are required; these only provide a global indication of the expected performance. In the final stages, when we want to fine-tune a design, accurate (and as a result, time-consuming) predictions may be required. Therefore, we need techniques to model business processes at different abstraction levels. This is also important because in modern administrative business processes, in particular when performing BPR, the supporting IT and telematics applications play a central role. Therefore, we are especially interested in modelling methods which are capable of describing not only the global business processes (in which the IT and telematics applications are integrated), but also a specific part of such a process, e.g. a telematics application, in more detail. Many techniques have been developed for performance prediction of several classes of (concurrent) discreteevent systems, such as (parallel) computer systems (Ajmone Marsan et al. 1986; Gemund 1996; Jonkers 1995), telecommunication systems (Dowd and Gelenbe 1995), manufacturing systems (Yao 1994) and traffic. As opposed to e.g. the manufacturing sector, these techniques are not yet widely applied in the relatively new area of redesign of administrative business processes. However, with the advent of the process-oriented approach in business process (re)design (which is central in BPR and which is characterised by the increasing application of workflow management systems) it is apprehended that many logistics principles can also be applied in administrative business processes (“office logistics”). As a result, many techniques from the aforementioned fields are likely to be also applicable in the case of business processes, with little or no modifications. In this paper, we relate typical quantitative measures of business processes to those of concurrent discrete-event systems in other application areas. Based on this, we show how several quantitative performance modelling and analysis techniques which are traditionally applied in these areas can also be applied to administrative business processes. Different techniques, ranging from static approaches to simulation, are compared to each other, and we show how they can be used to support the (re)design and optimisation of business processes. Their use is illustrated by means of a case study. QUANTITATIVE BUSINESS PROCESS ASPECTS Several quantitative measures related to business processes can be discerned. Examples are temporal measures, reliability measures, and cost measures. Also combinations are possible, e.g. performability measures (Trivedi et al. 1993), which combine temporal measures with reliability measures, and derived measures such as the performanceto-cost ratio. In this paper we restrict ourselves to temporal (performance) measures. These include any quantity that is related to the time it takes to complete a certain process or subprocess. A business process can be viewed from different perspectives, resulting in different (but related) performance measures: User/customer perspective: response time, the time between issuing a request and receiving the result; the response time is the sum of the processing time and waiting times (synchronisation losses). Examples in business processes are the time between the moment that a customer arrives at a counter and the moment of completion of the service, or the time between sending a letter and receiving an answer. Also in the supporting IT-applications the response time plays an important role; a well-known example is the (mean) time between a database query and the presentation of its results. Process perspective: completion time, the time required to complete one instance of a process (possibly involving multiple customers, orders, products etc., as opposed to the response time, which is defined as the time to complete one request). In batch processing by means of an information system the completion time can be defined as the time required to finish a batch. Product perspective: processing time, the amount of time that actual work is performed on the realisation of a certain product or result, i.e. the response time without waiting times. The processing time can be orders of magnitude lower than the response time. In a computer system, an example of the processing time is the actual time that the CPU is busy. System perspective: throughput, the number of transactions or requests that a system completes per time unit (for example, the average number of customers that is served per hour). Related to this is the maximum attainable throughput (also called the processing capacity, or in a more technically oriented context such as communication networks, the bandwidth), which depends on the number of available resources and their capacity. Resource perspective: utilisation, the percentage of the operational time that a resource is busy. On the one hand, the utilisation is a measure for the effectiveness with which a resource is used. On the other hand, a high utilisation can be an indication of the fact that the resource is a potential bottleneck, and that increasing that resource’s capacity (or adding an extra resource) can lead to a relatively high performance improvement. In case of humans, the utilisation can be used as a more or less objective measure for the work stress. In telematics applications, a typical example of the utilisation is the network load. Different perspectives can lead to conflicting interests. For example, a high throughput, which leads to a higher productivity and a better resource utilisation, generally results in a deterioration of the response time. BUSINESS PROCESS MODELLING When modelling dynamic discrete-event systems, such as business processes, we distinguish two modelling domains, the behaviour domain and the entity domain. The former describes the dynamic behaviour, i.e. the activities and their relationships; the latter describes the resources (entities) performing the activities and the way in which these are organised. These two domains are coupled by means of a mapping, which assigns activities to entities. Both domains can be distinguished for any model of a dynamic system, be it functional or quantitative, and can be recognised in a wide range of application areas. For example, in computer system models, the behaviour domain represents the program, and the entity domain the hardware components and their interconnections; in transportation, the behaviour domain represents the traffic flow, the entity domain the infrastructure. In our application area, business processes, the behaviour domain represents the actual processes, while the entity domain represents business resources such as people, offices, departments, technology, etc. (Franken et al. 1996). An important issue when modelling a process is the choice of the appropriate level of abstraction. A model at a high abstraction level only describes the global aspects of a process, while a model at a low abstraction level includes all kinds of details. For example, at a high abstraction level, the relevant business resources might be departments, and the time scale of the performance measures in the order of days; at a low abstraction level, the relevant resources might be people, and the time scale in the order of minutes. Therefore, it is important for modelling techniques to be applicable at different levels of abstraction. The fact that we focus on quantitative (temporal) aspects of business processes imposes a number of requirements on the choice of a modelling formalism. First, the modelling technique must have a formal basis in order to be able to define the analysis results unambiguously. Of course, it is necessary to be able to express the evolution of time and the duration of activities. Further, nearly all realistic business processes contain sources of uncertainty; we should also be able to quantify these, e.g. by means of a probability distribution. Finally, synchronisations have a major influence on the temporal behaviour (e.g. as a result of communication between people: an employee cannot start with a task before the necessary information from a customer is received), as well as the shared use of resources (e.g., only one person at a time can make use of a copier, which can lead to waiting times). Conversely, we can abstract from certain functional properties, e.g. the data that are used in a process, when we are only interested in the temporal behaviour. QUANTITATIVE ANALYSIS TECHNIQUES Basically, three classes of methods can be applied to derive estimates of performance measures: measurement, (quantitative) simulation and analytical methods. Measurements can only be carried out if the system of interest is already operational. Therefore, they will be of limited use for the guidance of the process design, because they cannot be used to obtain performance predictions. However, measurements are useful to obtain benchmark measures, and to calibrate model parameters. Measurements provide an on-line method for quantitative analysis, as opposed to the following two methods, which allow for off-line analysis. Simulation is the direct execution of a model, which is expressed in either a special-purpose simulation language or a general-purpose programming language. It is a very powerful tool, because it enables the study of every system aspect to any desired level of detail. However, a drawback is that it is time-consuming. Moreover, it results in numeric, non-parameterised results, so that a separate simulation run is required for each parameter selection of interest. Also, because nearly all models contain sources of non-determinism, simulation yields a confidence interval rather than an exact prediction of the performance measures. Analytical solution techniques derive performance measures (either exact or an approximation) in a mathematical way. They can be subdivided in symbolic techniques and numeric techniques. A drawback of (especially symbolic) analytical techniques compared to simulation is that the class of models which are analytically tractable is limited. The boundary between simulation and (in particular numeric) analytical techniques is not sharp. Parts of certain numerical analysis methods, e.g. the construction of a reachability graph for a Petri net model, closely resemble the approach taken in simulation. Conversely, certain kinds of simulation, e.g. Monte Carlo methods, are only a small step removed from analytical methods. An important difference, however, is that analytical results are always reproducible, while this might not be the case for simulation. An interesting approach is also the combination of simulation and analytical solutions: submodels which are analytically untractable are solved by means of simulation, after which the overal model is solved analytically, or vice versa. As a guideline for the choice of an appropriate analysis method in a given situation, we introduce a number of criteria for the evaluation and comparison of analysis techniques. • The usefulness of performance predictions largely depends on their accuracy. Two main sources of prediction errors can be identified: analytical errors (in the case of approximate analysis techniques), and modelling inaccuracies, which are due to the fact that a model is a simplification of the real system. The latter can be caused by insufficient insight in the structure of the real process, insufficient expressive power of the modelling formalism, or errors in the model parameters. • The robustness of the results is important in addition to the absolute accuracy. A technique which yields reasonably accurate results over the whole range of possible parameter settings will generally be preferable to a technique which yields very accurate results for many parameter settings, but results in high errors for certain extreme values. When performance predictions are used to compare design alternatives (or to optimise a design), we will generally require that at least the relative performance is correctly predicted. • The efficiency (inversely related to the analytical complexity) of a performance prediction technique determines whether the results are obtained within acceptable computing time. Of course, what is still acceptable strongly depends on the application of the predictions: in early design optimisations, when a large number of alternative solutions must be compared, we need very quick results, while in the final stages, when fine-tuning the design, more time-consuming analysis techniques might be justified to obtain more accurate results. A central issue in performance modelling is the tradeoff between expressive power (which directly influences the accuracy) and analytical tractability. Therefore, the requirements of accuracy and efficiency are in conflict, so that in practice a compromise must be sought. Many different types of analytical performance prediction methods exist, generally associated with different modelling formalisms. These methods mainly differ in the position they occupy on the trade-off between prediction accuracy and (computational) efficiency. Static techniques are used to quickly obtain first-order performance estimates (Fahringer 1995; Gemund 1996). Techniques based on (timed, stochastic) extensions of Petri nets (Ajmone Marsan et al. 1986) yield accurate results, but are timeconsuming due to a state explosion (which results in a complexity which is exponential in the model size). Between these two extremes, several techniques exist to obtain reasonably accurate results at moderate complexity, e.g. hybrid methods based on a combination of task graphs and queueing networks (Mak and Lundstrom 1990; Jonkers 1995) or a combination of Petri nets and queueing networks (Balbo et al. 1988). A more elaborate explanation of these techniques is beyond the scope of this paper. For more details, we refer to (Ajmone Marsan et al. 1986; Gemund 1996; Jonkers 1995). CASE STUDY: ALLOCATION OF HUMAN RESOURCES In this section we present a case study to illustrate the use of performance prediction techniques to optimise a business process. We will show two examples of how the predictions are used: to determine the optimal distribution of employees over departments, and to assess the effects of a business reorganisation. Consider a company which sells compact discs by mail order. The CDs are divided in three music styles: classical, jazz and popular. For each style there is a department preparing the shipment of orders. A fourth department, the order handling department, takes care of entering the incoming orders in the computer system and sending them to the appropriate shipping department. Each day two mail batches containing orders arrive: one at 9 a.m. with approximately 200 orders (30% classical, 20% jazz and 50% popular), and one at 11 a.m. with approximately 100 orders (20% classical, 10% jazz and 70% popular). Each batch is processed by the order handling department, which enters the orders in a computer system one by one and sends them to the appropriate shipping department by electronic mail. Entering and sending an order takes 2 minutes; preparing a shipment takes 5 minutes The mail order company employs 7 people who can be divided arbitrarilly over the four departments: Ko in the order handling department, Kj in the jazz department, Kc in the classical department and Kp in the pop department. The management wants to decide how these people should be distributed over the departments in order to minimise the time required to process the two mail batches. There are 20 ways to distribute 7 people over 4 departments; however, by observing that the shipment departments are identical except for the workload, which is the lowest for the jazz department and the highest for the pop department, we only need to consider the situations where Kj ≤ Kc ≤ Kp, which reduces the number of distributions to 7. A performance model is used to predict the completion times for the different possible distributions, using both simulation and a queueing model (see Figure 1). In this model, each department is modelled as a multipleserver queueing centre, with one server for each employee in the department. Two job classes (a and b) represent the two order batches. The delay centre models the two-hour delay between the arrival of the two batches. All the service times are expressed in minutes. Note that the structure of the queueing model reflects the structure of the organisation (one queueing centre for each department); this can be considered the entity domain. The quantitative aspects of the behaviour domain are represented by the workloads and the routing of jobs through the network. For the analysis of the queueing model we use regular (approximate) mean-value analysis (MVA) as well as the hybrid approach described in (Jonkers 1995). A drawback of using regular MVA is that it does not take into account the transient effects, which are important because the two batches do not completely overlap; the hybrid approach does take into account these effects. The simulations have been carried out on a Sun workstation, using the programming language C with the PAMELA discrete-event simulation library (Gemund 1996). This library provides routines to dynamically create (concurrent) threads, a virtual time axis, and semaphores for synchronisations between the threads (including mutual exclusion). Table 1 shows the completion times obtained with the different analysis techniques for the 7 possible distributions. Simulation and both kinds of queueing analysis yield the same optimal distribution (Ko = 2, Kj = Kc = 1, Kp = 3). The results of the hybrid approach are very close to the simulation values: the maximum deviation is 1.6%. Regular MVA gives relatively high errors (up to 40% compared to simulation); however, the relative performance is still correctly predicted, i.e., MVA yields the correct optimal distribution of people over the departments. Sj = Sc = Sp = 5 min. Paj = 20% Pbj = 10% Na = 200 jobs Nb = 100 jobs Kj So = 2 min. (delay centre) Ko Sda = 0 min. Sdb = 120 min. Pac = 30% Pbc = 20% Pap = 50% Pbp = 70% Kc Kp (multiple-server centre) Figure 1. Queueing model Ko 4 3 2 2 1 1 1 Kj 1 1 1 1 2 1 1 Kc 1 1 2 1 2 2 1 Kp 1 2 2 3 2 3 4 sim. (min.) 850 440 431 403 606 605 606 hybrid (min.) 850.0 445.2 425.1 401.9 600.1 600.0 600.1 MVA (min.) 1130.2 684.3 537.2 403.0 685.1 607.1 685.8 Table 1. Analysis results for different distrubutions There is a fairly large variation in the workloads of the three shipping departments. The management considers a reorganisation to improve the balance, which combines the jazz department and the classical department into a single shipping department. Before actually carrying out the reorganisation, the management uses a performance model to predict what the effect of the reorganisation will be on the completion time of the order batches. The same techniques as mentioned above are used. In this case, there are 12 possible distributions of the 7 employees over the 3 remaining departments; this number is reduced to 7 when we restrict ourselves to the cases where Kjc ≤ Kp (because the workload of the pop department is still slightly higher than the workload of the new jazz/classical department). The analysis results are shown in Table 2. Again, the results of the hybrid approach are very close to the simulation results; MVA still gives relatively high errors, but correctly predicts the relative performance. It appears that the reorganisation will decrease the completion time by about 16%. The optimal distribution of people in the new situation is Ko = Kjc = 2 and Kp = 3, i.e. the people in the former jazz department and classical department can both be assigned to the new combined department. Ko 5 4 3 2 2 1 1 Kjc 1 1 2 2 1 2 1 Kp 1 2 2 3 4 4 5 sim. (min.) 847 652 428 340 655 605 655 hybrid (min.) 850.7 650.0 425.6 334.0 650.0 600.0 650.2 MVA (min.) 1124.9 692.6 608.9 325.4 688.3 693.1 685.3 Table 2. Analysis results after reorganisation CONCLUSIONS AND DISCUSSION Although modelling of administrative business processes has been receiving a lot of attention recently, hardly any studies showing the use of quantitative analysis to support the optimisation of such processes have been presented. In this paper we demonstrate the applicability of quantitative modelling and analysis techniques from other application areas for the effective (re)design of business processes. We gave an overview of such analysis techniques, mainly differing from each other with respect to the accuracy of the performance predictions obtained and the efficiency of the analysis. We also discerned a number of perspectives yielding different performance measures, which can be used as criteria for the optimisation or comparison of business processes. The presented case study illustrated the use of performance prediction techniques to optimise the distribution of people over departments in a mail order company. Similar predictions were used to assess the effects of a reorganisation before it is actually carried out. The results of two analytical techniques, based on a queueing model, were compared to simulation results. Both analytical techniques correctly predicted the relative performance, and hence yielded the correct optimal distribution of people (and correctly predicted the effects of the reorganisation). However, their absolute accuracy highly differed. Although in many respects the dynamic aspects of business processes are very similar to those of technical systems (e.g. parallel computers and manufacturing systems), there also are some important differences. An example is the interpretation of a measure such as the resource utilisation. In the case of a machine, a high utilisation is usually favourable, because it indicates the effective use of the machine (although it can also be an indication that the machine is a potential bottleneck); in the case of an employee, however, a high utilisation may lead to a high work stress. Another difference could be that the completion times of tasks in an administrative business process show more variation than those in a technical system; this can have important consequences for the performance measures. Although it is unlikely that these aspects will make it necessary to use completely different modelling or analysis techniques, it is important to keep them in mind when modelling a business process and especially when interpreting the analysis results. ACKNOWLEDGEMENT Part of the work presented in this paper has been carried out in the Platinum project, in which Lucent Technologies, the Telematics Research Centre and the Centre for Telematics and Information Technology of the University of Twente participated. The project was partially funded by the Dutch Ministry of Economic Affairs. We would like to thank Jack Verhoosel and Wil Janssen for their useful comments on earlier versions of this paper. REFERENCES Ajmone Marsan, M.; G. Balbo; and G. Conte. 1986. Performance Modelling of Multiprocessor Systems. MIT Press, Cambridge, Mass. Balbo, G.; S.C. Bruell; and S. Ghanta. 1988. “Combining queueing networks and Gereralized Stochastic Petri Nets for the solution of complex models of system behavior.” IEEE Transactions on Computers 37: 1251-1268. Camp, R.C. 1989. Benchmarking - The Search for Industry Best Practices that Lead to Superior Performance. ASQC Quality Press, Milwaukee, Wisc. Davenport, T.H. and J.E. Short. 1990. “The new industrial engineering: information technology and business process redisign.” Sloan Management Review, summer, 11-27. Dowd, P. and E. Gelenbe (eds.). 1995. MASCOTS ‘95: Proceedings of the Third International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems. IEEE Computer Society. Fahringer, T. 1995. “Estimating and optimizing performance for parallel programs.” IEEE Computer, Nov., 47-56. Franken, H.M.; M.K. de Weger; D.A.C. Quartel; and L. Ferreira Pires. 1996. “On engineering support for business process modelling and redesign.” In Proceedings of the International. IFIP Workshop on Modelling Techniques, Business Processes and Benchmarking (Bordeaux, France, Apr.), 81-98. Gemund, A.J.C. van. 1996. Performance Modeling of Parallel Systems. Ph.D. Thesis, Delft University of Technology, The Netherlands (Apr.). Hammer, M. 1990. “Reengineering work: don’t automate, obliterate.” Harvard Business Review, Jul.-Aug., 104-112. Jonkers, H. 1995. Performance Analysis of Parallel Systems: A Hybrid Approach, Ph.D. Thesis, Delft University of Technology, The Netherlands (Oct.). Mak, V.W. and S.F. Lundstrom. 1990. “Predicting performance of parallel computations,” IEEE Transactions on Parallel and Distributed Systems 1: 257-270. Trivedi, K.S.; G. Ciardo; M. Malhutra; and R.P. Sahner. 1993. “Dependability and performability analysis,” in Performance Evaluation of Computer and Communication Systems (Lecture Notes in Computer Science 729), L. Donatiello and R. Nelson, eds., Springer-Verlag, Berlin. Yao, D.D. (ed.). 1994, Stochastic Modeling and Analysis of Manufacturing Systems. Springer-Verlag, Berlin.