Workforce Management and Optimization using Stochastic Network Models Yingdong Lu, Ana Radovanović and Mark S. Squillante IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598 January 2006 Abstract Mathematical models have been playing crucial roles in workforce management during the last several decades, from the use of linear programs in staff scheduling to the application of queueing techniques in call center management. In this study, we are interested in the stochastic modelling of the workforce capacity planning process, and we provide performance analysis and analytic decision support for this process. As previously mentioned, demands are streams of engagements. Each engagement requires the service of several different classes of resources for a certain amount of time. After the completion of the engagement, a reward is collected, and the resources are available to be assigned to other engagements. For example, to fulfill an IT service contract, a team consisting of a project manager and several IT specialists are needed for a period of three months. If there are not enough resources to fulfill the demand of an engagement, the engagement is at risk of being lost and financial penalties are incurred with such losses. This is a very typical model for the business processes in consulting service, hospitals and government. In fact, our study is directly motivated by resource planning problems in a major IT consulting company. We model this problem by a stochastic loss network, then calculate the minimum capacity required for a high percentage of engagements to be fulfilled. We develop a model based on stochastic loss networks to characterize the dynamics and uncertainty in general workforce management and optimization. We formulate profit maximization problems with serviceability constraints under different assumptions on demand and supply. Though these optimization problems are in general nonlinear programming problems, we are able to observe some intrinsic properties of the functions that facilitate efficient computation of the optimal solution. Numerical results demonstrate that our model provides capacity planning decisions that yield better results than available in current practice. 1 Introduction The efficient management and planning of a large workforce is a challenge faced by many companies, especially those in the service industry, in which revenue is largely accounted by the billable time charged on their employees’ commitment on business engagements. A typical service engagement consists of different tasks, and simultaneously executed by resource (workforce) with different attributes (skills). Any shortage of the required resources can result in the failure of the entire engagement. Meanwhile, there can be many types of engagements with different resource and time requirements. In addition, uncertainty appears naturally in engagement demand, process and resource supply. Ignoring the uncertainty will result in high risk of losing engagements, resulting in lost revenue. Therefore, we are dealing with a stochastic planning problem with multi-type demand and multi-attribute supply. Stochastic loss networks have been extensively studied as models for telecommunication networks (e.g., see, Kelly et al. [6], [7], [8], [9], [10]). However, there are also significant differences between our model and those in traditional loss networks. First, in comparison with circuit-switched networks, human resources exhibit a much higher degree of flexibility. For example, an engagement can require only 20% of a certain resource, and the same resource can be used to handle multiple engagements. These assumptions create new features in the model formulation, which require us to search the solutions in a much larger domain. ∗ Presented at 2006 IEEE International Conference on Service Operations and Logistics, and Informatics, June 2006, Shanghai, China. Also presented at International Workshop on Applied Probability, April 2006, Connecticut, USA. 2 The second major difference is time scale. In telecommunications networks, the duration of the calls are usually more homogeneous, and small in comparison with the planning horizon. However, in our workforce study, the length of engagement duration can vary across a large range, from several hours to one or two years, while the planing horizon in the service industry is usually a month or a quarter. Therefore, these differences need to be correctly captured in the model. The third difference is the action lead time. The actions in the planing problem include: additions (hire), reductions (lay-off) and reallocations (retrain) of resource capacities. Besides the cost associated with them as in telecommunication applications, there are also significant lead times and uncertainties that are associated with these actions. As in the case of telecommunication networks, the stochastic model has a closed-form expression for the stationary distribution of the number of engagements in the system. However, since the state space grows exponentially as the capacity increases, it becomes computationally expensive to use the exact formulas to obtain performance measures such as engagement loss probabilities. A fixed point approximation scheme is widely used as a method to estimate the loss probability in communication network applications. Several recursive procedures for identifying the fixed point have been suggested and shown to be efficient in searching for the fixed points; see, e.g. [11]. However, these procedures do not provide flexibility in incorporating the additional requirements in our model, such as serviceability and financial objectives. We formulate an overall performance optimization problem, in which the fixed point is characterized by a nonlinear constraint. Along with serviceability constraints, and financial objectives, the optimization problem produces a desired capacity plan. We are also able to formulate a stochastic dynamic programming to incorporate actions hiring and firing over time, and lead to overall optimal performance (e.g., profit maximization). We develop computational methods to efficiently solve this dynamic program. The rest of the paper is organized as follows. In Section 2, we present the detailed mathematical model and some preliminary results. Then, in Section 3, we discuss the optimization problems that are addressed. Finally, the paper is concluded in Section 4. 2 the capacity of resource type i. Furthermore, there are J engagement types, indexed from 1 to J, that arrive as independent Poisson processes with rate λj , j = 1, 2, · · · , J. Hence, the overall arrival process is a Poisson process with rate λ, λ = λ1 + λ2 + · · · + λJ . For each engagement type j, j = 1, 2, · · · , J, let Aij , i = 1, 2, . . . , I, represent the amount of resources i required by an engagement of type j. Upon an arrival of an engagement, it is accepted only if all the resources required are available; otherwise, the engagement is lost. For each type j engagement, let Fj (·) be a distribution function Rfor its service time, and µj the service ∞ rate, i.e., µj = ( 0 (1 − Fj (y))dy)−1 . In the rest of the paper we will use ρj to denote λj /µj , 1 ≤ j ≤ J, which in the queueing literature is well known as the traffic intensity. After the service has been completed, the engagement leaves the system, and all the resources are released. The performance metric of interest is the loss probability Lj , 1 ≤ j ≤ J, i.e., the probability of losing an engagement of type j due to insufficient resources. Let n(∞) be a vector whose elements represent the long-run average number of engagements of each type in the system. From classical results on loss networks, e.g., see [10], it is well known that π(n) = G(C)−1 n J Y ρj j , n ! j=1 j n ∈ S(C), (1) where π(n) := P[n(∞) = n], and S(C) := {n ∈ ZJ+ : An ≤ C} (2) n J X Y ρj j . G(C) = n ! j j=1 (3) n∈S(C) Although important for theoretical development, the Erlang formula (1) can not be used for computing the stationary loss probability due to its high complexity. In many similar applications, the Erlang fixed point method, an approximation scheme that considers independence between loss events for different resource types, has been shown to be effective. In this study, we take a similar approach. Moreover, since Aij can be fractional implying possibly fractional capacity values, we relax the fixed point method to allow the variables to take on nonnegative real values. Thus, define ¡ ¢−1 B(a, x) = a−x ea Γ(x + 1, a) (4) A stochastic network model where Γ(x, a) denotes We assume that our planing horizon is [0, T ] and is long enough for the system to reach stationarity. There are I resource types, indexed from 1 to I. Let Ci be Z Γ(x, a) = a 3 ∞ e−y y x−1 dy. 3.2 B(a, x) is obtained by continuous relaxation of the Erlang formula (1) without the normalization coefficient G(C)−1 ; this can be easily verified using the definition of the Gamma function (see also Jagerman [5] for numerical properties of this relaxation). Next, let Bi , i = 1, 2, · · · , I be the probability of insufficient resources of type i, 1 ≤ i ≤ I, and, by the Erlang fixed point approximation we have Bi = B(ηi , Ci ), where ηi , (1 − Bi )−1 X j Aij ρj Y In order to cope with fluctuating engagement arrival rates, we use a stochastic dynamic program to identify the optimal policy u∗ that governs capacity planning actions, such as hiring and firing, that maximizes an overall profit throughout the planning horizon. Assume that a planning horizon has N stages of fixed length T , long enough for the system to reach stationarity within each stage. Then, for engagement type j, 1 ≤ j ≤ J, assume that its arrival rate changes from stage to stage according to a Markov chain, Λj , with transition matrix Pj . Assume that the lead time of all planning actions is denoted by L where L ≤ T . Planning actions are taken within the stage, and their cost incurred within the stage, such that they become effective at the beginning of the next stage. We assume the hiring and firing costs to be a linear function of the u u , and fired, Fi,n , under number of people hired, Hi,n policy u, in period n, with rate hi and fi , respectively. u , Furthermore, following the single-stage notation, Ci,n u Bi,n and Luj,n denote the resource i capacity, the probability of insufficient resources of type i and the loss probability of engagements of type j, respectively, for stage n and under policy u. Under the above assumptions, the optimal control problem that maximizes the overall profit can be formulated as follows: N J X X max e−β(n−1) E Rj Λj (1 − Luj,n ) (5) (1 − Bk )Akj k represents the effective demand rate for resource type i, i = 1, 2, . . . , I, assuming the mutual independence of loss events for different resource types. Finally, Lj , j = 1, 2, · · · , J, can be estimated as Y Lj ≈ 1 − (1 − Bi )Aij . (6) i 3 Optimization formulations The goal of the optimizations addressed in this paper is to maximize the profit rate while satisfying serviceability requirements. In our problem, cost is incurred for using resources and we assume that it is a linear function of the resource capacities. For each class i resource, let ci be a cost rate paid for a unit of resource type i, 1 ≤ i ≤ I. Revenue is collected for each accepted engagement and we again treat it as a linear function of the effective arrival rate, i.e., the arrival rate discounted by the loss effect. For each accepted engagement of type j, 1 ≤ j ≤ J, let Rj be the collected revenue; thus, the overall revenue rate for type j engagements is Rj λj (1 − Lj ). Service requirements are represented as constraints on the engagement loss probabilities, where we assume that they can not exceed some pre-specified value αj for type j, 1 ≤ j ≤ J. 3.1 u rj λj (1 − Lj ) − j=1 s.t. j=1 I X # £ ¤ u u u ci Ci,n + hi Hi,n + fi Fi,n (8) i=1 s.t. u u u Bi,n = B(ηi,n , Ci,n ), 1 ≤ n ≤ N , 1 ≤ i ≤ I; Y u Aij 1 − (1 − Bi,n ) ≤ αj , 1 ≤ n ≤ N , 1 ≤ j ≤ J; i u u u u Ci,n = Ci,n−1 + Hi,n−1 − Fi,n−1 , n = 2, 3, . . . , N ; where β ≥ 0 is the discount factor. Under the assumption of stationarity, a single-stage model provides optimal resource capacities. More specifically, the optimization problem takes the following form: max n=1 − Single-stage model J X Stochastic dynamic programming model I X ci Ci 4 Conclusions We have developed a set of optimization models based on stochastic loss network models for workforce management and planning. Our models capture the basic dynamics of the workforce evolution and provide optimal decisions for cost minimization while meeting service requirements. Numerical results show that the (7) i=1 Bi = B(ηi , Ci ), 1 ≤ i ≤ I; Y 1− (1 − Bi )Aij ≤ αj , 1 ≤ j ≤ J. i 4 models provide better results than those available in current practice. In both of our models, the basic optimization problem is a nonlinear programming problem. When Aij satisfies certain nonsingularity conditions, see, e.g., [10], the problem is a convex program with a strictly convex objective function (obviously after some simple transformation to the single stage problem), hence, the uniqueness of the solution can be guaranteed, and the problem can be solved very efficiently. In general, we can not expect the uniqueness of the solution in terms of Bi , and, therefore, the optimization can be very difficult. However, we show that the solution is unique to Lj , and then exploit this fact together with an open-source interior-point software package, IPOPT, to efficiently obtain the optimal solution. During this process, we also employ other numerical procedures that include: functional transforms to smoothing out the constraints; introducing proper intermediate variables and slack variables; and so on. Details can be found in our related works that focus on the computational aspects of the problem. [8] Kelly, F. P. Routing in Circuit-Switched Networks: Optimization, Shadow Prices and Decentralization, Adv. Appl. Prob., Vol. 20, (1988) 112144. [9] Kelly, F. P. Routing and Capacity Allocation in Networks with Trunk Reservation, Math. Oper. Res. , vol 15, No 4, (1990) 771-793. [10] Kelly, F. P. Loss Networks, Ann. Appl. Prob., Vol. 1, No. 3, (1991) 319-178. [11] Whitt, W. Blocking When Service is Required From Several Facilities Simultaneously, AT& T Technical Journal, Vol. 64, No. 8, (1985) 18071856 [12] Whitt, W., Convergence of Stochastic Processes, 2002 References [1] Asmussen, S. Applied Probability and Queues, 2nd Edition, Springer, 2000. [2] Berezner, S. A., Krzesinski, A. E., and Taylor, P. G. On the Inverse of Erlang’s Function, J. Appl. Prob., Vol 35, (1998) 246-252. [3] Gibbens, R., Kelly, F.P., and Key P. B. A Decision-Theoretic Approach to Call Admission Control in ATM Networks, IEEE Jour. on Selected Areas in Comm. Vol. 13, No 6, (1995) 1101-1113. [4] Kawajiri, K. and Laird, C. D. Introduction to IPOPT: A tutorial for downloading, installing and usring IPOPT. [5] Jagerman, D. L. Mathods in Traffic Calculations, AT&T Bell Lab. Tech. Jour. , Vol. 63, No. 7, (1984) 1283-1310. [6] Kelly, F. P. Blocking Probabilities in Large Circuity-Switched Networks, Adv. Appl. Prob., Vol. 18, (1986) 473-505. [7] Kelly, F. P. One-Dimensional Circuit-Switched Networks, Ann. Prob., Vol. 15, No. 3, (1987) 11661179. 5