Fitting Phase-Type Distributions to Data from a Telephone Call Center by Eva Ishay Supervisors: Dr. Eitan Greenstein Prof. Avishai Mandelbaum 1 Outlines Motivation Objective The Research and Results The Data Flow chart Selecting the Right Model Conclusions Future Research 2 Motivation The world of call centers is vast. “70% of all customer-business interactions occurs in call centers”; “700$ billions in goods and services were sold through call centers in 1997”; “3% of the U.S. working population is currently employed in call centers”; “anywhere from 200,000 to 350,000 call centers, which employ anywhere between 4 to 6.5 million people”. 3 Motivation (continued…) Call center data prevalent call center data is only averaged over periods of fixed durations data at the individual call-transaction level was recently collected (Mandelbaum, Sakov, Zeltyn) design, management and optimization of performance possible only as a result of system modeling and deep analysis of data supporting the model. 4 Objective Analysis of Service Times and Customers’ Patience Fitting Phase-Type (PH) distributions Comparison & choosing a MODEL 5 Process of a Customer Interaction with a Call Center: VRU Customer joining the system waiting time in queue service agent Customer abandons the system End of Service 6 Thesis Flow Chart: DATA 1. 2. Service time Customers’ Patience Choosing Order & Structure Non-parametric estimation 1. Kernel Density estimator EMpht* program 2. Kaplan-Meier technique 1. 2. EM-algorithm PH-distributions Output: Graphs 1. 2. Comparison 1. 2. 3. Visual Confidence Interval Goodness-of-fit tests Output: Graphs Structure NO YES Choosing The Model 7 The Data Service Times – the positive time a customer spends with an agent, till departure from the service/system Customers’ Patience – the time a customer is willing to wait in queue before being served 8 Phase-Type Distributions (PH) Definition: absorption time of an absorbing finite-space continuoustime Markov chain, with a single absorbing state . T inf{ t 0 : X t } has PH-distribution, with distribution FT (t ) 1 qe tR 1, t 0 X { X t , t 0} Markov process on states {1,2,..., K , } q (qi ) i 1,, K when q i probability to starting in state i (q 0) R is Markov-chain generator Note: representation via (q, R) non-unique. 9 Representation of PH-distribution General structure Special cases Erlang distribution: qj 1 2 h(t) K 3 t Hyperexponential distribution: j q1 Ri,j (rates) qi=P(X0=i) h qk t k Coxian distribution: 1 qh h(t) q2 Rh i 1 p12 2 k1 pk-1,k k 1-p12 Erlang mixtures: q1 q2 qk 10 Why do we use PH-distributions as Statistical Models? dense: for every non-negative distribution G, there exists a sequence of PH-distributions Fn Fn G w structurally informative: versatile for modeling and computationally tractable underlying processes: modeling underlying stages of the service understanding customer behavior by modeling patience 11 EM-algorithm (Expectation-Maximization) EM algorithm – an iterative method for maximum likelihood estimation Goal: estimate parameter γ, X ~ fγ Problem: X unobservable data Principle: augment the observed data Y with latent data X (“missing” data) Y = u(X) with density gγ X with density fγ (complete data) γn the current estimate after n steps The n+1 step consists of finding γn+1 which maximizes E [log f ( X ) | u ( X ) y ] n E-step: evaluation of conditional expectation M-step: maximization 12 Estimation PH-distribution via EM-algorithm observed y1,…,yn of time to absorption – incomplete observation of Markov process X(t) unobserved Xt[1],…, Xt[n] - n independent replications of the underlying process then K K K K N Bi f (x; q, R ) qi exp{ Rii Z i } Rij ij i 1 i 1 i 1 j j i multi-parameter exponential family, with sufficient statistic S (( Bi ) i 1,, K , ( Z i ) i 1,, K , ( N ij ) i 1,, K , j ,1,, K ,i j ) Bi – number of Markov processes starting in state i, i=1,…,K Zi – total time spent in state i, i=1,…,K Nij – total number of jumps from state i to state j, i≠j, i=1,…,K , j=,1,…,K EM-algorithm E-step: calculation of the conditional expectation of S, given y1,…, yn and current estimates of (q,R) M-step: maximization of likelihood f(x;q,R) Note: an estimation from a censored data is performed in a similar way 13 EMpht-program for fitting PH-distributions (Asmussen, Olsson) Sample Non-censored Right-censored Interval-censored Approximation of any continuous distribution on [0,) by minimizing the Kullback-Leibler information 14 Nonparametric Methods: Estimation of Service Time Survival function S(t) = 1- F(t) = P(T>t) number of calls still receiving service at time t 1 n ˆ S (t ) 1(Ti t ) total number of calls n i 1 Density function f(t) number of calls ending service in the interval beginning at time t fˆ (t ) (total number of calls in service)(interval width) n t Ti ˆf (t ) 1 K , nh i 1 h Kernel density estimator Hazard function K(u)du 1 P(t T t t | T t ) f (t ) , h(t ) , H (t ) log e S (t ) t 0 t S (t ) h(t ) lim number of calls ending service in the interval beginning at time t hˆ(t ) (number of calls still receiving service at time t )(interval width) Super-smoother - a nonparametric regression method, is based on a symmetric k-nearest neighbor linear least squares procedure. Cross-validation is used to choose a value of k. 15 Service Times: The QUICK-HANG phenomenon! ? 0.006 December January 0.006 5.7% 5.5% 5.2% 0.005 0.005 Mean = 184 SD = 230 CV = 1.25 Median = 113 N = 27091 Min = 1 Max = 5300 0.004 Density Density 0.004 0.003 0.003 0.002 0.002 0.001 0.001 0.000 0.000 0 100 200 300 400 500 Time 600 700 800 900 Mean = 207 SD = 273 CV = 1.32 Median = 128 N = 34433 Min = 2 Max = 11868 1000 0 100 200 300 400 500 600 700 800 900 1000 Time 16 Kernel Density Estimator of Service Time December 0.006 5.2% 0.005 Mean = 207 SD = 273 CV = 1.32 Median = 128 N = 34433 Min = 2 Max = 11868 Density 0.004 0.003 0.002 0.001 0.000 0 100 200 300 400 500 Time 600 700 800 900 Histogram with h = 10 easy to construct and interpret discontinuous estimator choice of bandwidth (h), tradeoff – bias versus variance Kernel density estimator with a Gaussian kernel of width = 30 continuous and smooth estimator Shape is not exponential ! Density function proportion of customers that departure from the service in any time interval the peaks of high frequency of departure from the service 1000 17 Hazard Rate Estimation of Service Time Raw Hazard rates unstable as the time increases Super-smoother smoothes the raw hazard rates up to 1000 non-failure times are zero (for correction the behavior of the tail). 18 Design of Service Times Service types: PS – regular activity IN – internet consulting NE – stock exchange activity NW – potential customer Welcome Priority types: Low priority – regular customers High priority – stocks, V.I.P. Farewell 19 Lognormal Distribution of Service Times (Mandelbaum, Sakov, Zeltyn, Wharton Business School) Service Times –December ln (Service Times –December) 0.006 0.45 0.40 0.005 0.35 0.30 Density Density 0.004 0.003 0.002 0.25 0.20 0.15 0.10 0.001 0.05 0.00 0.000 0 100 200 300 400 500 600 Time in sec. 700 800 900 1000 0.6 1.2 1.8 2.4 3.0 3.6 4.2 4.8 5.4 6.0 6.6 7.2 7.8 8.4 9.0 Time in sec. Log-normal(μ=4.8, σ=1.03) E(LN)=207 SD(LN)=284 CV(LN)=1.37 Normal(μ=4.8, σ=1.03) 20 Nonparametric Methods: Estimation of Patience Observed T – positive waiting time in queue until SERVICE (call end up with service – censored observation) ABANDON (call end up with abandonment – failure time) Goal: estimate patience which in case of service, is censored. Survival Analysis by Kaplan-Meier (KM) setup Estimate Survival function Sˆ (t ) and Hazard function hˆ(t ) Sˆ (t ) Aj 1 B j:B j t j Density function 1 hˆ j :B j t j fˆ (t ) Sˆ (t ) hˆ(t ) T1 < T2 < ∙∙∙ < Tm ordered observed abandonment times m - number of distinct abandonment times in sample (m ≤ n) Bj – number of customers still in queue at Tj Aj – number of customers abandon at Tj 21 Hazard rate for Patience –December PS 3 K Heterogeneity of Customers q1 1 q2 qk 0.006 * ** * * * super smoother up to 400 super smoother up to 300 super smoother up to 200 accept reality 2 Hazard rate 1 * optimistic 0.008 loss of patience Phases of Patience *** * ** * ** * * ** * * *** * * * * * *** * * *** * * * * ** ** * * * * *** * * *** ** * * * ** * * * * * * * 0.002 * * * * * * * * * * * * * ** ** ** ** * * * ** * ** * * * * * * * * * * * * * * * * * * * * * * * * ** * ** * ** ** ** *** * * * ** ** ** ** * * * * ** ** * * * * * ****** * * * * ** **** * * * * * * * * ** ** * * **** * **** ** ** * ** ***** ** *** * * * * * * *** **** * *** * * * * * * * ** ***** *** *** * * * * * * 0.000 * *** ** ** ************************* 0 30 60 90 120 150 180 210 240 270 300 0.004 k Reality – a message about customers’ place in queue and the time the first customer in line is waiting. I II Time III 22 Service Times - December Fitted PH-distribution of order k=3 Survival function Distribution function 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 -3 x 10 500 Time 1000 0 Density 0 -3 x 10 500 Time Fitted mean = 207 Fitted SD = 253 CV = 1.22 1000 Hazard rate lim h(t ) min( 1 , 2 , 3 ) t 5 6 4 3 4 2 2 1 0 0 200 400 600 Time 800 1000 0 0 200 400 600 Time 800 1000 fitted PH – dashed curve (blue) empirical – solid curve (green) 23 Service Times: Internal structures of fitted PH-distribution a) General PH-structures, starting with different initial values of parameters: b) Coxian structure: 24 Simultaneous CI around Empirical CDF Resolution 1.36/√n, PH of order 3 Distribution function Distribution function 0.4 0.2 0.18 0.38 0.16 0.36 0.14 0.34 0.12 0.32 0.1 0.3 0.08 0.28 0.06 0.26 0.04 0.24 0.02 0 0.22 0 10 20 30 40 - input, - - fitted PH 50 60 0.2 60 65 70 75 Distribution function 0.7 80 85 90 - input, - - fitted PH 95 100 105 290 300 310 110 Distribution function 0.85 0.65 0.6 0.8 0.55 0.5 0.75 0.45 0.4 110 120 130 140 150 160 170 180 - input, - - fitted PH 190 200 210 220 0.7 220 230 240 250 260 270 280 - input, - - fitted PH 320 25 Service Times – December Fitted PH of order k=2,3,4,5,6 Densities -3 x 10 empirical k=2 k=3 k=4 k=5 k=6 5 4 3 2 1 0 0 100 200 300 400 500 600 Time in sec. 700 800 900 1000 26 Service Times – December Fitted PH of order k=2,3,5 (continued…) Survival functions 1 empirical k=2 k=3 k=5 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 100 200 300 400 500 600 Time in sec. 700 800 900 1000 27 Simultaneous CI around Empirical CDF Resolution 1.36/√n, PH of order k=3,4,5 0.2 Distribution function 0.15 0.1 k=3 k=4 k=5 Lower Upper empirical CDF 0.05 0 0 10 20 30 Time in sec. 40 50 60 28 Service Times - December Summary of Goodness-of-fit tests EDF tests – measure the discrepancy between an empirical CDF and a hypothesized CDF F H0: F(t) = F0(t), F0(t) – a specific PH-distribution H0 is accepted if D* (A2) ≤ cγ(a), cγ(a) – critical values of K-S (A-D) tests 34433 k=2 k=3 k=4 k=5 k=6 D* 17.503 3.754 3.613 1.708 1.799 A2 459.214 20.417 15.294 3.492 3.408 D 0.094 0.020 0.019 0.009 0.010 k=2 k=3 k=4 k=5 k=6 D* 5.537 1.182 1.134 0.535 0.564 A2 35.795 2.046 1.530 0.350 0.341 D 0.094 0.020 0.019 0.009 0.010 3443 29 Service Times - December, by service types Stochastic ordering X ≤st Y if Fc(t) ≤ Gc(t) t Survival function Hazard function 1 0.01 NW PS NE IN 0.9 0.8 0.009 0.008 0.7 0.007 0.6 0.006 0.5 0.005 0.4 0.004 0.3 0.003 0.2 0.002 0.1 0.001 0 0 200 400 600 Time in sec. 800 1000 0 0 Mean (NW ) = 128 Mean (PS) = 182 Mean (NE) = 285 Mean (IN) = 398 200 400 600 Time in sec. 800 1000 30 Patience – December – PS, PS for High and Low priorities Empirical results x 10 6 PS High Low 0.9 PS High Low 5 0.8 4 0.7 3 0.6 2 0.5 1 0.4 Hazard rates -3 Survival functions 1 0 0 100 200 300 400 500 Time in sec. 600 700 800 0 30 60 Average wait (PS-High ) = 91 Average wait (PS) = 99 Average wait (PS-Low) = 111 90 120 Time in sec. 150 180 31 Patience – PS, December Phase-type fits of a general coxian structure Hazard rates -3 x 10 Survival functions 1 Kaplan-Meier estimator general coxian of k=20 general coxian of k=25 general coxian of k=30 0.9 0.8 super smoother up to 200 general coxian of k=20 general coxian of k=25 general coxian of k=30 5 4 0.7 0.6 3 0.5 0.4 2 0.3 0.2 1 0.1 0 0 100 200 300 400 500 600 Time in sec. 700 800 900 1000 1100 0 0 30 60 90 120 Time in sec. 150 180 32 Patience – PS, December, by priorities Derived structures by fitting coxian structure of order 30 0.99 5 PS – Low Priority 0.89 5 5 0.01 5 80 4 4 0.81 4 3 0.18 0.11 0.18 5 7 3 8 9 7 6 4 5 4 4 3 3 3 4 4 7 5 3 1683 10 PS 0.81 5 10 0.9 10 9 9 9 9 0.85 8 21 0.15 0.1 0.96 8 45 13 7 5 0.88 8 8 0.04 10 8 8 9 0.12 0.96 13 4 8 35 8 14 25 8 7 1814 0.04 6 6 6 PS – High Priority 7 8 6 5 8 21 0.14 0.07 0.97 12 8 11 4 4 6 55 0.9 9 0.03 5 6 17 13 3 5 10 25 7 11 5 1891 0.1 7 33 Service Times: Approximation of Lognormal(μ=4.8, σ=1.03) by PH of order 3 Survival function Distribution function 1 1 fitted PH Log-normal 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 500 -3 x 10 1000 0.99 Density function 0.86 25 95 se c se c 0.14 0.01 0 200 400 600 800 1000 -3 x 10 0.03 0.07 0.54 0.9 0.45 0.01 25 6 se c 94 25 se c se c 0.25 31 5 se c 0.75 Hazard function 7 5 6 5 4 3 4 3 2 1 0 0 200 400 600 800 1000 Time 2 1 0 0 200 400 600 800 1000 Time Fitted mean = 198 Fitted SD = 230 CV = 1.16 E(LN) = 207 SD(LN) = 284 CV(LN) = 1.37 34 Comparison between Log-normal and Phase-Type distributions Objective 2 (ln y ) 2 1 2 dy min ( f PH ( y ) f LN ( y )) dy min q exp{ Ry} r exp 2 0 0 q,R q,R 2 y 2 for any specific order k of PH-distribution. Method of Moments Optimization methods Constrained nonlinear minimization, using Matlab Minimizing the information divergence, using EMpht 35 Comparison of the two optimization methods μ = 1, σ = 0.5 Matlab EMpht Matlab EMpht Matlab EMpht Matlab EMpht k=2 0.0321 0.0335 0.71 0.71 3.32 3.08 2.35 2.18 k=3 0.0098 0.0099 0.58 0.58 3.04 3.08 1.76 1.78 k=4 0.0023 0.0028 0.51 0.54 2.94 3.08 1.49 1.65 k=5 0.0022 0.0006 0.51 0.53 2.95 3.08 1.51 1.63 k = 5* 0.0004 Distance CV(LN) = 0.53 0.53 E(LN) = 3.08 3.02 SD(LN) = 1.63 1.59 μ = 0, σ=1 Matlab EMpht Matlab EMpht Matlab EMpht Matlab EMpht k=2 0.0437 0.0469 1.00 1.31 1.63 1.65 1.63 2.16 k=3 0.0437 0.0019 1.00 1.24 1.63 1.65 1.63 2.04 k=4 0.0011 0.0013 1.07 1.29 1.53 1.65 1.64 2.12 k=5 0.0011 0.0013 1.05 1.31 1.52 1.65 1.58 2.15 k = 5* 0.0011 Distance CV(LN) = 1.31 1.16 E(LN) = 1.65 1.58 SD(LN) = 2.16 1.83 36 Conclusions Model for Service Times: PH order 3 already provides a reasonable fit. Large samples requires more phases for a perfect fit. Model for Customers’ Patience: PH order 30 of Coxian structure provides a perfect fit. Only then can trap the peaks around 15 and 60 seconds. 37 Future research Ongoing: PH-models for patience, by different service types Advanced models for patience: a mixture of PH-distribution with a small number of phases and two distributions with a small variance that capture the peaks Analysis of data from other call centers Physical interpretation to the phases of service and patience. 38