Document

advertisement
Fitting Phase-Type Distributions
to Data from a Telephone Call Center
by Eva Ishay
Supervisors: Dr. Eitan Greenstein
Prof. Avishai Mandelbaum
1
Outlines
 Motivation
 Objective
 The Research and Results
 The Data
 Flow chart
 Selecting the Right Model
 Conclusions
 Future Research
2
Motivation

The world of call centers is vast.

“70% of all customer-business interactions occurs in
call centers”;

“700$ billions in goods and services were sold
through call centers in 1997”;

“3% of the U.S. working population is currently
employed in call centers”;

“anywhere from 200,000 to 350,000 call centers,
which employ anywhere between 4 to 6.5 million
people”.
3
Motivation (continued…)
 Call center data



prevalent call center data is only averaged over
periods of fixed durations
data at the individual call-transaction level was
recently collected (Mandelbaum, Sakov, Zeltyn)
design, management and optimization of
performance possible only as a result of system
modeling and deep analysis of data supporting the
model.
4
Objective
 Analysis of Service Times and
Customers’ Patience
 Fitting Phase-Type (PH) distributions
 Comparison & choosing a MODEL
5
Process of a Customer Interaction with
a Call Center:
VRU
Customer
joining
the system
waiting time in queue
service agent
Customer abandons
the system
End of
Service
6
Thesis Flow Chart:
DATA
1.
2.
Service time
Customers’ Patience
Choosing
Order & Structure
Non-parametric estimation
1. Kernel Density estimator
EMpht* program
2. Kaplan-Meier technique
1.
2.
EM-algorithm
PH-distributions
Output:
Graphs
1.
2.
Comparison
1.
2.
3.
Visual
Confidence Interval
Goodness-of-fit tests
Output:
Graphs
Structure
NO
YES
Choosing
The Model
7
The Data
 Service Times – the
positive time a
customer spends with
an agent, till departure
from the
service/system
 Customers’ Patience –
the time a customer is
willing to wait in queue
before being served
8
Phase-Type Distributions (PH)
 Definition: absorption time of an absorbing finite-space continuoustime Markov chain, with a single absorbing state .
T  inf{ t  0 : X t  } has PH-distribution, with distribution
FT (t )  1  qe tR 1, t  0



X  { X t , t  0} Markov process on states {1,2,..., K , }
q  (qi ) i 1,, K when q i probability to starting in state i (q  0)
R is Markov-chain generator
 Note: representation via (q, R) non-unique.
9
Representation of PH-distribution
General structure
Special cases
 Erlang distribution:
qj
1
2
h(t)
K
3
t
 Hyperexponential distribution:
j
q1

Ri,j (rates)
qi=P(X0=i)
h
qk
t
k
 Coxian distribution:
1
qh
h(t)
q2
Rh
i
1
p12
2
k1
pk-1,k
k
1-p12
 Erlang mixtures:
q1
q2
qk
10
Why do we use PH-distributions as
Statistical Models?
 dense: for every non-negative distribution G, there
exists a sequence of PH-distributions Fn  Fn  G
w
 structurally informative: versatile for modeling and
computationally tractable
 underlying processes:


modeling underlying stages of the service
understanding customer behavior by modeling patience
11
EM-algorithm (Expectation-Maximization)




EM algorithm – an iterative method for maximum likelihood estimation
Goal: estimate parameter γ, X ~ fγ
Problem: X unobservable data
Principle: augment the observed data Y with latent data X (“missing” data)
 Y = u(X) with density gγ
 X with density fγ (complete data)
 γn the current estimate after n steps
 The n+1 step consists of finding γn+1 which maximizes
  E [log f  ( X ) | u ( X )  y ]
n
 E-step: evaluation of conditional expectation
 M-step: maximization
12
Estimation PH-distribution via
EM-algorithm
 observed y1,…,yn of time to absorption – incomplete observation of Markov
process X(t)
 unobserved Xt[1],…, Xt[n] - n independent replications of the underlying process
 then
K
K
K
K
N
Bi
f (x; q, R )   qi  exp{ Rii Z i }  Rij ij
i 1
i 1
i 1 j  
j i
multi-parameter exponential family, with sufficient statistic
S  (( Bi ) i 1,, K , ( Z i ) i 1,, K , ( N ij ) i 1,, K , j   ,1,, K ,i  j )



Bi – number of Markov processes starting in state i, i=1,…,K
Zi – total time spent in state i, i=1,…,K
Nij – total number of jumps from state i to state j, i≠j, i=1,…,K , j=,1,…,K
 EM-algorithm


E-step: calculation of the conditional expectation of S, given y1,…, yn and current
estimates of (q,R)
M-step: maximization of likelihood f(x;q,R)
 Note: an estimation from a censored data is performed in a similar way
13
EMpht-program for fitting
PH-distributions (Asmussen, Olsson)
 Sample



Non-censored
Right-censored
Interval-censored
 Approximation of any continuous distribution on
[0,) by minimizing the Kullback-Leibler
information
14
Nonparametric Methods:
Estimation of Service Time
 Survival function S(t) = 1- F(t) = P(T>t)
number of calls still receiving service at time t 1 n
ˆ
S (t ) 
 1(Ti  t )
total number of calls
n i 1
 Density function f(t)
number of calls ending service in the interval beginning at time t
fˆ (t ) 
(total number of calls in service)(interval width)

n
 t  Ti 
ˆf (t )  1
K
,
 
nh i 1  h 
Kernel density estimator
 Hazard function
 K(u)du  1
P(t  T  t  t | T  t )
f (t )
, h(t ) 
, H (t )   log e S (t )
t 0
t
S (t )
h(t )  lim
number of calls ending service in the interval beginning at time t
hˆ(t ) 
(number of calls still receiving service at time t )(interval width)

Super-smoother - a nonparametric regression method, is based on a symmetric
k-nearest neighbor linear least squares procedure. Cross-validation is used to choose
a value of k.
15
Service Times:
The QUICK-HANG phenomenon!
?
0.006
December
January
0.006
5.7%
5.5%
5.2%
0.005
0.005
Mean = 184
SD = 230
CV = 1.25
Median = 113
N = 27091
Min = 1
Max = 5300
0.004
Density
Density
0.004
0.003
0.003
0.002
0.002
0.001
0.001
0.000
0.000
0
100
200
300
400
500
Time
600
700
800
900
Mean = 207
SD = 273
CV = 1.32
Median = 128
N = 34433
Min = 2
Max = 11868
1000
0
100
200
300
400
500
600
700
800
900
1000
Time
16
Kernel Density Estimator of Service Time

December
0.006
5.2%
0.005
Mean = 207
SD = 273
CV = 1.32
Median = 128
N = 34433
Min = 2
Max = 11868
Density
0.004

0.003


0.002
0.001
0.000
0
100
200
300
400
500
Time
600
700
800
900
Histogram with h = 10

easy to construct and
interpret

discontinuous estimator

choice of bandwidth (h),
tradeoff – bias versus
variance
Kernel density estimator with a
Gaussian kernel of width = 30

continuous and smooth
estimator
Shape is not exponential !
Density function

proportion of customers
that departure from the
service in any time interval

the peaks of high
frequency of departure
from the service
1000
17
Hazard Rate Estimation of Service Time

Raw Hazard rates
 unstable as the
time increases

Super-smoother
 smoothes the
raw hazard
rates up to
1000
 non-failure
times are zero
(for correction
the behavior
of the tail).
18
Design of Service Times


Service types:

PS – regular activity

IN – internet consulting

NE – stock exchange activity

NW – potential customer
Welcome
Priority types:

Low priority – regular
customers

High priority – stocks, V.I.P.
Farewell
19
Lognormal Distribution of Service Times
(Mandelbaum, Sakov, Zeltyn, Wharton Business School)
Service Times –December
ln (Service Times –December)
0.006
0.45
0.40
0.005
0.35
0.30
Density
Density
0.004
0.003
0.002
0.25
0.20
0.15
0.10
0.001
0.05
0.00
0.000
0
100
200
300
400
500
600
Time in sec.
700
800
900
1000
0.6
1.2
1.8
2.4
3.0
3.6
4.2
4.8
5.4
6.0
6.6
7.2
7.8
8.4
9.0
Time in sec.
Log-normal(μ=4.8, σ=1.03)
E(LN)=207
SD(LN)=284
CV(LN)=1.37
Normal(μ=4.8, σ=1.03)
20
Nonparametric Methods:
Estimation of Patience
 Observed T – positive waiting time in queue until


SERVICE (call end up with service – censored observation)
ABANDON (call end up with abandonment – failure time)
 Goal: estimate patience which in case of service, is censored.
 Survival Analysis by Kaplan-Meier (KM) setup
 Estimate Survival function Sˆ (t ) and Hazard function hˆ(t )
Sˆ (t ) 

Aj 


1

 B 
j:B j  t 
j 
 Density function




 1  hˆ
j :B j  t
j

fˆ (t )  Sˆ (t )  hˆ(t )
T1 < T2 < ∙∙∙ < Tm ordered observed abandonment times
m - number of distinct abandonment times in sample (m ≤ n)
Bj – number of customers still in queue at Tj
Aj – number of customers abandon at Tj
21
Hazard rate for Patience –December
PS
3
K
 Heterogeneity of Customers
q1
1
q2
qk
0.006
*
**
*
*
*
super smoother up to 400
super smoother up to 300
super smoother up to 200
accept reality
2
Hazard rate
1
*
optimistic
0.008
loss of patience
 Phases of Patience
***
*
**
*
**
* *
**
*
*
*** * *
*
* * *** * *
*** *
*
*
*
**
** * * * * *** *
* *** ** * *
* ** *
*
*
*
*
*
*
0.002
*
*
*
* * *
*
* * *
*
*
* ** ** **
**
*
*
*
** * **
*
*
* *
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
* * * ** * ** * ** ** **
*** *
* * **
**
**
** * * * * ** **
*
*
*
* *
******
*
*
*
*
** **** * * *
*
*
*
*
*
** **
*
* **** * **** ** **
* **
***** ** ***
*
*
* * * * ***
****
*
***
*
*
*
*
*
* *
** ***** *** ***
* * * * *
*
0.000
* ***
** ** *************************
0
30
60
90
120
150
180
210
240
270
300
0.004
k
 Reality – a message about
customers’ place in queue and
the time the first customer in
line is waiting.
I
II
Time
III
22
Service Times - December
Fitted PH-distribution of order k=3
Survival function
Distribution function
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
-3
x 10
500
Time
1000
0
Density
0
-3
x 10
500
Time
 Fitted mean = 207
 Fitted SD = 253
 CV = 1.22
1000
Hazard rate
lim h(t )  min( 1 , 2 , 3 )
t 
5
6
4
3
4
2

2
1
0
0
200
400 600
Time
800 1000
0
0
200
400 600
Time
800 1000

fitted PH – dashed
curve (blue)
empirical – solid curve
(green)
23
Service Times:
Internal structures of fitted PH-distribution
a) General PH-structures, starting with different initial values of parameters:
b) Coxian structure:
24
Simultaneous CI around Empirical CDF
Resolution 1.36/√n, PH of order 3
Distribution function
Distribution function
0.4
0.2
0.18
0.38
0.16
0.36
0.14
0.34
0.12
0.32
0.1
0.3
0.08
0.28
0.06
0.26
0.04
0.24
0.02
0
0.22
0
10
20
30
40
- input, - - fitted PH
50
60
0.2
60
65
70
75
Distribution function
0.7
80
85
90
- input, - - fitted PH
95
100
105
290
300
310
110
Distribution function
0.85
0.65
0.6
0.8
0.55
0.5
0.75
0.45
0.4
110
120
130
140
150
160
170
180
- input, - - fitted PH
190
200
210
220
0.7
220
230
240
250
260
270
280
- input, - - fitted PH
320
25
Service Times – December
Fitted PH of order k=2,3,4,5,6
Densities
-3
x 10
empirical
k=2
k=3
k=4
k=5
k=6
5
4
3
2
1
0
0
100
200
300
400
500
600
Time in sec.
700
800
900
1000
26
Service Times – December
Fitted PH of order k=2,3,5 (continued…)
Survival functions
1
empirical
k=2
k=3
k=5
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
100
200
300
400
500
600
Time in sec.
700
800
900
1000
27
Simultaneous CI around Empirical CDF
Resolution 1.36/√n, PH of order k=3,4,5
0.2
Distribution function
0.15
0.1
k=3
k=4
k=5
Lower
Upper
empirical CDF
0.05
0
0
10
20
30
Time in sec.
40
50
60
28
Service Times - December
Summary of Goodness-of-fit tests
 EDF tests – measure the
discrepancy between an empirical
CDF and a hypothesized CDF F
 H0: F(t) = F0(t),
F0(t) – a specific PH-distribution
 H0 is accepted if D* (A2) ≤ cγ(a),
cγ(a) – critical values of K-S
(A-D) tests
34433
k=2
k=3
k=4
k=5
k=6
D*
17.503
3.754
3.613
1.708
1.799
A2
459.214
20.417
15.294
3.492
3.408
D
0.094
0.020
0.019
0.009
0.010
k=2
k=3
k=4
k=5
k=6
D*
5.537
1.182
1.134
0.535
0.564
A2
35.795
2.046
1.530
0.350
0.341
D
0.094
0.020
0.019
0.009
0.010
3443
29
Service Times - December, by service types

Stochastic ordering
X ≤st Y if Fc(t) ≤ Gc(t)  t
Survival function
Hazard function
1
0.01
NW
PS
NE
IN
0.9
0.8
0.009
0.008
0.7
0.007
0.6
0.006
0.5
0.005
0.4
0.004
0.3
0.003
0.2
0.002
0.1
0.001
0
0
200
400
600
Time in sec.
800




1000
0
0
Mean (NW ) = 128
Mean (PS) = 182
Mean (NE) = 285
Mean (IN) = 398
200
400
600
Time in sec.
800
1000
30
Patience – December – PS,
PS for High and Low priorities
Empirical results
x 10
6
PS
High
Low
0.9
PS
High
Low
5
0.8
4
0.7
3
0.6
2
0.5
1
0.4
Hazard rates
-3
Survival functions
1
0
0
100
200
300
400
500
Time in sec.
600



700
800
0
30
60
Average wait (PS-High ) = 91
Average wait (PS) = 99
Average wait (PS-Low) = 111
90
120
Time in sec.
150
180
31
Patience – PS, December
Phase-type fits of a general coxian
structure
Hazard rates
-3
x 10
Survival functions
1
Kaplan-Meier estimator
general coxian of k=20
general coxian of k=25
general coxian of k=30
0.9
0.8
super smoother up to 200
general coxian of k=20
general coxian of k=25
general coxian of k=30
5
4
0.7
0.6
3
0.5
0.4
2
0.3
0.2
1
0.1
0
0
100
200
300
400
500 600
Time in sec.
700
800
900
1000 1100
0
0
30
60
90
120
Time in sec.
150
180
32
Patience – PS, December, by priorities
Derived structures by fitting coxian structure of order 30
0.99


5
PS – Low Priority
0.89
5
5
0.01
5
80
4
4
0.81
4
3
0.18
0.11
0.18
5
7
3
8
9
7
6
4
5
4
4
3
3
3
4
4
7
5
3
1683
10
PS
0.81
5
10
0.9
10
9
9
9
9 0.85
8
21
0.15
0.1
0.96
8
45
13
7
5
0.88
8
8
0.04
10
8
8
9
0.12
0.96
13
4
8
35
8
14
25
8
7
1814
0.04
6

6
6
PS – High Priority
7
8
6
5
8
21
0.14
0.07
0.97
12
8
11
4
4
6
55
0.9
9
0.03
5
6
17
13
3
5
10
25
7
11
5
1891
0.1
7
33
Service Times:
Approximation of Lognormal(μ=4.8, σ=1.03) by PH of order 3
Survival function
Distribution function
1
1
fitted PH
Log-normal
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
0
500
-3
x 10
1000
0.99
Density function
0.86
25
95
se
c
se
c
0.14
0.01
0
200 400 600 800 1000
-3
x 10
0.03
0.07
0.54
0.9
0.45
0.01
25
6
se
c
94
25
se
c
se
c
0.25
31
5
se
c
0.75
Hazard function
7
5
6
5
4
3
4
3
2
1
0
0
200 400 600 800 1000
Time
2
1
0
0
200 400 600 800 1000
Time



Fitted mean = 198
Fitted SD = 230
CV = 1.16



E(LN) = 207
SD(LN) = 284
CV(LN) = 1.37
34
Comparison between Log-normal
and Phase-Type distributions
 Objective
2


  (ln y   ) 2  
1
2
 dy
min  ( f PH ( y )  f LN ( y )) dy  min   q  exp{ Ry}  r 
exp 

2

0
0
q,R
q,R
2
y 2



for any specific order k of PH-distribution.
 Method of Moments
 Optimization methods


Constrained nonlinear minimization, using Matlab
Minimizing the information divergence, using EMpht
35
Comparison of the two optimization methods
μ = 1,
σ = 0.5
Matlab
EMpht
Matlab
EMpht
Matlab
EMpht
Matlab
EMpht
k=2
0.0321
0.0335
0.71
0.71
3.32
3.08
2.35
2.18
k=3
0.0098
0.0099
0.58
0.58
3.04
3.08
1.76
1.78
k=4
0.0023
0.0028
0.51
0.54
2.94
3.08
1.49
1.65
k=5
0.0022
0.0006
0.51
0.53
2.95
3.08
1.51
1.63
k = 5*
0.0004
Distance
CV(LN) = 0.53
0.53
E(LN) = 3.08
3.02
SD(LN) = 1.63
1.59
μ = 0,
σ=1
Matlab
EMpht
Matlab
EMpht
Matlab
EMpht
Matlab
EMpht
k=2
0.0437
0.0469
1.00
1.31
1.63
1.65
1.63
2.16
k=3
0.0437
0.0019
1.00
1.24
1.63
1.65
1.63
2.04
k=4
0.0011
0.0013
1.07
1.29
1.53
1.65
1.64
2.12
k=5
0.0011
0.0013
1.05
1.31
1.52
1.65
1.58
2.15
k = 5*
0.0011
Distance
CV(LN) = 1.31
1.16
E(LN) = 1.65
1.58
SD(LN) = 2.16
1.83
36
Conclusions
 Model for Service Times:


PH order 3 already provides a reasonable fit.
Large samples requires more phases for a
perfect fit.
 Model for Customers’ Patience:

PH order 30 of Coxian structure provides a
perfect fit. Only then can trap the peaks around
15 and 60 seconds.
37
Future research
 Ongoing: PH-models for patience, by
different service types
 Advanced models for patience: a mixture of
PH-distribution with a small number of
phases and two distributions with a small
variance that capture the peaks
 Analysis of data from other call centers
 Physical interpretation to the phases of
service and patience.
38
Download