Optimal Dynamic Pricing Policy for M/M/K Queue

advertisement
733 PROJECT REPORT by Qi Zhao
Optimal Dynamic Pricing Policy for M/M/K Queue
Abstract
In this report, I analyze the problem of how to maximize the long-run total
discounted reward in a queuing system. The problem can be formulated as a Markov
decision process. The control policy is increasing or decreasing the price which
encourages or discourages the arrival rate of customers. We can treat the arrival
process of customers as Poisson with arrival rate which is a decreasing function of the
current price. The service times of the servers are independently exponentially
distributed random variables with a fixed mean service rate. The total profits consist
of the customer’s payment and holding cost per unit which is accumulated along the
time. This is a continuous-time Markov chain problem. By using uniformization
method, we can transfer it to a discrete-event Markov chain. When each event
(customer arrival or service completion) happens, the manager should post a price
until the next event happens. We can prove that there exists an optimal stationary
policy for the infinite horizon to maximize the long-run reward. In addition, I showed
that the optimal policy is a non-decreasing function of the number of customers in the
system. Efficient solution approaches to solve this problem are value iteration method
and policy iteration method.
(Besides the analytical description and matlab simulation in this report, I also
programmed an applet by Java to validate the idea proposed in this project. I have put
it on the web site:http://people.bu.edu/zhaoqi/discrete%20project/disProject.html.
If
you can’t run it, please tell me about it, and I will find some other way to show you.)
1. Introduction:
In our daily life, it is an effective control method to control the queues by using
price control. For example, we can use higher daytime price for parking in the
downtown to reduce the traffic jam; extra charges for overload electricity; higher
tuition to limit the students applying for the university and so on. But as an owner or
manager, it is not the only objective to reducing the queues. The main objective is to
maximize the long-run reward or profit by using the proper pricing control method.
1
733 PROJECT REPORT by Qi Zhao
We should carefully balance the consequences of a price change. For instance, if the
manager increases the price at a time point, the arrival rate is reduced and hold costs
of the system tend to decrease. On the contrary, if the manager decreases the price at a
time point, the arrival rate will increase and the hold cost tend to increase with the
growth of queue length. So, we need to seek to an optimal policy in order to obtain
the maximum profit. David W. Low (1972) described the many different kinds of case
in this problem[1].
2. Problem formulation:
Let’s first consider the unlimited capacity queue reward system. A service facility
in an open market with infinite capacity offers the service to the customers with price
𝑝𝑖 charged. There are only one queue and one server for the customers. The service
times are exponentially distributed with fixed mean 1/u. We have the assumption that
once the manager decides the price, all of the customer can see it and it affects the
arrival rate of customers to the queue immediately. Customers arrive at the system
according to independent Poisson process with parameter 𝜆. They enter to the system
if their reservation price R (the maximum price that customers want to pay) is more
than the price charged. We assume that the reservation price R is a random variable
with uniform distributionUR (·). When each customer arrival or the customer service
completion, the decision maker must make a price to charge. The price is chosen from
a set of prices [𝑝𝑚𝑖𝑛 , 𝑝𝑚𝑎𝑥 ], where this set can be considered as a discrete set.
Accordingly, we can formulate that a customer does not enter the system with
probability of UR (p), and he enter the system with probability of 1 − UR (·) .
Moreover, the queue incurs a waiting cost, h per customer per unit time. The queue is
based on a first come, first serve principle with no priorities[2].
Figure1. One queue on server model
2
733 PROJECT REPORT by Qi Zhao
Then we can define the state space, the number of customers in the system, as
the set of nonnegative integers s={x(t); x(t)=0,1,…,N }. Then we consider the
𝐽𝜋 (𝑥(𝑡)) as the total expected discounted profit of the infinite horizon Markov
Decision Process associated to the control policy π. The state and control behavior at
any time t are denoted by x(t) and p(t), respectively, and they stay constant between
transitions. The uniformization method I use is referred to [3,4].
We use the following notation:
t k : The time of occurrence of the k-th transition. By convention, we denote t 0 = 0.
τk = t k − t k−1 : The k-th transition time interval.
xk = x t k : We have x(t)= xk for t k ≤ t ≤ t k+1 .
pk = p t k : We have p(t)= pk for t k ≤ t ≤ t k+1 .
Then the cost function of expected total discounted profit of infinite horizon Markov
Process associated to the control policy case can be written as:
tN
lim E{ e  t g ( x(t ), p(t ))dt}
N 
0
Where g is period cost function and β is a given positive discount parameter. Since
the state stay constant between transitions, the cost function of π is given by

J ( xk )   E{
tk 1
tk
k 0
e  t g ( x(t ), p(t ))dt | xk }
For a sequence {(xk , pk )}, the cost function can be expressed as:

lim E{ e  t g ( x(t ), p(t ))dt}   E{ e  t dt}E{g ( xk , pk )}
tN
N 
0
k 0
tN
0
If the service rate λ and the arrival rate μ all are fixed, we have that:
E{
tk 1
tk
e  t dt} 


1

1

1

E{e  tk }(1  E{e  k 1 })
E{e  (1 ... k ) }(1  E{e  k 1 })
a k (1  a)
3
733 PROJECT REPORT by Qi Zhao

where a  E{e  } 
ak
  

   .
Then we get that

 E{ e  t dt}E{g ( xk , pk )} 
k 0
tN
0
1

 a E{g ( x , p )}
  
k
k
k 0
k
Thus, we transform a continuous-time Markov chain problem with cost
tN
lim E{ e  t g ( x(t ), p(t ))dt}
N 
0
and rate of transition 𝜆 + 𝜇 that is independent of state is equivalent to a
discrete-time Markov chain problem with discount factor
a

  
And cost per stage given by
1
~
g ( xk , pk ) 
  
g ( xk , pk )
In particular, Bellman’s equation takes the form
J ( xk ) 
1
  
min [ g ( xk , pk )  (   ) pi , j ( pk ) J ( xk )]
p[ pmin , pmax ]
j
.
In our problem, the service rate μ is fixed, but the arrival rate λ can be controlled by
price policy, the relationship between price and arrival rate can be formulated as:
λ = 1 − UR p k
∗ λ = UR (pk )λ
λ is a fixed rate, so, λ ∈ [0, λ] .
Now, we use the trick of allowing fictitious transitions from a state to itself to get the
transition probability. We chose the rate of transition is λ + μ , then after
uniformization, we can get the transition probability of the system becomes that:
4
733 PROJECT REPORT by Qi Zhao
 
  , j  i 1
  


  
pi , j  
,i  j
 
 
, j  i 1






where i, j are the sates of the system that are the numbers of customers in the queue in
two neighborhood states.
Figure 2.1 Transition probability before uniformization
Figure 2.2 Transition probability after uniformization
In summary of the problem:
Sate space: x0  0; xk {Z} ;
Dynamic function:
5
733 PROJECT REPORT by Qi Zhao



 xk  1 ; prob
(

)





 
xk 1   xk ; (prob 
)




(

)
 xk  1 ; prob






where   U R ( p( xk )
.
Period cost function: gk ( xk , p( xk ))  1xk  xk1 0  p( xk )  hxk
.
Bellman’s equation takes the form:
J (0) 
1
  
J ( xk ) 
max [ p( xk )U R ( p( xk ))  (  U R ( p( xk ))   ) J (0)  U R ( p( xk )) J (1))]
p[ pmin , pmax ]
1
max [ p( x )U ( p( x ))  hxk  (  U R ( p( xk )) ) J ( xk )
k
R
k
     p[ p , p ]
U R ( p( xk )) J ( xk  1)   J ( xk  1)], xk  0.
min
max
We can define the T operator that act on a function J (xk) in this problem as:
T ( J )( xk ) 
1
max [ p( x )U ( p( xk ))  hxk  (  U R ( p( xk )) ) J ( xk )
k
R
     p[ p , p ]
U R ( p( xk )) J ( xk  1)   J ( xk  1)], xk  0
min
max
The stationary policy is that:
p* ( xk )  arg max  *{ p( xk )U R ( p( xk ))  U R ( p( xk )) J ( xk )  U R ( p( xk )) J ( xk  1)}
Where U R ( p( xk ))  1 
p( xk )  pmin R  p( xk )

.
R  pmin
R  pmin
So, we can continue to compute the optimal sulotion
Rp( xk )  p 2 ( xk ) R  p( xk )
R  p( xk )
p  arg max  *{

J ( xk ) 
J ( xk  1)}
R  pmin
R  pmin
R  pmin
*
 arg max

R  pmin
*{ p 2 ( xk )  ( R  ( xk 1 )) p( xk )  R( xk 1 )}
Where ( xk  1) the differential of the optimal cost
6
733 PROJECT REPORT by Qi Zhao
( xk  1)  J ( xk  1)  J ( xk )
xk 1  1 , 2 , . . .
If we solve the derivative of the function p* above and let it to be zero, we can get
1
the stationary point p  ( R  ( xk 1 ))
2
1

 pmin ; when 2 ( R  ( xk 1 ))  pmin

1

*
So, p ( xk )   pmax ; when ( R  ( xk 1 ))  pmax
2

1

( R  ( xk 1 )); otherwise

2

We can show that ( xk 1 ) is a non-increasing function, thus optimal solution p* is
the non-decreasing function of xk 1 .
For the sequence J 0 ( xk ), J 2 ( xk ),..., J k ( xk ) generated by the value iteration method,
let  k (0)  0; k ( xk )  J k ( xk )  J k ( xk  1) xk  1, 2,...
For the theory in the book proposition 3.1.7 , we have that J k ( xk )  J ( xk ) as
k   . It follows that lim k ( xk )  ( xk ) , xk  1, 2,...
k 
We use induction method to prove that  k ( xk ) is non-decreasing. Assuming that
 k ( xk ) is monotonically non-increasing, we show that is true for  k 1 ( xk ) .
For all xk  0,1, 2,... we have
k 1 ( xk  1)  J k 1 ( xk  1)  J k 1 ( xk )
1
[ p ( x )U ( p ( x )) h (xk
1)
    k k R k k
(  U R ( pk ( xk )) J) x
(k  1 ) UR p(k xk( ) )J kx (  2 )J kx (
( pk ( xk )U R ( pk ( xk)) hxk ( UR ( pk (xk ) ) J) kx( )
U R ( pk ) J ( xk 1) J (kx 1 ) ) ]


1
  
[h   k ( xk ) 
k
(x k
1
) ( 1U
Similarly, for xk  1, 2,3,... we can get
k 1 ( xk )  J k 1 ( xk )  J k 1 ( xk  1)
7
R
1)
p(k x (k ) )U)pkR( xk ))( k ( xk  2)]
733 PROJECT REPORT by Qi Zhao
1
[ p ( x )U ( p ( x ))  hxk  (  U R ( pk ( xk )) ) J ( xk )
   k k R k k
U R ( pk ( xk )) J ( xk  1)   J ( xk  1)

 pk ( xk )U R ( pk ( xk ))  h( xk  1)  (  U R ( pk ( xk )) ) J ( xk  1)
U R ( pk ( xk )) J ( xk )   J ( xk  2)]

1
[h   k ( xk  1)   k ( xk )(1  U R ( pk ( xk )))  U R ( pk ( xk )) k ( xk  1)]
  
Thus, by computing the difference of the two inequations, we can get that
 k 1 ( xk  1)   k 1 ( xk ) 
1
  
[  ( k ( xk )   k ( xk  1))
 ( k ( xk  1)   k ( xk ))(1  U R ( pk ( xk ))   ( k ( xk  1)   k ( xk ))]
0
Since we assume that  k ( xk ) is monotonically non-increasing, we can get that
 (k ( xk )  k ( xk  1))  0 ;
 (k ( xk  1)  k ( xk ))  0;
(k ( xk  1)  k ( xk ))  0;
1  U R ( pk ( xk )  0 . Thus,  k 1 ( xk ) is non-increasing. The optimal price p* ( xk ) is
non-decreasing with upper and lower bound pmin and pmax .
3. Numerical solution approach: Value iteration method or successive
approximation[3].
This method I use is introduced by reference [3].
Step1: guess a vector J 0  ( J 0 (0),...J 0 (n)) '  (0,0,...,0) ' . Apply the T operator for
T ( J 0 )( x)
for every
x
in S={0,1,2,…n} to obtain
J1  T ( J 0 )
and the
for every
x
corresponding optimal price p1 ( x) for every x .
Step2: Continue to obtain
J k 1  T ( J k )
and
pk ( x)
until
| J k 1  J k | tolerance . Then, the optimal policy is pk 1 ( x) .
4. A numerical example:
In this part, I will examine a problem by using the model and solution approach
8
733 PROJECT REPORT by Qi Zhao
introduced above. We consider a job shop manufacturing system with objective to
maximizing the expected long-run profit. The arrival process is a Poisson Process
with a rate of 5 customers per day, the service completion times are exponentially
distributed with a rate=10 customers per day (usually, we need the service rate should
greater than the arrival rate for we can control the length of the queue). The
reservation price R distribution is a uniform distribution between 100 and 200. The
price can be selected from the allowable set [100,200] with step size=1. That is to say,
there are 101 different control behaviors totally. In addition, the holding cost for
system is 150 per customer per day. There are only one queue and one server, and the
capacity of queue can be assumed to be infinity. The discount factor isβ = 1.[some
parameters in this problem are referred to reference 2]
By analyzing the problem above, we can get the controlled arrival rate is

  U R ( p)   (2  p /100) . After uniformization, we can transfer the original
system to a discrete time Markov chain in which a transition occurs every 1/16 days.
The optimal policy is numerically obtained by using the value iteration method with a
tolerance=0.001. In order to complete the simulation, I truncate the capacity to 50.
The optimal policy of the system can be shown as Figure 3. I use matlab to do the
simulation. I also use the java applet simulate it which will be introduced later.
the optimal price of the system
200
190
the optimal price
180
170
160
150
140
130
120
110
0
10
20
30
40
the number of customer in the system
Figure 3. The optimal policy for the problem
9
50
733 PROJECT REPORT by Qi Zhao
Result analysis:
As we have shown before, the optimal price is a non-decreasing function of the
number of customers in the system. Therefore, it is not necessary for us to show the
optimal price for all states x>16. The price for x>16 are all the upper bound of the
allowable price set.
5. Model extensions:
1.) One queue K servers system
We have discussed the one queue one server system in the previous part. Now, let’s
consider the case that a system consists of one queue and multiple servers. Assume
that there are K servers in the system, and the service rates of these servers are sameμ.
The system can be described as Figure 4 below:
Figure 4. One queue K servers model
The structure of this system is similar to the system one queue one server we have
discussed before, but there is a little difference. The different point is that the
transition probability becomes




xk  1;( prob 
)

  K



   min( xk , K )
xk 1   xk ;( prob  1 
)


K



 min( xk , K )
)
 xk  1;( prob 
  K


.
Then we can use the same modeling technique and solution method to solve this
problem.
10
733 PROJECT REPORT by Qi Zhao
A numerical example:
Consider the same queuing system as in the previous example with difference that
there are K servers. All other assumption is in effect. Then I solve the optimal price of
this problem.
For K=2, the optimal policy for the system is shown as Figure 5 below:
the optimal price policy when K=2
200
the optimal price
180
160
140
120
100
0
10
20
30
40
the number of customers in the system
50
Figure 5. Optimal policy for the problem
For K=3, the optimal policy for the system is shown as Figure 6 below:
the optimal price policy when K=3
200
the optimal price
180
160
140
120
100
0
10
20
30
40
the number of customers in the system
50
Figure 6. Optimal policy for the problem
Results analysis:
As they can be easily seen from the two figures, the optimal prices decrease when the
number of servers namely K in the system increases. Intuitively, as the number of
11
733 PROJECT REPORT by Qi Zhao
servers in the system increases, the system can operate with a relatively lower load,
thus the opportunity cost of a new customer decreases. In this way, the manager of the
system can make the lower price to encourage customers to enter the system.
2.) N queues K servers system
In realistic cases, it is common that there are more than one queue and more than one
server. In this case, we assume that the number of servers K is greater or equal to the
number of queues, namely K≥N. We also assume that service rate of the servers are
all μ and the arrival rates of different queues are different from each other denoted
by i , i  1, 2...N . The effect of price change to these arrival rates of queues are
different and are denoted by Ui ( p( xk )), i  1, 2...N . Thus, after uniformization, the
transition probability of this system becomes that
xk 1
N

iU i ( P( xk ))



xk  1; ( prob  i 1 N
)

i  K 


i 1

N

iU i ( P( xk ))   min( xk , K )



i 1
  xk ; ( prob  1 
)
N



K


i

i 1

 min( xk , K )

xk  1; ( prob  N
)



K


i

i 1



Then, the structure of the system and optimal solution approach are all similar to the p
one queue one server model discussed previously.
3.) Quadratic holding cost for one queue one server system
For realistic cases, different commercial model may have different holding cost
function. In previous cases, we define the cost function is proportional to the number
of customers in the system. But, we can consider the case that letting customer
waiting on the queue is a very expensive action that manager should take it into
account seriously. In this case, we formulate the cost function as a quadratic function
12
733 PROJECT REPORT by Qi Zhao
that the holding cost is proportional to square of number of customers in the system.
Namely, H ( xk )  hxk2 , h  0, xk  0,1,..., N .
In this case, the cost function has changed. But we can also show that the optimal
price p∗ is a non-decrease function of the number of customers in the system.
H ( xk )  hxk2 , h  0, xk  0,1,..., N is a concave and strictly increasing function of xk .
For all xk  0,1, 2,... we have
k 1 ( xk  1)  J k 1 ( xk  1)  J k 1 ( xk )

1
  
[h(2 xk  1)   k ( xk )   k ( xk )(1  U R ( pk ( xk )))  U R ( pk ( xk )) k ( xk  2)]
Similarly, we can get
k 1 ( xk )  J k 1 ( xk )  J k 1 ( xk  1)

1
  
[h(2 xk  1)   k ( xk  1)   k ( xk )(1  U R ( pk ( xk )))  U R ( pk ( xk )) k ( xk  1)]
Thus, by compute the difference of the two inequations, we can get that
 k 1 ( xk  1)   k 1 ( xk ) 
1
  
[2h   ( k ( xk )   k ( xk  1))
 ( k ( xk )   k ( xk  1))(U R ( pk ( xk  1)  U R ( pk ( xk ))   ( k ( xk  1)   k ( xk ))]
0
So, we can conclude that in this case, the optimal price p* is also an increasing
function of xk .
A numerical example:
Consider the one queue one server queuing system as in the previous example with
difference that the holding costs is a quadratic function of the number of customer in
the system H ( xk )  45* xk2 , h  0, xk  0,1,..., N . All other assumptions are in effect.
Then the optimal policy is shown as Figure6.
By observing the result in the figure, we can verify that our conclusion that in the
quadratic penalty case, the optimal price is also an increasing function of xk .
13
733 PROJECT REPORT by Qi Zhao
optimal policy for the system
200
190
the optimal price
180
170
160
150
140
130
120
110
0
10
20
30
40
the number of customers in the system
50
Figure 6. Optimal policy for the problem
6. Java Applet simulation:
Besides the matlab simulation method, I also use the java to write an applet to
simulate
the
problem.
You
can
visit
the
website
http://people.bu.edu/zhaoqi/discrete%20project/disProject.html to see the introduction
of the problem, the model of this problem and simulation of this model. You can run
the program with you own parameters. I give some screen shots of the results are
shown as Figure 7, Figure 8.1, and Figure 8.2.
Figure 7. Interface of the simulation program
7. Imperfection of The Simulation:
In this simulation, I can’t get the realistic data to simulate it. So, I use some
parameters I defined to simulate the program and it turns out that total expected
14
733 PROJECT REPORT by Qi Zhao
discounted value may go to negative for some parameters which means the company
will lose some money. I know it is not a good result for a manager, but how to better
define the parameter to obtain the profit depends on the realistic situation in our daily
life. I need to do some investigate to get these data to do simulation more precisely.
Also, according to some survey of this idea, I find that it is forbidden for some
situations in commercial applications for fair shake reason.
Figure 8.1 The graphic analysis screen shot
Figure 8.2 The graphic analysis screen shot
8.Conclusion:
In this project, we first describe the realistic problem that in our daily life that how to
maximize the long-run total expected reward in a queuing system. It is an efficient
tool for a manager to use dynamic pricing policy to maximize the reward while
minimize the holding cost in the system. Then, we formulated the problem as a
M/M/1 Markov decision process involves a continuous-time Markov chain. By using
the uniformization, we can transfer it to an equal discrete-event Markov chain to
analyze it for convenient. The discrete-event problem can be solved by dynamic
programming algorithm with infinite horizon and a discount factor β. We can get the
15
733 PROJECT REPORT by Qi Zhao
Bellman equation and we prove that the cost function can converge to a certain value
and the optimal price within its allowable upper and lower bound is a non-decreasing
function of the number of customer in the system. By using the value iteration method
we can get the optimal price for a particular numerical example and the result are
constant with my conclusion. Then, I introduce some extended models include
quadratic holding cost function model, one queue K servers model, and N queues K
servers model.
8. References:
[1] David W. Low, optimal Dynamic Policies for an M/M/s Queue, IBM Science
Center, Los Angeles, California 1972
[2] Tuba Aktaran-Kalayci, Hayriye Ayhan, Sensitivity of optimal policy to system
parameter in a steady-state service facility, European Journal of operational research
193(2009) 120-128
[3] Dimitri P. Bertsekas, Dynamic Programming and Optimal Control VOL.2
[4] Christos G. Cassandras, Stephane Lafortune, Introduction to Discrete Event
Systems (second edition) 2008.
16
Download