733 PROJECT REPORT by Qi Zhao Optimal Dynamic Pricing Policy for M/M/K Queue Abstract In this report, I analyze the problem of how to maximize the long-run total discounted reward in a queuing system. The problem can be formulated as a Markov decision process. The control policy is increasing or decreasing the price which encourages or discourages the arrival rate of customers. We can treat the arrival process of customers as Poisson with arrival rate which is a decreasing function of the current price. The service times of the servers are independently exponentially distributed random variables with a fixed mean service rate. The total profits consist of the customer’s payment and holding cost per unit which is accumulated along the time. This is a continuous-time Markov chain problem. By using uniformization method, we can transfer it to a discrete-event Markov chain. When each event (customer arrival or service completion) happens, the manager should post a price until the next event happens. We can prove that there exists an optimal stationary policy for the infinite horizon to maximize the long-run reward. In addition, I showed that the optimal policy is a non-decreasing function of the number of customers in the system. Efficient solution approaches to solve this problem are value iteration method and policy iteration method. (Besides the analytical description and matlab simulation in this report, I also programmed an applet by Java to validate the idea proposed in this project. I have put it on the web site:http://people.bu.edu/zhaoqi/discrete%20project/disProject.html. If you can’t run it, please tell me about it, and I will find some other way to show you.) 1. Introduction: In our daily life, it is an effective control method to control the queues by using price control. For example, we can use higher daytime price for parking in the downtown to reduce the traffic jam; extra charges for overload electricity; higher tuition to limit the students applying for the university and so on. But as an owner or manager, it is not the only objective to reducing the queues. The main objective is to maximize the long-run reward or profit by using the proper pricing control method. 1 733 PROJECT REPORT by Qi Zhao We should carefully balance the consequences of a price change. For instance, if the manager increases the price at a time point, the arrival rate is reduced and hold costs of the system tend to decrease. On the contrary, if the manager decreases the price at a time point, the arrival rate will increase and the hold cost tend to increase with the growth of queue length. So, we need to seek to an optimal policy in order to obtain the maximum profit. David W. Low (1972) described the many different kinds of case in this problem[1]. 2. Problem formulation: Let’s first consider the unlimited capacity queue reward system. A service facility in an open market with infinite capacity offers the service to the customers with price 𝑝𝑖 charged. There are only one queue and one server for the customers. The service times are exponentially distributed with fixed mean 1/u. We have the assumption that once the manager decides the price, all of the customer can see it and it affects the arrival rate of customers to the queue immediately. Customers arrive at the system according to independent Poisson process with parameter 𝜆. They enter to the system if their reservation price R (the maximum price that customers want to pay) is more than the price charged. We assume that the reservation price R is a random variable with uniform distributionUR (·). When each customer arrival or the customer service completion, the decision maker must make a price to charge. The price is chosen from a set of prices [𝑝𝑚𝑖𝑛 , 𝑝𝑚𝑎𝑥 ], where this set can be considered as a discrete set. Accordingly, we can formulate that a customer does not enter the system with probability of UR (p), and he enter the system with probability of 1 − UR (·) . Moreover, the queue incurs a waiting cost, h per customer per unit time. The queue is based on a first come, first serve principle with no priorities[2]. Figure1. One queue on server model 2 733 PROJECT REPORT by Qi Zhao Then we can define the state space, the number of customers in the system, as the set of nonnegative integers s={x(t); x(t)=0,1,…,N }. Then we consider the 𝐽𝜋 (𝑥(𝑡)) as the total expected discounted profit of the infinite horizon Markov Decision Process associated to the control policy π. The state and control behavior at any time t are denoted by x(t) and p(t), respectively, and they stay constant between transitions. The uniformization method I use is referred to [3,4]. We use the following notation: t k : The time of occurrence of the k-th transition. By convention, we denote t 0 = 0. τk = t k − t k−1 : The k-th transition time interval. xk = x t k : We have x(t)= xk for t k ≤ t ≤ t k+1 . pk = p t k : We have p(t)= pk for t k ≤ t ≤ t k+1 . Then the cost function of expected total discounted profit of infinite horizon Markov Process associated to the control policy case can be written as: tN lim E{ e t g ( x(t ), p(t ))dt} N 0 Where g is period cost function and β is a given positive discount parameter. Since the state stay constant between transitions, the cost function of π is given by J ( xk ) E{ tk 1 tk k 0 e t g ( x(t ), p(t ))dt | xk } For a sequence {(xk , pk )}, the cost function can be expressed as: lim E{ e t g ( x(t ), p(t ))dt} E{ e t dt}E{g ( xk , pk )} tN N 0 k 0 tN 0 If the service rate λ and the arrival rate μ all are fixed, we have that: E{ tk 1 tk e t dt} 1 1 1 E{e tk }(1 E{e k 1 }) E{e (1 ... k ) }(1 E{e k 1 }) a k (1 a) 3 733 PROJECT REPORT by Qi Zhao where a E{e } ak . Then we get that E{ e t dt}E{g ( xk , pk )} k 0 tN 0 1 a E{g ( x , p )} k k k 0 k Thus, we transform a continuous-time Markov chain problem with cost tN lim E{ e t g ( x(t ), p(t ))dt} N 0 and rate of transition 𝜆 + 𝜇 that is independent of state is equivalent to a discrete-time Markov chain problem with discount factor a And cost per stage given by 1 ~ g ( xk , pk ) g ( xk , pk ) In particular, Bellman’s equation takes the form J ( xk ) 1 min [ g ( xk , pk ) ( ) pi , j ( pk ) J ( xk )] p[ pmin , pmax ] j . In our problem, the service rate μ is fixed, but the arrival rate λ can be controlled by price policy, the relationship between price and arrival rate can be formulated as: λ = 1 − UR p k ∗ λ = UR (pk )λ λ is a fixed rate, so, λ ∈ [0, λ] . Now, we use the trick of allowing fictitious transitions from a state to itself to get the transition probability. We chose the rate of transition is λ + μ , then after uniformization, we can get the transition probability of the system becomes that: 4 733 PROJECT REPORT by Qi Zhao , j i 1 pi , j ,i j , j i 1 where i, j are the sates of the system that are the numbers of customers in the queue in two neighborhood states. Figure 2.1 Transition probability before uniformization Figure 2.2 Transition probability after uniformization In summary of the problem: Sate space: x0 0; xk {Z} ; Dynamic function: 5 733 PROJECT REPORT by Qi Zhao xk 1 ; prob ( ) xk 1 xk ; (prob ) ( ) xk 1 ; prob where U R ( p( xk ) . Period cost function: gk ( xk , p( xk )) 1xk xk1 0 p( xk ) hxk . Bellman’s equation takes the form: J (0) 1 J ( xk ) max [ p( xk )U R ( p( xk )) ( U R ( p( xk )) ) J (0) U R ( p( xk )) J (1))] p[ pmin , pmax ] 1 max [ p( x )U ( p( x )) hxk ( U R ( p( xk )) ) J ( xk ) k R k p[ p , p ] U R ( p( xk )) J ( xk 1) J ( xk 1)], xk 0. min max We can define the T operator that act on a function J (xk) in this problem as: T ( J )( xk ) 1 max [ p( x )U ( p( xk )) hxk ( U R ( p( xk )) ) J ( xk ) k R p[ p , p ] U R ( p( xk )) J ( xk 1) J ( xk 1)], xk 0 min max The stationary policy is that: p* ( xk ) arg max *{ p( xk )U R ( p( xk )) U R ( p( xk )) J ( xk ) U R ( p( xk )) J ( xk 1)} Where U R ( p( xk )) 1 p( xk ) pmin R p( xk ) . R pmin R pmin So, we can continue to compute the optimal sulotion Rp( xk ) p 2 ( xk ) R p( xk ) R p( xk ) p arg max *{ J ( xk ) J ( xk 1)} R pmin R pmin R pmin * arg max R pmin *{ p 2 ( xk ) ( R ( xk 1 )) p( xk ) R( xk 1 )} Where ( xk 1) the differential of the optimal cost 6 733 PROJECT REPORT by Qi Zhao ( xk 1) J ( xk 1) J ( xk ) xk 1 1 , 2 , . . . If we solve the derivative of the function p* above and let it to be zero, we can get 1 the stationary point p ( R ( xk 1 )) 2 1 pmin ; when 2 ( R ( xk 1 )) pmin 1 * So, p ( xk ) pmax ; when ( R ( xk 1 )) pmax 2 1 ( R ( xk 1 )); otherwise 2 We can show that ( xk 1 ) is a non-increasing function, thus optimal solution p* is the non-decreasing function of xk 1 . For the sequence J 0 ( xk ), J 2 ( xk ),..., J k ( xk ) generated by the value iteration method, let k (0) 0; k ( xk ) J k ( xk ) J k ( xk 1) xk 1, 2,... For the theory in the book proposition 3.1.7 , we have that J k ( xk ) J ( xk ) as k . It follows that lim k ( xk ) ( xk ) , xk 1, 2,... k We use induction method to prove that k ( xk ) is non-decreasing. Assuming that k ( xk ) is monotonically non-increasing, we show that is true for k 1 ( xk ) . For all xk 0,1, 2,... we have k 1 ( xk 1) J k 1 ( xk 1) J k 1 ( xk ) 1 [ p ( x )U ( p ( x )) h (xk 1) k k R k k ( U R ( pk ( xk )) J) x (k 1 ) UR p(k xk( ) )J kx ( 2 )J kx ( ( pk ( xk )U R ( pk ( xk)) hxk ( UR ( pk (xk ) ) J) kx( ) U R ( pk ) J ( xk 1) J (kx 1 ) ) ] 1 [h k ( xk ) k (x k 1 ) ( 1U Similarly, for xk 1, 2,3,... we can get k 1 ( xk ) J k 1 ( xk ) J k 1 ( xk 1) 7 R 1) p(k x (k ) )U)pkR( xk ))( k ( xk 2)] 733 PROJECT REPORT by Qi Zhao 1 [ p ( x )U ( p ( x )) hxk ( U R ( pk ( xk )) ) J ( xk ) k k R k k U R ( pk ( xk )) J ( xk 1) J ( xk 1) pk ( xk )U R ( pk ( xk )) h( xk 1) ( U R ( pk ( xk )) ) J ( xk 1) U R ( pk ( xk )) J ( xk ) J ( xk 2)] 1 [h k ( xk 1) k ( xk )(1 U R ( pk ( xk ))) U R ( pk ( xk )) k ( xk 1)] Thus, by computing the difference of the two inequations, we can get that k 1 ( xk 1) k 1 ( xk ) 1 [ ( k ( xk ) k ( xk 1)) ( k ( xk 1) k ( xk ))(1 U R ( pk ( xk )) ( k ( xk 1) k ( xk ))] 0 Since we assume that k ( xk ) is monotonically non-increasing, we can get that (k ( xk ) k ( xk 1)) 0 ; (k ( xk 1) k ( xk )) 0; (k ( xk 1) k ( xk )) 0; 1 U R ( pk ( xk ) 0 . Thus, k 1 ( xk ) is non-increasing. The optimal price p* ( xk ) is non-decreasing with upper and lower bound pmin and pmax . 3. Numerical solution approach: Value iteration method or successive approximation[3]. This method I use is introduced by reference [3]. Step1: guess a vector J 0 ( J 0 (0),...J 0 (n)) ' (0,0,...,0) ' . Apply the T operator for T ( J 0 )( x) for every x in S={0,1,2,…n} to obtain J1 T ( J 0 ) and the for every x corresponding optimal price p1 ( x) for every x . Step2: Continue to obtain J k 1 T ( J k ) and pk ( x) until | J k 1 J k | tolerance . Then, the optimal policy is pk 1 ( x) . 4. A numerical example: In this part, I will examine a problem by using the model and solution approach 8 733 PROJECT REPORT by Qi Zhao introduced above. We consider a job shop manufacturing system with objective to maximizing the expected long-run profit. The arrival process is a Poisson Process with a rate of 5 customers per day, the service completion times are exponentially distributed with a rate=10 customers per day (usually, we need the service rate should greater than the arrival rate for we can control the length of the queue). The reservation price R distribution is a uniform distribution between 100 and 200. The price can be selected from the allowable set [100,200] with step size=1. That is to say, there are 101 different control behaviors totally. In addition, the holding cost for system is 150 per customer per day. There are only one queue and one server, and the capacity of queue can be assumed to be infinity. The discount factor isβ = 1.[some parameters in this problem are referred to reference 2] By analyzing the problem above, we can get the controlled arrival rate is U R ( p) (2 p /100) . After uniformization, we can transfer the original system to a discrete time Markov chain in which a transition occurs every 1/16 days. The optimal policy is numerically obtained by using the value iteration method with a tolerance=0.001. In order to complete the simulation, I truncate the capacity to 50. The optimal policy of the system can be shown as Figure 3. I use matlab to do the simulation. I also use the java applet simulate it which will be introduced later. the optimal price of the system 200 190 the optimal price 180 170 160 150 140 130 120 110 0 10 20 30 40 the number of customer in the system Figure 3. The optimal policy for the problem 9 50 733 PROJECT REPORT by Qi Zhao Result analysis: As we have shown before, the optimal price is a non-decreasing function of the number of customers in the system. Therefore, it is not necessary for us to show the optimal price for all states x>16. The price for x>16 are all the upper bound of the allowable price set. 5. Model extensions: 1.) One queue K servers system We have discussed the one queue one server system in the previous part. Now, let’s consider the case that a system consists of one queue and multiple servers. Assume that there are K servers in the system, and the service rates of these servers are sameμ. The system can be described as Figure 4 below: Figure 4. One queue K servers model The structure of this system is similar to the system one queue one server we have discussed before, but there is a little difference. The different point is that the transition probability becomes xk 1;( prob ) K min( xk , K ) xk 1 xk ;( prob 1 ) K min( xk , K ) ) xk 1;( prob K . Then we can use the same modeling technique and solution method to solve this problem. 10 733 PROJECT REPORT by Qi Zhao A numerical example: Consider the same queuing system as in the previous example with difference that there are K servers. All other assumption is in effect. Then I solve the optimal price of this problem. For K=2, the optimal policy for the system is shown as Figure 5 below: the optimal price policy when K=2 200 the optimal price 180 160 140 120 100 0 10 20 30 40 the number of customers in the system 50 Figure 5. Optimal policy for the problem For K=3, the optimal policy for the system is shown as Figure 6 below: the optimal price policy when K=3 200 the optimal price 180 160 140 120 100 0 10 20 30 40 the number of customers in the system 50 Figure 6. Optimal policy for the problem Results analysis: As they can be easily seen from the two figures, the optimal prices decrease when the number of servers namely K in the system increases. Intuitively, as the number of 11 733 PROJECT REPORT by Qi Zhao servers in the system increases, the system can operate with a relatively lower load, thus the opportunity cost of a new customer decreases. In this way, the manager of the system can make the lower price to encourage customers to enter the system. 2.) N queues K servers system In realistic cases, it is common that there are more than one queue and more than one server. In this case, we assume that the number of servers K is greater or equal to the number of queues, namely K≥N. We also assume that service rate of the servers are all μ and the arrival rates of different queues are different from each other denoted by i , i 1, 2...N . The effect of price change to these arrival rates of queues are different and are denoted by Ui ( p( xk )), i 1, 2...N . Thus, after uniformization, the transition probability of this system becomes that xk 1 N iU i ( P( xk )) xk 1; ( prob i 1 N ) i K i 1 N iU i ( P( xk )) min( xk , K ) i 1 xk ; ( prob 1 ) N K i i 1 min( xk , K ) xk 1; ( prob N ) K i i 1 Then, the structure of the system and optimal solution approach are all similar to the p one queue one server model discussed previously. 3.) Quadratic holding cost for one queue one server system For realistic cases, different commercial model may have different holding cost function. In previous cases, we define the cost function is proportional to the number of customers in the system. But, we can consider the case that letting customer waiting on the queue is a very expensive action that manager should take it into account seriously. In this case, we formulate the cost function as a quadratic function 12 733 PROJECT REPORT by Qi Zhao that the holding cost is proportional to square of number of customers in the system. Namely, H ( xk ) hxk2 , h 0, xk 0,1,..., N . In this case, the cost function has changed. But we can also show that the optimal price p∗ is a non-decrease function of the number of customers in the system. H ( xk ) hxk2 , h 0, xk 0,1,..., N is a concave and strictly increasing function of xk . For all xk 0,1, 2,... we have k 1 ( xk 1) J k 1 ( xk 1) J k 1 ( xk ) 1 [h(2 xk 1) k ( xk ) k ( xk )(1 U R ( pk ( xk ))) U R ( pk ( xk )) k ( xk 2)] Similarly, we can get k 1 ( xk ) J k 1 ( xk ) J k 1 ( xk 1) 1 [h(2 xk 1) k ( xk 1) k ( xk )(1 U R ( pk ( xk ))) U R ( pk ( xk )) k ( xk 1)] Thus, by compute the difference of the two inequations, we can get that k 1 ( xk 1) k 1 ( xk ) 1 [2h ( k ( xk ) k ( xk 1)) ( k ( xk ) k ( xk 1))(U R ( pk ( xk 1) U R ( pk ( xk )) ( k ( xk 1) k ( xk ))] 0 So, we can conclude that in this case, the optimal price p* is also an increasing function of xk . A numerical example: Consider the one queue one server queuing system as in the previous example with difference that the holding costs is a quadratic function of the number of customer in the system H ( xk ) 45* xk2 , h 0, xk 0,1,..., N . All other assumptions are in effect. Then the optimal policy is shown as Figure6. By observing the result in the figure, we can verify that our conclusion that in the quadratic penalty case, the optimal price is also an increasing function of xk . 13 733 PROJECT REPORT by Qi Zhao optimal policy for the system 200 190 the optimal price 180 170 160 150 140 130 120 110 0 10 20 30 40 the number of customers in the system 50 Figure 6. Optimal policy for the problem 6. Java Applet simulation: Besides the matlab simulation method, I also use the java to write an applet to simulate the problem. You can visit the website http://people.bu.edu/zhaoqi/discrete%20project/disProject.html to see the introduction of the problem, the model of this problem and simulation of this model. You can run the program with you own parameters. I give some screen shots of the results are shown as Figure 7, Figure 8.1, and Figure 8.2. Figure 7. Interface of the simulation program 7. Imperfection of The Simulation: In this simulation, I can’t get the realistic data to simulate it. So, I use some parameters I defined to simulate the program and it turns out that total expected 14 733 PROJECT REPORT by Qi Zhao discounted value may go to negative for some parameters which means the company will lose some money. I know it is not a good result for a manager, but how to better define the parameter to obtain the profit depends on the realistic situation in our daily life. I need to do some investigate to get these data to do simulation more precisely. Also, according to some survey of this idea, I find that it is forbidden for some situations in commercial applications for fair shake reason. Figure 8.1 The graphic analysis screen shot Figure 8.2 The graphic analysis screen shot 8.Conclusion: In this project, we first describe the realistic problem that in our daily life that how to maximize the long-run total expected reward in a queuing system. It is an efficient tool for a manager to use dynamic pricing policy to maximize the reward while minimize the holding cost in the system. Then, we formulated the problem as a M/M/1 Markov decision process involves a continuous-time Markov chain. By using the uniformization, we can transfer it to an equal discrete-event Markov chain to analyze it for convenient. The discrete-event problem can be solved by dynamic programming algorithm with infinite horizon and a discount factor β. We can get the 15 733 PROJECT REPORT by Qi Zhao Bellman equation and we prove that the cost function can converge to a certain value and the optimal price within its allowable upper and lower bound is a non-decreasing function of the number of customer in the system. By using the value iteration method we can get the optimal price for a particular numerical example and the result are constant with my conclusion. Then, I introduce some extended models include quadratic holding cost function model, one queue K servers model, and N queues K servers model. 8. References: [1] David W. Low, optimal Dynamic Policies for an M/M/s Queue, IBM Science Center, Los Angeles, California 1972 [2] Tuba Aktaran-Kalayci, Hayriye Ayhan, Sensitivity of optimal policy to system parameter in a steady-state service facility, European Journal of operational research 193(2009) 120-128 [3] Dimitri P. Bertsekas, Dynamic Programming and Optimal Control VOL.2 [4] Christos G. Cassandras, Stephane Lafortune, Introduction to Discrete Event Systems (second edition) 2008. 16