Using partially observed Markov processes to select optimal 夡 Moshe Givon

Omega 36 (2008) 477 – 485 www.elsevier.com/locate/omega Using partially observed Markov processes to select optimal termination time of TV shows夡 Moshe Givona , Abraham Grosfeld-Nira, b,∗ a Faculty of Management, Tel Aviv University, Tel Aviv 69978, Israel b Academic College of Tel-Aviv-Yaffo, 29 Melchet, Tel-Aviv 61560, Israel Received 14 September 2005; accepted 6 February 2006 Available online 30 March 2006 Abstract This paper presents a method for optimal control of a running television show. The problem is formulated as a partially observed Markov decision process (POMDP). A show can be in a “good” state, i.e., it should be continued, or it can be in a “bad” state and therefore it should be changed. The ratings of a show are modeled as a stochastic process that depends on the show’s state. An optimal rule for a continue/change decision, which maximizes the expected present value of profits from selling advertising time, is expressed in terms of the prior probability of the show being in the good state. The optimal rule depends on the size of the investment in changing a show, the difference in revenues between a “good” and a “bad” show and the number of time periods remaining until the end of the planning horizon. The application of the method is illustrated with simulated ratings as well as real data. 䉷 2006 Elsevier Ltd. All rights reserved. Keywords: Dynamic programming; Markov chain; POMDP; TV shows; Planning and control; Simulation 1. Introduction Television advertising is a multi-billion dollar industry. In 2003 advertisers in the U.S. spent over 48 billion dollars on television advertising [1]. Television shows live and die by their ratings (that is, the sampled proportion of households tuned in to their channel at their time slots, e.g., Nielsen and Arbitron ratings). The rates advertisers pay for the time they buy are tied directly to the ratings of the shows in which their commercials appear (See e.g., [2]). Since broadcasting costs are independent of the ratings, every rating point can be directly translated into profits. One of the biggest problems facing programming directors is to decide when a schedule is good and should be continued and when it is bad and should be changed. Suppose that the true viewing proportions of a certain show were known for all the remaining weeks of the season. In this case a price for advertising time could be set to reflect these proportions. Knowing the operating costs and the expected revenues, a programming director could decide whether to continue the show or change it. If with all this information it is worthwhile to continue the show, it is defined as being in a “good” state. Otherwise it is defined as being in a “bad” state. 夡 This manuscript was processed by Area Editor Prof. B. Lev. ∗ Corresponding author. Academic College of Tel-Aviv-Yaffo, 29 Melchet, Tel-Aviv 61560, Israel. Tel.: +972 3 5268111. E-mail addresses: givon@post.tau.ac.il (M. Givon), agn@mta.ac.il (A. Grosfeld-Nir). 0305-0483/$ - see front matter 䉷 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.omega.2006.02.002 478 M. Givon, A. Grosfeld-Nir / Omega 36 (2008) 477 – 485 However, in reality the true viewing proportions and the true state of a show are never known with certainty. Instead, the programming director must base his decision on a sample, that is TV ratings. It is well known that TV ratings suffer from validity and reliability problems. Sample size, non-response problems and unattended working sets all contribute to biased ratings with large sampling errors (see e.g., [3–5]). Advertisers and television managers are obliged, nevertheless, to use these imperfect ratings in setting advertising rates and evaluating shows. Moreover, it is possible for a once good show to turn bad due to, for example, audience fatigue and boredom, or the appearance of new shows in competing channels at the same time. It is the job of programming directors to identify this switch as fast as possible by looking at the show’s past ratings, imperfect as they may be. Previous research in this area concentrated on prediction of a show’s measured ratings using the previous year’s ratings, the show’s characteristics and characteristics of audience segments [6–9]. These predicted ratings were then used to design an optimal schedule for the whole season [7,10] and derive competitive scheduling strategies [12]. In these studies the ratings were considered to be the actual proportions of exposed audience. Our work differs from the previous research in two ways. First, we consider the ratings of a show as only partial information on the true proportions of tuned in audience; at the same time we recognize the fact that advertising revenues are directly related to the ratings. Second, we focus on the problem of controlling an already running schedule. The objective of this control process is to identify problems in real time and indicate the need to change the scheduling of a faltering show. We model the transition of a show between the two states, from one period to another, as a Markov chain. A show (and its time slot) switches between a good state and a bad state. The true states are unobservable and can only be inferred from the ratings. The transition probability matrix depends on the action taken at the beginning of the period, i.e., continue or change. Ratings are assumed to be random, their probability density functions (p.d.f.) depending only upon the true states. Revenue from selling advertising time during each period is assumed to be an increasing function of the ratings. This formulation of the problem is known in the literature as partially observed Markov decision process (POMDP). This area has many applications and has attracted the attention of many authors. Monahan [13] provides an excellent survey of the theory, models and algorithms. As many authors have pointed out, e.g., Astrom [14], Aoki [15], Bertsekas [16, Chapter 4], the problem of optimally selecting, periodically, between continue and change, can be formulated to depend upon the posterior probability (given the past actions and ratings) that the show is in the good state. One of the most important results (see, for example, [17–19] is that the optimal policy has the following simple structure: continue if, and only if, the posterior probability exceeds a certain control limit (CLT). Apart from a method developed by Grosfeld-Nir [20] for uniform observations, no analytical formula has been proposed to resolve the practical issue of computing the control limit. Algorithms and solution techniques can be found in [21–24,19]. The general formulation of the problem is for a finite number of states and a finite number of actions (see, for example, [13,19]). Applications of POMDP have been proposed in many areas; many of them can be found in [13,18,22]. An interesting application of POMDP for fishermen has been analyzed by [25]. Finally, a recent paper by Treharne and Sox [26] discusses application of POMDP to the inventory management for a single stocking location problem when demand distribution changes probabilistically according to exogenous Markov process. In Section 2 we give a precise formulation of the model. In Section 3 we illustrate the implementation of the model. We derive optimal rules for some simulated as well as real shows. We demonstrate how the optimal rule changes as a function of the number of periods remaining and the relative investment in changing a show. We use ratings data to investigate the forms of the p.d.f.s of the ratings and the parameters of the problem. Finally, in Section 4 we suggest possible extensions of our model for further research. 2. The model A show is being observed weekly. The show can be in either a GOOD or a BAD state. The true state of the show is unknown and can only be inferred from the ratings. We denote by f0 and f1 the p.d.f.s of the ratings in the good and bad states, respectively, and assume that f0 stochastically dominates f1 . The revenue during each period is an increasing function of the ratings. We denote by m0 and m1 the expected weekly revenue in the good and bad states, respectively (therefore m0 > m1 ). At the beginning of each week, management must choose one of the two actions: CONTINUE (do nothing) or CHANGE (renew or replace the show for a fixed cost K > 0). After the action has been selected state transitions occur. M. Givon, A. Grosfeld-Nir / Omega 36 (2008) 477 – 485 479 The states follow a Markov chain with transition probabilities: good bad good r 1−r bad 0 1 action: continue good bad good q 1−q bad q 1−q action: change We interpret r as the probability that a continued show which is in the good state during one week will remain in the good state during the next week. Once the show enters the bad state it remains there until the action change is employed. We interpret q as the prior probability that a new show is in the good state. The objective is to maximize the total present value of the expected future profits. We denote by x the probability that the show will be in the good state if, at the beginning of the period, the action continue is selected. We refer to x as the “information state”. Note that x = rP , where P is the posterior probability (given the past actions and observations) that the show is in the good state. Recall that the problem of optimally selecting between continue and change, can be formulated to depend upon P , therefore, x too can be used for that purpose. We use x rather than P , since it simplifies the mathematics. We define the following: f (x, y) = xf 0 (y) + (1 − x)f1 (y), (1) (x, y) = rxf 0 (y)/f (x, y). (2) We interpret f (x, y) as the p.d.f. of the observation, y, given that at the beginning of the period, the probability that the show is in the good state is x (i.e., the information state is x). We interpret (x, y) as the next information state if the current information state is x, the action continue is selected, and the observation y is obtained; (x, y) is obtained via Bayes’ formula. We denote by UnCON (x) the total of expected discounted future profits when there are n periods remaining, if the current action is continue, the information state is x, and all future actions are optimal. We denote by UnCHANGE (x) the total of expected discounted future profits when there are n period remaining, if the current action is change, the information state is x, and all future actions are optimal. Finally, we denote UnOPT (x) = max UnCON (x), UnCHANGE . To obtain ( = discounting factor) UnCON (x) = m1 + (m0 − m1 )x OPT + Un−1 ((x, y))f (x, y) dy, UnCHANGE = −K + UnCON (q) (3) for n1, with the initial condition U0OPT (x) = 0. Note that UnCON (x) equals the expected current income: x · m0 + (1 − x) · m1 plus the discounted expected future earnings. Disregarding the set of parameters for which the problem is trivial, i.e., UnCON (x) > UnCHANGE (and hence xn∗ = 0), the finite horizon control limit (the CLT) xn∗ is the solution of UnCON (x) = UnCHANGE . The CLT is the minimal value of the probability that the show is in the good state for which continue is preferable (not strictly) over change. Thus, the optimal policy is continue if x xn∗ , and change otherwise. Problem (3) depends upon three monetary parameters: m0 , m1 and K. Interestingly, it can be modified to depend upon only one such parameter. Proposition 1. Problem (3) can be transformed so that the CLTs depend upon a single monetary parameter, k = K/(m0 − m1 ). Note that k is the ratio of the investment K to the “incremental revenue” of a good show over a bad show (m0 − m1 ). 480 M. Givon, A. Grosfeld-Nir / Omega 36 (2008) 477 – 485 Proof. Let RnCON (x) ≡ UnCON (x) − m1 /(1 − ); RnCHANGE ≡ UnCHANGE − m1 /(1 − ) and define RnOPT (x) = max{RnCON (x), RnCHANGE }; therefore RnOPT (x) = UnOPT (x) − m1 /(1 − ). With this notation (3) becomes OPT RnCON (x) = (m0 − m1 )x + Rn−1 ((x, y))f (x, y) dy, RnCHANGE = −K + RnCON (q). (4) Let VnCON (x) ≡ RnCON (x)/(m0 − m1 ); VnCHANGE ≡ RnCHANGE /(m0 − m1 ); k ≡ K/(m0 − m1 ); and define VnOPT (x) = max{VnCON (x), VnCHANGE }; therefore VnOPT (x) = RnOPT (x)/(m0 − m1 ). With this notation (3) becomes CON OPT ((x, y))f (x, y) dy, Vn (x) = x + Vn−1 VnCHANGE = −k + VnCON (q), (5) where n1, and V0OPT (x) = − m1 . (1 − )(m0 − m1 ) It can be verified that the CLTs calculated via (5) are independent of the initial condition and it is simplest to use V0OPT (x) ≡ 0. We refer to (5) as the “optimality equations” and to VnCON (x) as the “value function”. 2.1. Calculating the CLTs Before proceeding to the next section, which occupies itself with the practical issue of estimating parameters and applying the model to simulated, as well as real shows, we wish to explain how we calculated the (finite horizon) control limits. As, typically, a season lasts 25 week we were interested in calculating the CLTs xn∗ , 1 n25. Since this is rather a short time we use the discounting factor = 1. It is easy to verify that V1CON (x) = x and V1CHANGE = −k + q, which implies x1∗ = q − k if q > k and x1∗ = 0 otherwise. Therefore, V1OPT (x) = q − k if x < x1∗ and V1OPT (x) = x if x x1∗ . Next, using (5), it is easy to calculate, numerically, V2CON (x), which is the key to continue and calculate V2CHANGE , and thus x2∗ , and V2OPT (x). After that we proceed to calculate, numerically, V3CON (x), and so on. 2.2. Conditions for a non-degenerate policy Intuitively, if the investment in renewing the show is prohibitively high, the degenerate policy “always continue” is optimal. Next we specify the conditions for such a degenerate policy to be optimal. Proposition 2. If k > q(1 + r + 2 r 2 + · · ·) = q/(1 − r) the degenerate policy “always continue” is optimal. Proof. From (5) we have V1CON (x) = x, 0 x 1, and V1CHANGE = −k + q. Hence, if k > q, which is implied by the condition above, we have V1CON (x) > V1CHANGE , 0 x 1, and the optimal action is continue (in that case we set x1∗ = 0). Therefore, also, V1OPT (x) = V1CON (x). Note that (x, y)f (x, y) = rxf 0 (y) and use (5), again, to obtain V2CON (x) = x + V1 [(x, y)]f (x, y) dy =x + (x, y)f (x, y) dy = x(1 + r) and V2CHANGE = −k + q(1 + r). Hence, if k > q(1 + r), which is implied by the condition above, we have V2CON (x) > V2CHANGE , 0 x 1, and the optimal action is continue (in that case we set x2∗ = 0). Therefore, also, V2OPT (x) = V2CON (x). M. Givon, A. Grosfeld-Nir / Omega 36 (2008) 477 – 485 481 Proceed in this manner to obtain V3CON (x) = x(1 + r + 2 r 2 ). The rest follows by induction. Comment: The proof is for an infinite time horizon. Observing the details it is clear that for 25 weeks, a somewhat tighter bound, namely k > q(1 + r + 2 r 2 + · · · + 24 r 24 ), could be used. 3. Application of the optimal rule In this section we show how the model can be applied. We discuss the characteristics of the control limit and give examples for its use with some simulated ratings as well as real shows. In order to apply the optimal rule according to Eqs. (3) and (5), one should have estimates of r and q, and the p.d.f.s f0 and f1 , and the relative investment in a show change, k. In order to make the illustration realistic, we used real data to estimate the parameters. For this purpose we obtained the Nielsen ratings for prime time shows in 1986. We divided the shows into a “bad” and a “good” group according to their ratings (in reality this exercise should be performed by programming directors who know much more about the profitability of shows). Looking only at the new shows, we were able to determine that about 50% of them were failures from the start, i.e., they entered the programs schedule being in the bad state. This implies that q should be about 0.5. In order to estimate r, which is the probability that a good show will remain in the good state in the next week, we used the geometric distribution. This distribution gives the probability of n good periods before a transition to the bad state, conditional on entering the first period in a good state. For example, r = 0.95 implies that 70% of the shows which entered the season in the good state will fail within the next 24 weeks of the season. The failure rate in the data we used was close to 40%, which implies r = 0.98. The beta distribution is a natural choice for the conditional distributions of the ratings, given the show’s state, i.e., f0 and f1 . The beta distribution ranges between zero and one and its density is flexible enough to assume various shapes which may apply to the distribution of the ratings. In order to estimate the parameters of f0 and f1 we pooled together all the weekly ratings of the good shows to get 1544 observations, and all the weekly ratings of the bad shows to get 533 observations. Using the method of moments we estimated f0 to be a beta density function with parameters (8; 39) and f1 as beta with (11; 93). More formally, f0 (y) = y 7 (1 − y)38 /B(8; 39), (6) f1 (y) = y 10 (1 − y)92 /B(11; 93). (7) (Note that indeed f0 stochastically dominates f1 .) The mean ratings of good shows was 0.170 and their standard deviation was 0.055. For the bad shows the mean and standard deviation were 0.105 and 0.030, respectively. In order to apply the optimal rule described in Section 2, one has to estimate k = K/(m0 − m1 ), the “relative investment”. This enables the programming director to use a pre-prepared table of control limits corresponding to different k’s and planning periods, n. Table 1 is an example of such a table. It gives the control limits for 1–25 periods for each of 11 values of k, ranging from k = 0.1 to 5.0. A programming director evaluating a show should compute its Information State, xn , given its ratings history. Then he has to select k’s which represent shows available in stock, or shows which can be bought from external sources. Finally, he has to compare the show’s information state to control limits for these relevant k’s. If he can find a control limit greater than the show’s information state, the show should be changed, otherwise it should be continued. To illustrate this process we give a few examples of simulated and real shows. 3.1. Simulated shows Before presenting numerical simulated examples we wish to comment that we believe POMDP to be an important and useful decision tool. Unfortunately POMDP is less than widely spread; perhaps because it perceived complex by practitioners. We hope this article will prompt decision makers to use it more often. The simulated shows, below, illustrate some favorable properties inherent in the model: it responds very fast to identify shows in the bad state (Case 1); and is not fooled by the initial upward ratings of a bad show (Case 2). Five simulated shows were selected from 200 shows that were simulated with the parameters mentioned above, i.e., q = 0.5, r = 0.98 and Eqs. (6)–(7). In the simulation we assumed a season of 25 weeks. The five simulated shows were 482 M. Givon, A. Grosfeld-Nir / Omega 36 (2008) 477 – 485 Table 1 CLTs for combinations of n and k (Decimal points were omitted so the entries are in percentage terms.) n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Relative investment: k = K/(m0 − m1 ) 0.10 0.50 0.75 1.00 1.25 1.50 1.75 2.00 3.00 4.00 5.00 40 43 44 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 45 0 24 28 31 32 33 33 34 34 34 34 34 34 34 34 34 35 35 35 35 35 35 35 35 35 0 12 22 24 26 28 28 29 29 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 0 0 16 19 22 23 24 25 26 26 26 26 26 26 27 27 27 27 27 27 27 27 27 27 27 0 0 7 16 18 20 22 22 22 23 23 23 24 24 24 24 24 24 24 24 24 24 24 24 24 0 0 0 11 15 16 18 19 20 20 21 21 21 21 21 22 22 22 22 22 22 22 22 22 22 0 0 0 4 12 14 15 17 18 18 19 19 19 19 20 20 20 20 20 20 20 20 20 20 20 0 0 0 0 8 12 13 15 15 16 17 17 17 18 18 18 18 18 18 18 18 18 18 18 18 0 0 0 0 0 0 4 8 9 10 11 12 12 12 13 13 13 13 13 13 14 14 14 14 14 0 0 0 0 0 0 0 0 2 6 7 7 8 9 9 10 10 10 10 10 10 10 11 11 11 0 0 0 0 0 0 0 9 0 0 0 3 5 5 6 7 7 7 8 8 8 8 8 8 8 not selected to represent typical shows. Rather, we selected them to show how the optimal rule performs in different situations. All the shows were considered to be new shows, so the Information State in the beginning of the first period, x25 , is set equal to q = 0.5. For every simulated show we present the number of periods left until the end of the season—n, its ratings—y, and the information state—x. Decimal points were omitted, so the ratings and Information States are in percentage terms. To proceed with the calculation we set, as mentioned above, x25 = q; after obtaining the first observation (rating), y25 , it is necessary to calculate the next information state. As there are 24 weeks remaining, x24 = (x25 , y25 ). If, after ∗ (from Table 1), the optimal action is continue, we obtain the next observation y , and comparing x24 to the CLT x24 24 proceed to calculate x23 = (x24 , y24 ), and so on. Case 1: A bad show n: 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 y: 10 7 8 11 7 11 7 8 8 15 9 7 8 13 7 18 5 13 14 5 13 5 16 11 8 x: 50 22 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 All the ratings of this show were drawn from f1 . This is a typical bad show, and is identified as such very quickly by the information state. Case 2: A bad show with 5 periods of increasing ratings n: 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 y: 8 10 10 11 15 12 11 8 7 16 10 13 18 12 8 12 13 12 11 14 7 16 11 11 9 x: 50 14 4 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 This show was also in the bad state from the start. It reveals an interesting phenomenon. Although its ratings go up for the first five periods, its Information State goes down rather quickly. A programming director observing this upward M. Givon, A. Grosfeld-Nir / Omega 36 (2008) 477 – 485 483 trend may be tempted to hold on to this show and see how it develops. However, use of the optimal rule will cause him to change it very quickly, depending on the relative cost of change, k. Case 3: A good show that turned bad after 10 periods n: 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 y: 14 22 15 17 22 19 10 25 14 13 7 10 12 13 8 7 9 12 10 13 8 14 12 10 5 x: 50 53 97 96 97 98 98 91 98 96 94 66 34 22 18 3 0 0 0 0 0 0 0 0 0 This show has 10 good periods, for which the ratings were simulated from f0 . Then it turned bad, i.e., the ratings for the last 15 periods were drawn from f1 . We can see from its Information State together with the Control Limits of Table 1, that it takes the optimal rule three to six periods to change the show, depending on the relevant k’s. If a relatively large investment is required, i.e., k 5 the optimal rule indicates that the show should not be changed until the end of season. Case 4: A good show that turned bad after 20 periods n: 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 y: 14 13 11 22 27 13 19 15 10 15 15 19 18 26 13 16 11 22 19 14 12 13 11 13 9 x: 50 53 47 25 93 98 96 98 97 87 91 93 97 98 98 96 96 89 98 98 96 91 88 72 66 This show has 20 good periods and only the last five periods were simulated from f1 . With its long history of good performance, the optimal rule does not change it until the end of the season. This show will probably be one of the bad starters of the next season. Case 5: A good show n: 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 y: 14 10 11 15 14 19 23 15 17 18 23 19 12 29 22 12 13 15 19 25 10 16 24 22 22 x: 50 53 24 11 17 20 73 98 97 97 98 98 98 94 98 98 94 91 93 97 98 91 95 98 98 All the ratings of this show were drawn from f0 . It presents the high sensitivity of the Information State to low ratings at the beginning of the season when the Prior is about 0.50. It highlights the possibility of a false alarm when a good show gets off to a slow start. In contrast, low ratings after a long history of good performance do not affect the Information State so drastically and are not likely to cause the optimal rule to produce a false alarm. 3.2. Real shows Finally, we present in Table 2 four real new shows: two flops that were terminated before the end of the season and two that were continued to the next season. The show “Mary” would have been changed by the optimal rule with 16 or 15 weeks to go in the season, with all k’s less than or equal to 3.0, much as actually happened in reality. Both, the optimal rule and actual behavior were too long in bringing about the change. This was due to the high ratings the show got in the first period (Y25 = 0.21), which can be attributed to the good reputation and past performance of Mary Tyler Moore. Compare “Mary’s” results with those of “Foley Square” which would have been changed by the optimal rule three to eight periods earlier than it was, depending on k. Foley Square is an example of a show which was clearly continued too long. One would have to be confronted with very high cost of change (k) to continue this show that long. “Valerie” and “You Again” represent typical “good” but not great shows which were continued despite their ups and downs, as they would have been according to the optimal rule. 4. Discussion and summary We have presented a ratings model and an optimal rule for a continue/change decision about television shows. This optimal rule should be treated as a working tool for programming directors to be used in conjunction with their professional judgment and knowledge. However, like any other model, it is a simplification of reality and should be used with care. 484 M. Givon, A. Grosfeld-Nir / Omega 36 (2008) 477 – 485 Table 2 Ratings and the information state for real shows n 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 Mary y 21 x 50 15 95 14 95 16 94 14 96 12 95 12 89 10 80 11 51 10 28 Show was terminated 10 Foley square y 13 13 x 50 44 15 37 13 51 11 45 12 23 11 14 9 6 10 1 9 0 6 0 8 0 7 0 Valerie y 21 x 50 20 95 15 98 15 97 18 96 19 97 17 98 16 98 16 97 17 97 14 97 14 96 You again y 20 20 x 50 93 15 98 15 97 17 96 18 97 14 98 15 96 14 96 14 95 13 91 12 84 9 8 7 6 5 4 3 2 1 8 0 6 0 Show was terminated 0 13 95 13 91 13 88 12 83 13 72 13 66 13 59 12 53 14 37 15 41 16 54 17 76 15 92 14 84 14 85 13 80 15 86 14 86 13 82 14 83 15 88 14 88 16 93 14 93 17 96 16 97 The dichotomy into “good” and “bad” states of a show can be justified by the dichotomy of the actions space to “continue” and “change”. However, this may be an artificial description of reality. It is quite possible that shows do not switch from one state to the other, but rather deteriorate gradually over time. In this case a method has to be devised to map from the many states, or a continuum, to the two actions. We leave this refinement of the model for future research. We hope that the application of our method will improve decision making enough to justify its mathematical simplicity. An interesting managerial implication of Proposition 2 is that it provides an economical estimate of the upper limits for an investment in a new show. Suppose that indeed q = 0.50 and r = 0.98. Then, for the optimal policy to be non-degenerate, it is required that k 25 (=q/(1 − r)). This also implies an investment should never be with k 25. The information state, even if not used formally to compare with a control limit, provides the director with valuable information: it provides an estimate of the chances that next week’s episode is still in the good state. Such an estimate could be incorporated into subjective decision making, sometimes required in real life scenarios. References [1] TNS Media Intelligence/CMR. U.S. advertising market exhibits strong growth in 2003. Press release, March 8, 2004 http://www.tnsmi-cmr. com/news/2004/030804.html. [2] deKluyver C, Givon M. Characteristics of optimal simulated spot TV advertising schedules. New Zealand Operational Research 1982;10(1): 9–28. [3] Harvey B. Nonresponse in TV meter panels. Journal of Advertising Research 1968;8:24–7. [4] Mayer M. How good are television ratings?. New York: Television Information Office; 1965. [5] Buck S. TV audience research-present and future. Admap 1983;December: p.636–40. [6] Gensch D, Shaman P. Predicting TV ratings. Journal of Advertising Research 1980;August:85–92. [7] Horen J. Scheduling of network television programs. Management Science 1980;April:354–70. [8] Rust R, Alpert M. An audience flow model of television viewing choice. Marketing Science 1984;Spring:113–24. [9] Henry M, Rinne H. Predicting programs shares in new time slots. Journal of Advertising Research 1984;April/May:9–17. [10] Rust R. Advertising media models. New York: Lexington Books; 1986. [12] Henry M, Rinne H. Offensive versus defensive TV programming strategies. Journal of Advertising Research 1984;June/July:45–6. [13] Monahan G. A survey of partially observable Markov decision processes: theory, models and algorithms. Management Science 1982;28(1): 1–16. [14] Astrom K. Optimal control of Markov processes with incomplete state information. Journal of Mathematical Analysis Applications 1965;10: 174–205. [15] Aoki M. Optimal control of partially observable Markovian systems. Journal of Franklin Institute 1965;280:367–86. [16] Bertsekas D. Dynamic programming and stochastic control. New York: Academic Press; 1976. [17] Albright SC. Structural results for partially observable Markov decision processes. Operations Research 1979;27:1041–53. [18] White C. Optimal control-limit strategies for a partially observed replacement problem. International Journal of System Science 1979;10: 321–31. [19] Lovejoy W. Ordered solutions for dynamic programs. Mathematics and Operations Research 1987;12:269–76. M. Givon, A. Grosfeld-Nir / Omega 36 (2008) 477 – 485 485 [20] Grosfeld-Nir A. A two state partially observable Markov decision process with uniformly distributed observations. Operations Research 1996;44:458–63. [21] Sondik EJ. The optimal control of partially observable Markov processes. PhD dissertation, Department of Engineering-Economics Systems, Stanford University, Stanford, CA; 1971. [22] Smallwood R, Sondik E. The optimal control of partially observable Markov processes over a finite horizon. Operations Research 1973;21: 1071–88. [23] Eagle JN. Optimal search for a moving target when the search path is constrained. Operations Research 1984;32:1107–15. [24] White DJ. Real applications of Markov decision processes. Interfaces 1985;15:73–83. [25] Lane DE. A partially observable model of decision making by fishermen. Operations Research 1989;37:2. [26] Treharne JT, Sox CR. Adaptive inventory control for nonstationary demand and partial information. Management Science 2002;48(5):607–24.

Using partially observed Markov processes to select optimal 夡 Moshe Givon

Related documents

Products

Support

Using partially observed Markov processes to select optimal 夡 Moshe Givon

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib