lec-1-17 (z merged)

MCL361: Manufacturing System Design • A more detailed title of the course could be “An Operations Research Perspective on Manufacturing Industry.” • The major focus of this course is on how to handle uncertainty in manufacturing and industrial systems. • Venue and lecture timings: Online; Tues, Wed, Fri: 9 AM – 9.50 AM. • Tentative grade calculation policy: Minor: 30%; Major: 40%; Assignments: 20%; Term Paper: 10% • Textbook: Operations Research: An Introduction, 10th edition. Author: Hamdy A. Taha. • Supplementary textbook: Factory Physics, 3rd edition. Authors: W.J. Hopp and M.L. Spearman. • Planned Topics in Module 1 ❖Inventory systems ❖Materials requirement planning ❖Decision analysis for industrial scenarios ❖Introduction to bandit model ❖Queueing systems Inventory Systems Courtesy: goodfirms.co Courtesy: Wikimedia Commons • Inventory is the goods and materials that a business holds for the ultimate goal of resale. • Inventory management is a discipline primarily concerned with specifying the shape and placement of stocked goods. • Purchasing cost is the price per unit of an inventory item. • Setup cost represents the fixed charge incurred when an order is placed. • Setup cost is fixed regardless of the size of the order requested. • Setup cost can also include the cost associated with receiving a shipment. • Setup cost is also fixed regardless of the size of the shipment received. • Holding cost represents the cost of maintaining inventory in stock. • Holding cost includes • interest on capital • cost of storage • cost of maintenance • cost of handling • cost of obsolescence (meaning item passed its expiry date or a more recent version is available in the market, and therefore cannot be sold) • cost of shrinkage due to fraud or theft • Shortage cost is the penalty incurred when stock is out. • Shortage cost includes • potential loss of income • disruption in production due to rescheduling nominal manufacturing schedules • additional cost of ordering emergency shipments (usually overnight) • subjective cost of loss in customer goodwill which is hard to estimate • These four costs are conflicting because an increase in one may result in the reduction of another • For example, more frequent ordering results in higher setup cost but lower inventory holding cost. • Therefore we seek to minimize the total inventory cost in order to balance these conflicting costs. • The inventory problem reduces to devising an inventory policy that answers two questions 1. How much to order? 2. When to order? • How much to order means we need to determine the size of the order at replenishment time. • When to order i.e. how you decide the replenishment time is more complicated. • An inventory system may be based on • periodic reviews (e.g., ordering at the start of every week or month) • Or continuous reviews, placing a new order whenever the inventory level drops to a specific reorder point. • The solution of the inventory problem also depends on the nature of demand faced by the corresponding supply chain. • Demand in inventory systems can be of four types 1. Deterministic and constant (static) with time. 2. Deterministic and variable (dynamic) with time, for example, seasonality effects. 3. Probabilistic and stationary over time: parameters of probability distribution are known and fixed over time. 4. Probabilistic and nonstationary over time: you cannot pin down the underlying probability distribution • The coefficient of variation (CV) is defined as the ratio of the standard deviation to the mean. • CV measures the relative variation or spread of the data around the mean. • In general, higher values of CV indicate higher uncertainty if you use the mean as an approximation of monthly consumption. • For deterministic demand, CV = 0 because the associated standard deviation is zero. • CV can be used to classify demand into one of the four categories by using the following guidelines. • If the average monthly demand (taken over a number of years) is “approximately” constant and CV is reasonably small (< 20%), then the standard convention is to assume the demand to be deterministic and constant. • If the average monthly demand varies appreciably among the different months but CV remains reasonably small for all months, then the demand may be considered deterministic but variable. • Thus, low CV means deterministic and high CV means probabilistic. • If CV is high (> 20%) but the average monthly demand (taken over a number of years) is “approximately” constant, then the demand is probabilistic and stationary. • The remaining case is the probabilistic nonstationary demand, which occurs when the average monthly demands and coefficients of variation vary appreciably month to month. Static Economic-Order-Quantity (EOQ) Models • Classical EOQ model is the simplest of the inventory models. • Production is instantaneous. • There is no capacity constraint, and the entire lot is produced simultaneously. Thus, we assume no shortage so that shortage cost does not have to be accounted for in the total inventory cost formula. • Delivery is immediate. • There is no time lag between production and availability to satisfy demand. • Demand is deterministic. • There is no uncertainty about the quantity or timing of demand. • Demand is constant over time, hence called static EOQ. • Demand can be represented as a straight line, so that if annual demand is 365 units, this translates to a daily demand of one unit. • A production run incurs a fixed setup cost. • Regardless of the size of the lot or the status of the factory, the setup cost is the same. • Products can be analyzed individually. • Either there is only a single product to be analyzed or there are no interactions (e.g., shared equipment) between products and so their analysis can be decoupled into independent problems of single product. • The inventory stock is depleted uniformly at a constant demand rate, D. • The important characteristic of the classical EOQ model is that when the inventory reaches zero level, an order of size 𝑦 units is delivered to the production facility instantaneously. • There are two cost parameters associated with the classical EOQ model. • 𝐾 = Setup cost associated with the placement of an order (INR per order) • ℎ = Holding cost (INR per inventory unit per unit time) [ ] • We can violate the second assumption of the classical EOQ model because a new order need not be received at the instant it is ordered. • Introducing delivery delays is straightforward if delivery times are known and fixed. • Let us say that a positive lead time L occurs between the placement and the receipt of an order. • So if we assume that the lead time is less than the cycle length, then the reorder point is modified to occur whenever the inventory level drops to LD units. EOQ with Price breaks • This is also a type of Static EOQ model. • The inventory item can be purchased at a discount if the size of the order 𝑦 exceeds a given limit 𝑞. • Let 𝑐 denote the unit purchasing price. • Note that in previous model, we did not consider purchasing cost. • But in the discounting scenario, it must be accounted for in the total inventory cost formula because it clearly depends on 𝑦. • Verify that this dependence on 𝑦 does indeed disappear for the classical case. • This is because just like the classical EOQ model, every 𝑡0 units of time we order 𝑦 units which costs us 𝑐𝑦. • Because the two functions differ only by a constant amount, their minima will still coincide at 𝑦𝑚 just like in the classical case. • However, there is a clever trick to obtain the correct answer. • To see this, we must first determine the value of 𝑄 > 𝑦𝑚 as follows (whose significance will be explained later) • There will be three possible cases depending on the actual value of 𝑞. • If 𝑞 < 𝑦𝑚 , then we see that the dashed curves will be ignored. • Considering only the solid curves, the minimum point is obvious. • All the three possible cases are summarized below. Multi-Item EOQ with Storage Limitation • This is also a type of Static EOQ model. • This model deals with multiple items whose individual inventory fluctuations are exactly the same as the classical EOQ model. • The only difference is that the items compete for a limited storage space. Material Requirements Planning (MRP) • Any demand that originates outside the system is called independent demand. • This includes all demand for final products and possibly some demand for components (e.g., when they are sold as replacement parts). • Dependent demand is demand for components that make up the independent demand products. • MRP operates at the interface between independent and dependent demand. • MRP is therefore called a push system since it computes schedules of what should be started (or pushed) into production based on demand. • This is in contrast to pull systems such as Toyota’s kanban system, that authorize production as and when inventory is consumed. • Assume Part A has been assigned a fixed order period (FOP) of 2 weeks. • FOP implies that the firm places an order with the supplier for the supply of Part A at fixed time intervals. • FOP helps generate the planned order receipts by using net requirements data. • Also assume Part A has a lead time of 2 weeks. • Lead time is the amount of time that passes between the placement of an order and its delivery. • Lead time helps generate the planned order releases by using planned order receipts data. • We review the previous concepts by solving a more involved problem shown above. • Above is the master production schedule for part B. • The master production schedule is the source of demand for the MRP system. Dynamic Lot Sizing • Dynamic Lot Sizing is different from the static EOQ models studied so far. • The demand per period, though deterministic, is dynamic, in that it varies from one period to the next, thus we violate the assumption of constant demand. • The main historical approach to relaxing the constant-demand assumption is the Wagner-Whitin model. • The Wagner-Whitin model considers the problem of determining production lot sizes when demand is deterministic but time-varying. • All the other assumptions for the EOQ model are valid for the Wagner-Whitin model. • When demand varies over time, a continuous time model like the EOQ model which treats time as a continuous real line is infeasible. • So we clump demand into discrete periods, which could correspond to days, weeks, or months, depending on the system. • This gives rise to tabular data. • A daily production schedule might make more sense for a high-volume system with rapidly changing demand. • A monthly schedule may be more adequate for a low-volume system with demand that changes more slowly. INR INR INR • For simplicity, assume that setup costs 𝑨𝒕 , production costs 𝒄𝒕 , and holding costs 𝒉𝒕 are all constant over time, although this is not necessary for the Wagner-Whitin model. • Problem is to satisfy all demands at minimal cost, i.e., production plus setup plus holding cost. • The only available controls available to solve this problem are the production quantities 𝑄𝑡 . • However, since all demands must be filled, only the timing of production is open to choice, not the total production quantity. • Hence if unit production cost is constant (that is, 𝑐𝑡 does not vary with 𝑡), then production cost will be the same for all possible timings of production and therefore production cost can be ignored. • Wagner–Whitin Property: Under an optimal lot-sizing policy, either the inventory carried to week 𝑡 + 1 from a previous week will be zero or the production quantity in week 𝑡 + 1 will be zero. • Why? Because either it is cheaper to produce all of week 𝑡 + 1’s demand in week 𝑡, or all of it in 𝑡 + 1; it is never cheaper to produce some in each. • If we produce items in week 𝑡 (and incur a setup cost) to satisfy demand in week 𝑡 + 1, then it cannot possibly be economical to produce in week 𝑡 + 1 (and incur another setup cost). • The Wagner-Whitin property implies that either 𝑄𝑡 = 0 or 𝑄𝑡 will be exactly enough to satisfy demand in the current week plus some integer number of future weeks. • The Wagner–Whitin algorithm starts with week 1 and finishes with week N. • By the Wagner–Whitin property, we know that we will produce in a week only if the inventory carried to that week is zero. • For instance, in a 6-week problem, there are six possibilities for the amount we can produce in week 1, namely, 𝐷1 , 𝐷1 + 𝐷2 , 𝐷1 + 𝐷2 + 𝐷3 , … , 𝐷1 + 𝐷2 + 𝐷3 + 𝐷4 + 𝐷5 + 𝐷6 . • If we choose to produce 𝐷1 + 𝐷2 , then inventory will run out in week 3 and so we will have to produce again in week 3. Planning Horizon Property • Until step 4, we had to consider producing for week 4 in all weeks 1 through 4. But this is not always necessary. • In the 4-week problem we saw that it is optimal to produce in week 4 for week 4. • So let us now ask: Is it cheaper to produce for week 5 in week 3 than in week 4? • If we produce in week 3 or 4, then the produced items in weeks 3 and 4 must be held in inventory up to week 5. • In both cases, the carrying cost from week 4 to week 5 will be same. • So we only need to ask: is it cheaper to set up in week 3 and carry inventory from week 3 to week 4 than it is to set up in week 4? • But we already know the answer to this question from step 4: It is cheaper to set up in week 4. • Therefore, it is unnecessary to consider producing in weeks 1, 2, and 3 for the demand in week 5. • We need to consider only weeks 4 and 5. • The blank spaces in the upper right-hand corner of this table are due to planning horizon property. • Without using this property, entire upper triangular part of the table will get filled. • How to interpret the above table to solve the original dynamic lot sizing problem? • The minimum total setup plus inventory carrying cost is read from last column of the penultimate row 𝑍10 = 580. • The optimal lot sizes are determined from the 𝑗𝑡∗ values. • Since 𝑗𝑡∗ represents the last week of production in a 𝑡-week problem, it is optimal to produce enough to cover the demand from week 𝑗𝑡∗ through week 𝑡. ∗ • Therefore, since 𝑗10 = 8, it is optimal to produce for weeks 8, 9, and 10 in week 8. • Doing this leaves us with a 7-week problem. • Since 𝑗7∗ = 4, it is optimal to produce for weeks 4, 5, 6, and 7 in week 4. • This leaves us with a 3-week problem. • Since 𝑗3∗ = 1, we should produce for weeks 1, 2, and 3 in week 1. Continuous Review Probabilistic Inventory Models “Probabilitized” EOQ Model • The critical period during inventory cycle occurs between placing and receiving orders because shortage can occur then. • Probabilitized EOQ Model seeks to maintain a constant buffer stock that will put a cap on the probability of shortage. • Larger buffer stock results in lower shortage probability and vice versa. • Assume that the demand per unit time is normally distributed with mean D and standard deviation 𝜎. • Let 𝑥𝐿 denote the demand during lead time L. • Then 𝑥𝐿 is also normally distributed with mean 𝜇𝐿 and standard deviation 𝜎𝐿 . • The size of buffer B is determined by demanding the probability of shortage during lead time L is at most 𝛼. • We can define a new random variable Z = standard deviation 1. 𝑥𝐿 −𝜇𝐿 𝜎𝐿 which is clearly normally distributed with mean 0 and • We ask a simple question based on the left plot: For what value of z, the probability of 𝑧 being greater than that value is exactly equal to 𝛼? • Answer is 𝐾𝛼 . • There are standard tables that compute 𝐾𝛼 . • For values of 𝑧 greater than 𝐾𝛼 , the area under the curve only shrinks further. Probabilistic EOQ Model • The inventory policy is to order the quantity 𝑦 whenever the amount of inventory on hand drops to level R. • Reorder level R is a function of the lead time between placing and receiving an order. • Optimal values of 𝑦 and R are determined by minimizing the expected sum of setup, holding, and shortage costs per unit time. • The optimal values of 𝑦 ∗ and 𝑅∗ cannot be determined in closed form. • We use an iterative algorithm developed by Hadley and Whitin which is given below. ℎ We want 𝑦 ∗ to be as small as possible but incur no shortage. Setting 𝑆 = 0, we get our initial solution NEWSVE NDOR Classical mednde PROBLEM mdel L peuhable item y hastic dehndnd cletrnmii'shic ots Lonsidera a t a kgnizon s h y Shgle bne 9 mininze the am a xpe cted hotig l Atage t cts. AreAe e uebh e t ika tdoied de L Zt_te placed be dete y alhiny neks I ved mozlel, anyp tabadd do be *-I SirgleaArd a2t p0 dte pnalty Cotph rou4vvbla vohusblaX _dete e stag demanel pdt a . Pbas [b]=£a) dat e f cltnte t L Lt F date cdf_ Fla)=Prbu sa Cane 1 X: u) d No etup t htal Pxpecte Pru ctiom hdig nd hataga wts expected Zt a ( irrdu ca iO tundand nAatona SO O eyse mih u o)a u iilagda check Ht 0 ma (u) Azala 2 eyny t may4,u, o ) beck h t u = E-M-C. (epeted krldin st)= El-x) E.S-C lxheted Shataggtat1) E-(z-x) T TE.PC. [tuh abe delpducio t ) ; ESc +EHC we eek t inimi ze mi min ESC+ EPc) EHC -Py-)+Aly-1) Fo eementary mtabillty eoy Fla plan)dn B-berah fnnhionel r-v. X o, a igisal i n h d +0 iteralKon2 fn)dn Dince te pdf at ddon veducea -Ply-X)f)dn hl-x) f{n)dat mh The Hhe pln Stabdael th ohjedie pn dio h-1yT (-n)fh)dn an a tnchon ige wil wYt Sip dftahiatiate and 4 t N t e hat y aLotant wt 2t integvals Le bjedie fuh utin Pfla) da +p afla)da th 2 ftm)dn -h/f» Jfan) da pfafa)dn + kHn)ax-h [nft J Nowe call Lej bhiz s Yue _diffsediatn undoh ARra sign elenenter Callulua Z ane bt td+ T dz d al- Cp - Z ard 2 T ) Ji: a() al) t =al. Qh Cotvibufinn s Lonpute pri t o vahsheL Nou we Lna = 0 c ucE --H)dn + k (-)flb) dn Dfferechating ive =-12-n)f)] +hl-£m - 2 - ) f l o )d n +h n (fn)da + [fm)dzn Fl2) -GHn)da ,f Fly= b+h f Flz) =o ECZ)+h Flz)=o_ - E +h Lhawrian ake knouh, i s ndion is tasy Conpte d'tyibdo,Tken -M +h Johe Slddes ad tuoley thot owtpt invtrne MtO ph Lans 2: Witatup Lt i t Kdste Vai able tha ed t p Ctanel setupcat prrdueig aingle2 s h a b l e te 1EPC=Ek+ ky K atErc k + ESC+ EHC t Ep ( -x)|+ EA(y-x)*1 Ky thFl2) -1-Fa)) =o dy instead mibimzirg ts, Can haninze_prdits t_f20ddee erehuepduce t _peishable tem ttal ikn b ttul Y nue m' o rution ent(adadirg sAtage ct) ETi(X~=Kp - K Manuinizg e bok e g a i v , and w -hl-X)" Sah t r hE Ymih X-X,y-X)+ Ke + Ky +hly=X + Kp + kK +h-X hl-x) Nom pohamiNee(endd The4 we Amands_ An Apleheas d ey inahhed ove id n tie teps Thua, w_ootawh e kf s.t. S 7 Tollewning LP 4mulainn + ky +pS; t ht:) i -(-Xi) I, 2, t7 (2-Xi) w tis aRe 2ehse he sal dd S =9-X;_ad -X max s7o pruely > 0 Multi-Agent Decision-Making under Uncertainty • Game theory is widely considered as a misnomer to describe the field of “Multi-Agent Decision-Making under Uncertainty.” • Scenarios involving multiple agents where each makes decisions under uncertainty are modeled using “games.” • A game is a collection of the following • In non-cooperative games, each player is focused on their own interests. • In non-cooperative games, players may be in competition with each other, or their interests may align, but there are no binding agreements made between players regarding their actions or choices. • They may make their choices simultaneously or make choices in sequence. • In cooperative games, players arrive at a binding agreement regarding their actions. • In these games, players are generally in full communication with each other and have mechanisms to assure implementation of any agreements made. • The common goal of a cooperative game is to find a socially optimal outcome: one that collectively optimizes the outcomes for individual players. • We will not be able to cover this topic despite its heavy application and rampant usage in real-world scenarios. • Utility functions provide a mechanism for modeling player choices among the different outcomes in a game. The two main classes of utility functions are ordinal and von Neumann-Morgenstern utilities. • We shall again not be able to study the theory behind these utility functions in detail. However, it must be remembered that often constructing such utilities is the most crucial part of such decision-making problems. • A strategic game consists of the following • The combined strategies of all players is called the strategy profile. • A strategy profile determines the outcome which in turn determines the players utility. • Strategic games are non-cooperative since there are no binding agreements for how players must act. Prisoner’s Dilemma • Two suspects are taken into custody by the police for a crime. They are put into separate rooms for interrogation. • Evidence against them is slim, so the police need one or both to confess to the crime. • Therefore, each suspect is informed that if they confess to the crime and implicate other as the principal culprit, then the confessor will get the lightest possible sentence and the principal culprit will get the heaviest possible sentence. • They both will be convicted of a misdemeanor crime if neither confesses. • If both confess, they each get moderate sentences as the police cannot identify the principal culprit. • We will call the two players in this strategic game Row and Column as per game-theoretic conventions. • They each have two strategies available to them, Quiet and Confess. • Assume that each suspect is primarily concerned about their own sentence and wants to minimize it. • Utilities of each player are generally referred to as payoffs in game-theoretic texts. • We use the utility function denoted by 6 minus the number of years to be spent in prison, which serves as a proxy for how many years they could avoid spending in prison. • Table below lists each of the strategy profiles in the form (R, C) and the resulting outcome. Strategy Profile Outcome (Quiet, Quiet) Each suspect receives a three-year sentence. (Quiet, Confess) R receives a six-year sentence and C one-year. (Confess, Quiet) R receives a one-year sentence and C six-year. (Confess, Confess) Each suspect receives a five-year sentence. • Table below provides payoffs for each player. Strategy Profile R’s Payoff C’s Payoff (Quiet, Quiet) 3 3 (Quiet, Confess) 0 5 (Confess, Quiet) 5 0 (Confess, Confess) 1 1 • Define 𝑠−𝑖 = (𝑠1 , … , 𝑠𝑖−1 , 𝑠𝑖+1 , … , 𝑠𝑛 ) to be the strategy profile 𝑠 with player 𝑖’s strategy removed. • Define 𝑢𝑖 (𝑡𝑖 , 𝑠−𝑖 ) to be the value of the utility function to player 𝑖 of the strategy profile with 𝑠𝑖 removed and replaced by 𝑡𝑖 . • Definition. Player 𝑖’s strategy 𝑠𝑖 is a best response to the profile 𝑠−𝑖 of other player strategies if 𝑢𝑖 (𝑠𝑖 , 𝑠−𝑖 ) ≥ 𝑢𝑖 (𝑡𝑖 , 𝑠−𝑖 ) for all other strategies 𝑡𝑖 ∈ 𝑆𝑖 where 𝑆𝑖 denotes all the strategies of player 𝑖. • Confess is the best response strategy for R if C’s strategy is set to Quiet. • Confess is also a best response for R if C’s strategy is set to Confess. • Definition. Player 𝑖’s strategy 𝑠𝑖 strongly dominates player 𝑖’s strategy 𝑡𝑖 if 𝑢𝑖 (𝑠𝑖 , 𝑠−𝑖 ) > 𝑢𝑖 (𝑡𝑖 , 𝑠−𝑖 ) for all strategy profiles 𝑠−𝑖 ∈ 𝑆−𝑖 = 𝑆1 × 𝑆2 × ⋯ × 𝑆𝑖−1 × 𝑆𝑖+1 × 𝑆𝑛 available to the remaining players. • Definition. Player 𝑖’s strategy 𝑠𝑖 dominates player 𝑖’s strategy 𝑡𝑖 if 𝑢𝑖 (𝑠𝑖 , 𝑠−𝑖 ) ≥ 𝑢𝑖 (𝑡𝑖 , 𝑠−𝑖 ) for all strategy profiles 𝑠−𝑖 ∈ 𝑆−𝑖 , and strict inequality holds for at least one strategy profile 𝑠−𝑖 ∈ 𝑆−𝑖 . • The strategy Confess strongly dominates the strategy Quiet for R because R’s best response is the same regardless of C’s choice of strategy. • Definition. A strategy is strongly dominant [resp., dominant] for player 𝑖 if it strongly dominates [resp., dominates] all other strategies for player 𝑖. • When strategy 𝑠𝑖 strongly dominates strategy 𝑡𝑖 , player 𝑖 should select strategy 𝑠𝑖 over strategy 𝑡𝑖 unless the strategy selections by the other players result in player 𝑖 obtaining the same utility for either. • Confess strongly dominates Quiet for R and by symmetry, Confess also strongly dominates Quiet for C. • Thus, both players select Confess resulting in payoffs of 1 for each player, which we denote with the payoff pair (1,1). • Knowing that this was the thinking of the other suspect, neither regrets or second guesses their own decision to confess. • Such a regret-free strategy profile is known as a Nash equilibrium. • Definition. A strategy profile 𝑠 is a Nash equilibrium if 𝑢𝑗 (𝑠) ≥ 𝑢𝑗 (𝑡𝑗 , 𝑠−𝑗 ) for all players j ∈ 𝑁 and all strategies 𝑡𝑗 ∈ 𝑆𝑗 available to that player. • That is, 𝑠 is a Nash equilibrium if, given what the other players have chosen to do, 𝑠−𝑗 , each player 𝑗 cannot unilaterally improve their payoff by replacing their current strategy, 𝑠𝑗 , with a new strategy, 𝑡𝑗 . • Thus no player has regrets about their strategy selection in a Nash equilibrium. • A solution concept is simply a formal rule for predicting how a game will be played. • We introduced two solution concepts above, viz. dominance and Nash equilibria. • Now we look at a third solution concept. • A player might decide to think prudentially and so chooses a strategy by looking at the worst thing that can happen with each strategy choice, and then choosing the strategy that makes the worst case as “least bad” as possible. • So the player would choose a strategy that maximizes their minimum payoff with respect to the strategy choices of other players. • For example, if R chooses Quiet, the worst that can happen is a payoff of 0 when C chooses Confess. • On the other hand, R’s worst payoff if she chooses Confess is 1. • This suggests that R should choose Confess if she is strategically risk adverse (= prudential). • Definition. Player 𝑖’s strategy 𝑠𝑖 is prudential if • The value of 𝑢𝑖 (𝑠𝑖 , 𝑠−𝑖 ) is called the security level for player 𝑖. • Confess is the unique dominant and unique prudential strategy for each player. Verify! • Although (Confess, Confess) is the unique Nash equilibrium, the two players in the Prisoner’s Dilemma strategy game would be better off if they both chose Quiet, resulting in the payoff pair (3,3) instead of the payoff pair (1,1). • Thus, the three solution methods do not always yield the best overall payoff for each player. • Therefore, we introduce the fourth solution concept. • Definition. A strategy profile 𝑠, and its associated outcome 𝑜, are efficient if there does not exist a strategy profile t ∈ 𝑆 such that 𝑢𝑗 (𝑡) ≥ 𝑢𝑗 (𝑠) for all players 𝑗, with at least one of the inequalities being strict. • So a strategy profile is efficient if we cannot find another strategy profile that at least maintains the utility for all players, while strictly improving the utility for at least one player. • For the Prisoner’s Dilemma strategic game, the strategy profile (Confess, Confess) is not efficient because both players obtain larger utilities with (Quiet, Quiet). • Each of the other three strategy profiles are efficient because it is impossible to make a change without reducing at least one of the player’s payoffs. • As in the Prisoner’s Dilemma strategic game, players may not have an incentive to choose a strategy that is part of an efficient strategy profile. • However, there is no such dilemma when Nash equilibria are also efficient. • In the Prisoner’s Dilemma strategic game, there is a tension between 1. choosing the dominant strategy (Confess), which will always yield a higher payoff regardless of the other player’s choice, and 2. knowing that a better outcome might be possible if both players choose their dominated strategy (Quiet). • This tension is what puts the dilemma into the Prisoner’s Dilemma: each player selecting the logical, rational strategy does not lead to an efficient outcome! • One way to resolve this dilemma would be for the players to enter into a binding agreement to stay quiet. • But if we introduce binding agreements, then we no longer have a strategic game and Nash equilibrium solution concept does not apply. Office Scenario • Suppose there is a shared coffee pot in an office and the employees voluntarily contribute to a pool of money to replenish the supplies. • Each employee who drinks the coffee must decide whether or not to contribute to the pool. • Player strategies are Contribute or Not Contribute. • Not Contribute is the strongly dominant strategy because it helps the players save maximum money, hence has maximum payoff compared to any other strategy. • But if everyone selects Not Contribute then there are no funds to buy coffee. • So this is another example of Prisoner’s dilemma scenario. • A multi-agent decision-making scenario is said to be a prisoner’s dilemma scenario if ❖it can be modeled as a strategic game ❖there is a single strategy for each player that strongly dominates all of that player’s other strategies ❖but all players would receive a higher payoff if they choose a specific dominated, rather than the dominant, strategy. • Since the mutual benefit result requires all players to cooperate by choosing the dominated strategy, it is often called the Cooperate strategy. • Since there is always an incentive for any individual player to switch their choice to the dominant strategy, the dominant strategy is often called the Defect strategy. • In the original prisoner’s dilemma scenario, Quiet is the Cooperate strategy. • In the Office Coffee scenario, Contribute is the Cooperate strategy. • In the original prisoner’s dilemma scenario, Confess is the Defect strategy. • In the Office Coffee scenario, Not Contribute is the Defect strategy. • Assume a prisoner’s dilemma scenario involving exactly two players and each player has exactly two strategies such that their payoffs are as described below • The payoff to player 𝑖 for cooperating when the other player is defecting is 𝑆𝑖 . • The payoff when both players defect is 𝑃𝑖 . • The payoff when both cooperate is 𝑅𝑖 . • The payoff to entice player 𝑖 to defect is 𝑇𝑖 . • For the payoffs of the players we have the relationship as stated below. • However, there need not be any relationship between the two sequences of payoffs for the two players. • Given this ordering of the payoffs, let us verify that Defect is the strongly dominant strategy for each player. • To do this it is easier to write the payoffs in terms of a matrix. Player 2 Cooperate Defect Player 1 Cooperate (𝑅1 , 𝑅2 ) (𝑆1 , 𝑇2 ) Defect (𝑇1 , 𝑆2 ) (𝑃1 , 𝑃2 ) • Now let us fix a player, say 1. Now we have to show that Defect is better than Cooperate. • So we have to show this for all possible strategy combinations for other players. • Luckily, here we only have one more player to consider. • If player 2 is assigned Cooperate, then 𝑇1 > 𝑅1 which implies that Defect is indeed better than Cooperate for player 1. • If player 2 is assigned Defect, then 𝑃1 > 𝑆1 which implies that Defect is indeed better than Cooperate for player 1. • We repeat this same exercise for player 2, and then establish that “Defect is the strongly dominant strategy for each player.” • We will also show that Defect is the prudential strategy for each player. • Now let us fix a player, say 1. • Pick a strategy for it, say Defect. Fixing these two choices what is the minimum utility that player 1 can achieve? It is 𝑃1 . • Pick another strategy for player 1, viz. Cooperate. What is the minimum utility that player 1 can now achieve? It is 𝑆1 . • Since 𝑃1 > 𝑆1 , Defect is the prudential strategy for player 1. Similar reasoning holds for player 2. • The strategy profile (Defect, Defect) is the unique Nash equilibrium because no player would want to switch as long as the other player keeps his strategy fixed. • But observe that each player would be better off if the strategy profile (Cooperate, Cooperate) were chosen instead of the strategy profile (Defect, Defect). • Thus, Nash equilibrium is not the final word on the most intelligent strategy to adopt. • (Defect, Defect) is the only strategy profile that is not efficient. • All other strategy profiles can be easily verified to be efficient. Economic Mergers • Consider a continent made up of three independent nations Alpha, Beta and Gamma whose leaders are making the decision to merge for economic benefit or remain independent. • Assume here that a driving factor that leads to countries merging is the economic benefit. • A compelling reason to remain separated is to maintain independent decision making. • We model each nation as a single player. • Note that nations do not make decisions like a single unified entity but via the brainstorming between several institutions and policymakers. • For the sake of simplicity, we restrict ourselves to a simpler model to handle. • This is called a fidelity problem. Should we model a system at coarse or fine fidelity? • It depends on how much time you have for analysis and how much skilled is your team of modelers and analysts. • Thus, the players in this game are the three nations Alpha, Beta and Gamma, each with two strategy choices (Merge or Remain). • Assume that the nations’ leaders meet to peacefully decide on the economic merger. • Assume that the best outcome economically is for all three groups to merge. • But if they merge, they will each have to give up some independence. • The next best option economically is for all three to choose independence, since the mutually agreed upon independence would allow for some level of economic cooperation among the independent nations. • If two nations elect to remain independent and the third to merge, then the three nations continue to operate separately without any economic agreements benefiting any of them. • If two groups elect to merge and third to remain independent, assume that the merger cannot happen, but there is now a price to pay by the group that chose independence in the form of some economic hardship. • There are many ways to assign utilities and following is one way to do so. • Alpha strongly prioritizes independence over economic benefits achieved through merging. • Beta is indifferent between independence and merging for economic benefit. • So given a choice between the two, Beta will do a coin toss or something purely random to decide. • Gamma strongly prefers the economic gain over the benefits of independent decision making. • All three would prefer to avoid the situation that involves economic hardship. • There are total 8 strategy profiles which will be ranked in some sense by prescribing utilities for each player. • Gamma most prefers (Merge, Merge, Merge) because they value economic gain over independence, and so we assign the highest payoff of 8. • The next highest outcomes for Gamma are the tense situations in which Alpha and Beta lose the most because Gamma then has some economic gain. • But Gamma is indifferent between these two outcomes, and assigns average of 7 and 6 to both. • Each least prefers a tense situation in which they lose the most so Gamma assigns the least payoff of 1 to (Merge, Merge, Remain). • Next best strategy profile for Gamma after (Merge, Remain, Merge) and (Remain, Merge, Merge) will be (Remain, Remain, Remain) because at least there is some economic gain there and Gamma is strongly inclined towards making economic gains. • The remaining 3 strategy profiles are essentially the same outcome-wise but Gamma will prefer that strategy profile where he chooses to Merge because that was his goal all throughout. • So Gamma assigns a payoff of 4 to (Remain, Remain, Merge). • Gamma will remain indifferent between (Merge, Remain, Remain) and (Remain, Merge, Remain) so both get average of 3 and 2 as their payoff. • Alpha will most prefer (Remain, Remain, Remain) and so assigns it the highest payoff of 8. • Alpha’s next best preferences are where one of the other two chooses to Merge and the other choose to Remain. Then Alpha can vote Remain, thus retaining independence with no economic loss. • So Alpha will be indifferent between (Remain, Merge, Remain) and (Remain, Remain, Merge), and assigns average of 7 and 6 to both. • The only other strategy profile which allows Alpha to retain independence is (Merge, Remain, Remain) and so gets a payoff of 5. • (Remain, Merge, Merge) is the least preferred strategy profile for Alpha for obvious reasons and so gets the least payoff of 1. • Among the 3 remaining strategy profiles, (Merge, Merge, Merge) is the best because it only results in some loss of independence and avoids economic instability, so it gets a payoff of 4. • Alpha is indifferent between the last 2 remaining strategy profiles and gives both (Merge, Merge, Remain) and (Merge, Remain, Merge) average payoff of 3 and 2. • Beta will least prefer (Merge, Remain, Merge) and so assigns it the least payoff of 1. • The only strategy profiles that yield Beta a gain are (Merge, Merge, Merge) and (Remain, Remain, Remain). • But Beta is indifferent between the two based on the assumption about Beta’s behavior. • So Beta assigns them average payoff of 8 and 7. • Now we made an assumption that everyone dislikes economic instability. • So Beta’s next favorite strategy profiles are clearly (Merge, Remain, Remain), (Remain, Merge, Remain), and (Remain, Remain, Merge). • However, Beta would slightly dislike (Remain, Merge, Remain) because everyone is going against Beta’s choice. • So Beta prefers (Merge, Remain, Remain) and (Remain, Remain, Merge). • But it will be indifferent between these two and so assigns them the average payoff of 6 and 5. • Then (Remain, Merge, Remain) gets a payoff of 4. • Among the last two remaining strategy profiles, Beta again remains indifferent and assigns both of them an average payoff of 3 and 2. • This completes the assignment of payoffs of each player to different outcomes of the strategic game. • Throughout the table, payoffs that correspond to best response strategies are boxed. • Fix any player. Let us choose Alpha. • Now what are the strategies that Beta and Gamma may adopt against Alpha? Simple combinatorics shows 4 possibilities: (Beta: Merge, Gamma: Remain), (Merge, Merge), (Remain, Remain), (Remain, Merge). • Suppose both Beta and Gamma choose the strategy Merge. • If Alpha chooses the strategy Merge he gets a payoff of 4. • If Alpha chooses the strategy Remain he gets a payoff of 1. • Thus, Merge is a best response by Alpha to Beta and Gamma both choosing Merge. • This is shown by boxing 4 in the first row of Alpha’s column and not boxing 1 in the fourth row of Alpha’s column. • Using similar reasoning, it is straightforward to work out all the logic behind all the 12 boxed payoffs. • A strategy profile in which all three strategies are best responses is a Nash equilibrium. • This is because for a strategy profile to be a Nash equilibrium, all players have to prefer sticking to their designated strategy as long as all other players are sticking to their designated strategies. • Then that row which has all of its payoffs boxed represents a Nash equilibrium. • Thus, there are two Nash equilibria: all three groups choose Merge or all three groups choose Remain. • Merge is the unique best response for Alpha when both Beta and Gamma choose Merge. • Remain is the unique best response for Alpha when both Beta and Gamma choose Remain. • For a strategy to be dominant, it should dominate all other strategies for that player. • But based on the above observation, it is obvious that Alpha does not have a dominant strategy. • We can show by similar reasoning that neither Beta nor Gamma has a dominant strategy. • Let us now turn to the prudential solution method. • If Alpha chooses Merge, its worst payoff is 2.5, which occurs when exactly one of Beta or Gamma also selects Merge. • If Alpha chooses Remain, its worst payoff is 1, which occurs when both Beta and Gamma select Merge. • Since 2.5 > 1, Alpha’s prudential strategy is Merge. • If Beta chooses Merge, its worst payoff is 2.5. • If Beta chooses Remain, its worst payoff is 1. Since 2.5 > 1, Beta’s prudential strategy is Merge. • If Gamma chooses Merge, its worst payoff is 4 . • If Gamma chooses Remain, its worst payoff is 1. Since 4 > 1, Gamma’s prudential strategy is Merge. • So each player’s prudential strategy is Merge. • Thus, (Merge, Merge, Merge) represents the strategy profile corresponding to prudential strategies. • (Merge, Merge, Merge) is also efficient because any outcome change will result in Gamma receiving a smaller payoff as it already has its maximum payoff (8) for this strategy profile and we know that for a strategy profile to be efficient, there must be no other strategy profile that can provide a larger payoff to Gamma or any other player. • The Nash equilibrium (Remain, Remain, Remain) is also efficient because any outcome change will result in Alpha receiving a smaller payoff. • None of the remaining strategy profiles is efficient because either (Merge, Merge, Merge) or (Remain, Remain, Remain) is at least as good for each player and strictly better for at least one player. This is easy to verify. • In summary o no strategy is dominant o no strategy is dominated o (Merge, Merge, Merge) is the profile of prudential strategies o (Merge, Merge, Merge) and (Remain, Remain, Remain) are both the Nash equilibria and the efficient strategy profiles • These 4 solution concepts predict the three nations will mutually choose to Merge or mutually choose to Remain. SEQUENTIAL GAMES • Sequential games model scenarios in which the players act sequentially, usually taking turns according to some rule. • As with strategic games, each player has complete information about each other’s strategies and preferences. • A play of a sequential game consists of a sequence of actions taken by the players. • The game begins with an empty sequence, known as the empty history and ends with a sequence called a terminal history. • Terminal histories determine the outcomes of the game, just like strategy profiles determine outcomes in strategic games. • Sub-sequences of play that are obtained along the way are known as non-terminal histories. • At every non-terminal history, the assigned player must choose an action. • To describe a strategy for a player in a sequential game, we must identify an action for every possible nonterminal history to which the player was assigned. • This differs from strategies in strategic games which simply identified the single action each player would take. A sequential game consists of the following Media Streaming Industry • A company called PlumProducts is about to launch a media-streaming device for modern TVs and is deciding how aggressive to be with its marketing options. • Serious aggressive marketing costs more but is more likely to saturate the market quickly. • A competing company called SeeTV is working on a clone of PlumProducts’ device which will require one year to bring to market. • SeeTV must decide whether to continue this research in order to enter the market. • However, SeeTV can clone this product better and more cheaply if it can recruit some of PlumProducts’ engineers to work for SeeTV. • Knowing this, PlumProducts can choose to include restraining clauses in its engineers’ contracts barring them from working for competitors if they leave PlumProducts. • However, these clauses then require PlumProducts to pay their engineers higher salaries in exchange. • In this scenario, we consider PlumProducts and SeeTV to be the players. • PlumProducts can choose to include or not include restraining clauses in its engineers’ contracts (Restrain or Open). • In either case, SeeTV will later choose to enter or not enter the market with its own product (In or Out). • Finally, if SeeTV chooses to enter the market, PlumProducts will choose to be Aggressive or Passive with its marketing. • This chronology determines the possible terminal histories outlined below. • The sequential nature of the game is nicely captured in the game tree below • In the game tree, the shaded circles, called nodes, correspond to non-terminal histories. • For example, the node labeled “D5” corresponds to the non-terminal history “Open, In.” • The label above the D5 node indicates that it is PlumProducts’ turn to act. • PlumProducts can choose to either be Aggressive or Passive, as indicated by the labels on the line segments, called edges, emanating from the D5 node to the right. • In the former case, the terminal history is “Open, In, Aggressive.” • In the latter case, the terminal history is “Open, In, Passive.” • The table below lists non-terminal histories and the player who acts at each subhistory. • Assume that the market will support sales sufficient to generate 200 crores in net revenue. • Restraining clauses will cost PlumProducts an additional 40 crores. • Aggressive marketing costs PlumProducts an additional 25 crores. • With the Aggressive option, PlumProducts obtains a 65% market share with the restraining clause and a 55% share without it. • When choosing the Passive option, PlumProducts only obtains a 50% market share. • It costs SeeTV 40 crores to develop the product if it can hire PlumProducts’ engineers and 80 crores if it cannot. • The data in the above model is not to be taken as accurate depiction of reality but the model is flexible enough to handle wide range of data. • Assume that each company’s preference is to maximize its expected profit. • The table below shows the payoffs to each company associated with each terminal history, and these payoffs are also included as the terminal nodes in the game tree above. • For PlumProducts, its payoff for the first terminal history is determined by finding 65% of 200 crores and then subtracting the additional 40 crores for the restraining clauses and 25 crores for the aggressive marketing. SeeTV’s payoff for the first terminal history is found by subtracting its 80 crores development costs from 35% of 200 crores. • For PlumProducts, its payoff for the second terminal history is determined by finding 50% of 200 crores and then subtracting the additional 40 crores for the restraining clauses. SeeTV’s payoff for the second terminal history is found by subtracting its 80 crores development costs from 50% of 200 crores. • For PlumProducts, its payoff for the third terminal history is determined by finding 100% of 200 crores and then subtracting the additional 40 crores for the restraining clauses. SeeTV’s payoff for the third terminal history is obviously zero. • For PlumProducts, its payoff for the fourth terminal history is determined by finding 55% of 200 crores and then subtracting 25 crores for the aggressive marketing. SeeTV’s payoff for the fourth terminal history is found by subtracting its 40 crores development costs from 45% of 200 crores. • For PlumProducts, its payoff for the fifth terminal history is determined by finding 50% of 200 crores. SeeTV’s payoff for the fifth terminal history is found by subtracting its 40 crores development costs from 50% of 200 crores. • For PlumProducts, its payoff for the last terminal history is determined by finding 100% of 200 crores. SeeTV’s payoff for the last terminal history is obviously zero. • Now we describe each player’s strategies. • To describe a strategy for SeeTV, select an action at both non-terminal histories viewed from top to bottom. • One such strategy is (Out, In), which indicates that SeeTV will stay out of the market if PlumProducts writes a restraining clause into its employee contracts but will enter into the market if PlumProducts does not write a restraining clause into its employee contracts. • SeeTV has three other strategies: (In, In), (In, Out), and (Out, Out). • PlumProducts is assigned three non-terminal histories, so its strategies must describe what action to take at each of them left to right and then top to bottom. • Thus the strategy (Restrain, Aggressive, Passive) indicates that PlumProducts should include the restraining clause and if SeeTV enters the market, employ an aggressive marketing plan. • The Passive part of this strategy gives an action for decision point D5 even though choosing Restrain in the game tree indicates PlumProducts will never arrive at D5. • We still define the last part of the strategy for the sake of completeness because the definition of strategy requires us to identify an action for each decision point. • By assigning one of two actions to each of the 3 decision points, we see that PlumProducts has a total of eight strategies. • We can construct a strategic game model for this scenario based on the strategies that we have identified for each player as shown in table below. • The best responses by PlumProducts and SeeTV to each other’s strategy choices are boxed. • We identify four Nash equilibria whose both payoffs are boxed as best response payoffs. • Modeling it as a strategic game is unsatisfying because it does not capture the sequential nature of the scenario. • Therefore, we present a sequential analysis for this scenario which has become a staple technique for all kinds of decision-making problems in industry, robotics and various other domains. • The first step of such a sequential analysis is always to begin at the end of the game tree. • PlumProducts will select Aggressive at node D4 because the payoff of 65 with Aggressive is larger than the payoff of 60 with Passive • PlumProducts will select Passive at node D5 because payoff of 100 with Passive > payoff of 85 with Aggressive. • This is indicated by making the edges thicker and labeling the nodes with the payoff vectors. • Given this information, SeeTV should select Out at node D2 because the payoff of 0 with Out is larger than the payoff of −10 with In. • Similarly, SeeTV should select In at node D3 because the payoff of 60 with In is larger than the payoff of 0 with Out. • And now, anticipating SeeTV’s strategy, PlumProducts should select Restrain at node D1 because the payoff of 160 with Restrain is larger than the payoff 100 with Open. • Thus, the strategy pair ((Restrain, Aggressive, Passive), (Out, In)) is the strategy profile that should result from players attempting to maximize their payoffs. • Note that this is one of the Nash equilibria from the strategic game version. Game tree & Backward Induction Algorithm • A game tree is a visual representation of a sequential game. • It displays all ways that a game could be played. • Each internal node in the tree represents a non-terminal history. • Each internal node is labeled to indicate which player is uniquely able to move from that position. • Each of the player’s legal moves at a given node is indicated by an edge leading from that node to another node. • There is an identified root node which is the initial game position. • The terminal nodes are labeled with the outcomes and payoffs. • BIA: Label all of the terminal nodes of the game tree with the payoff vectors. • Select any unlabeled node which has all of its directed edges extending to nodes labeled with payoff vectors. • The unlabeled node is referred to as the parent and the labeled nodes as children. • For each parent, select a child with the largest payoff for the player assigned to the parent node. • Mark the edge from the parent to this child and label the parent with the payoff vector from the selected child. • Repeat this process until all nodes are labeled. • The marked edges for each player describe their backward induction strategy. • The path of marked edges starting at the root leads to a backward induction outcome. • The backward induction strategy profile is a Nash equilibrium. • Zermelo’s Theorem. Every sequential game with a finite number of nodes has a backward induction solution, that is, a strategy profile and outcome. • Proof: We prove by induction. • Let 𝑛 denote the maximum number of steps needed to reach a terminal node from the root node. • If 𝑛 = 1, then we know the backward induction solution: pick the action that yields the highest payoff. • Induction hypothesis: Suppose that any game with (𝑛 − 1) or lesser as the maximum number of steps has a backward induction solution. • Now consider a game in which the maximum number of steps is 𝑛. • At the parent of each terminal node in this game, select a child whose payoff vector maximizes the payoff for the player who is assigned to the parent and assign this action to the player. • Now consider the game in which this set of parent nodes becomes the new terminal nodes. • This game tree has a maximum depth of (𝑛 − 1) and hence has a backward induction solution by hypothesis. Patent Race • Consider a scenario in which companies R and S are competing to develop a new product for the market. • The distance each is from completing the development can be measured in discrete steps, so at any point we can quantify the number of steps 𝑛 left in the project. • The maximum number of steps either company can take at once is 3. • It costs 2 crores, 7 crores, and 15 crores to take 1, 2, or 3 steps, respectively, at one time. • The first company to develop the product gets the patent, which is worth ₹20 crores. • A company quits the competition when it is no longer profitable to play. • The two companies alternate moves so that they can observe each other and manage their investment funds. • The empty history of this game consists of some ordered pair (𝑟, 𝑠) that indicates how many steps companies R and S, respectively, are away from applying for the patent. • If company R has the first move in this game, then its first action is to select from moving to (𝑟, 𝑠), (𝑟 − 1, 𝑠), (𝑟 − 2, 𝑠), and (𝑟 − 3, 𝑠). • Company S then has four similar actions at each of these new positions. • Below is a partial game tree when R and S are 3 and 4 steps away from completion. • Note that since an ordered pair may be reached in several different ways, several nodes may be labeled with the same ordered pair. • Nodes, and not ordered pairs, correspond to non-terminal histories. • But it turns out that it suffices to perform backward induction based on the ordered pairs. • Let the number of steps that R and S still need to take to reach completion be represented as the coordinates of a point on the grid as shown below. • Completion for company R means reaching the vertical axis and completion for company S means reaching the horizontal axis. • The goal of company R is to move left in the location space and that of company S is to move down. • To begin the Backward Induction analysis, consider a position (𝑟, 𝑠) in which 𝑟 ≤ 3 and 𝑠 ≤ 3. • This point can lie anywhere in the gray square. • In this case, whichever company has the first move should take sufficient steps to end, and win, the game. • Let us call this region a trigger zone, as the first mover has a winning strategy in this region. • Now consider a position (𝑟, 𝑠) in which 𝑟 > 3 and 𝑠 ≤ 3. • Company R cannot complete the research in one turn, and observes that company S can complete the research on its next turn. • Thus, R’s best option is to drop out of the patent race. • This gives company S the freedom to complete the research in 𝑠 moves to minimize its cost • If S makes two single-step moves, then it will cost less than one two-step move. • Three single step moves will cost less than one three-step move. • Such considerations are taken into account to find the optimal plan. • Therefore the region described by 𝑟 > 3 and 𝑠 ≤ 3 is a safety zone for company S. • Similar analysis shows that the region described by 𝑟 ≤ 3 and 𝑠 > 3 is a safety zone for company R. • Now consider a position (𝑟, 𝑠) such that 3 < 𝑟 ≤ 5 and 3 < 𝑠 ≤ 5. • Suppose that it is company R’s turn to move. • R can move into its safety zone for a cost of at most 7 crores and then S will drop out because we assume S does not have the means to know whether R has the budgetary flexibility to finish the game from its safety zone. • So R completes the research in three steps for an additional 6 crores, yielding a profit of at least 7 crores since the patent is worth 20 crores. • Thus, R will always win if it starts in this region. But the same analysis will not hold if R starts outside of this region. • To see this, let us consider position 6, 5 . • Suppose it is company R’s turn to move, then R can make a three-step move costing it 15 crores and then take three single step moves costing 6 crores to land the patent. • But then R makes a loss of 1 crore which is undesirable. So R will not opt for this plan. • Exact same analysis applies to company S. • So we have identified this region (3 < 𝑟 ≤ 5 and 3 < 𝑠 ≤ 5) as a second trigger zone. • The second trigger zone also creates a second pair of safety zones. • We can continue in this fashion to identify more trigger zones and their corresponding safety zones. • The first mover from a position in the 𝑛𝑡ℎ trigger zone spends what is needed to profitably move into its (𝑛 − 1)𝑡ℎ safety zone. Then the other resigns and the first mover finishes the game optimally. Political Challenger • Suppose a budding politician runs against a long time incumbent. • Before the election the challenger had to decide whether or not to enter the race. • The long-term incumbent then has to decide whether to withdraw from the race to avoid embarrassment from getting defeated by a newcomer or fight the challenger. • The game tree below models this scenario as a sequential game. • The utilities assigned to the challenger and the incumbent are simply their preferences among the three terminal histories. • Assume that the challenger would most prefer the terminal history (In, Withdraw). • The other two terminal histories are ranked arbitrarily for the challenger. • The incumbent would most prefer the terminal history (Out). • The other two terminal histories are ranked arbitrarily for the incumbent. • The backward induction strategy profile is (In, Withdraw) as shown below. • The table below displays the strategic form of this game and also the best response payoffs in boxes. • Thus, it follows that there are two Nash equilibria: (In, Withdraw) and (Out, Fight). • The former is also the backward induction strategy profile. • The latter involves the willingness of the incumbent to threaten to choose a less preferred outcome in order to convince the challenger to stay out of the race. • When this scenario is modeled as a strategic game, it is assumed that the challenger and the incumbent always see the full strategy chosen by the other player. • But when modeled as a sequential game, the challenger never sees the action of the incumbent when the challenger chooses Out. • Thus choosing Out is not necessarily logical for the challenger, as they do not know what the incumbent would do if they had selected In. • One way to give the players of a sequential game the same information as they have in the strategic game version is to imagine many repetitions of the game with different strategy profiles. • This allows all players to observe others’ behavior. • Based on past observed behaviors, the challenger would know that the incumbent would choose Fight if the challenger chooses In. • However, in a one-time sequential game, the incumbent would not actually choose Fight when the challenger selects In because it leads to a sub-optimal outcome. • Thus, while (Out, Fight) is a Nash equilibrium in the strategic game, it does not meet a more robust understanding of equilibrium in a sequential game. • A subgame 𝐺(ℎ) of the sequential game 𝐺, beginning at the non-terminal history ℎ, consists of the players in the game, any terminal histories for which ℎ is its initial part, and the player function and preferences inherited from the full game. • The game 𝐺 is a subgame of itself. • There is a proper subgame for each non-terminal history of the game. • Informally, a subgame perfect equilibrium is a strategy profile 𝑠 ∗ = (𝑠1∗ , 𝑠2∗ , … , 𝑠𝑛∗ ) such that in no subgame can any player 𝑖 do better by choosing a strategy different from 𝑠𝑖∗ given that all other players 𝑗 are playing 𝑠𝑗∗ . • This is a more refined version of Nash equilibrium applicable to sequential games. • The following theorem proves the existence of subgame perfect equilibria and provides a method for finding them. • Theorem. In a sequential game, subgame perfect equilibria are exactly the same as backward induction strategy profiles. • Proof. A final decision node is a parent whose children are all terminal nodes. • Pick any final decision node and a non-terminal history that ends at this node. • The player assigned to this node will pick the action which yields the highest payoff. • This selected action will be a best response by definition. Refer to the example above to verify. • And this is what the step 1 of the backward induction process asks us to do. • So step 1 of the backward induction process selects the Nash equilibria among all of the subgames generated by such non-terminal histories. • Now consider a penultimate decision node meaning you backtrack one time from your selected final decision node. • At this penultimate decision node, there are two decision makers, the penultimate one and the final one. • We already found out the best response of the final decision maker. • The penultimate decision maker can select a best response from the inferred final decision. • This is precisely what happens in the backward induction process. • Thus we can assert that backward induction yields a Nash equilibrium on each subgame generated by nonterminal histories ending in a penultimate decision node. • Continuing in this manner, we see that backward induction yields a Nash equilibrium on each subgame of the original game. • This completes the proof. • Also note that all subgame perfect equilibria are doing is that they are eliminating certain unreasonable Nash equilibria. Council Elections • 3 local council members must vote on whether to give themselves a raise. • A simple majority (2 > 1) is required to pass the raise, which will apply to all three council members. • The raise is of amount 𝑏, and it costs extra 𝑐 in the next election campaign for each member who votes yes. • We have 𝑏 > 𝑐, otherwise all three members would vote no. • Assume that council member 1 votes first, council member 2 votes second, and council member 3 votes last. • The game tree for this scenario is very easy to construct and is shown below. • Performing backward induction helps us arrive at the subgame perfect equilibrium which is given by the strategy profile (No, (Yes, No), (No, Yes, Yes, No)). • So council member 1 votes No. • Council member 2 votes Yes at the upper node and votes No at the lower node. • Council member 3 votes No at the top and bottom nodes and votes Yes at the middle two nodes. • This results in the outcome where council member 1 votes No and the other two council members vote Yes, with the corresponding payoff vector (𝑏, 𝑏 − 𝑐, 𝑏 − 𝑐). • Thus, it is best to be the first voter. • The same result can be discerned from the strategic game version of this scenario. • Council member 1 has 2 strategies available (No and Yes). • Council member 2 has 4 strategies available (e.g., NY indicates No at the upper node and Yes at the lower node). • Council member 3 has 16 strategies available (e.g., YNNN indicates Yes at the top node and No at the three lower nodes). • This results in a game with 2 × 4 × 16 = 128 outcomes. • So, constructing the strategic form of this game is nontrivial. • From the 2 tables below, we see that there are 36 Nash equilibria. • The strategy profile (N, YN, NYYN) corresponds to the backward induction strategies and so is subgame perfect. • Since there were no ties for best responses during the backward induction process, it must be that for the other 35 Nash equilibria, there are subgames in which the players’ strategies are not in a Nash equilibrium. • For example, consider the Nash equilibrium (Y, YY, NNNN). • Consider the subgame shown in the blue box and decide whether CM 3 will try to switch his precommitted strategy or not. • Moreover, Council member 1 is not likely to vote Yes since her payoffs in the equilibria in which she votes Yes, viz. (𝑏 − 𝑐) are smaller than many of those in which she votes No. • It is easy to show that none of the equilibria in which council member 1 votes Yes are subgame perfect. • It is also straightforward to show that most of the equilibria in which she votes No are also not subgame perfect. For example, consider the equilibrium (N,YY,YYYY). • A more difficult one to try will be the equilibrium (N, YY, NYYN). • This time consider the subgame shown in the blue box. Assume CM1 and CM3 have fixed their strategies. • Then CM2 will get a payoff of 𝑏 with No and payoff of 𝑏 − 𝑐 with Yes. So he will definitely make a switch. Cournot and Sequential Duopoly • Duopolies occur in markets when there are only two major company firms competing to provide an identical (or nearly identical) good or service. • The two firms are the two players if we model their competition as a strategic game. • Let 𝑄 denote the total quantity of a product available in the market at a price 𝑃 such that 𝑄 = 𝑎 − 𝑏𝑃. • The parameter 𝑎 > 0 denotes the amount that could be given away for free, and 𝑏 > 0 denotes the rate at 𝑑𝑄 which demand falls as the price P increases because = −𝑏. 𝑑𝑃 • Note that if you set 𝑃 = 𝑎/𝑏, then 𝑄 becomes zero, so 𝑎/𝑏 is the price at which consumers will demand nothing. • Let 𝑄1 denote the amount produced by the first firm and 𝑄2 the amount produced by the firm. • The total quantity of the product in market is 𝑄 = 𝑄1 + 𝑄2 . • Each firm can only control the quantity it produces, so the strategies available to firm 1 correspond to the possible amounts 𝑄1 it can produce, and likewise for firm 2. • Assume 𝑄1 , 𝑄2 ≥ 0 and in a range that keeps P ≥ 0. • Assume that cost of production is constant and same for each firm. • Let 𝑐 be the cost per unit produced. • The profit functions for firm 1 and 2 are given above. • Each firm’s profit depends not only on their own choices, but also on the choices of the other firm. • We can rewrite the above profit functions as utility functions given below which will be used for determining the Nash equilibrium for the two firms. • Each firm’s goal is to maximize its profit. • So we calculate the two partial derivatives to find the best response each firm has to the other’s action. For Firm 1, we obtain the following • Setting the partial derivative equal to zero, we find the critical value of 𝑄1 which will be a maximum point as long as it is positive because the second derivative is negative. • If the critical value is nonpositive, then firm 1 should produce nothing. • Thus, firm 1’s best response function is: Given a production level 𝑄2 for firm 2, firm 1 calculates the production level 𝑄1 which will maximize its profit as given below. • Firm 2’s best response function is also computed similarly. • Summing up the two best response functions gives us the following • Subtracting the two best response functions shows us that 𝑄1 = 𝑄2 , and therefore the optimal production values are as computed below. • Thus, if 𝑎 − 𝑏𝑐 > 0, then each firm will produce (𝑎 − 𝑏𝑐)/3 units of the good. The profit of each firm then will be (𝑎 − 𝑏𝑐)2 /9𝑏. • If 𝑎 − 𝑏𝑐 ≤ 0, then each firm will produce nothing and obtain no profit. • The condition for positive production can also be written as 𝑎/𝑏 > 𝑐, which is saying for any reasonable market to sustain itself, the price at which consumers will demand nothing must be higher than the cost to produce a single unit of the product. • Sequential games can also be used to model scenarios in which the players have a continuum of strategy options available to them. Cournot duopoly can be extended to allow continuous strategies. • Suppose that Firm 1 moves first by selecting a production level and then Firm 2 moves second, with knowledge of what Firm 1’s decision was. • The shading between the “0” and “∞” branches for each firm indicates that there are an infinite number of actions available to each player at each decision point: any nonnegative number indicating the production level of the corresponding firm. • A generic choice for Firm 1 is labeled 𝑄1 and the resulting choice for Firm 2 is labeled 𝑄2 . • Firm 2 has a choice for each choice of Firm 1 and the resulting total production level is Q = 𝑄1 + 𝑄2 . • Firm 2’s best response function had been computed previously • Because Firm 2 is the last firm to move, this strategy is precisely the one indicated by the backward induction method as it helps Firm 2 maximize its payoff. • The profit function for Firm 1 then needs to be rewritten as above. • Note that since Firm 1 knows what action Firm 2 will be taking, its utility function only depend on 𝑄1 . • Thus, Firm 1’s best response is to maximize its profit by computing the optimal value of 𝑄1 by taking the derivative of 𝑢1 with respect to 𝑄1 and setting it equal to zero. SEQUENTIAL DECISION MAKING • Game tree is one way to model sequential decision making (SDM) scenarios but Markov Decision Processes (MDPs) form a more powerful modeling tool for more complex SDM scenarios. • BIA is used to solve game trees whereas Reinforcement Learning (RL) algorithms are used to solve MDPs. • The most general SDM scenarios are described as follows. • There is a decision maker (DM) who makes successive observations of a process before making a final decision, and this keeps repeating until the time horizon is reached or some termination condition is triggered. • But there are costs associated with each of these observations. • The DM must choose what to do and how much to do it at various points in time. • The choices at one time influence the possibilities available at other points in time. • These choices depend on the relative value the DM assigns to different payoffs at different points in time. • These choices also require the DM to trade off costs and rewards at different points in time. • The procedure to decide when to stop taking observations and when to continue is called the stopping rule. • The objective is to find the optimal stopping rule and the optimal decision-making strategy with the goal of minimizing some given loss function, and observation costs are included in this optimization process. Government Subsidy • People who are monetarily risk-averse often purchase insurance policies to protect themselves or their properties against large losses. • Even though the expected value of paying INR10k is the same as having a 1% chance of paying INR10 lacs but for many, paying a guaranteed monthly premium is preferable to having even a small chance of paying a much larger sum. • Can having security against having to pay a larger sum changes people’s behavior in anyway? • For example, insured drivers may not drive as cautiously or may leave their cars unlocked knowing they won’t be responsible for large damages or losses. • Insurance companies must take this into account when calculating premiums or must find ways to reduce riskier behavior amongst their insured clients. • The change in behavior produced by passing some of the risk or responsibility from an individual to another player is known in game theory as moral hazard. • The individual is called an agent and the other party assuming some of the risk is known as the principal. • Insurance is one example with the insurance company acting as the principal for clients who are agents. • Another common example studied is that of managers attempting to influence their employees behavior through incentive programs related to sales or profits. • Here we explore whether the scholarships given out by central governments can result in moral hazard: will free college education encourage or discourage students in giving their best? • A key characteristic of a situation that involves moral hazard is that the principal cannot influence the agent’s action in any binding way. • In many situations the principal is able to observe only the final results. • Consider a large central government agency handing out scholarships to individual students spread out across a country. • Will the government, in the role of the principal, enable the student, in the role of the agent, to engage in activities that do not support the intentions of the government? • Specifically, rather than working hard to succeed, will a student choose to put minimal effort into their studies and have a higher risk of failure since there is no or little cost to them for the classes? • Instead of modeling the entire college education, let us simply consider a single course with two levels of accomplishment: success (S) or failure (F). • Cost of the course (tuition, books, and supplies) is a fixed amount K. • Government (G) reimburses the student (A) for • part of the cost of the course (an amount less than K) • the entire cost of the course (the amount K) • an amount greater than K as in a fellowship • G may also make this amount depend on whether the student succeeds or fails. • Let 𝑟𝑆 and 𝑟𝐹 be the amount that the government will reimburse in the two cases respectively. • Completely free college occurs when 𝑟𝑆 = 𝑟𝐹 = 𝐾. • The student can choose how much effort they are willing to put into succeeding in this course categorized as high effort (H), low effort (L), or no effort (N) by not taking the course. • If student has chosen to take the course, chance (C) then determines success or failure with the probability of success being 𝑝𝐻 or 𝑝𝐿 depending on whether the student chose to put in high or low effort. • The game tree for this sequential game is shown below. • We next explain the rationale for assigning the payoffs to the government and student for each of the five possible outcomes. (𝐺, 𝐴) • When the student chooses to not take the course, a payoff of zero is assigned. • For the government, the monetary worth for student success and failure is 𝑣𝑆 and 𝑣𝐹 . • For the student, the monetary worth for success and failure is 𝑤𝑆 and 𝑤𝐹 . • The monetary cost for high and low effort is 𝑐𝐻 and 𝑐𝐿 . • Thus, if the government chooses reimbursement plus the student chooses a high level of effort plus the student is successful, then the government’s payoff is 𝑣𝑆 − 𝑟𝑆 and the student’s payoff is 𝑤𝑆 − 𝑐𝐻 − 𝐾 + 𝑟𝑆 . • Assume the government prefers success (0 ≤ 𝑣𝐹 < 𝑣𝑆 ). • Assume there may be intrinsic worth to the student to take and pass the course (0 ≤ 𝑤𝐹 ≤ 𝑤𝑆 ). • Assume effort is costly for the student (0 < 𝑐𝐿 < 𝑐𝐻 ). • Assume students can pass and hard work increases their chances of getting a passing grade (0 < 𝑝𝐿 < 𝑝𝐻 ). • Given government’s action (𝑟𝑆 , 𝑟𝐹 ) and student’s action (𝐻, 𝐿, 𝑁), then government’s expected payoff is computed below. • We can similarly compute the student’s expected payoff as shown below. • ℎ and 𝑙 denote the student’s intrinsic (without government reimbursement) net benefit for high and low effort. • 𝑤𝑆 = 𝑤𝐹 means that there is no intrinsic worth to the student passing instead of failing the course. • If there is no intrinsic worth to the student passing instead of failing the course (𝑤𝑆 = 𝑤𝐹 ) and no government assistance (𝑟𝑆 = 𝑟𝐹 = 0), then ℎ < 𝑙 because 𝑐𝐿 < 𝑐𝐻 . So the student has no incentive to put in a high amount of effort. • If passing the course is sufficiently valuable (𝑤𝑆 > 𝑤𝐹 ) and this is also sufficiently effective in comparison with the cost of effort (𝑤𝑆 −𝑤𝐹 ≫ 𝑐𝐻 − 𝑐𝐿 ), then this implies that ℎ > 0 and ℎ > 𝑙. • So the student will take the course and put in a high level of effort even without government assistance. • Let us determine the government’s best course of action by applying the subgame perfect equilibrium solution concept. • We begin by considering the student’s maximizing strategy. • For each government action (𝑟𝑆 , 𝑟𝐹 ), student chooses an action among (𝐻, 𝐿, 𝑁) denoted by a(𝑟𝑆 , 𝑟𝐹 ) which maximizes 𝑢𝐴 ( 𝑟𝑆 , 𝑟𝐹 , a(𝑟𝑆 , 𝑟𝐹 )). • Government’s strategy will be to choose that action (𝑟𝑆 , 𝑟𝐹 ) which maximizes 𝑢𝐺 ( 𝑟𝑆 , 𝑟𝐹 , a(𝑟𝑆 , 𝑟𝐹 )). • Finding this subgame perfect equilibrium is equivalent to solving the two linear programs below. 𝑟𝑆 ≥ 0, 𝑟𝐹 ≥ 0 𝑟𝑆 ≥ 0, 𝑟𝐹 ≥ 0 𝑟𝑆 ≥ 0, 𝑟𝐹 ≥ 0 𝑟𝑆 ≥ 0, 𝑟𝐹 ≥ 0 • The last inequality constraints in both LPs are simply the nonnegativity constraints. • The only nontrivial constraint is the second inequality constraint. • The 2nd constraint simply decides which strategy student will adopt, H or L. • The 3rd constraint ensures that student will not consider N. • For 1st LP , the 2nd constraint decides if H is adopted by student. • For 2nd LP, the 2nd constraint decides if L is adopted by student. • The objective function maximizes the government’s payoff based on whether student adopts H or L. • If value of 1st LP is larger than value of 2nd LP and also than the value of student not taking the course (N) which is zero, then government announces reimbursement as per 1st LP. • If value of 2nd LP is larger than value of 1st LP and also than the value of student not taking the course (N) which is zero, then government announces reimbursement as per 2nd LP. • If value of both LPs ≤ 0, then government sees no utility in students putting effort (H or L) in their education and so announces its subsidy based on student choosing N which will logically lead to (𝑟𝑆 = 𝑟𝐹 = 0). EXTENSIVE GAMES • Until now, we have assumed that players in a game know the rules, the possible outcomes, and each other’s preferences over outcomes. Such games are said to be games with complete information. • Sometimes we have also assumed that players choose their actions sequentially and know all actions taken previously. Such games are said to be games with perfect information. • Sequential games have both complete and perfect information and have no actions left to chance. • Strategic games are imperfect information games with complete information. • An extensive game is a way to model scenarios that have imperfect or incomplete information in some way. • We shall restrict ourselves to imperfect information variants of extensive games where players may not always know where they are in a game. • Thus we assume that the players do not know one or more actions taken by the other players. • An extensive game consists of the following Service Contract • A local food processing plant, FoodPro, needs a small repair done. • They normally call Ben’s contracting company. • But the FoodPro manager decides to explore another contractor. • So the FoodPro manager first asks Mark’s contracting company for a bid. • Mark, who would like to have the work, can decide not to place a bid or to place a bid of some amount • After receiving a response from Mark, the FoodPro manager tells Ben whether Mark has submitted a bid. • But the amount of the bid, if one was submitted, is not told to Ben. • The FoodPro manager then asks Ben for a bid. • Since the project is small, Ben does not really want the work. • However, he wants to keep FoodPro as a customer over the long term. • The FoodPro manager plans to accept the lower of the two bids, but if the bids are of similar amounts, the FoodPro manager will choose Ben over Mark. • Assume that Mark and Ben are the only players in this game. • The game is sequential in nature. • Ben does not have full information about Mark’s action, thus this is an imperfect information game. • To create a simple model, we assume Mark can choose one of three actions: not bid (No), bid low (Lo), or bid high (Hi). • Ben only knows whether Mark bids or does not bid. • If Mark does bid, Ben does not know whether he bid high or low. • Since Ben is not interested in doing the work for a low price, assume that he chooses between two actions: not bid (No) or bid high (Hi). • The game tree below summarizes this scenario by assigning appropriate utilities. • The same game tree can also be represented using the table below. • Since Ben is preferred over Mark, Ben will obtain the work with histories (No, Hi) and (Hi, Hi). • Mark will obtain the work with histories (Lo, No), (Lo, Hi), and (Hi, No). • Neither will obtain the work with history (No, No). • Mark most prefers (Hi, No) because he gets the work at the most profit and probably gains a new customer. • Mark prefers (Lo, Hi) over (Lo, No) because with the former outcome he is getting the highest possible profit given Ben’s bid and with the later outcome he regrets losing out on a higher profit margin. • Mark values the remaining histories in which he does not obtain the work in the order (Hi, Hi), (No, Hi), and (No, No). • Mark is happy to have at least tried in (Hi, Hi). • We can also assume that Mark would rather want Ben to get the work than neither of them getting it in the first round of bidding with (No, No). • Next we discuss Ben’s payoffs. • Ben most prefers (Hi, No) because he does not need to do a small job he’s not interested in. • We can also assume that (Hi, No) does not take away future repair jobs of Ben at FoodPro. • Ben next prefers histories (Hi, Hi), (No, Hi), and (Lo, Hi) because his concern about his long term relationship with FoodPro is more important than his desire to not take on a small job. • Also because (Hi, Hi) suggests to FoodPro that Ben’s charges are reasonable while (Lo, Hi) suggests that Ben may be trying to cheat FoodPro. • Ben’s lowest ranked option is (Lo, No) since it most likely leads to Ben losing future repair jobs to Mark. • The history (No, No) is not ranked last because having FoodPro realize the difficulty of finding competent repair contractors could be advantageous to Ben in the future. • This completes utility assignment to players. • The non-terminal histories are shown above to be partitioned into three groups which are referred to as the information sets. • Only Ben faces an information set, viz. Ben2, that contains more than one node. • At Ben2, Ben does not know which sequence of actions has occurred. • Ben has to make a decision whether to choose No or Hi without knowing whether Mark has bid Hi or Lo. • In FoodPro, one pure strategy for Ben is to always choose No. • Another pure strategy is to choose No when Mark Bids (at information set Ben2) and Hi if Mark does not bid (at information set Ben1). • Mark has three pure strategies, corresponding to the actions at his only non-terminal history: No, Lo, or Hi. • For strategic games, we know that players can adopt mixed strategies. • A mixed strategy in strategic games is a probability distribution that assigns to each available action a likelihood of being selected. • We could also allow players in extensive games to adopt such mixed strategies. • But it is more natural to allow players to randomize action choices at each information set based on probability distributions. • Such strategies are called behavior strategies. • A pure strategy for player 𝑖 is a function 𝑠𝑖 which assigns to each of the player’s information sets a possible action. • A behavior strategy for player 𝑖 is a function 𝑠𝑖 which assigns to each of the player’s information sets a probability distribution over possible actions. • If 𝑠 is used to denote a pure or behavior strategy, let 𝑠𝑖 (𝐼) or simply 𝑠(𝐼) denote the action or probability distribution over actions chosen by player 𝑖 at the assigned information set 𝐼. • Let 𝑠𝑖 (𝑎|𝐼) or simply 𝑠(𝑎|𝐼) denote the probability that player 𝑖 will choose action 𝑎 at information set 𝐼. • A belief system for an extensive game is a function, 𝛽, that assigns a probability distribution over histories in each information set not assigned to chance. • Thus, 𝛽(𝐼) denotes the probability distribution on the nodes in information set 𝐼. • Let 𝛽(ℎ|𝐼) denote the probability of history ℎ in the information set 𝐼 with respect to the belief system 𝛽. • A belief system models the players’ understanding of what has happened in the game up to 𝐼. • When the player assigned to 𝐼 makes a choice of actions, that player uses the probabilities 𝛽(ℎ|𝐼) for all ℎ𝐼. • So these probabilities should reflect that player’s beliefs about how likely it is that each ℎ has occurred. • The probability distribution 𝛽(𝐼) is also referred to as the “player 𝑖’s belief system at 𝐼.” • For sequential games, we defined and used a stronger solution concept than Nash equilibrium, namely subgame perfect equilibrium, that required player strategies to be best responses starting from each nonterminal history of the game. • In an extensive game, players may not know at which node they find themselves within an information set, and so it would be impossible to determine the best response using the subgame perfect equilibrium solution concept. • However, if each player has a belief about how likely each history in the information set is to have occurred, they can determine a best response based on these beliefs. • An extensive game includes all aspects of a sequential game, but adds chance and information sets which may contain more than one non-terminal history. • Because of their sequential nature, we still use game trees as a visual representation for extensive games. • Each node in the tree corresponds to a history: terminal nodes correspond to terminal histories and are labeled with payoffs. • The other nodes correspond to non-terminal histories and are grouped together within dashed boxes to indicate information sets, which are labeled with chance or the player who chooses an action there. • A player assigned to an information set only knows that they are somewhere within the dashed box, but does not know specifically at which node, when choosing an action. • When all of the information sets contain a single history and no information set is assigned to chance, that extensive game becomes identical to a sequential game. • An information set containing two or more histories incorporates imperfect information because the players do not know some of the history of actions before making a choice and do not necessarily know the direct effects of an action. • A player in an extensive game has perfect recall if at every opportunity to act i. he remembers his prior actions, and ii. he remembers everything that he knew before. • Intuitively, the perfect recall assumption is that players never forget information once it is acquired. • An extensive game has perfect recall if every player in that game does. • If the partners in a card game or the members of a soccer team are modeled as a single player, then the game will not have perfect recall because each card player only knows their own cards and each team member can only see certain parts of the soccer field. • Even individual persons often forget certain choices they made in the past. • There are two types of memory associated with perfect recall: memory of past knowledge and memory of past actions. • A player can forget what she knew and yet remember what she did in the past. • A player can forget what actions she took in the past but remember what she knew at the time. • In the FoodPro scenario, one pair of behavior strategies is for Mark to choose between No, Lo, and Hi with equal probability and for Ben to choose No or Hi with equal probability at Ben1 but choose Hi over No twice as often at Ben 2. • Thus, we get the strategy profile 𝑠 = (𝑠𝑀𝑎𝑟𝑘 , 𝑠𝐵𝑒𝑛 ) shown above. For this strategy profile. Computing Mark’s and Ben’s expected payoff is straightforward as shown below respectively for the two. • Ben knows when Mark bids. • Assume that Ben believes Mark is equally likely to have bid Hi as Lo. • Thus, one possible belief system is to assign uniform probability distribution to Ben2, denoted by (1/2)(Lo) + (1/2)(Hi). • In information sets with only one node, for example Ben1, Ben knows with certainty that No has occurred. • Definition. An assessment is a pair (𝑠, 𝛽) consisting of a profile of behavior strategies, 𝑠 = (𝑠1 , … , 𝑠𝑛 ), together with a belief system, 𝛽. • One possible assessment for the FoodPro game is shown below. • The previous belief system can be represented on the game tree as shown below with the probability associated with each history in the Ben2 information set shown below the corresponding node. • Definition. Suppose (𝑠, 𝛽) is an assessment for an extensive game. Player 𝑖’s strategy is a best response at the information set 𝐼 if it maximizes the player’s utility at 𝐼 given the beliefs β(𝐼) and the other players’ strategies. The assessment (𝑠, 𝛽) is sequentially rational if each player’s strategy is a best response at each information set to which either the player or chance is assigned. • The current assessment for the FoodPro game is not sequentially rational because the payoff to Ben beginning at the singleton information set Ben1 is not a best response. • The expected payoff of (1/2)(2)+(1/2)(4) = 3 is less than the expected payoff from strategy 𝑠 ′ (Ben1) = (1)Hi of 4. • Ben’s payoff from his behavior strategy at Ben2 is not a best response either because the expected payoff is (1/2)((1/3)(1) + (2/3)(3)) + (1/2)((1/3)(6) + (2/3)(5)) ≈ 3.83. • This is less than his expected payoff of (1/2)(3) + (1/2)(5) = 4 had he instead selected 𝑠 ′ (Ben2) = (1)Hi. • So let us change Ben’s behavior strategy to 𝑠 ′ (Ben1) = Hi and 𝑠 ′ (Ben2) = Hi so that his behavior strategy becomes a best response at each information set. • Mark’s strategy is also not a best response to Ben’s new behavior strategy because 𝑠 ′ (Mark1) = Lo will yield a better payoff of 5 rather than the inferior value of (1/3)(2) + (1/3)(5) + (1/3)(3) ≈ 3.33 based on Ben’s new behavior strategy. • Against Ben’s new behavior strategy, Mark obtains payoffs of 2, 5, and 3 by choosing strategies No, Lo, and Hi, respectively, so Mark’s revised behavior strategy is indeed Mark’s unique best response. • Ben’s revised behavior strategies are best responses for Ben given his beliefs regardless of Mark’s strategy choice. • Based on our discussion, the above assessment is clearly sequentially rational. • Definition. A subgame 𝐺 ′ of an extensive game 𝐺 consists of a non-terminal history ℎ, called the root, all histories 𝐻 from 𝐺 that start with ℎ, and all other aspects of 𝐺 that relate to 𝐻 (i.e., players, actions, terminal histories, information sets, player function, chance probability distributions, and utilities) for which each information set 𝐼 of 𝐺 is either completely inherited by 𝐺 ′ (i.e., 𝐼 ∩ 𝐻 = 𝐼) or mutually exclusive with 𝐺 ′ (i.e., 𝐼 ∩ 𝐻 = ∅). • In the FoodPro game, there are two subgames: whole game rooted at node labeled Mark1, and subgame rooted at node labeled Ben1. • Definition. Let 𝑃𝑟𝐺′ (h|s) denote the probability that ℎ is reached in the subgame 𝐺 ′ given that the strategy profile 𝑠 is used. An assessment (𝑠, 𝛽) of a game 𝐺 achieves consistency of beliefs if for each history ℎ in each information set 𝐼 in each subgame 𝐺 ′ of 𝐺 for which 𝑃𝑟𝐺′ k s > 0 for some history k ∈ 𝐼, following holds • Observe that the consistency equation always holds at a singleton information set 𝐼 = {ℎ} because in such a situation 𝛽 ℎ 𝐼 = 1 and clearly • Also if the consistency equation holds for all but one history in an information set, then the consistency equation clearly holds for the last remaining history because the summation of 𝛽 ℎ 𝐼 and 𝑃𝑟𝐺′ ℎ s over all ℎ has to be one. • So to verify that an assessment achieves consistency of beliefs, it is sufficient to check the consistency equation at all except one history in each non-singleton information set where at least one history is reached with positive probability (otherwise you will be dividing by zero). • In the FoodPro game, since the subgame rooted at the node labeled Ben1 has no non-singleton information sets, the consistency of beliefs condition holds trivially. • For the subgame 𝐺 rooted at Mark1, there is one non-singleton information set that we have to consider, namely Ben2 and it satisfies the consistency equation as shown below, therefore the original assessment achieves consistency of beliefs. • However, the assessment with the changed belief strategy does not achieve consistency of beliefs as shown above. • But there is an easy recipe to define consistent beliefs. • Trick 1: For any information set that is reached via player strategies with positive probability in some subgame, you can construct the consistent beliefs by directly calculating the probability ratios from the behavior strategies. • Trick 2: If an information set is not reached via player strategies with a positive probability in any subgame, then any probability distribution on that information set will be consistent. • Definition. An assessment (𝑠, 𝛽) is a weak sequential equilibrium if it is both sequentially rational and achieves consistency of beliefs. • The assessment consisting of the original belief strategy in FoodPro game is not a weak sequential equilibrium since it is not sequentially rational, even though it achieves consistency of beliefs. • The assessment consisting of the changed belief strategy in FoodPro game is not a weak sequential equilibrium since even though it is sequentially rational, it does not achieve consistency of beliefs. • Using the tricks above, we can make the latter assessment achieve consistency of beliefs by simply computing the beliefs according to the consistency equation. • Consider the assessment as shown above. It is a weak sequential equilibrium. • To show this, we must first prove it is sequentially rational which we have already done before. • So we only need to show that it achieves consistency of beliefs by verifying the consistency equation. • For the subgame 𝐺 rooted at Mark1, this is shown below. • For strategic games, the primary solution concept has been Nash equilibrium. • For sequential games, we have used subgame perfect equilibrium. • For extensive games, we shall use weak sequential equilibrium which, similar to the subgame perfect equilibrium solution concept, also eliminates certain unreasonable Nash equilibria. • Strategic and sequential games are special cases of extensive games. Existence Theorems • The following theorems provide a sufficient condition for the existence of a weak sequential equilibrium. • Weak Sequential Equilibrium Existence Theorem. If an extensive game has perfect recall and a finite number of histories, then a weak sequential equilibrium exists. • Theorem. Given a sequential game, the corresponding extensive game puts each non-terminal history in its own information set. Each subgame perfect equilibrium of the sequential game becomes a weak sequential equilibrium for the corresponding extensive game by adding a belief system achieving consistency of beliefs, and the strategy profile of each weak sequential equilibrium of the corresponding extensive game is a subgame perfect equilibrium of the sequential game. • Theorem. Given a strategic game with strategy sets 𝑆1 , 𝑆2 , … , 𝑆𝑛 the corresponding extensive game consists of the set of terminal histories 𝑆1 × 𝑆2 × ⋯ × 𝑆𝑛 and an information set 𝐼𝑖 = 𝑆1 × 𝑆2 × ⋯ × 𝑆𝑖−1 for each player 𝑖. Each Nash equilibrium of the strategic game becomes a weak sequential equilibrium for the corresponding extensive game by adding a belief system achieving consistency of beliefs, and the strategy profile of each weak sequential equilibrium of the corresponding extensive game is a subgame perfect equilibrium of the sequential game. • Nash’s Existence Theorem. Every game with a finite number of players in which each player can choose from finitely many pure strategies has at least one Nash equilibrium, which might be a pure or mixed strategy for each player. Analysis of War Games • Let RE and GT denote two parties at war with each other. • RE prefers to conduct short campaigns raiding villages to capture or destroy tribal military supplies. • RE can advance on the village either through a forest or across a small lake. • The defending GT soldiers are sufficient to defend one of the approaches. • If GT chooses correctly, GT will win the resulting battle. • If GT chooses incorrectly, RE will capture the village and capture weapons and food. • The GT soldiers are then faced with a second choice: attack RE in the village, or wait in ambush. • The success of either option depends on a decision by RE on whether they will return to their base immediately, or wait until nightfall. • If GT is waiting in ambush and RE returns by day, then GT wins the resulting battle. • If GT is waiting in ambush, but RE withdraws at night, then RE successfully escape the ambush. • If GT attacks, but RE had withdrawn during the day, then RE are successful in their mission. • But if GT attacks and RE has decided to wait to withdraw, then there is a vicious battle in which both sides lose. • RE can also decide not to attack at all but doing this results in a loss of morale among soldiers and the removal of commander. • In this model, both players lack information. • Their payoffs are assigned so that positive numbers correspond to an overall “win” for that player and negative numbers correspond to an overall “loss” for that player. • The 2 tables below sum this up for RE and then GT. • Based on the payoffs given above, we can construct the following extensive game tree. • Probability values have been labeled below edges and nodes. • Just like boxing best response strategies helps you find all Nash equilibria and backward induction algorithm helps you find all subgame perfect equilibria, similarly we need a method for finding all possible weak sequential equilibria. • This example scenario illustrates a straightforward way how this can be done. • The lack of information is indicated by the dashed line boxes around three pairs of the decision nodes. • Within each of these boxes, the player who is making the decision does not know which history has occurred. • For example, within the GT1 box, when GT is making its decision whether to defend the forest or lake, it does not know RE’s decision whether it advanced through the forest or over the lake. • However, at the GT2 node, GT knows that they defended an advance across the lake while RE attacked the village after advancing through the forest. • This game has 3 subgames. • First, the entire game has the empty history as its root. • Second, the subgame with root (Forest, Lake) which corresponds to the node labeled GT2 and everything to the right of GT2. • Third, the subgame with root (Lake, Forest) which corresponds to the node labeled GT3 and everything to the right of GT3. • We first analyze the right-most action choices. • At the information set RE3 by choosing Day, RE expects payoffs of (1 − r)(0) + r(2) = 2r. • At the information set RE3 by choosing Night, RE expects payoffs of (1 − r)(2) + r(1) = 2 − r. • Case 1. Let r > 2/3. Then Day is RE’s unique best response. • So GE’s unique best response at GT3 would be Ambush because a payoff of 1 is larger than a payoff of −2. • So 𝑔 will be zero. • Now to achieve consistency of beliefs, 𝑟 must be equal to 𝑔 as can be checked using the consistency equation. • So r will also be zero. • Since r > 2/3 and r = 0 are mutually contradictory, this case is rejected and we can conclude that there is no weak sequential equilibrium for r > 2/3. • Case 2. Let r < 2/3. Then Night is RE’s unique best response. • So GT’s unique best response at GT3 would be Attack because a payoff of 0 is larger than a payoff of −2. • So 𝑔 will be one. • Again, to achieve consistency of beliefs, it follows that r = 1. • Since r < 2/3 and r = 1 are mutually contradictory, this case is also rejected and we can conclude that there is no weak sequential equilibrium for r < 2/3. • Therefore, r = 2/3, and to achieve consistency of beliefs, 𝑔 = 2/3. • The physical significance of r = 2/3 is that RE3 is indifferent between choosing Day and Night. • Similarly, 𝑔 = 2/3 being GT3’s best response strategy implies that GT3 must be indifferent between choosing Ambush and Attack because if expected payoff of either was more, then GT3 would select that with full probability. • GT’s expected payoff from choosing Ambush at GT3 is (1 − d)(1) + d(−2) = 1 − 3d. • GT’s expected payoff from choosing Attack at GT3 is (1 − d)(−2) + d(0) = 2d − 2. • But both these expected payoffs must be equal as GT3 is indifferent between choosing Ambush and Attack. • 1 − 3d = 2d − 2 implies d = 3/5. • So expected payoffs at GT3 for the 2 players are (1−2/3)((1−3/5)(0,1) + (3/5)(2,−2)) + (2/3)((1−3/5)(2,−2) + (3/5)(1,0)) = (4/3,−4/5). • Since the structure and payoffs in subgames rooted at GT2 and GT3 are identical, in any sequential equilibrium, 𝑓 = 𝑔 = 2/3, 𝑞 = 𝑟 = 2/3, 𝑐 = 𝑑 = 3/5, and expected payoffs at GT2 are (4/3,−4/5). • This results in the truncated game tree shown below. • Now analyze the GT1 information set. • The expected payoff to GT when it chooses Lake (e = 0) is (1 − p)(−4/5) + p(2) = 2.8p − 0.8. • The expected payoff to GT when it chooses Forest (e = 1) is (1 − p)(2) + p(−4/5) = 2 − 2.8p. • Case 1. Let p > ½. Then Lake is the GT’s unique best response. • Then RE’s best response at RE1 has to be Forest because the resulting payoff 4/3 is larger than the payoffs of −1 and −2 by choosing Don’t and Lake, respectively. • So 𝑏 is equal to zero. • To achieve consistency of beliefs, 𝑝 must be equal to 𝑏 which is zero. • Since p > 1/2 and p = 0 are mutually contradictory, a weak sequential equilibrium cannot have p > 1/2. • Case 2. Let p < ½. • A similar argument shows that a weak sequential equilibrium cannot have p < 1/2. • Thus, p = 1/2. • To achieve consistency of beliefs, 𝑎 must be equal to 𝑏. • If RE chooses 𝑎 = 𝑏 = 0 at weak sequential equilibrium, then RE’s expected payoff by choosing Don’t Attack ≥ RE’s expected payoffs by choosing Forest and Lake. • RE’s expected payoff by choosing Don’t Attack is −1. • RE’s expected payoffs by choosing Forest and Lake will be (1 − e)(4/3) + e(−2) and (1 − e)(−2) + e(4/3) respectively. • Summing the two inequalities −1 ≥ (1 − e)(4/3) + e(−2) and −1 ≥ (1 − e)(−2) + e(4/3) gives −2 ≥ −2/3. • This is a contradiction, therefore a weak sequential equilibrium cannot have 𝑎 = 𝑏 = 0. • Therefore, 𝑎 = 𝑏 > 0. • So RE1 must be indifferent between choosing Forest and Lake. • Now RE’s expected payoff from choosing Forest is (1−e)(4/3)+e(−2). • Also RE’s expected payoff from choosing Lake is (1 − e)(−2) + e(4/3). • So these two expected payoffs must be equal. • Thus, e = ½. • RE’s expected payoffs at RE1 for choosing Forest or Lake is −1/3. • −1/3 is greater than −1 which is the expected payoff from choosing Don’t Attack. • So RE discards Don’t Attack and chooses as its unique best response at RE1 to be 𝑎 = 𝑏 = 1/2. • In conclusion, the assessment satisfying 𝑎 = 𝑏 = 1/2, 𝑐 = 𝑑 = 3/5, 𝑓 = 𝑔 = 2/3, 𝑝 = 1/2, 𝑞 = 𝑟 = 2/3 is the only possible weak sequential equilibrium for this game.

lec-1-17 (z merged)

Related documents

Products

Support

lec-1-17 (z merged)

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib