Decision Analysis I - Martin L. Puterman

advertisement
BAMS 517
Decision Analysis:
A Dynamic Programming
Perspective
Martin L. Puterman
UBC Sauder School of Business
Winter Term 2011
1
Introduction to Decision Analysis -outline



Course info
Dynamic decision problem introduction
Decision problems and decision trees

Single decision problems
 Multiple decision problems






Probability
Expected value decision making
Value of Perfect Information
Value of Imperfect Information
Utility and Prospect Theory
Finite Horizon Dynamic Programming
2
Some dynamic decision problems






Assigning customers to tables
in a restaurant
Deciding when to release an
auction on eBay
Choosing the quantity to
produce (inventory models)
Deciding when to start a
medical treatment or accept an
organ transplant
Playing Tetris
Deciding when to add capacity
to a system







Advanced patient scheduling
Managing a bank of elevators
Deciding when to replace a car
Managing a portfolio
Deciding when to stop a
clinical trial
Guiding a robot to a target
Playing golf
In each case there is a trade-off between
immediate reward and uncertain long term gain
3
Common ingredients of these dynamic
decision problems





Problem persists over time.
Problem structure remains the same every period.
Current decisions impact future system behavior probabilistically.
Current decision may result in immediate costs or rewards.
These problems are all examples of Markov decision problems or
MDPs or stochastic dynamic programs




They were first formulated in the 1940s for problems in reservoir
management (Masse) and sequential statistical estimation problems
(Wald)
They were formalized in the 1950s by Bellman and Howard.
Theory was developed between 1960-1990.
Rediscovered in the 1990s by computer scientists


Reinforcement learning
Approximate dynamic programming
4
Basic Decision
Analysis
5
Decision Analysis

Goal: to understand how to properly structure, and then solve,
decision problems of nearly any type

Structuring the decision problem and obtaining the inputs is usually the
hard part
 Once the right structure has been found, solving for the best course of
action is usually straightforward


We will be guided by mathematical and scientific principles
These principles will ensure that:

Our decision-making is rational and logically coherent
 We choose the best course of action based on our preferences for
certain outcomes and knowledge available at the time of the decision

We might not always be satisfied with the outcome but we will be
confident with that the process we used was the best available.
6
Decision Analysis

Our analysis will tell us what decision ought to be taken,
as a rational person, and not what decision people
actually tend to make in the same situation



The methods we introduce provides a framework that
translates your preferences for outcomes and your
assessments of the likelihood of each consequence into
a recipe for action



Normative (or prescriptive) analysis, rather than a descriptive
analysis
Many studies have shown that people do not always act
rationally
Places minimal requirements on your preferences and
assessments
Does not impose someone else’s values in place of your own
We begin by exploring how to assess the likelihood of
outcomes. We will discuss how to determine your preference
for outcomes in a few classes.
7
Simple decision problems


The basic problem is to select an action from a finite set without
knowing which outcome will occur
In order to decide on the proper action, we need to:

Quantify the uncertainty of future events


Evaluate and compare the “goodness” of the possible outcomes


Assign probabilities to the events
Assign utilities to the outcomes
Once these are in place, we have fully specified the decision problem
8
Assessing Probabilities
Through Decision
Trees
9
The election stock market problem



Suppose we are faced with the following opportunity on
September 8, 2008.

You can pay $.56 and if Obama wins the election you receive $1
and if he loses you receive $0.

http://iemweb.biz.uiowa.edu/graphs/graph_Pres08_WTA.cfm
Decision: Invest $.56 or do not
Uncertain Event: Obama wins.


Suppose this has probability q
The election stock market problem is perhaps the
simplest decision problem we will study. It contains,
however, all the basic elements of many more complex
problems.
10
The election problem on
September 8
Payoff
(Gain)
Obama wins Obama
loses
Buy 1 share
1
(+.44)
0
(-.56)
Do not
0
0
11
A decision tree for the election problem
$0.44
Obama wins
q
Buy 1
share
1-q
Obama
loses
-$0.56
Do not
invest
$0
12
Valuing gambles


Under certain conditions (to be discussed in
class 3) it is advantageous to evaluate gambles
by their mathematical expectation.
For the previous problem the expected value of
the gamble would be
.44 q - .56 (1-q) = q - .56
13
Solving the election problem a reduced problem
We replace the gamble by its expectation – latter we
use expected utility of the gamble
Buy 1
share
$( q-.56)
Do not
invest
$0
14
The election problem solution




Assume you will choose the decision which maximizes your
expected payoff.
If you invest, your expected payoff is q-.56; if you do not your
expected payoff is 0. Thus if you thought (on September 8) that q,
the probability Obama wins exceeds .56, you will invest, if you don’t,
you will not invest.
You will be indifferent, when q =.56.
On September 8, the consensus (among investors in the Iowa
Electronic Stock Market) probability of Obama winning was 0.56


Why?
Thus an Electronic Stock Market provides an alternative to polls
when predicting outcomes of random events and a method for
assessing probabilities.

Current markets
 Wikipedia Article (Gives comparison of accuracy compared to polls)
15
Odds - definitions

If p is the probability an event occurs;
o = p/ (1-p)
is called the odds of the event occurring

Often we consider l = ln(o) = ln (p/1-p) which is called the
log-odds or logit of p.


Aside: this is a key ingredient in a logistic regression model
ln(p/1-p) = β0 + β1x
Thus the odds (on September 8) of Obama winning is
o = .56/.44 = 1.27 (to one)
16
Odds - Examples

On December 30, 2008 The Globe and Mail gave the
following odds for various teams winning the Super
Bowl :




They have the following meaning in this context;


NY Giants 2 to 1
Tennessee Titans 4 to 1
Arizona Cardinals 40 to 1
If you bet $1 on the Cardinals (on Dec 30) and they win the
Super Bowl, you get back $41 dollars for a net gain of $40
dollars
The relation of these odds to probabilities can be
determined using decision analysis

What is the implied odds makers probability q, that Arizona wins
the Super Bowl?
17
A decision tree for the Super Bowl problem
$40
Arizona wins
q
Bet on
Arizona
1-q
Arizona
loses
-$1
Do not
bet
$0
18
Solving the Super Bowl problem
You would be indifferent between the two decisions if q = 1/41 or 1-q =40/41
Bet on
Arizona
$ 41q -1
Do not
bet
$0
19
Odds and Bookmaker’s Odds

Based on the decision tree and expectations the probability of winning is 1/41

So using the above definition of odds you would find the odds of winning is
oW = 1/41/ (1 - 1/41) = 1/40 (to 1)


The odds of losing would be
oL = 40/41/ (1-40/41) = 40 (to 1)
Thus quoted odds for sports events are the odds of losing.
Hence the odds (on December 30) the Giants don’t win the Super Bowl are
2 to 1 and the odds they win the Super Bowl are 1 to 2.
 The implied probability the Giants win the Super Bowl is 1/3.


Another interpretation (courtesy Wikipedia)

“Generally, 'odds' are not quoted to the general public in the format (p/1-p) because
of the natural confusion with the chance of an event occurring being expressed
fractionally as a probability. ”
 Example – Suppose that you are told to pick a digit from 0 to 9. Then the odds are 9
to 1 against you choosing a 7. One way to think about this interpretation is that there
are 10 outcomes in 1 you succeed in picking a 7 and in 9 you don’t succeed.


This interpretation doesn’t work for one time events like the Super Bowl.
I’ll refer to these as ‘bookmaker’s odds”.
20
Games of Chance and Odds
The payout on a successful bet on a single number is 35 to 1 plus the amount bet.
The true bookmaker’s odds are 37 to 1 on an American roulette wheel (with 0 and 00).
(assuming a fair wheel)
21
A decision tree for a single number bet in
roulette
$35
Ball stops on 7
1/38
Bet $1 on 7
37/38
Lose
-$1
Do not
bet
$0
22
Solving the roulette problem
Bet $1 on 7
-$ 0.0526
Do not
$0
23
Bet name
Winning spaces
Payout
Odds against winning
Expected value
(on a $1 bet)
0
0
35 to 1
37 to 1
−$0.053
00
00
35 to 1
37 to 1
−$0.053
Straight up
Any single number
35 to 1
37 to 1
−$0.053
Row 00
0, 00
17 to 1
18 to 1
−$0.053
Split
any two adjoining numbers vertical or horizontal
17 to 1
18 to 1
−$0.053
Trio
0, 1, 2 or 00, 2, 3
11 to 1
11.667 to 1
−$0.053
Street
any three numbers horizontal (1, 2, 3 or 4, 5, 6 etc.)
11 to 1
11.667 to 1
−$0.053
Corner
any four adjoining numbers in a block (1, 2, 4, 5 or 17, 18, 20, 21 etc. )
8 to 1
8.5 to 1
−$0.053
Five Number Bet
0, 00, 1, 2, 3
6 to 1
6.6 to 1
−$0.079
Six Line
any six numbers from two horizontal rows (1, 2, 3, 4, 5, 6 or 28, 29, 30, 31, 32, 33 etc.)
5 to 1
5.33 to 1
−$0.053
1st Column
1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31, 34
2 to 1
2.167 to 1
−$0.053
2nd Column
2, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35
2 to 1
2.167 to 1
−$0.053
3rd Column
3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36
2 to 1
2.167 to 1
−$0.053
1st Dozen
1 through 12
2 to 1
2.167 to 1
−$0.053
2nd Dozen
13 through 24
2 to 1
2.167 to 1
−$0.053
3rd Dozen
25 through 36
2 to 1
2.167 to 1
−$0.053
Odd
1, 3, 5, ..., 35
1 to 1
1.111 to 1
−$0.053
Even
2, 4, 6, ..., 36
1 to 1
1.111 to 1
−$0.053
Red
1, 3, 5, 7, 9, 12,
14, 16, 18, 19, 21, 23,
25, 27, 30, 32, 34, 36
1 to 1
1.111 to 1
−$0.053
Black
2, 4, 6, 8, 10, 11,
13, 15, 17, 20, 22, 24,
26, 28, 29, 31, 33, 35
1 to 1
1.111 to 1
−$0.053
1 to 18
1, 2, 3, ..., 18
1 to 1
1.111 to 1
−$0.053
19 to 36
19, 20, 21, ..., 36
1 to 1
1.111 to 1
−$0.053
24
Roulette


Based on the previous table; every $1 bet in
roulette has an expected value of negative
$0.526.
Thus roulette is an unfavorable game.
 Note
there is research on how to play unfavorable
games optimally based on dynamic programming.


But; if you play many times, and the wheel is
fair, you will lose money.
Why do people play?
25
Money Lines or “odds sets”

Another way of expressing odds.


The Globe and Mail (December 31,2008)



In an NHL game the favorite Calgary has a line of -175 and Edmonton the
underdog has a line of +155.
This means that if you want to bet on Calgary, you must bet $175 to win
$100 and if you want to bet on Edmonton you must bet $100 to $155.
This implies



Used frequently for hockey and baseball betting.
P(Calgary) = 7/11 = .636
P(Edmonton) = 20/51 = .392
What’s happening? What about ties?

Suppose the Calgary probability is correct, (and ties are not possible), then the
probability of Edmonton winning should be 1-.636 = .354 and the money line on
Edmonton should be 636/.354 = 180!
 So “the House” is taking $25 off the payout on a winning Edmonton bet.
 The same argument for Calgary implies ?

So again like in roulette “the House” is taking a premium on every bet by
reducing the payoff below the expected value of the gamble.
26
Assigning probabilities to events


The uncertainty of an event will be measured according
to its probability of occurrence
For events that have been repeated several times and
regularly observed, it’s easy to assign a probability:

Outcomes of gambling games:


Tossing coins, rolling dice, spinning roulette wheels, etc.
Actuarial and statistical events:




A 30-year-old female driver having an accident in the next year
The chance of rain tomorrow, given today’s weather conditions
The number of cars driving over the Lion’s Gate bridge tomorrow
between 8 and 9 AM
The number of admits to the emergency room at VGH on January 7,
2009
27
Assigning probabilities to events

However, not all events occur with statistical regularity:



The uncertainty of an event often derives from a lack of
precise knowledge



How many jellybeans are in a jar?
Was W.L. MacKenzie King Prime Minister of Canada in 1936?
Or there is not much data available


General Motors will be bankrupt by July 1, 2010
A democrat will win the 2012 US presidential election
Will a new medical treatment be effective in a specific patient?
Since these events cannot be repeated in any
meaningful way, how can we assign a probability to their
occurrence?


We can rely on election stock markets or odds if they are
available.
What if they’re not?
28
Assigning probabilities to events

It is important to recognize that two different people in the same
situation might assign two different probabilities to the same event

A probability assignment reflects your personal assessment of the
likelihood of an event – the uncertainty being measured is your
uncertainty
 Different people may have different knowledge about the event in
question
 Even people with the same knowledge could still differ in their opinion of
the likelihood of an event
 Someone could coherently assign a probability of ¼ to a coin coming up
heads, if he/she had reason to believe the coin is not fair

They are often called subjective probabilities.

The assessment of subjective probabilities is a key topic in research on
decision analysis (and forecasting)
29
Assigning probabilities to events


Example; Suppose we wished to assign a probability
to the event “A thumb tack lands with its point up”
How we could we find this probability?
We could guess.
 We can gauge our belief of the likelihood of an event by
comparing it to a set of “standard” statistical probabilities through
a reference lottery.


We can compare the following two gambles to assess
this probability:
Choice A: Toss the thumbtack. If it lands point up, you
win $1; otherwise you receive $0
Choice B: Spin the spinner, If it ends on blue, you win
$1; otherwise you receive $0


We can adjust the portion of the spinner that is blue
until we are indifferent between the two choices.
This Probability spinner provides a way of varying the
blue portion systematically.
30
The implied decision tree
+1
Thumbtack land
up
Choice A
Thumbtack lands on
“tip down”
0
+1
Spinner blue
Choice B
Spinner red
0
31
Implications of using reference
lottery




If the spinner is set so that the probability of blue is .5 and you
prefer A to B, then you believe the probability “thumbtack up”
is greater than .5
If the spinner is set so that the probability of blue is .9 and you
prefer B to A, then you believe the probability “thumbtack up”
is less than .9
Repeating this can give a plausible range for the probability of
“thumbtack up” .
This is hard to do!


There is a big literature on biases of such assignments.
Alternatively we could construct a distribution of plausible
values for this probability and the likelihood of each of these
values instead of assigning one number.

Or we could input our assessment into the decision problem and do
sensitivity analysis.
32
Another option – acquire information


Suppose you were faced with Choice A only?
What would this gamble be worth?
One approach; provide a prior distribution on the
probability of the event p.
 Example:
Uniformly distributed on [0,1].
 Base the decision on the mean, median or mode of
this distribution.

Toss the thumbtack once, and use Bayes’
theorem to update this probability.
33
Assigning probabilities to events

Let E be an event, and let H represent the knowledge and
background information used to make a probability judgment. We
denote the assigned probability as P(E | H)


We do not consider probabilities as separate from the information
used to assess them


“The probability of event E given information H”
This reflects the fact that we consider all probabilities to be based on the
judgment of an individual and the individual’s knowledge at the time of
the assessment.
Even though we consider probabilities to based on an individual’s
judgment, they cannot be arbitrarily assigned


Certain rules must be obeyed for the assignments to be coherent
Using the method outlined above to assign probabilities avoids
incoherent assignments
34
Axioms of Probability

The probability assignments P(E | H) must obey the following basic
axioms:
0 ≤ P(E |H) ≤ 1
 (Addition law) Suppose that E1 and E2 are two events that could not both
occur together (they are mutually exclusive). Then
P(E1 or E2 | H) = P(E1 | H) + P(E2 | H)
 If E1 and E2 are mutually exclusive and collectively exhaustive, then
P(E1 or E2 | H) = P(E1 | H) + P(E2 | H) = 1


(Multiplication law) For any two events E1 and E2,
P(E1 and E2 |H) = P(E1 | H)P(E2 | E1 and H)
If E1 and E2 are independent (i.e., P(E2 | E1 and H) = P(E2 |H)), then
P(E1 and E2 | H) = P(E1 | H) P(E2 | H)

These rules can be used to compute probability assignments for
complex events based on those for simpler events
35
The law of total probabilities

This law can be derived from the axioms and the
definition of conditional probability. It says that for any
two events A and E,


This law is useful because it allows one to divide a
complex event into subparts for which it may be easier to
assess probabilities.



P(A | H)= P(A and E | H) + P(A and Ec | H)
= P(A | E and H)•P(E | H) + P(A | Ec and H) • P(Ec | H)
Also it generalizes to more than just conditioning on E and Ec.
We can replace it by any set (or continuum) of events that
partitions the sample space.
It is used widely in probability theory to compute complex
probabilities and is fundamental for evaluating Markov
chains
36
Bayes’ rule

This is a very important rule that we will use extensively.



It is a way to systematically include information in assessing probabilities
To simplify notation lets drop the conditioning on H and assume that it is
understood that probabilities are conditional on history.
Bayes’ Rule can be written as
P( A | B) 


P( A & B)
P( B | A) P( A)

P( B)
P( B | A) P( A)  P( B | AC ) P( Ac )
It is derived using the definition of conditional probability and the law of total
probabilities
It generalizes to any set of events that partitions the sample space.
37
Updating probability assessments





Suppose that you can’t see inside a
jellybean jar containing only red and white
beans, but I tell you that either 25% of the
beans are red or 25% are black. You think
these possibilities are equally likely.
Suppose I pick a bean at random. What is
the probability it is red?
Now you draw 5 jellybeans from the jar with
replacement, and find that 4 of them are
red. How should you revise your belief in
the probability that 25% of the beans are
red, in light of this information?
Let A be the event “25% of beans in the jar
are red” and let E be the event of drawing 5
beans and obtaining 4 red and 1 black
We want to find P(A | E) and we know:
P(A) = P(Ac) = 0.5
 P(E | A) = .75 (.25)4 = 0.00293
 P(E | Ac) = .25 (.75)4= 0.0791

38
Updating probability assessments



Using Bayes’ rule, we now compute
P(A |E) = .00293(0.5) / [.00293(0.5) + .0791(0.5)] = .0357
Thus, you should now believe that there is about a 3.5% chance that
25% of the jellybeans are red.

You also think there is about a 96.5% chance that 25% of the beans are
black
 Obviously, we have received strong evidence regarding the contents of
the jar, since our beliefs have gone from complete uncertainty (50%) to
high probability (96.5%)

Let’s look at some of the terms involved in Bayes’ rule:
The expression P(A | E) is known as the “posterior” probability of A, i.e.,
the assessed probability of A after we learn that E has occurred
 P(A) is known as the “prior” probability of A, i.e., the assessment of the
probability of A before the new information was received
 P(E | A) is known is the “likelihood” of E occurring given that A is true

39
Probabilities for single events


To follow on the previous example, let’s now ask what we think the
probabilities are now for the next bean drawn from the jar to be red
We assign a probability of .965 to there being 75% red beans in the
jar, and a chance of .035 to there being 25% red beans in the jar
Let A = “75% of the beans in the jar are red”
 Let Ac = “25% of the beans in the jar are red”
 Let B = “The next bean drawn from the jar is red”




We use the law of total probability to compute P(B):
P(B) = P(B | A) P(A) + P(B | Ac) P(Ac)
= .75(.965) + .25(.035) = .733
P(B) is called a marginal probability
What was this probability before sampling?
40
The “Monty Hall” problem





Monty Hall was the host of the once-popular game show “Let’s Make a Deal”
In the show, contestants were shown three doors, behind each of which was a
prize. The contestant chose a door and received the prize behind that door
This setup was behind one of the most notorious problems in probability
Suppose you are the contestant, and Monty tells you that there is a car behind
one of the doors, and a goat behind each of the other doors. (Of course, Monty
knows where the car is)
Suppose you choose door #1
41
The “Monty Hall” problem



Before revealing what’s behind door #1, Monty says “Now I’m going to reveal to
you one of the other doors you didn’t choose” and opens door #3 to show that
there is a goat behind the door.
Monty now says: “Before I open door #1, I’m going to allow you to change your
choice. Would you rather that I open door #2 instead, or do you want to stick with
your original choice of door #1?”
What do you do?
42
Summary





Sequential Decision Problems
Decision Trees
Probability Assessment, Odds and Gambling
Probability updating
Monty Hall and Hatton Realty for next time.
43
Download