Belief-type probability

advertisement
TWO CONCEPTS OF PROBABILITY
So far we have done probability calculations
without asking whether “probability” is a
well-understood concept.
 In fact, probability has been studied
systematically for only 300 years or so.
 Philosophers, mathematicians, logicians
and statisticians disagree about the nature
of probability.
Two fundamental probability ideas:
Belief: probability is a measure of how
certain your beliefs are or should be.
Frequency: probability is the relative
frequency of an outcome on repeated
trials of a chance set-up.
Let’s examine the differences between
these two.
1
(I) “This die is fair. The probability of rolling
a six is 1/6.”
This claim is true or false regardless of what
we know.
If it is true, it is because of some features of
the die.
 It could be explained by reference to the
physical properties of the die.
 We can test this claim by tossing the die
and seeing how often six comes up.
In short: this claim is about the die. It is
about something in the world.
When we speak this way, we are using
“probability” in the frequency sense.
2
There are different ways to think of the
frequency sense of “probability”:
 In the long run, the relative frequency of
sixes will be 1/6.
 The die has a tendency to come up six
1/6 of the time.
 The die has a propensity or disposition
to come up six 1/6 of the time.
 The die has a symmetrical geometric
structure.
We will label these ideas:
Frequency-type probability
3
“Unicorns probably never existed.”
Consider some reasons for this claim:
1. There are no fossil records of
unicorns.
2. Unicorns appear in fanciful myths.
3. No one has ever observed a unicorn.
On this basis one says:
(II) “Considering all the evidence, it is 80%
probable that unicorns never existed.”
This is a statement about the proposition:
Unicorns never existed.
It says that it is 80% likely that this
proposition is true.
4
What are the differences between (I) and
(II)?
1. (II) is not true or false regardless of what
we know. It is about evidence we have.
2. It if is true, it is true because of the
relationship between evidence and a
proposition, not the world.
3. The evidence may be true because of
the way the world is (the lack of fossil
records, for example) but these facts
can’t explain why (II) is true. That
depends on inductive logic.
4. It makes no sense to talk about
repeated trials to test (II).
5. In short: (II) is about the relation
between evidence and a proposition, not
the way the world is.
This is the belief sense of “probability”.
5
Two ways to understand the belief sense
of “probability”:
I. Interpersonal/Evidential:
 Any reasonable person who considered
the evidence would agree with the
statement.
 Probability is objective: the evidence is
evidence for everybody.
II. Personal degree of belief:
 I am personally quite confident that
unicorns never existed.
 If I had to bet, I’d take 4 to 1 odds that
unicorns never existed.
We call both ideas:
Belief-type probability
6
In sum
The frequency sense of “probability” talks
about the number of times a certain kind of
event will happen in repeated trials.
The belief sense of “probability” talks about
the extent to which evidence supports a
proposition.
7
Take a moment to look back at some of the
examples in the book. Which ones are best
understood as frequency-type examples?
Which as belief-type?
Notice that the calculations remain the
same either way.
The problem is one of philosophical
understanding of the concept “probability”.
8
Important distinction:
Don’t confuse the following:
 What a person says (the proposition
he/she expresses).
 The reasons a person has for believing
a proposition.
Of course, you should have reasons for any
claim you make.
 However, what you say is different from
your reasons for saying it.
So it is false that every statement you make
is a statement about your own beliefs or
reasons.
9
Example: I say: “Unicorns never existed.”
I say this because:
1. I read about the fossil records.
2. I studied Greek mythology.
3. Etc.
These are my reasons.
But the statement I make does not refer to
my beliefs or reasons. It is simply the
proposition that unicorns never existed.
This is a claim about the world, not me.
 Belief-type theories say probability is
about the relation between reasons and
propositions.
 Frequency-type theories say it is about
the world.
10
Evidence
It makes no sense to talk about the
frequency of a single event. Hence, such
claims must be belief-type.
However, we often use frequencies to form
belief-type probability statements about
single cases.
Example: you’ve tossed a certain coin
hundreds of times.
80% of the tosses come up heads.
(Q): What is the probability that this (single)
toss will come up heads?
Answer: 80%.
We use the frequency as evidence to
support the belief that there is an 80%
probability that this toss will come up heads.
11
Frequency Principle
If all you know is the frequency-type
probability of a certain kind of event, then
the belief-type probability that the event
occurs in a single case should equal the
frequency-type probability.
[In most cases we are not completely
ignorant and have to make a judgment.]
12
Imagine you’ve tossed the biased coin 500
times and seen 399 heads.
 You conclude that heads comes up about
80% of the time.
 But you’re not 100% confident.
 You think more trials are needed to be
absolutely sure.
So: you say you are 99% sure that the coin
comes up heads about 80% of the time.
Does this make sense? Yes:
Your statement expresses your belief-type
probability in a frequency-type probability.
I.e., it expresses your confidence that
you’ve identified the right frequency.
13
THEORIES OF PROBABILITY
For most calculations, belief-type and
probability-type theorists agree.
We shall take both ideas seriously (chapters
13-15 explore belief-type, 16-19 frequencytype).
But let us take a closer look at some of the
theories.
14
Belief-Type Probability
1. Logical Probability
Endorsed by: J. M. Keynes and R. Carnap.
Probability statements express a logical
relation between evidence and a
proposition.
On this view, probability is always relative to
evidence (whether or not this is explicit).
There is no categorical probability, only
conditional probability, i.e. Pr(H/E).
 In other words, no proposition has a
probability on its own. It gains probability
in relation to evidence.
Let’s take a moment to see how this might
be set up.
15
Logic and evidence
Suppose there are three objects, a, b and c.
We consider whether they have some
property, F.
There are eight logically possible
combinations. We’ll call these ‘states’:
1.
2.
3.
4.
5.
6.
7.
8.
Fa & Fb & Fc
~Fa & Fb & Fc
Fa & ~Fb & Fc
Fa & Fb & ~Fc
~Fa & ~Fb & Fc
~Fa & Fb & ~Fc
Fa & ~Fb & ~Fc
~Fa & ~Fb & ~Fc
So far so good, but note that there are
various ways we can organize this list.
16
Structure descriptions
We can focus on broad structure as follows:




Each object is F: (1)
Two objects are F, one is ~F: (2, 3, 4)
One object is F, two are ~F: (5, 6, 7)
Nothing is F: (8)
Carnap suggests we treat the situation as
follows:
1. Give each structure an equal weight
(1/4)
2. Partition the states equally within each
structure (i.e. 1/4x1/3)
This leads to the following:
17
Carnap’s partition
State
Structure
Weight p*
1. Fa.Fb.Fc
I. Each is F 1/4
2. ~Fa.Fb.Fc
3. Fa.~Fb.Fc
¼
1/12
II. Two Fs,
one ~F
1/4
1/12
4. Fa.Fb.~Fc
1/12
5. ~Fa.~Fb.Fc
1/12
6. ~Fa.Fb.~Fc
III. One F,
two ~Fs
1/4
7. Fa.~Fb.~Fc
8.
~Fa.~Fb.~Fc
1/12
1/12
IV. Nothing
1/4
is F
18
¼
Logic and learning from experience
Let P = Fc (i.e. c has property F)
Let Q = Fa
Now we can ask, what is Pr(P/Q)?
Pr(P/Q) = Pr(P & Q)/Pr(Q)
 Pr(Q) = ¼ + 1/12 + 1/12 + 1/12 = ½
 Pr(P & Q) = ¼ + 1/12 = 4/12 = 1/3
So, Pr(P/Q) = (1/3)/½ = 2/3
Since Pr(P) = ½ we have added
confirmation to our belief that c is F, so this
models learning from experience.
Note:
 Even Pr(Q) is really conditional (i.e.
given this particular arrangement of
individuals and properties).
 This is an interpersonal, evidential
theory of probability.
19
Some concerns
Why focus on structure? Why not give each
state an equal probability of 1/8? Call this
partition p.
We can certainly do so. In that case:
 Pr(P & Q) = 1/8 + 1/8 = ¼
 Pr(Q) = ½
So, Pr(P/Q) = (1/4)/½ = ½.
In other words, the probability doesn’t
change given the new evidence, so there is
no learning here.
But isn’t this how it should be? There are
four states that will give rise to Fa, and of
these only two also have Fc.
So how do we assign probabilities?
20
Different distributions
The difference between p and p*
corresponds to the difference between
Maxwell-Boltzmann statistics and BoseEinstein statistics used in physics.
Of interest:
 No particles obey M-B statistics but
photons obey B-E statistics.
 We can partition states up yet differently
to get Fermi-Dirac statistics, which
electrons obey
So, is any particular partition the right one?
How can the logical theorist decide?
21
Principle of Insufficient Reason:
Here is an interesting question: what if there
is no relevant evidence? In that case, how
do we understand the logical theory?
Keynes proposes the following principle:
If there is no reason (evidence) to favour
one alternative over any other, they should
each be treated as equally probable.
If there are n outcomes and no evidence in
favour of either one, the probability of each
should be 1/n.
This is also called the Principle of
Indifference.
22
This principle makes sense in many cases.
E.g.: you don’t know which side of a coin
has come up (it’s underneath someone’s
hand). You assign a probability of ½ to
each outcome, heads/tails.
However, the principle can lead to
problems.
For example: there are two alternatives:
either the car behind the garage door is red
or it is not red. Would you assign a
probability of ½ to each?
No! Since there are so many colours for
cars, the chance of one being red is < ½.
23
Bertrand’s Paradox
Here is a more difficult problem associated
with the principle of indifference.
 Imagine a circle (of radius R) with an
inscribed equilateral triangle (XYZ).
 Mark the mid-point of the circle ‘O’.
 Drop a line from vertex Y, perpendicular
to side XZ, until it hits the far side of the
circle. It crosses XZ at point W.
 It follows that XW = ZW and OWZ =
90
This is the result:
24
The picture
Randomly select a chord of the circle
 (Reminder: chord = an interior line from
one point on the circumference to
another).
What is the probability that the randomly
selected chord is longer than the side of the
inscribed triangle? I.e. what is Pr(CLSE)
25
The Challenge
Even if we accept the principle of
indifference, there seem to be three equally
valid ways of answering this question.
26
First solution
AB is the random chord.
 Let OW hit the circle at C (OC = R):
The chord AB is longer than the side if
OW<R/2, shorter if OW>R/2.
27
50-50
Since the chord is selected randomly, there
is 50% that it crosses OC on one side of its
mid-point and 50% chance that it crosses
OC on the other side of its midpoint.
So, we should remain indifferent between
the two.
Answer: Pr(CLSE) = ½ = 0.5
28
Second solution
Draw a tangent to the circle at one of the
vertices of the inscribed triangle and let  be
the angle between AB and the tangent:
29
Indifference again
There are three 60 regions in which 
might be found.
Since we selected the chord at random,
there is an equal probability it will be in
each.
Therefore: Pr(CLSE) = 1/3
30
Third solution
Inscribe a circle with radius one half that of
the main circle:
AB will be longer than the side of the
triangle if and only if its centre (W) lies
within the inner circle.
31
Indifference yet again
AB is selected at random so we have no
reason to suppose it is more likely to occur
at one point in the main circle than any
other.
So, the probability that it is within the
smaller circle is equal to the area of that
circle compared to that of the larger circle.
So, Pr(CLSE) = (R2/4)/R2 = ¼.
32
Upshot
The principle of indifference, central to the
logical theory, gives three different yet
equally persuasive answers to the original
question.
But if the principle of indifference is flawed,
so is the logical theory.
33
Possible responses
Keynes: none of the alternative possibilities
should be ‘divisible’ into anything that has
the same form as another possibility.
Consider the car example. We used the
alternatives red and not-red.
 But not red = blue, green, yellow…
 These are of the same form as red.
 So not-red is divisible and should be
ruled out.
Instead, every possibility should be a colour,
and now Pr(red) < ½, which seems right.
34
Problem
This solution can’t handle continuum cases.
Example: what is the probability that a
randomly selected point is found exactly
halfway between 0 and 1 on a number line?
Problem: there are infinitely many indivisible
alternatives (it is at 0.1, 0.01, 0.001…).
So, the principle tells us that Pr(0.5) = 1/.
What does that mean?
 Also: what exactly does indivisible
mean?
35
An objection ‘from authority’
Ramsey: there’s no such thing as this
logical relation!
36
Upshot
Other solutions to the paradoxes of
indifference have been suggested.
E.g.: the principle is a guideline or heuristic.
 However, by and large the logical theory
has been rejected (a heuristic isn’t a
logical principle).
Options:
 Those who favour belief-type probability
tend to argue for the personal theory
(Gillies calls this the subjective theory).
 Others prefer frequency approaches.
37
2. Personal Probability
Endorsed by: F. P. Ramsey, B. de Finetti, L.
J. Savage. It is the view that:
Probability claims state your personal
confidence or degree of belief in a
proposition (what you would bet on).
Objection: “subjective”, useless approach.
Response: If you want your beliefs to be
consistent then they must obey the basic
rules of probability.
 Hence, even personal probability should
be guided by the calculations, including
Bayes’ Theorem.
Objection: what if one doesn’t care about
consistency?
Response: This amounts to irrationality (?).
38
Overview of frequency-type theories
3. Limiting Frequency
The view of J. Venn, R. von Mises:
Probability statements give the relative
frequency of an outcome “in the long run”.
 I.e. Pr(E) is the limit to which the
frequency of E-occurrences converges to
at infinity.
We reject gambling systems: we consider
probability not determined outcomes.
 Random sequence = a program that can
generate the sequence is as long as the
sequence itself.
Objections:
 Ignores underlying causes of frequency.
 In reality, no such idealization exists: “in
the long run we’re all dead” (Keynes).
39
4. Propensity
Endorsed/invented by K. Popper:
 Emphasizes underlying cause of observed
frequency.
 Underlying structure of a system exists
even if no trials are ever made.
The system still has a tendency or
disposition to give a certain frequency in the
long run.
 This property is called the system’s
propensity.
 This interests us when we investigate
probabilistic systems in nature (Q. M.).
Objection: we must often make probability
judgments without knowing the underlying
structure (e.g. economics, psychology).
40
Expected Value
Coke machine:
 Only works 95% of the time.
 So, Coke charges $0.95 instead of $1.
Frequency-dogmatists:
 That’s a fair price
 Most of the time you save a nickel, only
occasionally do you lose 95 cents.
 So, the expected value is zero.
Belief-dogmatists:
 Frequency-types get it wrong.
 Who cares about the long-run frequency
of getting a coke?
 We care about getting a coke now.
 Since its 95% probable that I get a coke
now, a price of $0.95 is fair:
 Expected value = zero.
 It would be unfair to be forced to use this
machine even if it is “fair” in the long run.
41
Why so many theories?
At first, people thought a lot about dice,
lotteries, urns, etc.
 In these simple models, the differences
don’t matter: all theories give the same
results.
Only as models get more sophisticated do
differences start to appear.
Therefore, as more and more people started
to think about probability, two families of
ideas emerged from the simple models:
frequency-type and belief-type.
To this day, experts continue to fight for
their preferred interpretations.
42
Homework
 Do the exercises at the end of chapters 11
and 12.
43
Download