TWO CONCEPTS OF PROBABILITY So far we have done probability calculations without asking whether “probability” is a well-understood concept. In fact, probability has been studied systematically for only 300 years or so. Philosophers, mathematicians, logicians and statisticians disagree about the nature of probability. Two fundamental probability ideas: Belief: probability is a measure of how certain your beliefs are or should be. Frequency: probability is the relative frequency of an outcome on repeated trials of a chance set-up. Let’s examine the differences between these two. 1 (I) “This die is fair. The probability of rolling a six is 1/6.” This claim is true or false regardless of what we know. If it is true, it is because of some features of the die. It could be explained by reference to the physical properties of the die. We can test this claim by tossing the die and seeing how often six comes up. In short: this claim is about the die. It is about something in the world. When we speak this way, we are using “probability” in the frequency sense. 2 There are different ways to think of the frequency sense of “probability”: In the long run, the relative frequency of sixes will be 1/6. The die has a tendency to come up six 1/6 of the time. The die has a propensity or disposition to come up six 1/6 of the time. The die has a symmetrical geometric structure. We will label these ideas: Frequency-type probability 3 “Unicorns probably never existed.” Consider some reasons for this claim: 1. There are no fossil records of unicorns. 2. Unicorns appear in fanciful myths. 3. No one has ever observed a unicorn. On this basis one says: (II) “Considering all the evidence, it is 80% probable that unicorns never existed.” This is a statement about the proposition: Unicorns never existed. It says that it is 80% likely that this proposition is true. 4 What are the differences between (I) and (II)? 1. (II) is not true or false regardless of what we know. It is about evidence we have. 2. It if is true, it is true because of the relationship between evidence and a proposition, not the world. 3. The evidence may be true because of the way the world is (the lack of fossil records, for example) but these facts can’t explain why (II) is true. That depends on inductive logic. 4. It makes no sense to talk about repeated trials to test (II). 5. In short: (II) is about the relation between evidence and a proposition, not the way the world is. This is the belief sense of “probability”. 5 Two ways to understand the belief sense of “probability”: I. Interpersonal/Evidential: Any reasonable person who considered the evidence would agree with the statement. Probability is objective: the evidence is evidence for everybody. II. Personal degree of belief: I am personally quite confident that unicorns never existed. If I had to bet, I’d take 4 to 1 odds that unicorns never existed. We call both ideas: Belief-type probability 6 In sum The frequency sense of “probability” talks about the number of times a certain kind of event will happen in repeated trials. The belief sense of “probability” talks about the extent to which evidence supports a proposition. 7 Take a moment to look back at some of the examples in the book. Which ones are best understood as frequency-type examples? Which as belief-type? Notice that the calculations remain the same either way. The problem is one of philosophical understanding of the concept “probability”. 8 Important distinction: Don’t confuse the following: What a person says (the proposition he/she expresses). The reasons a person has for believing a proposition. Of course, you should have reasons for any claim you make. However, what you say is different from your reasons for saying it. So it is false that every statement you make is a statement about your own beliefs or reasons. 9 Example: I say: “Unicorns never existed.” I say this because: 1. I read about the fossil records. 2. I studied Greek mythology. 3. Etc. These are my reasons. But the statement I make does not refer to my beliefs or reasons. It is simply the proposition that unicorns never existed. This is a claim about the world, not me. Belief-type theories say probability is about the relation between reasons and propositions. Frequency-type theories say it is about the world. 10 Evidence It makes no sense to talk about the frequency of a single event. Hence, such claims must be belief-type. However, we often use frequencies to form belief-type probability statements about single cases. Example: you’ve tossed a certain coin hundreds of times. 80% of the tosses come up heads. (Q): What is the probability that this (single) toss will come up heads? Answer: 80%. We use the frequency as evidence to support the belief that there is an 80% probability that this toss will come up heads. 11 Frequency Principle If all you know is the frequency-type probability of a certain kind of event, then the belief-type probability that the event occurs in a single case should equal the frequency-type probability. [In most cases we are not completely ignorant and have to make a judgment.] 12 Imagine you’ve tossed the biased coin 500 times and seen 399 heads. You conclude that heads comes up about 80% of the time. But you’re not 100% confident. You think more trials are needed to be absolutely sure. So: you say you are 99% sure that the coin comes up heads about 80% of the time. Does this make sense? Yes: Your statement expresses your belief-type probability in a frequency-type probability. I.e., it expresses your confidence that you’ve identified the right frequency. 13 THEORIES OF PROBABILITY For most calculations, belief-type and probability-type theorists agree. We shall take both ideas seriously (chapters 13-15 explore belief-type, 16-19 frequencytype). But let us take a closer look at some of the theories. 14 Belief-Type Probability 1. Logical Probability Endorsed by: J. M. Keynes and R. Carnap. Probability statements express a logical relation between evidence and a proposition. On this view, probability is always relative to evidence (whether or not this is explicit). There is no categorical probability, only conditional probability, i.e. Pr(H/E). In other words, no proposition has a probability on its own. It gains probability in relation to evidence. Let’s take a moment to see how this might be set up. 15 Logic and evidence Suppose there are three objects, a, b and c. We consider whether they have some property, F. There are eight logically possible combinations. We’ll call these ‘states’: 1. 2. 3. 4. 5. 6. 7. 8. Fa & Fb & Fc ~Fa & Fb & Fc Fa & ~Fb & Fc Fa & Fb & ~Fc ~Fa & ~Fb & Fc ~Fa & Fb & ~Fc Fa & ~Fb & ~Fc ~Fa & ~Fb & ~Fc So far so good, but note that there are various ways we can organize this list. 16 Structure descriptions We can focus on broad structure as follows: Each object is F: (1) Two objects are F, one is ~F: (2, 3, 4) One object is F, two are ~F: (5, 6, 7) Nothing is F: (8) Carnap suggests we treat the situation as follows: 1. Give each structure an equal weight (1/4) 2. Partition the states equally within each structure (i.e. 1/4x1/3) This leads to the following: 17 Carnap’s partition State Structure Weight p* 1. Fa.Fb.Fc I. Each is F 1/4 2. ~Fa.Fb.Fc 3. Fa.~Fb.Fc ¼ 1/12 II. Two Fs, one ~F 1/4 1/12 4. Fa.Fb.~Fc 1/12 5. ~Fa.~Fb.Fc 1/12 6. ~Fa.Fb.~Fc III. One F, two ~Fs 1/4 7. Fa.~Fb.~Fc 8. ~Fa.~Fb.~Fc 1/12 1/12 IV. Nothing 1/4 is F 18 ¼ Logic and learning from experience Let P = Fc (i.e. c has property F) Let Q = Fa Now we can ask, what is Pr(P/Q)? Pr(P/Q) = Pr(P & Q)/Pr(Q) Pr(Q) = ¼ + 1/12 + 1/12 + 1/12 = ½ Pr(P & Q) = ¼ + 1/12 = 4/12 = 1/3 So, Pr(P/Q) = (1/3)/½ = 2/3 Since Pr(P) = ½ we have added confirmation to our belief that c is F, so this models learning from experience. Note: Even Pr(Q) is really conditional (i.e. given this particular arrangement of individuals and properties). This is an interpersonal, evidential theory of probability. 19 Some concerns Why focus on structure? Why not give each state an equal probability of 1/8? Call this partition p. We can certainly do so. In that case: Pr(P & Q) = 1/8 + 1/8 = ¼ Pr(Q) = ½ So, Pr(P/Q) = (1/4)/½ = ½. In other words, the probability doesn’t change given the new evidence, so there is no learning here. But isn’t this how it should be? There are four states that will give rise to Fa, and of these only two also have Fc. So how do we assign probabilities? 20 Different distributions The difference between p and p* corresponds to the difference between Maxwell-Boltzmann statistics and BoseEinstein statistics used in physics. Of interest: No particles obey M-B statistics but photons obey B-E statistics. We can partition states up yet differently to get Fermi-Dirac statistics, which electrons obey So, is any particular partition the right one? How can the logical theorist decide? 21 Principle of Insufficient Reason: Here is an interesting question: what if there is no relevant evidence? In that case, how do we understand the logical theory? Keynes proposes the following principle: If there is no reason (evidence) to favour one alternative over any other, they should each be treated as equally probable. If there are n outcomes and no evidence in favour of either one, the probability of each should be 1/n. This is also called the Principle of Indifference. 22 This principle makes sense in many cases. E.g.: you don’t know which side of a coin has come up (it’s underneath someone’s hand). You assign a probability of ½ to each outcome, heads/tails. However, the principle can lead to problems. For example: there are two alternatives: either the car behind the garage door is red or it is not red. Would you assign a probability of ½ to each? No! Since there are so many colours for cars, the chance of one being red is < ½. 23 Bertrand’s Paradox Here is a more difficult problem associated with the principle of indifference. Imagine a circle (of radius R) with an inscribed equilateral triangle (XYZ). Mark the mid-point of the circle ‘O’. Drop a line from vertex Y, perpendicular to side XZ, until it hits the far side of the circle. It crosses XZ at point W. It follows that XW = ZW and OWZ = 90 This is the result: 24 The picture Randomly select a chord of the circle (Reminder: chord = an interior line from one point on the circumference to another). What is the probability that the randomly selected chord is longer than the side of the inscribed triangle? I.e. what is Pr(CLSE) 25 The Challenge Even if we accept the principle of indifference, there seem to be three equally valid ways of answering this question. 26 First solution AB is the random chord. Let OW hit the circle at C (OC = R): The chord AB is longer than the side if OW<R/2, shorter if OW>R/2. 27 50-50 Since the chord is selected randomly, there is 50% that it crosses OC on one side of its mid-point and 50% chance that it crosses OC on the other side of its midpoint. So, we should remain indifferent between the two. Answer: Pr(CLSE) = ½ = 0.5 28 Second solution Draw a tangent to the circle at one of the vertices of the inscribed triangle and let be the angle between AB and the tangent: 29 Indifference again There are three 60 regions in which might be found. Since we selected the chord at random, there is an equal probability it will be in each. Therefore: Pr(CLSE) = 1/3 30 Third solution Inscribe a circle with radius one half that of the main circle: AB will be longer than the side of the triangle if and only if its centre (W) lies within the inner circle. 31 Indifference yet again AB is selected at random so we have no reason to suppose it is more likely to occur at one point in the main circle than any other. So, the probability that it is within the smaller circle is equal to the area of that circle compared to that of the larger circle. So, Pr(CLSE) = (R2/4)/R2 = ¼. 32 Upshot The principle of indifference, central to the logical theory, gives three different yet equally persuasive answers to the original question. But if the principle of indifference is flawed, so is the logical theory. 33 Possible responses Keynes: none of the alternative possibilities should be ‘divisible’ into anything that has the same form as another possibility. Consider the car example. We used the alternatives red and not-red. But not red = blue, green, yellow… These are of the same form as red. So not-red is divisible and should be ruled out. Instead, every possibility should be a colour, and now Pr(red) < ½, which seems right. 34 Problem This solution can’t handle continuum cases. Example: what is the probability that a randomly selected point is found exactly halfway between 0 and 1 on a number line? Problem: there are infinitely many indivisible alternatives (it is at 0.1, 0.01, 0.001…). So, the principle tells us that Pr(0.5) = 1/. What does that mean? Also: what exactly does indivisible mean? 35 An objection ‘from authority’ Ramsey: there’s no such thing as this logical relation! 36 Upshot Other solutions to the paradoxes of indifference have been suggested. E.g.: the principle is a guideline or heuristic. However, by and large the logical theory has been rejected (a heuristic isn’t a logical principle). Options: Those who favour belief-type probability tend to argue for the personal theory (Gillies calls this the subjective theory). Others prefer frequency approaches. 37 2. Personal Probability Endorsed by: F. P. Ramsey, B. de Finetti, L. J. Savage. It is the view that: Probability claims state your personal confidence or degree of belief in a proposition (what you would bet on). Objection: “subjective”, useless approach. Response: If you want your beliefs to be consistent then they must obey the basic rules of probability. Hence, even personal probability should be guided by the calculations, including Bayes’ Theorem. Objection: what if one doesn’t care about consistency? Response: This amounts to irrationality (?). 38 Overview of frequency-type theories 3. Limiting Frequency The view of J. Venn, R. von Mises: Probability statements give the relative frequency of an outcome “in the long run”. I.e. Pr(E) is the limit to which the frequency of E-occurrences converges to at infinity. We reject gambling systems: we consider probability not determined outcomes. Random sequence = a program that can generate the sequence is as long as the sequence itself. Objections: Ignores underlying causes of frequency. In reality, no such idealization exists: “in the long run we’re all dead” (Keynes). 39 4. Propensity Endorsed/invented by K. Popper: Emphasizes underlying cause of observed frequency. Underlying structure of a system exists even if no trials are ever made. The system still has a tendency or disposition to give a certain frequency in the long run. This property is called the system’s propensity. This interests us when we investigate probabilistic systems in nature (Q. M.). Objection: we must often make probability judgments without knowing the underlying structure (e.g. economics, psychology). 40 Expected Value Coke machine: Only works 95% of the time. So, Coke charges $0.95 instead of $1. Frequency-dogmatists: That’s a fair price Most of the time you save a nickel, only occasionally do you lose 95 cents. So, the expected value is zero. Belief-dogmatists: Frequency-types get it wrong. Who cares about the long-run frequency of getting a coke? We care about getting a coke now. Since its 95% probable that I get a coke now, a price of $0.95 is fair: Expected value = zero. It would be unfair to be forced to use this machine even if it is “fair” in the long run. 41 Why so many theories? At first, people thought a lot about dice, lotteries, urns, etc. In these simple models, the differences don’t matter: all theories give the same results. Only as models get more sophisticated do differences start to appear. Therefore, as more and more people started to think about probability, two families of ideas emerged from the simple models: frequency-type and belief-type. To this day, experts continue to fight for their preferred interpretations. 42 Homework Do the exercises at the end of chapters 11 and 12. 43