Bayesian Epistemology PHIL 218/338 Welcome and thank you! Outline Part I: What is Bayesian epistemology? Probabilities The as credences axioms of probability Conditionalisation Part II: Applications and problems: Theism Bear with me! Ideally we would discuss these topics over several lectures. What is Bayesian Epistemology? Bayesianism is our “leading theory of uncertainty” Alan Hájek and Stephan Hartmann It concerns credences, or degrees of belief, which are often uncertain I’m not going to be attacked by a duck tomorrow Bayesianism ≈ a theory about when our credences are rational or justified (one which may complement other theories of justification) There are many varieties of Bayesianism (Irving Good calculated that there are at least 46,656!) Bayesian epistemology is the “application of Bayesian methods to epistemological problems.” First component of Bayesianism: Probabilities as credences Credences Traditional epistemology deals primarily with qualitative concepts Belief/disbelief Knowledge/ignorance In Bayesian epistemology, these binary concepts are arguably less central and therefore receive less attention Bayesian epistemology deals largely with a quantitative concept of credences Credences ≈ degrees of belief or disbelief First component of Bayesianism: Probabilities as credences In the 17th century, mathematicians Blaise Pascal and Pierre de Fermat pioneered a representation of uncertainty as probabilities Subjective interpretation of probability: Subjective interpretation: ‘Probability is degree of belief’ But whose degree of belief? Some actual person or Some ideal person This is the subjective or personal interpretation of probability because these probabilities concern the psychological state of a subject or person Terminology Terminology h = hypothesis/proposition ~h = negation of the hypothesis P(h) Example: h = It will rain tomorrow P(h) = probability of the hypothesis = Probability that it will rain tomorrow These terms are on your handout Quantitative nature of credences Credences (or subjective probabilities) are taken to be associated with a numerical value or an interval P(h) - decimal P(h) in % P(h)=1 P(h)=100% P(h) in normal language P(~h) in normal language Quantitative nature of credences Credences (or subjective probabilities) are taken to be associated with a numerical value or an interval P(h) - decimal P(h) in % P(h) in normal language P(~h) in normal language P(h)=1 P(h)=100% h is certainly true ~h is certainly false Quantitative nature of credences Credences (or subjective probabilities) are taken to be associated with a numerical value or an interval P(h) - decimal P(h) in % P(h) in normal language P(~h) in normal language P(h)=1 P(h)=100% h is certainly true ~h is certainly false P(h)=0 P(h)=0% h is certainly false ~h is certainly true Quantitative nature of credences Credences (or subjective probabilities) are taken to be associated with a numerical value or an interval P(h) - decimal P(h) in % P(h) in normal language P(~h) in normal language P(h)=1 P(h)=100% h is certainly true ~h is certainly false P(h)=0 P(h)=0% h is certainly false ~h is certainly true P(h)=.8 P(h)=80% h is probably true ~h is probably not true Quantitative nature of credences Credences (or subjective probabilities) are taken to be associated with a numerical value or an interval P(h) - decimal P(h) in % P(h) in normal language P(~h) in normal language P(h)=1 P(h)=100% h is certainly true ~h is certainly false P(h)=0 P(h)=0% h is certainly false ~h is certainly true P(h)=.8 P(h)=80% h is probably true ~h is probably not true P(h)=.2 P(h)=20% h is probably not true ~h is probably true Measuring credences Consider your credence that h, the sun will rise tomorrow Consider your credence that you will (after random selection) draw a red marble from an urn containing 5 red marbles 5 black marbles Are you more confident that the sun will rise tomorrow? If yes, then P(h)>.5 Measuring credences Consider your credence that h, the sun will rise tomorrow Consider your credence that you will (after random selection) draw a red marble from an urn containing 90 red marbles 10 black marbles Are you more confident that the sun will rise tomorrow? If yes, then P(h)>.9 Measuring credences Consider your credence that h, the sun will rise tomorrow Consider your credence that you will (via random selection) draw a red marble from an urn containing 9,999 red marbles 1 black marble Are you more confident that the sun will rise tomorrow? If yes, then P(h)>.9999 Measuring credences What about your credence that: It will rain tomorrow You will be attacked by a duck tomorrow Maybe an interval might represent your credences better If h = It will rain tomorrow Then P(h) = [.6, .7] What do you think? Can all of our credences be represented with numerical values? Objections to the subjective interpretation The probability of h given some evidence e does not mean someone’s actual credence since there may be no actual credence that is relevant It’s not clear that the probability of h given some evidence e is the credence of some epistemically rational agent When is an agent’s credence epistemically rational? When their credence for h given e equals the (inductive) probability of h given e? When their belief is not blameworthy from an epistemic point of view? This is uniformative! (Patrick Maher) But someone might accidentally mistake the probability of h given e to be low and not be blameworthy, but still the probability of h given e might be high (Patrick Maher) Isn’t this just like saying “A proposition is true if and only if an omniscient God were to believe it?” – It’s uninformative Alternatives Inductive probabilities are conceptual primitives – they can be understood, but not expressed in terms of other simpler concepts (Patrick Maher) Probabilities are relative frequencies, which we might loosely understand as the proportion of the time that something is true (the frequentist interpretation of probability) 80% of the time when a student sits this course, it is true that they pass 60% of the time when a patient undergoes chemotherapy, it is true that they will recover Second component of Bayesianism: Credences should conform to the axioms (or rules) of probability Second component of Bayesianism: Credences should conform to the axioms (or rules) of probability (A1) All probabilities are between 1 and 0, (A2) Logical truths have a probability of 1, i.e. 0 ≤ P(h) ≤ 1 for any h. i.e. P(T)=1 for any tautology T (A3) Where h1 and h2 are two mutually exclusive hypotheses, the probability of h1 or h2 (h1 ∨ h2) is the sum of their respective probabilities, i.e. P(h1 ∨ h2) = P(h1) + P(h2). These are on your handout The axioms in action Suppose you draw a marble from an urn: r = the marble you have drawn is red ~r = the marble you have drawn is not red Suppose the urn is comprised of 3 red marbles and 7 black marbles You set 𝑃(𝑟) = .3 (30%) 𝑃(~𝑟) = .7 (70%) These assignments conform to axiom 1 By axioms 2 and 3, 𝑃(𝑟 ∨ ~𝑟) = 1 (100%) Arguments for conformity to the axioms Argument from cases Lindley draws out rules of probability from the urn example We can prove other theorems using the axioms and see that they make sense using the example E.g. 𝑃 ~𝑟 = 1 − 𝑃 𝑟 Dutch book arguments Dutch book = a combination of bets which an individual might accept individually, but which collectively entail that they will lose money A Dutch book If one violates the probability axioms, then they are vulnerable to having a Dutch book made against them E.g. suppose you violate A2 or A3 by setting 1. 𝑃 𝑟 = .7 2. 𝑃 ~𝑟 = .5 If you conform to axiom 2, then you do not conform to axiom 3 By axiom 2, 𝑃(𝑟 ∨ ~𝑟) = 1 But by the above assignments 1 and 2, 𝑃 𝑟 + 𝑃 ~𝑟 = .7 + .5 = 1.2 So, contrary to axiom 2, 𝑃 𝑟 ∨ ~𝑟 ≠ 𝑃 𝑟 + 𝑃 ~𝑟 because 1 ≠ 1.2 A Dutch book If one violates the probability axioms, then they are vulnerable to having a Dutch book made against them E.g. suppose you violate A2 or A3 by setting 1. 𝑃 𝑟 = .7 2. 𝑃 ~𝑟 = .5 But if you conform to axiom 3, then you do not conform to axiom 2 By axiom 3, 𝑃 𝑟 ∨ ~𝑟 = 𝑃 𝑟 + 𝑃 ~𝑟 So by assignments 1 and 2, 𝑃 𝑟 ∨ ~𝑟 = 1.2 = .7 + .5 = 𝑃 𝑟 + 𝑃 ~𝑟 So, contrary to axiom 2, 𝑃 𝑟 ∨ ~𝑟 ≠ 1 because 1 ≠ 1.2 A Dutch book If one violates the probability axioms, then they are vulnerable to having a Dutch book made against them E.g. suppose you violate A2 or A3 by setting 1. 𝑃 𝑟 = .7 2. 𝑃 ~𝑟 = .5 If you conform to axiom 2, then you do not conform to axiom 3 But if you conform to axiom 3, then you do not conform to axiom 2 So you cannot conform to the axioms A Dutch book Suppose you violate A2 or A3 by setting 1. 𝑃 𝑟 = .7 2. 𝑃 ~𝑟 = .5 𝑟 ~𝑟 Bet 1 for assignment 1 +$3 -$7 Bet 2 for assignment 2 -$5 +$5 If r occurs, then they win $3 according to the first bet and lose $5 according to the second, so they lose $2 If r does not occur, then they lose $7 according to the first bet and gain $5 according to the second, so they lose $2 Either way, they lose $2. Dutch book argument 1. If someone violates the probability axioms, then she is vulnerable to having a Dutch book made against her 2. One should avoid being vulnerable to having a Dutch book made against her (because this is a rational defect) 3. Therefore, one should avoid violating the axioms of probability An objection to the second component Conformity to the axioms requires logical omniscience, but no one is omniscient “You’re right, but the component only sets an ideal standard, irrespective whether any one can meet it” Questions? Do you think that one’s credences should conform to the axioms of probability? Third component of Bayesianism: Credences should be updated via conditionalisation Terminology Before examining this component, we need to introduce some terms Conditional probability =𝑃 𝑝𝑞 = the probability of p on the condition that q obtains = the probability of p given q RATIO formula as an analysis of conditional probability: 𝑃 𝑝𝑞 = 𝑃(𝑝&𝑞) 𝑃(𝑞) where 𝑃 𝑞 > 0. Example of a conditional probability m = Taylor is a mother f = Taylor is a female 𝑃 𝑚 𝑓 = the probability that Taylor is a mother given that Taylor is a female 𝑃 𝑓 = .5 𝑃 𝑚&𝑓 = .2 So: 𝑃 𝑚&𝑓 𝑃 𝑓 𝑃 𝑚𝑓 = = = .4 .2 .5 Note the big difference between 𝑃 𝑚 𝑓 and 𝑃 𝑓 𝑚 𝑃 𝑚|𝑓 = .4 𝑃 𝑓𝑚 =1 Likelihoods A likelihood = 𝑃 𝑒 ℎ where e represents some evidence and h a hypothesis. 𝑃 𝑒 ℎ is called the likelihood of h on e. Prior probabilities 𝑃𝑖 (ℎ) = Your prior probability = “your subjective probability for the hypothesis immediately before the evidence comes in” (emphasis added) Strevens Terms: e = A person, such as Taylor, smiles at you h = A person, such as Taylor, likes you ~h = A person, such as Taylor, does not like you 𝑃𝑖 (ℎ) = prior probability of a person, such as Taylor, liking you 𝑃 ℎ 𝑒 = probability of a person, such as Taylor, liking you given that s/he smiles at you What is the probability that Taylor likes you given that he or she smiled at you? 𝑃 ℎ𝑒 What is the prior probability that Taylor likes you? Suppose you surveyed 100 people and find the following: What is the probability that Taylor likes you given the evidence? P(h|e) = ? P(h|e) = 9/(9+36) = 9/45 = 1/5 = 20% = .2 Posterior probabilities What is the probability that Taylor likes you given the evidence? 𝑃𝑖 (ℎ) = Your prior probability = “your subjective probability for the hypothesis immediately before the evidence comes in” – Michael Strevens(emphasis added) 𝑃𝑓 ℎ = Your posterior probability = “your subjective probability immediately after the evidence (and nothing else) comes in” (emphasis added) Conditionalisation: One should adjust their probability for h from their prior probability 𝑃𝑖 (ℎ) to a posterior probability 𝑃𝑓 (ℎ) which equals 𝑃 ℎ 𝑒 when having acquired some evidence e (which has a non-zero initial probability). This is called conditionalising h on e. Conditionalisation should occur through Bayes’s theorem (where applicable). Conditionalisation via Bayes’s theorem Bayes’s theorem: 𝑃 𝑒 ℎ ×𝑃𝑖 (ℎ) 𝑃 ℎ𝑒 = 𝑃𝑖 (𝑒) Where 𝑃𝑖 𝑒 = 𝑃(𝑒|ℎ)×𝑃𝑖 (ℎ) + 𝑃(𝑒|~ℎ)×𝑃𝑖 (~ℎ) Application to the case: .9 × .1 .2 = .45 Where .45 = .9 × .1 + .4 × .9 Bayes’s theorem was expressed in a paper by Rev. Thomas Bayes that was published posthumously. Arguments for the conditionalization norm Case-by-case Bayes’s evidence theorem is used widely in statistics Dutch-book arguments Part II: Applications and problems Does God exist? 𝑃𝑖 ℎ = ? (where h = theism) (One version of) The principle of indifference: In the absence of evidence favouring one possibility over another, assign each possibility an equal probability The principle of indifference seems intuitively plausible in many cases E.g. all you know is that a prize is behind one of three doors Presumably the probability that it is behind a given door is 1/3 or approximately .33 Application to theism: Either ℎ or ~ℎ, so 𝑃𝑖 ℎ = .5 Sounds reasonable right? WRONG! Multiple partitions problem Suppose you’re cooking dinner for Jed, but you don’t know whether he eats meat One partition of possibilities: Either 1) Jed is a meat eater h or 2) he is not a meat eater ~ h, so 𝑃𝑖 ℎ = .5 Another partition of possibilities: 1) Jed is a meat eater h, 2) Jed is a vegetarian v1 or 3) Jed is a vegan v2, so 𝑃𝑖 ℎ = 1/3 The problem is that the space of possibilities can be partitioned differently so that it is unclear as to how or whether to apply the principle of indifference Application to theism Either ℎ or ~ℎ, so 𝑃𝑖 ℎ = .5 But what about another partition? Either: 1. There is no ultimate cause of the universe 2. Or there is an ultimate cause of the universe, but this cause is not a person (or conscious being) 3. Or there is a personal and ultimate cause of the universe, but this cause is not omnibenevolent 4. Or there is a personal, omnibenevolent and ultimate cause of the universe, but this cause is not omnipotent … Or theism is true So already 𝑃𝑖 ℎ < 1/5 according to the principle of indifference! The problem of the priors: Subjective and objective Bayesianism We can partition the logical possibilities differently so as to yield conflicting results when the principle of indifference applies So which partition do we go with? Some think that there is no uniquely correct partition So how do we determine 𝑃𝑖 ℎ ? Subjectivists: Well, just pick any value you like – no value is incorrect, except for perhaps 1 or 0 Objectivists: There is a uniquely correct value for 𝑃𝑖 ℎ , and it is… Let’s move on and assume that 𝑃𝑖 ℎ = .5, just for illustration What evidence is there that God exists? Theistic evidence: Atheistic evidence: Fine-tuning of laws and constants Human suffering A universe Animal suffering Moral truths Non-resistant, non-belief in God Miracle reports Scale of the universe Abiogenesis (Origins of life) Contradictory theistic theories Consciousness Theism is less simple (Occam’s razor) The fine-tuning argument e1 = the laws of the universe are finely tuned to permit meaningful life: According to philosopher Robin Collins, if the strength of the gravitational force were to change by one part in 1036, then any land-based or aquatic organisms the size of humans would be crushed. Likelihoods: 𝑃 𝑒1 ℎ) = .5 𝑃 𝑒1 ~ℎ) = 1/1036 Note that I will assume that ~h is equivalent to Western philosophical atheism (rather than also including polytheism, pantheism, etc.) What is the posterior probability of theism? 𝑃 ℎ 𝑒1 ) ≈ 1 The fine-tuning argument – Just kidding! e1 = the laws of the universe are finely tuned to permit meaningful life: According to philosopher Robin Collins, if the strength of the gravitational force were to change by one part in 1036, then any land-based or aquatic organisms the size of humans would be crushed. Likelihoods: 𝑃 𝑒1 ℎ) = .5 𝑃 𝑒1 ~ℎ) = .01 Note that I will assume that ~h is equivalent to Western philosophical atheism (rather than also including polytheism, pantheism, etc.) What is the posterior probability of theism? 𝑃 ℎ 𝑒1 ) ≈ .98 The multiverse objection If there were (infinitely) many universes with the values of their laws randomly generated by chance, then we wouldn’t be surprised to see that one of them happen to have life-permitting values In Bayesian terms: Perhaps it is true that a where a = there is an (infinitely) large number of other universes with values randomly generated by chance and 𝑃 𝑒 ~ℎ&𝑎 = 1 (or some relatively high figure) The argument from suffering e2 = humans suffer and this is a bad thing Genocide Oppression Missing buses Now our prior probability relative to e2 is our posterior probability relative to e1, so 𝑃𝑖 ℎ ≈ .98 What are the likelihoods? Logical argument from evil (J.L. Mackie): 𝑃 𝑒2 ℎ = 0 𝑃 𝑒2 ~ℎ = .5 So, 𝑃 ℎ 𝑒2 ) = 0 The argument from suffering e2 = humans suffer and this is a bad thing Genocide Oppression Missing buses Now our prior probability relative to e2 is our posterior probability relative to e1, so 𝑃𝑖 ℎ ≈ .98 What are the likelihoods? Evidential argument from evil (William Rowe): 𝑃 𝑒2 ℎ = .01 𝑃 𝑒2 ~ℎ = .5 So, 𝑃 ℎ 𝑒2 ) = .5 Sceptical theism “God knows a lot more than us and would have reasons to justify his actions which we do not know of” “So if God existed, there was suffering and we did not see any reason that would justify God’s permission of the suffering, then we would not be surprised” More sophisticated defences of versions of sceptical theism are given by Stephen Wykstra and Daniel Howard-Snyder The problem of the priors There is sometimes a lot of debate about the likelihoods, or at least about what the relevant likelihoods are Suppose we agree that: 𝑃 𝑒 ℎ = .9 𝑃 𝑒 ~ℎ = .1 So if we assume that 𝑃𝑖 ℎ = .5 But if we assume that 𝑃𝑖 ℎ = .1 Then 𝑃 ℎ 𝑒) = .9 Then 𝑃 ℎ 𝑒) = .5 And if we assume that 𝑃𝑖 ℎ = .00001 Then 𝑃 ℎ 𝑒) ≈ .0009 The problem of the priors The problem of the priors The posterior probability is sensitive to the value of the prior probability Subjective Bayesians often think that the subjectivity of the prior is not a major problem since the subjectivity will be “washed out” as evidence accumulates So two people starting off with different priors will converge on the probable truth given their conditioning on a growing body of evidence However, as Alan Hájek notes: “Indeed, for any range of evidence, we can find in principle an agent whose prior is so pathological that conditionalizing on that evidence will not get him or her anywhere near the truth, or the rest of us.” And there are other worries So does the problem of the priors render Bayesianism practically useless? Does it eliminate scepticism about the reliability of inductive inference? Questions? Thank you!