Formal methods and Bayesian epistemology: Two Hegelian Dialectics Alan Hájek [Nearly-finished draft] 0. Introduction Our brief for this workshop is to discuss: In what ways do (or should, or can) Formal Methods shed light on Traditional Epistemological Concerns? In other words, how may formal epistemology fruitfully inform and engage traditional issues and approaches, and how should traditional approaches respond to the formalists? There are three key phrases there: formal methods, traditional epistemological concerns, and formal epistemology. They will be my main points of departure in this talk. In the spirit of the workshop’s goal of fostering discussion of these themes, I will present my talk in the form of two Hegelian dialectics: 1. Formal methods: thesis and antithesis (i) I will give a brief characterization of formal methods and examine their use in philosophy – in particular, in epistemology. (ii) I will consider first the thesis that formal methods are a boon to philosophy, and especially to epistemology. (iii) I will then consider the antithesis that formal methods are a bane to philosophy, and especially to epistemology. (iv) Then, like a good Hegelian, I will offer something of a synthesis, attempting some reconciliation of these views. 2. Traditional epistemology vs Bayesian epistemology: thesis and antithesis (i) I will give a brief characterization of traditional epistemology and Bayesian epistemology, my favorite example of formal epistemology. (ii) I will consider first the thesis that Bayesian epistemology is superior to, or should supplant, traditional epistemology. (iii) I will then consider the antithesis that Bayesian epistemology is not superior to, and should not supplant traditional epistemology, and indeed that Bayesian fails to do justice to some of the deepest problems in epistemology. (iv) Then, like a good Hegelian, I will offer something of a synthesis, suggesting that both approaches are alive and well, Bayesian epistemology directs us to various promising avenues of research for epistemology, while traditional epistemology still has things to offer the Bayesian. [Time allowing, I may go on to: 3. Bayesian epistemology vs other formal approaches: thesis only In passing, I want to briefly compare Bayesian epistemology with a couple of rival formal approaches to epistemology. There, I won’t be Hegelian at all. Thesis: Bayesian is just plain superior to them. End of the dialectic! (Well, actually I will confess my ignorance of the rival approaches, and invite the audience to continue the dialectic.)] 1. Formal methods: thesis, antithesis, and synthesis 1 (i) What are formal methods? It’s easy to come up with examples of formal methods: the use of various logical systems, computational algorithms, causal graphs, information theory, probability theory and mathematics more generally. What do they have in common? They are all abstract representational systems. Sometimes the systems are studied in their own right for their intrinsic interest, but often they are regarded as structurally similar to some target subject matter of interest to us, and they are studied to gain insights about that. They often, but not invariably, have an axiomatic basis; they sometimes have associated soundness and completeness results. There is something of a spectrum of ‘formality’ here. At the high end, we have, for example, the higher reaches of set theory. At the low end we have rather informal presentations of arguments in English in ‘premise, premise … conclusion’ form. Higher up we find more formal representations of these arguments, whittled down to schematic letters, quantifiers, connectives, and operator symbols. Near the top we find Euclid’s Elements; lower down, Spinoza’s Ethics. 1(ii) Thesis: Formal methods are a boon to philosophy, and especially to epistemology Formal methods often force us to be explicit about our assumptions, keeping us on high alert when questionable assumptions might otherwise be smuggled in. Formal systems often provide a safeguard against error: by meticulously following a set of rules prescribed by a given system, we minimize the risk of making illicit inferences. And reducing a proof to symbol manipulation, when that is possible, can often make it easier. It is striking how one can start with a rather imprecise philosophical problem stated in English, precisify it, translate it into a formal system, use the inference rules of the system to prove some results about it, then translate back out to a conclusion stated in English. It is easy to admire the rigor, the sharpening of questions and their resolution, and to enjoy the feeling that one is really getting results. Formal methods also provide something of a lingua franca. I was recently in China, and when English failed my interlocutors (and trust me, my Mandarin didn’t even get off the starting line), we could still communicate in the language of logic and mathematics—well, somewhat. I am also impressed by how formal systems can stimulate creativity. Staring at the theorems of a particular system can make one aware of hitherto undiscovered possibilities, or of hitherto unrecognized constraints. And it can enable one to discern common structures across different subject matters. I’ll give a couple of examples from my work—not so much in a shameless act of self-promotion (although that may also be true), but because I am particularly authoritative about the creative processes involved, such as they were; it’s not for me to speculate as to what may have inspired someone else’s creative processes. A former student of mine, Harris Nover and I offered a new paradox for expected utility theory in our (2004). Expected utilities are sums—sums of products of the form ‘utility times probability’. The St. Petersburg game exploits one well-known kind of anomalous sum: a divergent series. But we know from real analysis that another kind of anomalous infinite sum is one that is conditionally convergent—if we leave it alone, it converges, but if we replace all of its terms by their absolute value, the resulting series diverges. Riemann’s rearrangement theorem tells us that every conditionally convergent series can be reordered so as to sum to any real number; and it can be reordered so as to diverge to infinity and to negative infinity; and it can be reordered so as to simply diverge. Now let this piece of mathematics guide the creation of a new game, whose expectation series has exactly this property—the formal model thus inspires a new kind of anomaly for rational decision-making. Harris and I proposed a St. Petersburglike game—the Pasadena game—in which the pay-offs alternate between rewards and punishments, in such a way that the resulting expectation is conditionally convergent. Decision theory apparently tells us that the desirability of the game is undefined, thus falling silent as to its desirability. Worse, the theory falls silent about the desirability of everything, as long as you give any credence whatsoever to your playing the Pasadena game—for any mixture of undefined and any other quantity is itself undefined. In that case, for example, you can’t rationally choose between pizza and Chinese for dinner, since both have undefined expectation (each being ‘poisoned’ by a positive probability, however tiny, of a subsequent Pasadena game). Thus, you are paralyzed—a sin against practical rationality. Yet assigning probability 0, as opposed to extremely tiny positive probability, to the Pasadena game seems excessively dogmatic—a sin against theoretical rationality. The upshot was that the use of the formal mathematical model of decision theory facilitated the invention of a new kind of problem for decision theory. At the risk of more self-promotion (again, because it’s an example I know particularly well): my early philosophical work was on probabilities of conditionals. I liked Adams’s and (independently) Stalnaker’s idea of looking to probability theory, and in particular its familiar notion of conditional probability, for inspiration for understanding conditionals. In short, they looked to a formal structure for illumination on a philosophical problem. They both advanced versions of the thesis that probabilities of conditionals are conditional probabilities. Then along came Lewis’s triviality results, which began an industry of showing that various precisifications of the thesis entailed triviality of the probability functions. I liked this industry a lot, and I joined in, proving some further triviality results. One of them owed its existence to the easy visualization that can be given of probabilities in terms of a ‘muddy Venn diagram’ (in van Fraassen’s coinage). [EXPLAIN PERTURBATIONS. NOTICE THAT IF THE CONDITIONAL IS INDEXICAL, THE PROOF IS BLOCKED.] Without this heuristic I could not have ‘seen’—in both senses of the word—the trouble that probability dynamics could cause for Adams’/Stalnaker’s theses. Then along came the so-called desire-as-belief thesis. David Lewis canvassed a certain anti-Humean proposal for how desire-like states are reducible to belief-like states: roughly, the desirability of X is the probability that X is good. He proved certain triviality results that seemed to refute the proposal. This was another lovely example of how formal methods could serve philosophical ends: this time, a thesis that was born in an informal debate in moral psychology could apparently be expressed decision-theoretically. The decision-theoretic machinery could then be deployed to deliver a formal verdict, which could then be translated back to bear on the informal debate. I noticed that the probabilities-of-conditionals-are-conditional-probabilities thesis of Adams and Stalnaker looked suspiciously like the desire-as-belief thesis, and that Lewis’s triviality results against the former looked suspiciously like his triviality results against the latter. This gave me the idea that the subsequent moves and countermoves that were made in the probabilities-of-conditionals debate could be mimicked in the desire-as-belief debate. ‘Perturbations’ similarly cause trouble for the desire-as-belief thesis. And I showed that, much as Lewis’s original triviality results and the perturbations results could be blocked by making the conditional indexical in a certain sense, his later triviality results could be blocked by making ‘good’ indexical in the same sense. Philip Pettit and I (2003) then translated back out of the formalism, suggesting meta-ethical theories that accorded ‘goodness’ the necessary indexicality. So the trick was to notice a similarity between the formal structures that underpinned the probabilities-of-conditionals and the desire-as-belief debates, something that I could not have noticed about the original debates themselves. After that, it was easy to see how the next stages of the desire-as-belief debate should play out, paralleling the way they did in the probabilities-of-conditionals debate. Two seemingly disparate debates turned out to be closely related. 1(iii) Antithesis: formal methods are a bane to philosophy, and especially to epistemology In my paean to formal methods I have said a bit about how their use may fertilize the imagination. There is also a risk of the opposite phenomenon: that they may encourage one to think inside a box, constraining one’s imagination. I have mentioned how formal methods often force one to be clear on the assumptions that one is making; but a risk is that the set of assumptions that underlie a given model may become presuppositions that are not questioned, and that perhaps should be. There is a danger of reading too much off a given formalism. It may resemble some target in certain important respects, but it must differ from the target in other important respects. (Compare how a map of a city differs from the city itself in all sorts of ways—it had better do so in order to be of any use!) One has to be vigilant in keeping track of which assumptions are specific to a formal model, and which carry over to the target. A solution to a problem in a formal model may not be a solution to its real-life counterpart; and one should be careful when reifying a problem in a formal model, reading it back into the world. For example, I think that some Bayesians overstate the lessons of the famous convergence theorems. Here is a real-world question: how can we explain the huge amount of agreement we find among different humans, and in particular, among scientists? A common Bayesian answer: the convergence theorems show us that in the long run such agreement is guaranteed. For example: If observations are precise… then the form and properties of the prior distribution have negligible influence on the posterior distribution. From a practical point of view, then, the untrammeled subjectivity of opinion… ceases to apply as soon as much data becomes available. More generally, two people with widely divergent prior opinions but reasonably open minds will be forced into arbitrarily close agreement about future observations by a sufficient amount of data. (Edwards, et al., p. 201) Call this convergence to intersubjective agreement; such agreement, moreover, is often thought to be the mark of objectivity (see Nozick 19xx). The “forcing” here is a result of conditionalizing the people’s priors on the data. Gaifman and Snir (19xx) and Jim Hawthorne (200x) similarly show that for each suitably open-minded agent, there is a data set sufficiently rich to force her arbitrarily close to assigning probability 1 to the true member of a partition of hypotheses. Call this convergence to the truth. These are beautiful theorems, but one should not overstate their epistemological significance. They are ‘glass half-full’ theorems, but a simple alternation of the quantifiers turns them into ‘glass half-empty’ theorems. For each data set, there is a suitably open-minded agent whose prior is sufficiently perverse to thwart such convergence: after conditionalizing her prior on the data set, she is still nowhere near assigning probability 1 to the true hypothesis, and still nowhere near agreement with other people. And strong assumptions underlie the innocent-sounding phrases “suitably open-minded agent” and “sufficiently rich data set”. No data set, however rich, will drive a dogmatic agent anywhere at all. Worse, an agent with a wacky enough prior will be driven away from the truth. Consider someone who starts by giving low probability to being a brain in a vat, but whose prior regards all the evidence that she actually gets as confirming that she is. And we can always come up with rival hypotheses that no courses of evidence can discriminate between—think of the irresolvable conflict between an atheist and a creationist who sees God’s handiwork in everything. Related, there may be a residue of an original problem that remains even when it has been solved in a model. For example, there has been wonderful work on analyzing the preface paradox from a Bayesian point of view – Jim Hawthorne’s paper with Luc Bovens is an example. But the preface paradox was originally a paradox about belief. I have myself appealed in print to the “Lockean thesis” (endorsed by Jim and Luc) that belief is probability above a threshold (perhaps contextually determined). But I also admit to having some qualms about that thesis. Do you really believe that your lottery ticket will lose, as the Lockean thesis would apparently have it (for a big enough lottery)—after all, you don’t tear it up! Rather, you believe that it will very probably lose, but that is a different epistemic state. The Lockean thesis also has trouble explaining ‘cross-over’ cases. It seems that you do believe that your ticket lost when you read in the newspaper that the winning ticket was some other one. Yet you do not give probability 1 to the truth of the newspaper report, and indeed there exists some other lottery whose probability of a given ticket winning is higher, but which you do not believe will win: so we have a belief with lower probability than a non-belief. Finally, the Lockean thesis seems to make it too easy to rationally hold inconsistent beliefs—indeed, it seems to be forced in lottery cases, if we set the threshold for belief below 1. On the other hand, if we set it at 1, then belief seems all too hard to come by. Either way, something has gone wrong— arguably, something in the formal model of belief, rather than in belief itself. Be that as it may, my point is that while there is clearly no preface “paradox” for degrees of belief—and this is an important insight—still one might think that the preface paradox for binary belief remains. Similarly, Ned Hall gives a trenchant Bayesian analysis of the ‘surprise exam paradox’. But again, one might think that the paradox for binary belief remains. If you don’t like these examples, I’m sure you’ll find others that you do. My point ought to be a platitude: for the very reason that a model must differ from a target system in order to be a model of that system, one has to be careful that a result reached in the model can be faithfully applied to the system. The ‘antithesist’ in me here is now drawing attention to cases in which one may read too much into the model. I am also reminded of a salutary caution that Paul Benacerraf has given in a few places: beware of philosophers claiming to derive philosophical conclusions from formal results—in particular, mathematical facts. “The point is that you need some further premises - and in [a case that he considers], clearly philosophical ones… I am making a sort of Duhemian point about philosophical implications. You need some to get some.” Here are some examples of violations of Benacerraf’s cautionary words: - Penrose and Lucas: Godel's theorem shows that minds are not machines. - Putnam: the Lowenheim-Skolem theorem shows that realism is false. - Various people: the Dutch book theorem shows that rational credences must be probabilities. - Forster and Sober: “A literal reading of Akaike's Theorem is that we should use the best fitting curve from the family with the highest estimated predictive value.' (1994, p. 18, my emphases). - “Arrow’s impossibility theorem shows that democracy is dead”, as I’ve heard a political scientist say. This segues nicely into another concern about the use of formal methods, which I call formalism fetishism. We see it all too often: someone representing some philosophical problem with triple integrals, or tensors, or topologies, or topoi, just because they can. Why say it in English when you can say it with a self-adjoint operator instead?! And you can’t help suspecting that in some cases there’s at least an unconscious desire to impress or to intimidate one’s audience. As James Bond’s girlfriend (in Goldeneye) would say: “Boys with toys!” If nothing else, there is a risk that you will lose, or turn off your audience. (On the other hand, if they’re formal fetishists, you may instead turn them on!) 1(iv) Synthesis [I plan to add to this section.] As in all the things, the answer, of course, is to exercise good judgment when using formal methods. I find Lewis’s use of formal methods to be exemplary. I am especially drawn to his work on counterfactuals, on causation, and of course on probability and decision theory. He used such methods sparingly and judiciously, always to illuminate and to make insights easier to come by and to understand. His work serves as a model to me. The points I made in support of the ‘thesis’ live in perfect harmony with those in support of the ‘antithesis’. There’s no denying the heuristic and clarificatory value of formal methods. But like the proverbial ladder that one kicks away, once they have served their purpose, I want to see their translation back into plain English. And where some formalism is used to aid with the presentation of a philosophical argument, let us not mistake it for the argument itself. 2. Traditional epistemology vs Bayesian epistemology: thesis, antithesis, and synthesis 2(i) What are traditional epistemology and Bayesian epistemology? Bayesianism is our leading theory of uncertainty. Epistemology is defined as the theory of knowledge. So “Bayesian Epistemology” may sound like an oxymoron. Bayesianism, after all, studies the properties and dynamics of degrees of belief, understood to be probabilities. Traditional epistemology, on the other hand, places the singularly non-probabilistic notion of knowledge at center stage, and to the extent that it traffics in belief, that notion does not come in degrees. So how can there be a Bayesianism epistemology? According to one view, there cannot: Bayesianism fails to do justice to essential aspects of knowledge and belief, and as such it cannot provide a genuine epistemology at all. According to another view, Bayesianism should supersede traditional epistemology: where the latter has been mired in endless debates over skepticism and Gettierology, Bayesianism offers the epistemologist a thriving research program. I will advocate a more moderate view: Bayesianism can illuminate various long-standing problems of epistemology, while not addressing all of them; and while Bayesianism opens up fascinating new areas of research, it by no means closes down the staple preoccupations of traditional epistemology. The contrast between the two epistemologies can be traced back to the mid-17th century. Descartes regarded belief as an all-or-nothing matter, and he sought justifications for his claims to knowledge in the face of powerful skeptical arguments. No more than four years after his death, Pascal and Fermat inaugurated the probabilistic revolution, writ large in the Port-Royale Logic, in which the many shades of uncertainty are represented with probabilities, and rational decision-making is a matter of maximizing expected utilities (as we now call them). Correspondingly, the Cartesian concern for knowledge fades into the background, and an alternative, more nuanced representation of epistemic states has the limelight. Theistic belief provides a vivid example of the contrasting orientations. Descartes sought certainty in the existence of God grounded in apodeictic demonstrations. Pascal, by contrast, explicitly shunned such alleged ‘proofs’, arguing instead that our situation with respect to God is like a gamble, and that belief in God is the best bet—thus turning the question of theistic belief into a decision problem which he, unlike Descartes, had the tools to solve. Bayesian epistemology owes to the eponymous Reverend Thomas Bayes, who wrote a century later, an important theorem that underwrites certain calculations of conditional probability central to confirmation theory. But really ‘Bayesian epistemology’ is something of a misnomer; ‘Kolmogorovian epistemology’ would be far more appropriate. Bayes’ theorem is just one of many theorems of the probability calculus. It provides just one way to calculate a conditional probability, when various others are available, all ultimately deriving from the usual ratio formula, and often conditional probabilities can be ascertained directly, without any calculation at all. When I speak of ‘traditional epistemology’, I lump together a plethora of positions as if they form a monolithic whole. For my purposes, their common starting point is regarding the central concepts of epistemology to be knowledge and belief; they then go on to study the properties, grounds, limits, and so on of these binary notions. I also speak of ‘Bayesianism’ as if it is a unified school of thought, when in fact there are numerous intra-mural disputes. I. J. Good jokes that there are “46,656 ways to be a Bayesian”, while I will mostly pretend that there is just one. By and large, the various distinctions among Bayesians will not matter for my purposes. As a good (indeed, a Good) Bayesian might say, my conclusions will be robust under various precisifications of the position: Many traditional problems can be framed, and progress can be made on them, using the tools of probability theory. But Bayesian epistemology does not merely recreate traditional epistemology; thanks to its considerable expressive power, it also opens up new lines of enquiry. I can now bring out several points of contrast between traditional and Bayesian epistemology. I have noted that ‘knowledge’ and ‘belief’ are binary notions, to be contrasted with the potentially infinitely many degrees of ‘credence’ (corresponding to all the real numbers in the [0, 1] interval). ‘Knowledge’ is famously not merely ‘justified true belief’, but many epistemologists hope that some elusive ‘fourth condition’ will complete the analysis—some kind of condition that rules out cases in which one has a justified true belief by luck, or for some anomalous reason. Notice that three of these four conditions may be characterized as objective, with ‘belief’ providing the only subjective component. This is in sharp contrast to orthodox Bayesianism, which refines and analyzes this doxastic notion, but which has no clear analogue of the ‘objective’ conditions. Most importantly, Bayesianism apparently has nothing that corresponds to the factivity of knowledge, that one can only know truths—the convergence theorems hardly suffice. And even when our beliefs fall short of knowledge, still it is a desideratum that they be true; but the Bayesian seems to have no corresponding desideratum for intermediate credences, which are its stockin-trade. When you assign, for example, probability 0.3 to it raining tomorrow, what sense can the Bayesian make of this assignment being true? (We will return to this point at the end.) It is also dubious whether Bayesianism can capture ‘justification’ or any ‘fourth condition’ on knowledge. The Bayesian might try to parlay the convergence theorems into providing surrogates for justification or for the elusive ‘fourth condition’ for knowledge, insisting that such convergences are based on evidence, and they do not happen by luck, or for some anomalous reason, but are probabilistically guaranteed. But again, I really don’t think they do the job. I currently have a justified, non-accidentally true belief that I am typing these words in Miami. How could a convergence theorem shed light on that? Given the striking differences between traditional and Bayesian epistemology, are there reasons to prefer one to the other? 2(ii) Thesis: Bayesian epistemology is superior to traditional epistemology Jeffrey (1992), a famous Bayesian, writes: knowledge is sure, and there seems to be little we can be sure of outside logic and mathematics and truths related immediately to experience. It is as if there were some propositions – that this paper is white, that two and two are four – on which we have a firm grip, while the rest, including most of the theses of science, are slippery or insubstantial or somehow inaccessible to us … The obvious move is to deny that the notion of knowledge has the importance generally attributed to it, and to try to make the concept of belief do the work that philosophers have generally assigned the grander concept. I shall argue that this is the right move. It becomes immediately clear that Jeffrey has in mind here degrees of belief, understood as subjective probabilities. He goes on to suggest two main benefits accrued by the Bayesian framework: 1. Subjective probabilities figure in decision theory, an account of how our opinions and our desires conspire to dictate what we should do. The desirability of each of our possible actions is measured by its expected utility, a probability-weighted sum of the utilities association with that action. To complete the argument, we should add that traditional epistemology offers no decision theory (recall Descartes vs Pascal). Rational action surely cannot be analyzed purely in terms of the binary terms of knowledge and belief. 2. Observations rarely deliver certainties – rather, their effect is typically to raise our probabilities for certain propositions (and to drop our probabilities for others), without any reaching the extremes of 1 or 0. Traditional epistemology apparently has no way of accommodating such less-than-conclusive experiential inputs, whereas Jeffrey conditionalization is tailor-made to do so. We may continue the list of putative advantages of Bayesianism over traditional epistemology at some length. Here I add my top ten: 3. Implicit in the quote from Jeffrey is the thought that knowledge is unforgiving. Its standards are so high that they can rarely be met, at least in certain contexts. (This is related to the fact that knowledge does not come in degrees – near-knowledge is not knowledge at all.) This in turn plays into the hands of skeptics. But it is harder for skeptical arguments to get a toehold against the Bayesian. For example, the mere possibility of error regarding some proposition X undermines a claim of knowledge regarding X, but it is innocuous from a probabilistic point of view: an agent can simply assign X some suitable probability less than 1. Indeed, even an assignment of probability 1 is consistent with the possibility of error—a dart thrown at random at a representation of the [0, 1] interval has probability 1 of hitting an irrational number, even though it might fail to do so. 4. Moreover, it is a commonplace that doxastic states come in degrees, and the categories of ‘belief’ and ‘knowledge’ are too coarse-grained to do justice to this fact. You believe, among other things, that 2 + 2 = 4, that you have a hand, that London is in England, and (say) that Khartoum is in Sudan. But you do not have the same confidence in all these propositions, as we can easily reveal in your betting behavior and other decision-making that you might engage in. The impoverished nature of ‘belief’ attributions is only exacerbated when we consider the wide range of propositions for which you have less confidence – that this coin will land heads, that it will rain tomorrow in Novosibirisk, and so on. We may conflate your attitudes to them all as ‘suspensions of belief’ (as Descartes would), but that belies their underlying structure. Such attitudes are better understood as subjective probabilities. 5. Related, the conceptual apparatus of deductivism is impoverished, and comparatively little of our reasoning can be captured by it, either in science or in daily life (pace Popper and Hempel). After all, whether we like it or not, our epistemic practices constantly betray our commitment to relations of support that fall short of entailment. We think that it would be irrational to deny that the sun will rise tomorrow, to project ‘grue’ rather than ‘green’ in our inductions, and to commit the gambler’s fallacy. It seems that probability theory is required to understand such relations. 6. Bayesianism has powerful mathematical underpinnings. It can help itself to a century of work in probability theory and statistics. Traditional epistemology may appeal to the occasional system of epistemic or doxastic logic, but nothing comparable to the formidable formal machinery that we find in the Bayesian’s tool kit. 7. Accordingly, probabilistic methods have much wider application than any formal systematization of ‘knowledge’ or ‘belief’. Look at the sciences, social sciences, and engineering if you need any convincing of this. 8. Bayesianism has provided yeoman service in the rational reconstruction of science. I’m thinking here of Dorling’s and Franklin’s Bayesian reconstructions of some key scientific episodes – and obviously scientific knowledge is an important part of epistemology. Traditional epistemology does not seem to do so well here. Imagine trying to illuminate some scientific episode purely with ‘K’s and ‘B’s! 9. Bayesianism has a natural way of integrating explicitly probabilistic theories into epistemic states, via the Principal Principle. Thus, a chance statement from quantum mechanics, say, can be seamlessly transformed into a corresponding subjective probability statement. Likewise, if an agent spreads credences over a number of such theories, she can arrive at her credence for a particular statement by an exercise of the law of total probability. How can traditional epistemology capture this? 10. Bayesianism has a symbiotic relationship with causation – witness the fruitfulness of research on Bayesian networks. Traditional epistemologists should pay heed. 11. There are many arguments for Bayesianism, which collectively provide a kind of triangulation to it. For example, Dutch Book arguments provide an important defence of the thesis that rational credences are probabilities. An agent’s credences are identified with her betting prices; it is then shown that she is susceptible to sure losses iff these prices do not conform to Kolmogorov’s axioms. There are also arguments from various decision-theoretic representation theorems (Ramsey 19xx, Savage 19xx, Joyce 19xx), from calibration (van Fraassen 19xx, Shimony 19xx), from ‘gradational accuracy’ or minimization of discrepancy from truth (Joyce 19xx), from qualitative constraints on reasonable opinion (Cox 19xx), and so on. Moreover, there are various arguments in support of conditionalization and Jeffrey conditionalization—e.g., Dutch book arguments (Lewis 200x, Armendt, 1980) and arguments from minimal revision of one’s credences (Diaconis and Zabell). Again, there is nothing comparable in traditional epistemology. 12. Finally, a pragmatic argument for Bayesianism comes from an evaluation of its fruits. It illuminates various old epistemological chestnuts—in particular, various paradoxes in confirmation theory. The Bayesian begins with the idea that confirmation is a matter of probability-raising: (*) E confirms H (relative to probability function P) iff P(H | E) > P(H). We may also define various probabilistic notions of comparative confirmation, and various measures of evidential support (see Fitelson xx). The Bayesian then shows how important intuitions about confirmation can be vindicated—for example, that black ravens confirm ‘all ravens are black’ more than white shoes do, or that green emeralds confirm ‘all emeralds are green’ more than they confirm ‘all emeralds are grue’. Again, it seems that no analysis couched purely in terms of ‘knowledge’ and ‘belief’ could pay such dividends. But the traditional epistemologist has plenty of ammunition with which to fight back. 2(iii) Antithesis: Bayesian epistemology is not superior to traditional epistemology 1. Bayesians introduce a new technical term, ‘degree of belief’, but they struggle to explicate it. To be sure, the literature is full of nods to betting interpretations, but these meet a fate similar to that of behaviorism—indeed, a particularly localized behaviorism that focuses solely on the rather peculiar kind of behavior that is mostly found at racetracks and casinos. Other characterizations of ‘degree of belief’ that fall out of decision-theoretic representation theorems are also problematic. (See Eriksson and Hájek 2007.) ‘Belief’ and ‘knowledge’, by contrast, are so familiar to the folk that they need no explication. 2. Recall the absence of any notion of truth of an intermediate degree of belief. Yet truth is said to be the very aim of belief. It is usually thought to consist in correspondence to the way things are. Moreover, we want our methods for acquiring beliefs to be reliable, in the sense of being truth-conducive. What is the analogous aim, notion of correspondence, and notion of reliability for the Bayesian? The terms of her epistemology seem to lack the success-grammar of these italicized words. For example, one can assign very high probability to the period at the end of this sentence being the creator of the universe without incurring any Bayesian sanction: one can do so while assigning correspondingly low probability to the period not being the creator, and while dutifully conditionalizing on all the evidence that comes in. Traditional epistemology is not so tolerant, and rightly not. 3. Related, the Bayesian does not answer the skeptic, but merely ignores him. Bayesianism doesn’t make skeptical positions go away; it merely makes them harder to state. 4. The Bayesian similarly lacks a notion of ‘justification’—or to the extent that she has one, it is too permissive. Any prior is a suitable starting point for a Bayesian odyssey—yet mere conformity to the probability calculus is scant justification. Now, the Bayesian will be quick to answer this and the previous objections in a single stroke. She will appeal to various convergence theorems. But see my discussion above of the limitations of such theorems. And I don’t see how they help one iota in addressing simple skeptical challenges, such as how do I know, right now, that I have a hand? 5. Bayesian epistemology conflates genuine epistemic possibilities with impossibilities, and genuine epistemic necessities with contingencies. Probabilities have what I call ‘blurry vision’: notoriously, probability zero events can happen, and probability one events can fail to happen. 6. The triumphs of Bayesian confirmation theory touted above are supposedly offset by the so-called problem of old evidence (Glymour 1980). Note that if P(E) = 1, then E cannot confirm anything by the lights of (*): in that case, P(H | E) = P(H E)/P(E) = P(H). Yet we often think that such ‘old evidence’ can be confirmatory. Consider the evidence of the advance of the perihelion of Mercury, which was known to Einstein at the time that he formulated general relativity theory, and thus (we may assume) was assigned probability 1 by him. Nonetheless, he rightly regarded this evidence as strongly confirmatory of general relativity theory. The challenge for Bayesians is to account for this. (See Branden’s 200x for discussion. Jim Joyce and Jim Hawthorne also have much to say about the problem.) 7. Still on Bayesian confirmation theory: when I gave above the usual Bayesian analysis (*) of confirmation, I followed the common practice of downplaying the relativity to the probability function ‘P’ by secreting it away in parentheses, as if it’s a minor caveat or afterthought. But really ‘P’ should be a completely equal partner in a 3-place relation of the form <H, E, P>—quite unlike ‘confirmation’ in ordinary English, which is a 2-place relation. A more accurate but less rhetorically effective name for the relation between E and H would be “P-enhancement”1. But said that way, one wants to revisit the putative Bayesian solutions to the ravens paradox, the grue paradox, and so on. For example, it seems less satisfying to be told that the evidence of a black raven P-enhances ‘all ravens are black’ more than the evidence of a white shoe does for some suitable P—especially since there are other probability functions for which this inequality is reversed. And more precisely, confirmation relations are relativised to a probability space, <, F, P>. But ‘E confirms H relative to <, F, P>’ hardly glides off the tongue, and sounds even less like the 2-place relation that we were trying to analyze. 1 Dave Chalmers suggested this name. 8. More generally, all Bayesian claims are model-relative. For example, claims of independence, of confirmation, and of exchangeability are relative to a probability function, and again that really means a probability space, so that they are at least 5place relations. Knowledge is not like that. We don’t say: ‘I know that I have a hand relative to such-and-such model’—there is no mediation via a model. The traditional epistemologist protests that Bayesians thus distance themselves from the world: rather than hooking up directly with the world, they are caught up in models of the world. “Cut out the middle man!”, says the traditional epistemologist—there is too much distance between the epistemic agent and the world. Moreover, the Bayesian does not have a good theory of what makes a model good. This is related to the concern that the Bayesian does not do justice to truth. It also comes down to the notorious problem of the prior for the Bayesian. Traditional epistemology has no analogue of that problem. 9. Bayesians have trouble avoiding the terms of traditional epistemology. They slip into plausible informal glosses using familiar words like ‘learning’, ‘knowledge’, ‘belief’, ‘evidence’, ‘forgetting’, etc.. One also sees this in the classic arguments for probabilism – e.g. we are to assume that ‘the Dutch Bookie doesn’t KNOW anything that the agent doesn’t KNOW’. But the official Bayesian story is about ‘ideally rationally held certainties’, and our intuitions about that rather obscure, rarified notion are less secure. The Bayesian changes the topic from things that we have a good grip on (‘knowledge’, etc.): first, to ‘probabilistic certainty’ (which is not even genuine certainty, as we have seen); second, it’s RATIONALITY, which is a term of art, not all that familiar on the street; third it’s IDEALLY rationally held certainty, and who knows what that is? By the time we’re done, we’re a long way from ‘learning’, ‘knowledge’, etc.—the core epistemological concepts that we understood well. Don’t get fooled, then, by the informal gloss when judging some Bayesian claim (“the agent comes to learn X …”). Keep in mind the strict interpretation of rational credence, and so on. Even Jeffrey, the author of “Probable Knowledge”, backslides and uses the terms of traditional epistemology, having claimed to eschew them. For example, he speaks of “learning”. But “learning” has a success grammar; merely pumping up or pumping down subjective probabilities across a partition, à la Jeffrey conditioning, does not guarantee any such success. Similarly, Bayesians sometimes acknowledge a source of contextualism by conditionalizing on background knowledge ‘K’. But this arguably mixes the traditional and Bayesian epistemologies in an awkward way – what is the Bayesian treatment of that ‘K’? If it is merely ‘rationally held probabilistic certainty’, will that meet all the epistemologist’s needs? Synthesis [Again, time allowing, I expect to add more.] There should be more cross-fertilization between traditional and Bayesian epistemology. It’s not as if one must swear allegiance to one of the epistemologies and shun the other. Indeed, a single philosophical paper could fruitfully traffic in both. But often it does seem that research programs on each side run in comparative isolation and ignorance of each other. Let me draw attention to some issues that play a comparatively minor role in traditional epistemology, but which are central in the Bayesian framework – more power to that framework for drawing our attention to them. Traditional epistemologists would do well to find ways of embracing them. (If that requires them to become Bayesians, more power to Bayesianism!) But as we will see, the traditional epistemologist has some lessons for the Bayesian as well. 1. Bayesians offer a new notion of consistency: probabilistic coherence. We could argue about the extent to which it merely generalizes deductive logic’s notion of consistency, or rather offers a genuinely new notion, but either way it is important. See Jim Hawthorne on related issues: while in deductive logic, consistency goes hand-in-hand with truth-preservation, the two notions cleave apart in inductive logic. 2. Some Bayesians codify a kind of diachronic consistency, most famously in van Fraassen’s Reflection Principle. He analogizes violations of Reflection to Moore’s paradoxical sentences, something the traditional epistemologist has long cared about. I am not aware of any analogue of Reflection in traditional epistemology, which is less interested in diachronic principles as far as I can tell. 3. Bayesianism gives us a new notion of independence that does not seem to reduce to other such notions—logical, causal, counterfactual, metaphysical, or what have you. Yet we do have firm intuitions about some cases of probabilistic independence – think of the way we are supposed to recoil at the gambler’s fallacy. There are obvious ramifications for induction here. Now there’s a traditional epistemological concern if ever there was one! 4. Exchangeability has proved to be a fecund notion in the Bayesian’s hands. In particular, it has earned its keep in formulating and illuminating problems in induction—more grist for the traditional epistemologist’s mill. Now, going in the other direction: how can traditional epistemology inform Bayesian? One way is in the very statement of Bayesianism. We all know that it portrays rational credences as obeying the probability calculus—but what is that exactly? We all know that one of its axioms is a normalization axiom—but what is that exactly? On one formulation, it says that all logical truths should receive probability 1. But why stop there? How about a priori truths more generally? So perhaps the normalization axiom could require all a priori truths to be assigned probability 1 by an agent. Here we enter traditional epistemology’s concern with knowledge of the a priori. I think the most important way in which traditional epistemology should inform Bayesian is epistemology is by emphasizing the importance of truth of epistemic states. I see three ways the Bayesian might try to respect the desideratum of alignment with truth: - A la Joyce: gradational accuracy - A la Carnap: agreement with logical probability - A la me: agreement with objective chance. (Elsewhere I argue in favor of this.) I’d be interested in our discussion to hear your thoughts of how traditional epistemology in turn can further inform and guide Bayesian epistemology. May the dialectic continue!2 2 I thank Dave Chalmers for helpful discussion at some idyllic Caribbean locales, and Lina Eriksson for helpful discussion in the rather less exotic Canberra.