Truth-conduciveness Without Reliability: A Skeptical Derivation of Ockham’s Razor Kevin T. Kelly Department of Philosophy Carnegie Mellon University www.cmu.edu Naivete Lo! An apple. Skeptical Hypothesis Lo! An apple. Maybe you are a brain in a vat. Everything would look the same. Skeptical Hypothesis poof Maybe you are a brain in a vat. Everything would look the same. Retrenchment That’s not a serious possibility You have the burden of proof. It’s remote. It’s implausible. It’s distant from the actual world. You’re not in my community. Who cares about the worst case? Retrenchment That’s not a serious possibility You have the burden of proof. It’s remote. It’s implausible. It’s distant from the actual world. You’re not in my community. Who cares about the worst case? Unsatisfying Possibilities delimited a priori: circular account. Possibilities delimited a posteriori: how do we seek knowledge? So there! Zen Approach Don’t rush to defeat the demon. Grrrr! Zen Approach Don’t rush to defeat the demon. Get to know him extremely well. Justification may be located in the demon’s power rather than in his weakness. The Zen of Computation Algorithms are justified by efficiency. Efficiency means you couldn’t do better. You couldn’t do better due to a demonic argument (the halting problem, etc). Scientific Theory Choice Which theory is true? Ockham Says: Choose the Simplest! Skeptical Hypothesis Maybe a complex theory is true but the data are simple Puzzle An indicator must be sensitive to what it indicates. simple Puzzle An indicator must be sensitive to what it indicates. complex Puzzle But Ockham’s razor always points at simplicity. simple Puzzle But Ockham’s razor always points at simplicity. complex Meno If we know that the truth is simple, we don’t need Ockham’s razor. simple Meno If we don’t know that the truth is simple, what good is Ockam’s razor? complex Some Standard Responses Simple Theories are Virtuous Testable (Popper, Glymour) Unified (Friedman, Kitcher) Explanatory (Harman) Symmetrical (Malament) Compress data (Rissanen) Interesting (Vitanyi) But the Truth Might Not be Virtuous To conclude that a theory is true because it is virtuous is wishful thinking (van Fraassen). Overfitting (Akaike, Sober, Forster) Empirical estimates based on complex models have greater mean squared distance from the truth Truth Overfitting (Akaike, Sober, Forster) Empirical estimates based on complex models have greater mean squared distance from the truth. Pop! Pop! Pop! Pop! Overfitting (Akaike, Sober, Forster) Empirical estimates based on complex models have greater mean squared distance from the truth. Truth clamp Overfitting (Akaike, Sober, Forster) Empirical estimates based on complex models have greater mean squared distance from the truth. Pop! Pop! Pop! Pop! Truth clamp Does Not Aim at True Theory ...even if the simple theory is known to be false… Four eyes! clamp Miracle Argument (Putnam, Rosenkrantz) Simple data would be a miracle in a complex world. Simple data would be expected in a simple world. Miracle Argument Planetary retrograde motion Earth Sun Mars Miracle Argument Simple data would be a miracle in a complex world. Simple data would be expected in a simple world. epicycle q Complex theory lapping Simple theory Miracle Argument Simple data would be a miracle in a complex world. Simple data would be expected in a simple world. epicycle lapping q’ Complex theory Simple theory However… Simple data would not be a miracle if the complex theory’s parameter were set near q; epicycle q Complex theory lapping Simple theory The Real Miracle Ignorance about model: p(S) p(C); + Ignorance about parameter settings within theories: p(C(q) | C) p(C(q’ ) | C). = Knowledge about parameter settings across theories CP p(C(q)) << p(S). Is it knognorance or Ignoredge? q q q q q q q q The Ellsberg Paradox 1/3 ? ? 3 ball colors with these frequencies Urn The Ellsberg Paradox p q r 1/3 ? ? Human betting preferences p > q The Ellsberg Paradox p q r 1/3 ? ? Human betting preferences p p r > q < q r ! Diagnosis p q r 1/3 ? ? knowledge ignorance Robust Bayesianism (Levi, Kadane, Seidenfeld) knowledge 1/3 p q? ?r 1/3 0 2/3 1/3 1/3 1/3 2/3 ... 1/3 ... Credence is range of probs. ignorance 0 Choose the act with highest worst-case expected value. Worst-case Expected Values p q r 1/3 ? ? 1/3 ? ? 1/3 1/3 > 0 < 0 2/3 Whither Ockham? Since you don’t really know that complex worlds won’t produce simple data, shouldn’t your ignorance include distributions concentrated on such possibilities? I prefer ignoredge. In Any Event The coherentist foundations of Bayesianism have nothing to do with short-run truth-conduciveness. Temptation If only the probabilities p(C(q’ ) | C) were chances rather than opinions. Then the alleged miracle would be a proper miracle. Proof of God (R. Koons 1999) 1. Natural chance is determined by the fundamental theory of natural chance. 2. If Ockham’s razor reliably infers the theory of natural chance, the chance that a complex theory of natural chance would have its parameters set to produce simple data must be low. 3. But since natural chance is determined by the free parameters of the fundamental theory of natural chance, the parameter setting is not governed by natural chance. 4. Hence, it must be governed by non-natural chance. 5. Holy water is available at the exit. Moral The basic point is right. Solution: 1. Keep naturalism 2. Keep fundamental scientific knowledge 3. Dump short-run reliability as explication of truth-conduciveness. Externalist Magic Simplicity informs via hidden causes or tracking mechanisms. G Leibniz, evolution Simple B(Simple) Kant Simple B(Simple) Ouija board Simple B(Simple) With Friends Like Those… Practice and data are the same. Knowledge vs. non-knowledge depends on hidden causes. By Ockham’s razor, better to explain Ockham’s razor without the hidden causes. ? The Last Gasp: Convergence Bayes (washing out of the prior) BIC (Schwarz) Structural Risk Minimization (Vapnik, Harman) TETRAD (Spirtes, Glymour, Scheines) truth Complexity The Last Gasp: Convergence truth Plink! Blam! Complexity The Last Gasp: Convergence truth Blam! Plink! Complexity The Last Gasp: Convergence truth Plink! Blam! Complexity Logic is Backwards Ockham methods are sufficient for convergence. But every finite variant of a convergent method converges (Salmon). So Ockham’s razor is not necessary for convergence. truth Alternative ranking Truth Conduciveness Reliability Too strong: Circles or magic required. Convergence Too weak Doesn’t single out simplicity Simple Complex Simple Complex Truth Conduciveness Indication or tracking Too strong: Circles or magic required. Convergence Simple Complex Simple Complex Too weak Doesn’t single out simplicity “Straightest” convergence Just right? Simple Complex Truth-conduciveness as Straightest Convergence Simple Complex Ancient Roots "Living in the midst of ignorance and considering themselves intelligent and enlightened, the senseless people go round and round, following crooked courses, just like the blind led by the blind." Katha Upanishad, I. ii. 5, c. 600 BCE. Retraction New output does not entail previous output. Retracted Content t t+1 Eliminate Needless Retractions Truth Necessary Retractions are Virtuous Truth Demon’s Role as Justifier Truth I can force every convergent method to retract this often, so your retractions are justified by my power. Eliminate Needless Delays to Retractions theory Eliminate Needless Delays to Retractions application application application application applicationcorollary theory application application corollary application corollary Easy Comparisons retractions at least as bad = at least as many retractions at least as late time Worst-case Retraction Time Bounds (1, 2, ∞) ... ... Empirical Complexity Hopeless ideas: Syntactic length Computational incompressibility By what miracle do notational conventions indicate truth? Empirical Complexity Close but no cigar: Free parameters Broken symmetries Meno, I want simplicity itself, not parts of simplicity. Empirical Complexity Empirical complexity of T in G = the length of the maximum path (T1, …, Tn, T) of answers in G the demon can force from an arbitrary convergent method. Keep up! T1 T2 T3 T Polynomial Order Data = open intervals around Y at rational values of X. Polynomial Order Demon shows flat line until convergent method takes bait. Zero degree curve Polynomial Order Demon shows flat line until convergent method takes bait. Zero degree curve Polynomial Order Then switches to tilted line until convergent method takes the bait. First degree curve Polynomial Order Then switches to parabola until convergent method takes the bait … Second degree curve Complexity can be Complex Complexity given e: T2 3 T7 2 1 0 T5 T4 T8 T3 Complexity Relative to Data Complexity given e + e’: T2 3 T7 2 1 0 T5 T4 T8 T3 Complexity Relative to Data Complexity given e + e’: 3 2 1 0 T2 T5 T7 T4 Timed Retraction Bounds r(M, e, n) = the least timed retraction bound for worlds satisfying theories of complexity n and producing finite input history e. M ... Empirical Complexity 0 1 2 3 ... M is Efficient at e For each convergent M’ that agrees with M along finite input history e, for each complexity n: r(M, e, n) r(M’, e, n) M M’ ... Empirical Complexity 0 1 2 3 ... M is Strongly Beaten at e There exists convergent M’ that agrees with M up to the end of e, such that for each complexity n: r(M, e, n) > r(M’, e, n). M M’ ... Empirical Complexity 0 1 2 3 ... M is Weakly Beaten at e There exists convergent M’ that agrees with M up to the end of e, such that each n, r(M, e, n) r(M’, e, n); Exists n, r(M, e, n) > r(M’, e, n). For M M’ ... Empirical Complexity 0 1 2 3 ... Demons for Ockham Ockham’s Razor Don’t select a theory unless it is uniquely simplest in light of experience. 3 2 1 0 ? T2 T5 T7 T4 Ockham’s Razor Don’t select a theory unless it is uniquely simplest in light of experience. 3 2 1 T2 0 T7 T7 Stalwartness Don’t retract your answer while it remains uniquely simplest 3 2 1 T2 0 T7 T7, T7 Argument Sketch No matter what convergent M has done in the past, nature can force M to produce each answer down an arbitrary effect path, arbitrarily often. Nature can also force violators of Ockham’s razor or stalwartness either into an extra retraction or a late retraction in each complexity class. Ockham Efficiency Theorem Let M converge to the true theory in problem P. The following are equivalent: M is always Ockham and stalwart in P; M is always efficient in P; M is never weakly beaten in P. Policy Retractions Many explanations have been offered to make sense of the here-today-gone-tomorrow nature of medical wisdom — what we are advised with confidence one year is reversed the next — but the simplest one is that it is the natural rhythm of science. (Do We Really Know What Makes us Healthy, NY Times Magazine, Sept. 16, 2007). Causal Inference Causal graph theory: more correlations more causes. partial correlations S G(S) Idealized data = list of conditional dependencies discovered so far. Anomaly = the addition of a conditional dependency to the list. Causal Axioms (Pearl, Glymour) 1. 2. Screening off: X is statistically independent of its non-descendents given its parents. No invisible causes: The only true independence relations are those entailed by condition 1. N1 N1 P1 P1 P2 P2 N2 X D Forcible Sequence of Causal Theories Y1 X1 Y2 X2 X3 W Forcible Sequence of Causal Theories Y1 Y3 X1 Y2 X2 X3 W Y4 Forcible Sequence of Causal Theories Y1 Y3 X1 Y2 Y5 X2 X3 W Y4 Forcible Sequence of Causal Theories Y1 Y3 X1 Y2 Y5 X2 X3 W Y4 Y4 Moral In counterfactual prediction, form of model matters and retractions are unavoidable. Ockham efficiency agrees very closely with best contemporary practice. Maybe that’s all there is to it. Conclusions Ockham’s razor is necessary for staying on the straightest path to the truth Does not reliably point at or indicate the truth. Demonstrably works without circles, evasions, or magic. Such a theory is motivated in counterfactual inference and estimation.