Truth-conduciveness Without Reliability: A Skeptical Derivation of Ockham’s Razor

advertisement
Truth-conduciveness Without
Reliability:
A Skeptical Derivation of
Ockham’s Razor
Kevin T. Kelly
Department of Philosophy
Carnegie Mellon University
www.cmu.edu
Naivete
Lo! An apple.
Skeptical Hypothesis
Lo! An apple.
Maybe you are a brain in a vat.
Everything would look the same.
Skeptical Hypothesis
poof
Maybe you are a brain in a vat.
Everything would look the same.
Retrenchment
That’s not a serious possibility
You have the burden of proof.
It’s remote.
It’s implausible.
It’s distant from the actual world.
You’re not in my community.
Who cares about the worst case?
Retrenchment
That’s not a serious possibility
You have the burden of proof.
It’s remote.
It’s implausible.
It’s distant from the actual world.
You’re not in my community.
Who cares about the worst case?
Unsatisfying


Possibilities delimited a priori: circular
account.
Possibilities delimited a posteriori: how do we
seek knowledge?
So there!
Zen Approach

Don’t rush to defeat the demon.
Grrrr!
Zen Approach



Don’t rush to defeat the demon.
Get to know him extremely well.
Justification may be located in the demon’s
power rather than in his weakness.
The Zen of Computation



Algorithms are justified by efficiency.
Efficiency means you couldn’t do better.
You couldn’t do better due to a demonic
argument (the halting problem, etc).
Scientific Theory Choice
Which theory is true?
Ockham Says:
Choose the
Simplest!
Skeptical Hypothesis
Maybe a complex theory is true
but the data are simple
Puzzle

An indicator must be sensitive to what it
indicates.
simple
Puzzle

An indicator must be sensitive to what it
indicates.
complex
Puzzle

But Ockham’s razor always points at
simplicity.
simple
Puzzle

But Ockham’s razor always points at
simplicity.
complex
Meno

If we know that the truth is simple, we don’t
need Ockham’s razor.
simple
Meno

If we don’t know that the truth is simple,
what good is Ockam’s razor?
complex
Some Standard Responses
Simple Theories are Virtuous






Testable (Popper, Glymour)
Unified (Friedman, Kitcher)
Explanatory (Harman)
Symmetrical (Malament)
Compress data (Rissanen)
Interesting (Vitanyi)
But the Truth Might Not be Virtuous

To conclude that a theory is true because it is
virtuous is wishful thinking (van Fraassen).
Overfitting (Akaike, Sober, Forster)

Empirical estimates based on complex models
have greater mean squared distance from the
truth
Truth
Overfitting (Akaike, Sober, Forster)

Empirical estimates based on complex models
have greater mean squared distance from the
truth.
Pop!
Pop!
Pop!
Pop!
Overfitting (Akaike, Sober, Forster)

Empirical estimates based on complex models
have greater mean squared distance from the
truth.
Truth
clamp
Overfitting (Akaike, Sober, Forster)

Empirical estimates based on complex models
have greater mean squared distance from the
truth.
Pop!
Pop!
Pop!
Pop!
Truth
clamp
Does Not Aim at True Theory

...even if the simple theory is known to be false…
Four eyes!
clamp
Miracle Argument (Putnam, Rosenkrantz)
Simple
data would be a miracle in a complex world.
Simple data would be expected in a simple world.
Miracle Argument
Planetary retrograde motion
Earth
Sun
Mars
Miracle Argument
Simple
data would be a miracle in a complex world.
Simple data would be expected in a simple world.
epicycle
q
Complex theory
lapping
Simple theory
Miracle Argument
Simple
data would be a miracle in a complex world.
Simple data would be expected in a simple world.
epicycle
lapping
q’
Complex theory
Simple theory
However…
Simple data would not be a miracle if the
complex theory’s parameter were set near q;

epicycle
q
Complex theory
lapping
Simple theory
The Real Miracle
Ignorance about model:
p(S)  p(C);
+ Ignorance about parameter settings within theories:
p(C(q) | C)  p(C(q’ ) | C).
= Knowledge about parameter settings across theories
CP
p(C(q)) << p(S).
Is it knognorance or
Ignoredge?
q
q
q
q
q
q
q
q
The Ellsberg Paradox
1/3
?
?
3 ball colors with these frequencies
Urn
The Ellsberg Paradox
p
q
r
1/3
?
?
Human betting preferences
p
>
q
The Ellsberg Paradox
p
q
r
1/3
?
?
Human betting preferences
p
p
r
>
q
<
q
r
!
Diagnosis
p
q
r
1/3
?
?
knowledge
ignorance
Robust Bayesianism (Levi, Kadane, Seidenfeld)
knowledge
1/3
p
q?
?r
1/3
0
2/3
1/3
1/3
1/3
2/3
...
1/3
...
Credence
is range
of probs.
ignorance
0
Choose the act with highest worst-case expected value.
Worst-case Expected Values
p
q
r
1/3
?
?
1/3
?
?
1/3
1/3
>
0
<
0
2/3
Whither Ockham?
Since you don’t really know that complex worlds
won’t produce simple data, shouldn’t your ignorance
include distributions concentrated on such possibilities?
I prefer ignoredge.
In Any Event
The coherentist foundations of Bayesianism have
nothing to do with short-run truth-conduciveness.
Temptation
If only the probabilities p(C(q’ ) | C) were chances
rather than opinions. Then the alleged miracle
would be a proper miracle.
Proof of God (R. Koons 1999)
1. Natural chance is determined by the fundamental theory
of natural chance.
2. If Ockham’s razor reliably infers the theory of natural
chance, the chance that a complex theory of natural
chance would have its parameters set to produce simple
data must be low.
3. But since natural chance is determined by the free
parameters of the fundamental theory of natural chance,
the parameter setting is not governed by natural chance.
4. Hence, it must be governed by non-natural chance.
5. Holy water is available at the exit.
Moral
The basic point is right.
Solution:
1. Keep naturalism
2. Keep fundamental scientific knowledge
3. Dump short-run reliability as explication of
truth-conduciveness.
Externalist Magic

Simplicity informs via hidden causes or tracking
mechanisms.
G
Leibniz, evolution
Simple
B(Simple)
Kant
Simple
B(Simple)
Ouija board
Simple
B(Simple)
With Friends Like Those…



Practice and data are the same.
Knowledge vs. non-knowledge depends on hidden
causes.
By Ockham’s razor, better to explain Ockham’s razor
without the hidden causes.
?
The Last Gasp: Convergence
Bayes (washing out of the prior)
BIC (Schwarz)
Structural Risk Minimization (Vapnik, Harman)
TETRAD (Spirtes, Glymour, Scheines)
truth
Complexity
The Last Gasp: Convergence
truth
Plink!
Blam!
Complexity
The Last Gasp: Convergence
truth
Blam!
Plink!
Complexity
The Last Gasp: Convergence
truth
Plink!
Blam!
Complexity
Logic is Backwards



Ockham methods are sufficient for convergence.
But every finite variant of a convergent method converges
(Salmon).
So Ockham’s razor is not necessary for convergence.
truth
Alternative ranking
Truth Conduciveness

Reliability
Too strong:
 Circles or magic required.


Convergence
Too weak
 Doesn’t single out simplicity

Simple
Complex
Simple
Complex
Truth Conduciveness

Indication or tracking
Too strong:
 Circles or magic required.


Convergence
Simple
Complex
Simple
Complex
Too weak
 Doesn’t single out simplicity


“Straightest” convergence

Just right?
Simple
Complex
Truth-conduciveness as
Straightest Convergence
Simple
Complex
Ancient Roots
"Living in the midst of ignorance and
considering themselves intelligent and
enlightened, the senseless people go round
and round, following crooked courses, just
like the blind led by the blind." Katha
Upanishad, I. ii. 5, c. 600 BCE.
Retraction

New output does not entail previous output.
Retracted
Content
t
t+1
Eliminate Needless Retractions
Truth
Necessary Retractions are Virtuous
Truth
Demon’s Role as Justifier
Truth
I can force every
convergent
method to retract
this often, so your
retractions are
justified by my
power.
Eliminate Needless Delays to
Retractions
theory
Eliminate Needless Delays to
Retractions
application
application
application
application
applicationcorollary
theory
application
application
corollary
application
corollary
Easy Comparisons
retractions
at least as bad =
at least as many retractions
at least as late
time
Worst-case Retraction Time Bounds
(1, 2, ∞)
...
...
Empirical Complexity
Hopeless ideas:
Syntactic length
Computational incompressibility
By what miracle do notational conventions indicate truth?
Empirical Complexity
Close but no cigar:
Free parameters
Broken symmetries
Meno, I want simplicity itself, not parts of simplicity.
Empirical Complexity
Empirical complexity of T in G =
the length of the maximum path (T1, …, Tn, T)
of answers in G the demon can force from an
arbitrary convergent method.
Keep up!
T1
T2
T3
T
Polynomial Order

Data = open intervals around Y at rational
values of X.
Polynomial Order

Demon shows flat line until convergent
method takes bait.
Zero degree curve
Polynomial Order

Demon shows flat line until convergent
method takes bait.
Zero degree curve
Polynomial Order

Then switches to tilted line until convergent
method takes the bait.
First degree curve
Polynomial Order

Then switches to parabola until convergent
method takes the bait …
Second degree curve
Complexity can be Complex
Complexity given e:
T2
3
T7
2
1
0
T5
T4
T8
T3
Complexity Relative to Data
Complexity given e + e’:
T2
3
T7
2
1
0
T5
T4
T8
T3
Complexity Relative to Data
Complexity given e + e’:
3
2
1
0
T2
T5
T7
T4
Timed Retraction Bounds

r(M, e, n) = the least timed retraction bound for
worlds satisfying theories of complexity n and
producing finite input history e.
M
...
Empirical Complexity
0
1
2
3
...
M is Efficient at e


For each convergent M’ that agrees with M along finite
input history e,
for each complexity n:
r(M, e, n)  r(M’, e, n)
M
M’
...
Empirical Complexity
0
1
2
3
...
M is Strongly Beaten at e


There exists convergent M’ that agrees with M up to the
end of e, such that
for each complexity n:
r(M, e, n) > r(M’, e, n).
M
M’
...
Empirical Complexity
0
1
2
3
...
M is Weakly Beaten at e

There exists convergent M’ that agrees with M up to the
end of e, such that
each n, r(M, e, n)  r(M’, e, n);
 Exists n, r(M, e, n) > r(M’, e, n).
 For
M
M’
...
Empirical Complexity
0
1
2
3
...
Demons for Ockham
Ockham’s Razor

Don’t select a theory unless it is uniquely
simplest in light of experience.
3
2
1
0
?
T2
T5
T7
T4
Ockham’s Razor

Don’t select a theory unless it is uniquely
simplest in light of experience.
3
2
1
T2
0
T7
T7
Stalwartness

Don’t retract your answer while it remains
uniquely simplest
3
2
1
T2
0
T7
T7, T7
Argument Sketch


No matter what convergent M has done in the
past, nature can force M to produce each
answer down an arbitrary effect path, arbitrarily
often.
Nature can also force violators of Ockham’s
razor or stalwartness either into an extra
retraction or a late retraction in each complexity
class.
Ockham Efficiency Theorem

Let M converge to the true theory in problem P. The
following are equivalent:
M
is always Ockham and stalwart in P;
 M is always efficient in P;
 M is never weakly beaten in P.
Policy Retractions


Many explanations have been offered to make
sense of the here-today-gone-tomorrow nature
of medical wisdom — what we are advised with
confidence one year is reversed the next — but
the simplest one is that it is the natural rhythm
of science.
(Do We Really Know What Makes us Healthy, NY
Times Magazine, Sept. 16, 2007).
Causal Inference

Causal graph theory: more correlations  more causes.
partial correlations
S
G(S)

Idealized data = list of conditional dependencies
discovered so far.

Anomaly = the addition of a conditional dependency to
the list.
Causal Axioms (Pearl, Glymour)
1.
2.
Screening off: X is statistically independent of
its non-descendents given its parents.
No invisible causes: The only true independence
relations are those entailed by condition 1.
N1
N1
P1
P1
P2
P2
N2
X
D
Forcible Sequence of Causal Theories
Y1
X1
Y2
X2
X3
W
Forcible Sequence of Causal Theories
Y1
Y3
X1
Y2
X2
X3
W
Y4
Forcible Sequence of Causal Theories
Y1
Y3
X1
Y2
Y5
X2
X3
W
Y4
Forcible Sequence of Causal Theories
Y1
Y3
X1
Y2
Y5
X2
X3
W
Y4
Y4
Moral



In counterfactual prediction, form of model
matters and retractions are unavoidable.
Ockham efficiency agrees very closely with best
contemporary practice.
Maybe that’s all there is to it.
Conclusions




Ockham’s razor is necessary for staying on the
straightest path to the truth
Does not reliably point at or indicate the truth.
Demonstrably works without circles, evasions,
or magic.
Such a theory is motivated in counterfactual
inference and estimation.
Download