Computing as an Experimental Science or Exaggerated Formalist Rhetoric

advertisement
Computing as an
Experimental Science
or
Exaggerated Formalist Rhetoric
Considered Harmful
Raymond J. Mooney
Dept. of Computer Sciences
University of Texas at Austin
1
Philosophy and Methodology Matters
• One’s beliefs about the philosophy and
methodology of computer science greatly
impacts:
– The problems on which one chooses to work.
– The approach one takes to these problems.
– One’s perception of the significance of results
and the quality of others’ work.
– One’s beliefs about the education and training
of students and CS curriculum issues.
2
Programs as Mathematical Objects
• A computer program is a formally defined
mathematical object, i.e. a Turing machine.
• Properties of such a mathematical object
can be formally proven:
– Correctness according to a formal specification.
– Termination.
– Time and space complexity:
• Worst case.
• Average case (assuming a formal specification of the
input distribution).
3
Exaggerated Formalist Rhetoric
• Since programs are formal mathematical objects, experiments and empirical
analysis have no place in computer science.
• Computer science is mathematics and consists of definitions, theorems, and
proofs.
• Without theorems, there is no rigorous science, just unprincipled hacking.
• Students primarily need to be taught appropriate mathematics and how to prove
theorems.
• Students do not need to be taught experimental methodology appropriate for
natural and social sciences.
4
Formal and Empirical Specifications
• Some problems have clear, mathematical,
formal specifications.
– These lend themselves to theoretical analysis.
• Some problems have “empirical”
specifications that depend on physical
(biological/psychological/social)
phenomena that, at least currently, have no
adequate mathematical formalization.
– These require experimental analysis.
5
A Tale of Two Bugs
• Formalists’ Poster Child: The Intel Pentium
division bug illustrates a problem (floating-point
division) that has a clear formal definition.
• Experimentalists’ Poster Child: The Apple
Newton’s insufficiently accurate hand-writing
recognition illustrates a problem whose
specification relies on a psychological
phenomenon with no known formalization: human
visual perception of written language.
6
9/3
4
7
Final exam 9AM
Tino1 exon qRH
8
Formalist $100K Challenge Problem!
• If you believe that hand-writing recognition
can be given a formal specification suitable
for mathematical verification, then I
strongly encourage you to write it down!
• If, in my lifetime, you can formulate such a
specification and use it to develop and
verify hand-writing recognition software
and demonstrate perfect accuracy on a
standard, realistic benchmark dataset…
– I will personally award you a $100,000 prize!
9
Other Problems with Empirical
Specifications
•
•
•
•
Speech recognition.
Natural-language question answering.
Filtering spam from email.
Retrieval of documents or images for a websearch query that a human user finds relevant.
• Predicting the secondary or tertiary structure of
proteins from amino-acid sequences.
• Lossy compression of images or movies that are
still “acceptable” to human perception.
• Rendering images or visualizations that humans
perceive as natural or useful for solving problems.
10
Choosing the Right Methodology
• When the problem is easily formalized, one
should attempt to prove one’s algorithms
and programs correct.
• When the problem is empirical, one should
run well-designed, controlled experiments
on real data, using multiple trials, and
analyze the statistical significance of results
with respect to a well-defined hypothesis.
11
Formal and Empirical Input Distributions
• Some problems have clear formal input
distributions that lend themselves to
theoretical average-case analysis.
• Some problems have “empirical” input
distributions that depend on phenomena in
the physical (biological/ psychological/
social) world that, at least currently, have no
adequate mathematical formalization.
12
Average-Case Analysis Examples
• Formal Distribution: Time to sort a list of randomly
ordered items.
• Empirical Distribution: Time to run a typical user
program, where program behavior can vary with
respect to:
– Locality of memory references
– Predictability of branch outcomes
– …
Human-written programs for solving typical human
problems exhibit regularities not present in programs
randomly generated by any known statistical distribution.
13
Other Empirical Problem Distributions
• Typical traveling-salesman problems
encountered in applications and industry.
• Typical scheduling problems encountered in
applications and industry.
• Typical problems for automated theorem
proving.
– TPTP problem set
14
Experimental Methodology 101
• An appropriate, meaningful measure of performance:
– Character error rate.
• A clear hypothesis.
– Method A has lower character error rate than method B on
English non-cursive handwriting.
• A large set of realistic benchmark data.
– Millions of words of human-labeled handwritten text from a
diverse set of English writers.
• A clear separation of training (development) and test
data.
– Labeled hand-written text that the developers have never
seen.
15
Experimental Methodology 101 (cont.)
• A well-controlled study.
– The only difference between the two conditions is the
algorithm being tested (e.g. same training and test data).
• Multiple trials on different independent data sets
in order to measure variance.
• Statistical analysis demonstrating significant
difference.
– Significant t-test result (p< 0.05) on the difference
between the mean character error rates of A and B in
order to reject the “null hypothesis” that performance
difference is attributable to random variation.
16
CS as Poor Experimental Science
• Generally, computer scientists’ experimental
methodology is severely lacking.
• “Experimental” computer science frequently
means hacking-up a new system and illustrating
performance on a few demo problems.
– “Look Ma, no hands”
– “Dancing bears”
• Even when quantitative results are gathered and
presented, frequently there is no:
– Clearly stated hypothesis that is being tested by a wellcontrolled experiment.
– Measure of variance or statistical analysis of results.
17
The Poor Experimental Methodology
of a Turing-Award Winner
• Perhaps my own research area of machine
learning has become one of the most
experimentally rigorous areas in CS.
• An ICML-01 paper on classifying gene-expression
data co-authored by R. Karp was properly
criticized during Q&A after the presentation for
lacking statistical analysis of its experimental
results.
• This lapse by leading computer scientists was
quite surprising to my 1st year graduate student.
18
CS Education in Experimental Methods
• In most natural and social sciences, experimental
methodology and statistical analysis of results is
specifically taught in laboratory or statistics
classes.
• Computer scientists receive virtually no formal
training in basic experimental methodology or
statistical analysis.
– I had to learn it from psychologists!
– I have to teach it in a CS graduate depth course!
• CS curricula assume theory is the only source of
rigor.
19
Misapplied Formalism
• Sometimes researchers misapply formal methods to
fundamentally empirical problems.
• A particular formal specification or input distribution
is assumed and analyzed.
• Without evidence, this formalism is motivated by, or
claimed to be relevant to, some important empirical
problem.
• The result is an insignificant theoretical result that has
little or no bearing on the problem of interest.
• For empirical problems, experimental evidence must
be presented to demonstrate that a particular
formalism truly characterizes the actual problem.
20
Beauty is NOT Our Primary Business
• Frequently, striving for elegant formalism leads
some computer scientists to study mathematical
problems that are mere caricatures of important
empirical problems.
• They focus on what can be proven and ignore the
complexity of the real problem.
• Proving theorems about caricatures of empirical
problems contributes little to either theoretical or
applied computer science.
• Science should focus on demonstrably solving
interesting, important problems, not on formulating
elegant formalisms that do not reflect reality.
21
Kepler vs. Keats
• J. Kepler wasted years of his life trying to model
planetary orbits with elegant, beautiful circles before
empirical data forced him to realize that astronomical
reality was more complex.
• J. Keats makes nice poetry but lousy science.
Beauty is truth,
truth beauty.
but
Beauty is in the eye of the beholder.
Beauty is only skin deep.
• In science, truth is a theory that accurately predicts
relevant empirical data.
22
Experimental Analysis of Formal Problems
• Although a problem may have a clear formal
definition, theoretical analysis may currently be
intractable.
–
–
–
–
Chess.
Nonlinear dynamic systems.
Cellular automata.
Random satisfiability problems.
• In this case, experimental analysis may also be the
best approach.
• Experimentation may result in conjectures that
may subsequently be proven.
23
Experimental Mathematics
• Many conjectures in mathematics originate from
empirical observations.
– Fermat’s last theorem
– Goldbach’s conjecture
– P  NP
• The experimental aspects of mathematics have
generally not been publicized or appreciated.
• Partly due to influence from computer science,
mathematics has begun to embrace its
experimental side:
– Experimental Mathematics journal (started 1992)
(www.expmath.org)
24
Epistemology
• Many believe that mathematical proof is a
fundamentally more trustworthy source of
knowledge than experimentation.
– Mathematics as the “Queen of the sciences”
• I believe this erroneous belief is based on a long
tradition of rationalism that ignores the fact that
mathematics is a human enterprise, and therefore
equally based in the empirical world.
• Rationalism vs. empiricism is a 2,400 year long
philosophical debate, which, apparently, continues
today to impact computer science methodology.
25
Empirical Basis of Mathematics
• All mathematical proofs rely on accepting a set of
fundamental axioms without proof.
• Gödel proved that even the consistency of the axioms
of arithmetic can not be proven formally.
– Newsflash! (1931) “Gödel knocks Queen from throne”.
• Most humans are willing to accept these axioms based
on intuitions that are based on empirical experience
and/or innate pre-conceptions that have evolved to
increase survival and reproduction.
• These intuitions may be misleading.
– Non-Euclidian geometry and General Relativity
– Mathematics: The Loss of Certainty, M. Kline, 1982.
26
Philosophy of Mathematics
• Platonism is a mystical belief in a non-material
world of mathematical concepts to which humans
somehow have infallible access.
• I believe a much more scientifically defensible view
is that mathematics is based on human psychological
processing that is grounded in the material world.
• I recommend the following recent books:
– Number Sense: How the Mind Creates Mathematics, S.
Dehaene, 2000.
– Where Mathematics Comes From: How the Embodied
Mind Brings Mathematics into Being, G. Lakoff & R.
Nuñez, 2001.
– The Math Gene: How Mathematical Thinking Evolved,
K.J. Devlin, 2001.
27
Conclusions
• In contradiction to exaggerated formalist rhetoric,
experimental computer science can be wellmotivated and rigorous.
• Some computational problems are fundamentally
empirical and properly approached using
experimental methodology.
• Sometimes the right thing to do is to prove a
theorem, sometimes to run an experiment.
• Compared to theoretical CS, rigorous
experimental CS is relatively immature.
• Progress in experimental CS requires changes to
existing educational practice and curricula.
28
Download