Computing as an Experimental Science or Exaggerated Formalist Rhetoric Considered Harmful Raymond J. Mooney Dept. of Computer Sciences University of Texas at Austin 1 Philosophy and Methodology Matters • One’s beliefs about the philosophy and methodology of computer science greatly impacts: – The problems on which one chooses to work. – The approach one takes to these problems. – One’s perception of the significance of results and the quality of others’ work. – One’s beliefs about the education and training of students and CS curriculum issues. 2 Programs as Mathematical Objects • A computer program is a formally defined mathematical object, i.e. a Turing machine. • Properties of such a mathematical object can be formally proven: – Correctness according to a formal specification. – Termination. – Time and space complexity: • Worst case. • Average case (assuming a formal specification of the input distribution). 3 Exaggerated Formalist Rhetoric • Since programs are formal mathematical objects, experiments and empirical analysis have no place in computer science. • Computer science is mathematics and consists of definitions, theorems, and proofs. • Without theorems, there is no rigorous science, just unprincipled hacking. • Students primarily need to be taught appropriate mathematics and how to prove theorems. • Students do not need to be taught experimental methodology appropriate for natural and social sciences. 4 Formal and Empirical Specifications • Some problems have clear, mathematical, formal specifications. – These lend themselves to theoretical analysis. • Some problems have “empirical” specifications that depend on physical (biological/psychological/social) phenomena that, at least currently, have no adequate mathematical formalization. – These require experimental analysis. 5 A Tale of Two Bugs • Formalists’ Poster Child: The Intel Pentium division bug illustrates a problem (floating-point division) that has a clear formal definition. • Experimentalists’ Poster Child: The Apple Newton’s insufficiently accurate hand-writing recognition illustrates a problem whose specification relies on a psychological phenomenon with no known formalization: human visual perception of written language. 6 9/3 4 7 Final exam 9AM Tino1 exon qRH 8 Formalist $100K Challenge Problem! • If you believe that hand-writing recognition can be given a formal specification suitable for mathematical verification, then I strongly encourage you to write it down! • If, in my lifetime, you can formulate such a specification and use it to develop and verify hand-writing recognition software and demonstrate perfect accuracy on a standard, realistic benchmark dataset… – I will personally award you a $100,000 prize! 9 Other Problems with Empirical Specifications • • • • Speech recognition. Natural-language question answering. Filtering spam from email. Retrieval of documents or images for a websearch query that a human user finds relevant. • Predicting the secondary or tertiary structure of proteins from amino-acid sequences. • Lossy compression of images or movies that are still “acceptable” to human perception. • Rendering images or visualizations that humans perceive as natural or useful for solving problems. 10 Choosing the Right Methodology • When the problem is easily formalized, one should attempt to prove one’s algorithms and programs correct. • When the problem is empirical, one should run well-designed, controlled experiments on real data, using multiple trials, and analyze the statistical significance of results with respect to a well-defined hypothesis. 11 Formal and Empirical Input Distributions • Some problems have clear formal input distributions that lend themselves to theoretical average-case analysis. • Some problems have “empirical” input distributions that depend on phenomena in the physical (biological/ psychological/ social) world that, at least currently, have no adequate mathematical formalization. 12 Average-Case Analysis Examples • Formal Distribution: Time to sort a list of randomly ordered items. • Empirical Distribution: Time to run a typical user program, where program behavior can vary with respect to: – Locality of memory references – Predictability of branch outcomes – … Human-written programs for solving typical human problems exhibit regularities not present in programs randomly generated by any known statistical distribution. 13 Other Empirical Problem Distributions • Typical traveling-salesman problems encountered in applications and industry. • Typical scheduling problems encountered in applications and industry. • Typical problems for automated theorem proving. – TPTP problem set 14 Experimental Methodology 101 • An appropriate, meaningful measure of performance: – Character error rate. • A clear hypothesis. – Method A has lower character error rate than method B on English non-cursive handwriting. • A large set of realistic benchmark data. – Millions of words of human-labeled handwritten text from a diverse set of English writers. • A clear separation of training (development) and test data. – Labeled hand-written text that the developers have never seen. 15 Experimental Methodology 101 (cont.) • A well-controlled study. – The only difference between the two conditions is the algorithm being tested (e.g. same training and test data). • Multiple trials on different independent data sets in order to measure variance. • Statistical analysis demonstrating significant difference. – Significant t-test result (p< 0.05) on the difference between the mean character error rates of A and B in order to reject the “null hypothesis” that performance difference is attributable to random variation. 16 CS as Poor Experimental Science • Generally, computer scientists’ experimental methodology is severely lacking. • “Experimental” computer science frequently means hacking-up a new system and illustrating performance on a few demo problems. – “Look Ma, no hands” – “Dancing bears” • Even when quantitative results are gathered and presented, frequently there is no: – Clearly stated hypothesis that is being tested by a wellcontrolled experiment. – Measure of variance or statistical analysis of results. 17 The Poor Experimental Methodology of a Turing-Award Winner • Perhaps my own research area of machine learning has become one of the most experimentally rigorous areas in CS. • An ICML-01 paper on classifying gene-expression data co-authored by R. Karp was properly criticized during Q&A after the presentation for lacking statistical analysis of its experimental results. • This lapse by leading computer scientists was quite surprising to my 1st year graduate student. 18 CS Education in Experimental Methods • In most natural and social sciences, experimental methodology and statistical analysis of results is specifically taught in laboratory or statistics classes. • Computer scientists receive virtually no formal training in basic experimental methodology or statistical analysis. – I had to learn it from psychologists! – I have to teach it in a CS graduate depth course! • CS curricula assume theory is the only source of rigor. 19 Misapplied Formalism • Sometimes researchers misapply formal methods to fundamentally empirical problems. • A particular formal specification or input distribution is assumed and analyzed. • Without evidence, this formalism is motivated by, or claimed to be relevant to, some important empirical problem. • The result is an insignificant theoretical result that has little or no bearing on the problem of interest. • For empirical problems, experimental evidence must be presented to demonstrate that a particular formalism truly characterizes the actual problem. 20 Beauty is NOT Our Primary Business • Frequently, striving for elegant formalism leads some computer scientists to study mathematical problems that are mere caricatures of important empirical problems. • They focus on what can be proven and ignore the complexity of the real problem. • Proving theorems about caricatures of empirical problems contributes little to either theoretical or applied computer science. • Science should focus on demonstrably solving interesting, important problems, not on formulating elegant formalisms that do not reflect reality. 21 Kepler vs. Keats • J. Kepler wasted years of his life trying to model planetary orbits with elegant, beautiful circles before empirical data forced him to realize that astronomical reality was more complex. • J. Keats makes nice poetry but lousy science. Beauty is truth, truth beauty. but Beauty is in the eye of the beholder. Beauty is only skin deep. • In science, truth is a theory that accurately predicts relevant empirical data. 22 Experimental Analysis of Formal Problems • Although a problem may have a clear formal definition, theoretical analysis may currently be intractable. – – – – Chess. Nonlinear dynamic systems. Cellular automata. Random satisfiability problems. • In this case, experimental analysis may also be the best approach. • Experimentation may result in conjectures that may subsequently be proven. 23 Experimental Mathematics • Many conjectures in mathematics originate from empirical observations. – Fermat’s last theorem – Goldbach’s conjecture – P NP • The experimental aspects of mathematics have generally not been publicized or appreciated. • Partly due to influence from computer science, mathematics has begun to embrace its experimental side: – Experimental Mathematics journal (started 1992) (www.expmath.org) 24 Epistemology • Many believe that mathematical proof is a fundamentally more trustworthy source of knowledge than experimentation. – Mathematics as the “Queen of the sciences” • I believe this erroneous belief is based on a long tradition of rationalism that ignores the fact that mathematics is a human enterprise, and therefore equally based in the empirical world. • Rationalism vs. empiricism is a 2,400 year long philosophical debate, which, apparently, continues today to impact computer science methodology. 25 Empirical Basis of Mathematics • All mathematical proofs rely on accepting a set of fundamental axioms without proof. • Gödel proved that even the consistency of the axioms of arithmetic can not be proven formally. – Newsflash! (1931) “Gödel knocks Queen from throne”. • Most humans are willing to accept these axioms based on intuitions that are based on empirical experience and/or innate pre-conceptions that have evolved to increase survival and reproduction. • These intuitions may be misleading. – Non-Euclidian geometry and General Relativity – Mathematics: The Loss of Certainty, M. Kline, 1982. 26 Philosophy of Mathematics • Platonism is a mystical belief in a non-material world of mathematical concepts to which humans somehow have infallible access. • I believe a much more scientifically defensible view is that mathematics is based on human psychological processing that is grounded in the material world. • I recommend the following recent books: – Number Sense: How the Mind Creates Mathematics, S. Dehaene, 2000. – Where Mathematics Comes From: How the Embodied Mind Brings Mathematics into Being, G. Lakoff & R. Nuñez, 2001. – The Math Gene: How Mathematical Thinking Evolved, K.J. Devlin, 2001. 27 Conclusions • In contradiction to exaggerated formalist rhetoric, experimental computer science can be wellmotivated and rigorous. • Some computational problems are fundamentally empirical and properly approached using experimental methodology. • Sometimes the right thing to do is to prove a theorem, sometimes to run an experiment. • Compared to theoretical CS, rigorous experimental CS is relatively immature. • Progress in experimental CS requires changes to existing educational practice and curricula. 28