30 Years The History of Keysteps of Computational Statistics Wilfried Grossmann, University of Vienna, Austria Michael G. Schimek, Medical University of Graz, Austria Peter Paul Sint, Austrian Academy of Sciences, Vienna 1974 Department of Statistics and Informatics, University of Vienna Peter Paul, a „senior“ Assistant Professor 2 1974 Department of Statistics and Informatics, University of Vienna A few years after Wilfried, a „junior“ Assistant Professor 3 1974 University of Vienna A few years after Gerhard Bruckmann Michael, a first year student 4 Outline of Presentation The Beginning of COMPSTAT • Early statistical computing • The institutional environment • The first symposium and the Compstat Society Developments in Computational Statistics (CS) • • • • CS and statistical theory CS and algorithms CS and computer science CS and application The COMPSTAT Symposia 5 The Beginning of COMPSTAT 6 Early Computational Statistics • The Beginnings in Vienna – Institute of Statistics • Part of the Law Faculty - S. Sagoroff Leipzig/Sofia/USA/Berlin//Vienna - Energy Balances • first Computer: first generation machine – Paid for by Rockefeller-Foundation 1960 – Arrival of the ‚Electronic Brain‘ 1st generation » Never again similar enthusiasm • Institute of Advanced Studies - Ford Institute – Statistical machines - card counting - >2nd generation • Replaced by IBM /360-44 - 3rd gen. SSP / SPSS – Computing Center 7 Statistics-Computational One year Biostatistics department Oxford University Still: Not strongly integrated in international statistical community Main contacts ISI: Central Statistical Office, Sagoroff 1973 ISI-session in Vienna - emphasis on applications - computational methods rare Bring statisticians with our interests to Vienna Encouragement by publisher Arnulf Liebing /Physica/ What is specific to our department? Concept of Computational Statistics - Johannes Gordesch (Math) - Peter Paul Sint (Physics) 8 First COMPSTAT Call COMPSTAT 1974 -Gerhart Bruckmann - Local fame as analyst of voting results during election nights -Leopold Schmetterer (successor of Sagoroff) - Internationally known Mathematical Statistician (Franz Ferschl, incoming professor of statistics, new editor of Metrika - added as an editor by the publisher) 9 S. Sagoroff and M. Tantilov 10 First COMPSTAT Editors 11 Preface of the first Proceedings 12 Logic of the Logo 13 J. Gordesch at Compstat76 Berlin 14 Getting of Age • • • • • • • • International from the start Compstat Society since Berlin Leiden NL 1978 Integration into IASC Edinburgh GB 1980 - Toulouse F 1982 Eastern Europe needed Politics ISI-IASC Local Projects redirected: Prague 1984 Rome I 1986 - Copenhagen 1988 DK Dubrovnik YU 1990 - Neuchâtel CH 1992 15 Prague 1984 16 Developments in Computational Statistics 17 Computational Statistics • What is Computational Statistics? – A question raised many times at the end of the 80ies and beginning of the 90ies inside the community 18 Computational Statistics • Working definition (A. Westlake) Computational Statistics is related to the advance of statistical theory and methods through the use of computational methods. This includes both the use of computation to explore the impact of theories and methods, and development of algorithms to make these ideas available to users 19 Computational Statistics Statistical Theory Modelling Applications Numerical Analysis Algorithms Computational Statistics Statistical Software Seminumerical Algorithms Computer Science 20 Computational Statistics and Statistical Theory • The statistical journey in the 20th century • The Theory Era • The Methodology Era 21 Computational Statistics and Statistical Theory • The statistical journey in the 20th century – B. Efron: Statistics in the 20th century is a journey between three poles: • Applications • Mathematics • Computation 22 Computational Statistics and Statistical Theory • The Theory Era (Pearson, Neyman, Fisher, Wald) – From models for solving practical problems towards a mathematical decision theoretic framework – Based on optimality principles – Application is based on computations feasible for paper and pencil or mechanical computing devices 23 Computational Statistics and Statistical Theory • Modelling Era (1) – Tukey’s paper about the future of data analysis (1962) as a turning point from mathematics towards computation • • • • Confirmatory versus explanatory analysis Dynamics of data analysis “Robustness” Importance of Graphics 24 Computational Statistics and Statistical Theory • Modelling Era (2) – Important developments in the modelling era • • • • • • Nonparametric and Robust Methods Kaplan-Meier and Proportional Hazards Logistic Regression and GLM Jackknife and Bootstrap EM and MCMC Empirical Bayes and James-Stein Estimation 25 Computational Statistics and Statistical Theory • Modelling Era (3) – The modelling area is characterized by a strong interplay between statistical theory and computational statistics – The computer as a workbench for statistical experiments (going back to v. Neumann and S. Ulam) • Passive usage: Studying feasibility of statistical theory by simulation • Active usage: Obtain results which cannot be computed by conventional numerical algorithms 26 Computational Statistics and Statistical Theory • COMPSTAT was probably not always at the frontier of this developments but the programs and the proceedings reflect quite well the dynamics of the subject in the Modelling Era 27 Computational Statistics and Algorithms • Numerical Algorithms – Matrix Computation, Optimization • Random Numbers / Monte Carlo • Semi-numerical Algorithms – Sorting, Searching, Combinatorial Methods, Graph Theoretic Algorithms,… • Graphical Algorithms • Symbolic Computation (?) • Mathematical vs. Statistical Modelling 28 Computational Statistics and Algorithms • Statistics and Numerical Algorithms (1) – Fast Fourier Transform (Tukey) – Recursive Algorithms and Filtering (Kalman Filter) (Both topics seem to be not core topics in computational statistics) 29 Computational Statistics and Algorithms • Statistics in Numerical Algorithms (2) – Adaptation of optimization techniques (e.g. scoring methods) – Behaviour of optimization methods in statistical context (numerical convergence vs. stochastic convergence concepts) Implicit Consideration at COMPSTAT 30 Computational Statistics and Algorithms • Statistics and Random Numbers / Monte Carlo – Generation of Random numbers was (and is) probably more a topic of mathematics (number theory) and computer science • In the beginning of COMPSTAT there was also some connection to simulation – Genuine application of Monte Carlo Methods in connection with new developments of statistical theory (e.g. MCMC) 31 Computational Statistics and Algorithms • Statistics and semi-numerical algorithms – Applications in context of nonparametric statistics and analysis of tabular data • Feasibility of conditional inference for logistic models – New developments on the borderline between statistics and computer science • Data Mining as a new statistical modelling paradigm COMPSTAT was open towards these developments and integrated it into the program 32 Computational Statistics and Algorithms • Statistics and Graphical Algorithms – Development rather complementary to the developments of computer science, – Important issues (L. Wilkinson): • Graphics are not only a tool for displaying results but rather a tool for perceiving relationships • Dynamic graphics as important tool for data analysis • Graphics are a means of model formalization reflecting quantitative and qualitative traits of its variables Represented quite well at COMPSTAT 33 Computational Statistics and Algorithms • Mathematical vs. Statistical Modelling – Emphasis on different methods (e.g. Differential Equations) – Different modelling environments (J. Nelder) • Data structures in statistics • Exploratory nature of statistical analysis (statistical analysis cycle) • Competence of users 34 Computational Statistics and Computer Science • Developments in Statistical Software • Development of Statistical Languages • Developments in Statistical Database Management 35 Computational Statistics and Computer Science • Developments in Statistical Software (1) – From numerical subroutines towards statistical packages – Main goals: • Taking into account the peculiarities of statistical data analysis • Usage of actual hardware developments 36 Computational Statistics and Computer Science • Developments in Statistical Software (2) – COMPSTAT was from the beginning onwards an important forum for the development of statistical software • The proceedings in the beginning of the eighties show numerous software developments for specific statistical models • There was always some tension in connection with presentation of commercial software developments and the scientific character of the conference 37 Computational Statistics and Computer Science • Development of Statistical Languages (1) – GLIM was probably the first genuine statistical modelling language • Present at COMPSTAT from the very beginning 38 Computational Statistics and Computer Science • Development of Statistical Languages (2) – The S language set up a new paradigm for computing which is of interest also outside statistical applications • Contribution in Computer Science honoured by the ACM Software System Award for J. Chambers Also it started already in 1976 it took a long time to enter the COMPSTAT community 39 Computational Statistics and Computer Science • Development of Statistical Languages (3) – R got rather fast popularity inside COMPSTAT due to free availability and effective organisation of CRAN – Omegahat: An umbrella for open source projects in computational statistics covering not only statistical computation but also other important aspects in distributed computing 40 Computational Statistics and Computer Science • Development of Statistical Languages (4) – XLISP-Stat as proof of concept (in particular for animated graphics) – XploRe as Java based production system 41 Computational Statistics and Computer Science • Statistical Data Base Management – Main challenge is appropriate usage of the developments in database technology in statistical context • Combination of statistical data structures and statistical processing activities with conceptual data models • Representation of tabular data • Metadata as a tool to capture the complexity of statistical data A small but active group inside the COMPSTAT community from the very beginning 42 Computational Statistics and Applications • Challenges for Computational Statistics Rather independent from application area – Data • Data capture • Data structures • Data size – Analysis Process • Analysis strategies • The role of the statistician in the computer age 43 Computational Statistics and Applications • Data challenges (1) – Contributions towards data challenges occur occasionally at COMPSTAT • Actual problems – Data capture • Data capture tools are rather a side branch of computational statistics and more connected to official statistics • A new challenge are data streams which have up to now attracted not so much attention in the computational statistics community 44 Computational Statistics and Applications • Data challenges (2) – Data structures • New problems (e.g. in connection with data mining) raise questions with respect to the applicability of the basic statistical analysis paradigm (population, sample, measurement process) – Data size • Handling huge datasets All these challenges seem to be at the moment not core topics of computational statistics 45 Computational Statistics and Applications • Analysis process – Analysis strategies • The question of formalization of analysis strategies was a hot topic at the COMPSTAT conferences in the end of the 80ies, but there was limited success – The role of statisticians in the computer age • Is progress in computational statistics an enabler for statisticians or leads it towards a de-skilling of the statistical profession? 46 The COMPSTAT Symposia 47 A full set of COMPSTAT proceedings (one statistical outlier removed) Do you see the CSDA volumes in the background ? Here they are ! 48 The COMPSTAT Symposia I Symposium Year Organizers Vienna 1974 Berlin # Submissions # Papers I/C # Participants Sint 50 100 1976 Gordesch Naeve 58 180 Leiden 1978 Corsten Hermans 68 310 Edinburgh 1980 Barrit Wishart 250 4/82 750 Toulouse 1982 Caussinus Ettinger Tomassone 250 15/60 500 49 The COMPSTAT Symposia II Symposium Year Organizers # Submissions # Papers I/C # Participants Prag 1984 Havranek Sidak Novak 300 7/65 ??? Rome 1986 De Antoni Lauro Rizzi 300 14/60 900 Copenhagen 1988 Edwards Raun 300 9/51 800 Dubrovnik 1990 Momirovic 115 6/43 180 Neuchâtel 1992 Dodge Whittaker 115 11/115 200 50 COMPSTAT 1994 Vienna and Satellite Meeting on Smoothing Semmering (World Cultural Heritage) Andrew Westlake, Allmut Hörmann, Wolfgang Härdle Randy Eubank 51 On the track from Vienna to Semmering in the Austrian Alps (historical train) The organizer 52 Satellite Meeting on Smoothing We finally arrived at the mountain spa Semmering Antoine de Falguerolles and the organizer at the opening 53 The COMPSTAT Symposia III Symposium Year Organizers # Submissions # Papers I/C # Participants Vienna Semmring (Satellite) 1994 Dutter Grossmann Schimek 200 11/60 380 30 7/26 50 Barcelona 1996 Prat 250 13/56 300 Bristol 1998 Payne Green 180 12/58 370 Utrecht 2000 Van der Heijden Bethlehem 250 15/60 220 Berlin 2002 Härdle 220 9/90 260 54 The COMPSTAT proceedings from the Vienna and Semmering meetings Model of Vienna University Kastalia Fountain 55