The History of COMPSTAT

advertisement
30 Years
The History of
Keysteps of
Computational Statistics
Wilfried Grossmann, University of Vienna, Austria
Michael G. Schimek, Medical University of Graz, Austria
Peter Paul Sint, Austrian Academy of Sciences, Vienna
1974
Department of Statistics and
Informatics, University of
Vienna
Peter Paul, a „senior“
Assistant Professor
2
1974
Department of Statistics
and Informatics,
University of Vienna
A few
years after
Wilfried, a „junior“
Assistant Professor
3
1974
University of Vienna
A few years
after
Gerhard
Bruckmann
Michael, a first year student
4
Outline of Presentation
The Beginning of COMPSTAT
• Early statistical computing
• The institutional environment
• The first symposium and the Compstat Society
Developments in Computational
Statistics (CS)
•
•
•
•
CS and statistical theory
CS and algorithms
CS and computer science
CS and application
The COMPSTAT Symposia
5
The Beginning of COMPSTAT
6
Early Computational Statistics
• The Beginnings in Vienna
– Institute of Statistics
• Part of the Law Faculty - S. Sagoroff Leipzig/Sofia/USA/Berlin//Vienna - Energy Balances
• first Computer: first generation machine
– Paid for by Rockefeller-Foundation 1960
– Arrival of the ‚Electronic Brain‘ 1st generation
» Never again similar enthusiasm
• Institute of Advanced Studies - Ford Institute
– Statistical machines - card counting - >2nd generation
• Replaced by IBM /360-44 - 3rd gen. SSP / SPSS
– Computing Center
7
Statistics-Computational
One year Biostatistics department Oxford University
Still: Not strongly integrated in international
statistical community Main contacts ISI: Central Statistical Office, Sagoroff
1973 ISI-session in Vienna - emphasis on applications
- computational methods rare
Bring statisticians with our interests to Vienna
Encouragement by publisher Arnulf Liebing /Physica/
What is specific to our department?
Concept of Computational Statistics
- Johannes Gordesch (Math)
- Peter Paul Sint (Physics)
8
First COMPSTAT Call
COMPSTAT 1974
-Gerhart Bruckmann
- Local fame as analyst of voting results during election nights
-Leopold Schmetterer (successor of Sagoroff)
- Internationally known Mathematical Statistician
(Franz Ferschl, incoming professor of statistics, new editor of Metrika
- added as an editor by the publisher)
9
S. Sagoroff and M. Tantilov
10
First COMPSTAT Editors
11
Preface of the first Proceedings
12
Logic of the Logo
13
J. Gordesch at Compstat76 Berlin
14
Getting of Age
•
•
•
•
•
•
•
•
International from the start
Compstat Society since Berlin
Leiden NL 1978 Integration into IASC
Edinburgh GB 1980 - Toulouse F 1982
Eastern Europe needed Politics ISI-IASC
Local Projects redirected: Prague 1984
Rome I 1986 - Copenhagen 1988 DK
Dubrovnik YU 1990 - Neuchâtel CH 1992
15
Prague 1984
16
Developments in
Computational Statistics
17
Computational Statistics
• What is Computational Statistics?
– A question raised many times at the end of
the 80ies and beginning of the 90ies inside
the community
18
Computational Statistics
• Working definition (A. Westlake)
Computational Statistics is related to the
advance of statistical theory and methods
through the use of computational methods.
This includes both the use of computation to
explore the impact of theories and methods,
and development of algorithms to make these
ideas available to users
19
Computational Statistics
Statistical
Theory
Modelling
Applications
Numerical Analysis
Algorithms
Computational
Statistics
Statistical Software
Seminumerical
Algorithms
Computer
Science
20
Computational Statistics and
Statistical Theory
• The statistical journey in the 20th century
• The Theory Era
• The Methodology Era
21
Computational Statistics and
Statistical Theory
• The statistical journey in the 20th century
– B. Efron:
Statistics in the 20th century is a journey
between three poles:
• Applications
• Mathematics
• Computation
22
Computational Statistics and
Statistical Theory
• The Theory Era
(Pearson, Neyman, Fisher, Wald)
– From models for solving practical problems
towards a mathematical decision theoretic
framework
– Based on optimality principles
– Application is based on computations feasible
for paper and pencil or mechanical computing
devices
23
Computational Statistics and
Statistical Theory
• Modelling Era (1)
– Tukey’s paper about the future of data
analysis (1962) as a turning point from
mathematics towards computation
•
•
•
•
Confirmatory versus explanatory analysis
Dynamics of data analysis
“Robustness”
Importance of Graphics
24
Computational Statistics and
Statistical Theory
• Modelling Era (2)
– Important developments in the modelling era
•
•
•
•
•
•
Nonparametric and Robust Methods
Kaplan-Meier and Proportional Hazards
Logistic Regression and GLM
Jackknife and Bootstrap
EM and MCMC
Empirical Bayes and James-Stein Estimation
25
Computational Statistics and
Statistical Theory
• Modelling Era (3)
– The modelling area is characterized by a
strong interplay between statistical theory and
computational statistics
– The computer as a workbench for statistical
experiments (going back to v. Neumann and
S. Ulam)
• Passive usage: Studying feasibility of statistical
theory by simulation
• Active usage: Obtain results which cannot be
computed by conventional numerical algorithms
26
Computational Statistics and
Statistical Theory
• COMPSTAT was probably not always at
the frontier of this developments but the
programs and the proceedings reflect
quite well the dynamics of the subject in
the Modelling Era
27
Computational Statistics and
Algorithms
• Numerical Algorithms
– Matrix Computation, Optimization
• Random Numbers / Monte Carlo
• Semi-numerical Algorithms
– Sorting, Searching, Combinatorial Methods, Graph
Theoretic Algorithms,…
• Graphical Algorithms
• Symbolic Computation (?)
• Mathematical vs. Statistical Modelling
28
Computational Statistics and
Algorithms
• Statistics and Numerical Algorithms (1)
– Fast Fourier Transform (Tukey)
– Recursive Algorithms and Filtering (Kalman
Filter)
(Both topics seem to be not core topics in
computational statistics)
29
Computational Statistics and
Algorithms
• Statistics in Numerical Algorithms (2)
– Adaptation of optimization techniques (e.g.
scoring methods)
– Behaviour of optimization methods in
statistical context (numerical convergence vs.
stochastic convergence concepts)
Implicit Consideration at COMPSTAT
30
Computational Statistics and
Algorithms
• Statistics and Random Numbers / Monte
Carlo
– Generation of Random numbers was (and is)
probably more a topic of mathematics
(number theory) and computer science
• In the beginning of COMPSTAT there was also
some connection to simulation
– Genuine application of Monte Carlo Methods
in connection with new developments of
statistical theory (e.g. MCMC)
31
Computational Statistics and
Algorithms
• Statistics and semi-numerical algorithms
– Applications in context of nonparametric statistics and
analysis of tabular data
• Feasibility of conditional inference for logistic models
– New developments on the borderline between
statistics and computer science
• Data Mining as a new statistical modelling paradigm
COMPSTAT was open towards these developments
and integrated it into the program
32
Computational Statistics and
Algorithms
• Statistics and Graphical Algorithms
– Development rather complementary to the
developments of computer science,
– Important issues (L. Wilkinson):
• Graphics are not only a tool for displaying results but rather a
tool for perceiving relationships
• Dynamic graphics as important tool for data analysis
• Graphics are a means of model formalization reflecting
quantitative and qualitative traits of its variables
Represented quite well at COMPSTAT
33
Computational Statistics and
Algorithms
• Mathematical vs. Statistical Modelling
– Emphasis on different methods (e.g.
Differential Equations)
– Different modelling environments (J. Nelder)
• Data structures in statistics
• Exploratory nature of statistical analysis (statistical
analysis cycle)
• Competence of users
34
Computational Statistics and
Computer Science
• Developments in Statistical Software
• Development of Statistical Languages
• Developments in Statistical Database
Management
35
Computational Statistics and
Computer Science
• Developments in Statistical Software (1)
– From numerical subroutines towards
statistical packages
– Main goals:
• Taking into account the peculiarities of statistical
data analysis
• Usage of actual hardware developments
36
Computational Statistics and
Computer Science
• Developments in Statistical Software (2)
– COMPSTAT was from the beginning onwards
an important forum for the development of
statistical software
• The proceedings in the beginning of the eighties
show numerous software developments for
specific statistical models
• There was always some tension in connection with
presentation of commercial software developments
and the scientific character of the conference
37
Computational Statistics and
Computer Science
• Development of Statistical Languages (1)
– GLIM was probably the first genuine statistical
modelling language
• Present at COMPSTAT from the very beginning
38
Computational Statistics and
Computer Science
• Development of Statistical Languages (2)
– The S language set up a new paradigm for
computing which is of interest also outside
statistical applications
• Contribution in Computer Science honoured by the
ACM Software System Award for J. Chambers
Also it started already in 1976 it took a long
time to enter the COMPSTAT community
39
Computational Statistics and
Computer Science
• Development of Statistical Languages (3)
– R got rather fast popularity inside COMPSTAT
due to free availability and effective
organisation of CRAN
– Omegahat: An umbrella for open source
projects in computational statistics covering
not only statistical computation but also other
important aspects in distributed computing
40
Computational Statistics and
Computer Science
• Development of Statistical Languages (4)
– XLISP-Stat as proof of concept (in particular
for animated graphics)
– XploRe as Java based production system
41
Computational Statistics and
Computer Science
• Statistical Data Base Management
– Main challenge is appropriate usage of the
developments in database technology in statistical
context
• Combination of statistical data structures and statistical
processing activities with conceptual data models
• Representation of tabular data
• Metadata as a tool to capture the complexity of statistical
data
A small but active group inside the COMPSTAT
community from the very beginning
42
Computational Statistics and
Applications
• Challenges for Computational Statistics
Rather independent from application area
– Data
• Data capture
• Data structures
• Data size
– Analysis Process
• Analysis strategies
• The role of the statistician in the computer age
43
Computational Statistics and
Applications
• Data challenges (1)
– Contributions towards data challenges occur
occasionally at COMPSTAT
• Actual problems
– Data capture
• Data capture tools are rather a side branch of computational
statistics and more connected to official statistics
• A new challenge are data streams which have up to now
attracted not so much attention in the computational statistics
community
44
Computational Statistics and
Applications
• Data challenges (2)
– Data structures
• New problems (e.g. in connection with data mining) raise
questions with respect to the applicability of the basic
statistical analysis paradigm (population, sample,
measurement process)
– Data size
• Handling huge datasets
All these challenges seem to be at the moment not
core topics of computational statistics
45
Computational Statistics and
Applications
• Analysis process
– Analysis strategies
• The question of formalization of analysis strategies
was a hot topic at the COMPSTAT conferences in
the end of the 80ies, but there was limited success
– The role of statisticians in the computer age
• Is progress in computational statistics an enabler
for statisticians or leads it towards a de-skilling of
the statistical profession?
46
The COMPSTAT Symposia
47
A full set of COMPSTAT proceedings
(one statistical outlier removed)
Do you see the CSDA volumes in the background ?
Here they are !
48
The COMPSTAT Symposia I
Symposium
Year
Organizers
Vienna
1974
Berlin
# Submissions
# Papers
I/C
# Participants
Sint
50
100
1976
Gordesch
Naeve
58
180
Leiden
1978
Corsten
Hermans
68
310
Edinburgh
1980
Barrit
Wishart
250
4/82
750
Toulouse
1982
Caussinus
Ettinger
Tomassone
250
15/60
500
49
The COMPSTAT Symposia II
Symposium
Year
Organizers
# Submissions
# Papers
I/C
# Participants
Prag
1984
Havranek
Sidak
Novak
300
7/65
???
Rome
1986
De Antoni
Lauro
Rizzi
300
14/60
900
Copenhagen
1988
Edwards
Raun
300
9/51
800
Dubrovnik
1990
Momirovic
115
6/43
180
Neuchâtel
1992
Dodge
Whittaker
115
11/115
200
50
COMPSTAT 1994 Vienna and Satellite Meeting on
Smoothing Semmering (World Cultural Heritage)
Andrew Westlake, Allmut
Hörmann, Wolfgang Härdle
Randy Eubank
51
On the track from Vienna to Semmering
in the Austrian Alps (historical train)
The organizer
52
Satellite Meeting on Smoothing
We finally arrived at the
mountain spa Semmering
Antoine de Falguerolles
and the organizer at the opening
53
The COMPSTAT Symposia III
Symposium
Year
Organizers
# Submissions
# Papers
I/C
# Participants
Vienna
Semmring
(Satellite)
1994
Dutter
Grossmann
Schimek
200
11/60
380
30
7/26
50
Barcelona
1996
Prat
250
13/56
300
Bristol
1998
Payne
Green
180
12/58
370
Utrecht
2000
Van der
Heijden
Bethlehem
250
15/60
220
Berlin
2002
Härdle
220
9/90
260
54
The COMPSTAT proceedings from
the Vienna and Semmering meetings
Model of Vienna
University
Kastalia Fountain
55
Download