a brief overview of some statistical research related to

advertisement
A BRIEF OVERVIEW OF SOME STATISTICAL
RESEARCH RELATED TO COMPUTER MODELS
Max D. Morris
Dept. of Statistics,
Dept. of Industrial and Manufacturing
Systems Engineering,
Iowa State University
OVERVIEW
1. (Very incomplete) summary of some stat’l approaches to
empirical study of computer models, organized by:
•
Assessment of relative importance of inputs
•
Quantitative understanding of computer models
•
Relationships between models and their context
2. Some “twists” and current points of interest concerning GaSP
models
3. Potential links to more general problems and algorithms
INPUT ASSESSMENT
1. Input sampling plans
•
e.g. Latin hypercube sampling – Variance reduction
techniques to improve sampling properties of simple
statistics
•
Uncertainty Analysis – Variance propagation via other
restricted random samples – Where should input uncertainty
be reduced so as to most reduce output uncertainty?
2. “Quasi-Modeling” based on ranks – Standard data analysis
techniques after replacing data with ranks
•
“Linearize” monotonic relationships, so that simple modeling
techniques work – No attempt to predict/approximate output
•
Fairly ad hoc, but often effective
3. Dimension-Reduction methods – “Screening” to discover the
(hopefully small) subset of important inputs
QUANTITATIVE UNDERSTANDING OF MODELS
1. “Meta-models” based on (e.g.) classification/regression trees –
Focus on partitioning input space into homogeneous sectors
2. Flexible (input-)”spatial” modeling – Better justified for mimicking
unknown functional forms … especially:
3. Spatial stochastic process models – Most often Gaussian
Stochastic Process models (GaSP)
•
Multi-dimensional analogue of time-series modeling
•
Akin to spatial analysis techniques, e.g. Kriging
•
Most applicable when some qualitative assumptions can be
made about the model (continuity, smoothness …)
4. Design for GaSP models
•
Entropy – Information content of statistical model
•
Asymptotic arguments – Distance between input vectors
MODELS AND CONTEXT
1. Issues concerning relationships between
•
Model: y = M(x,)
•
Reference: yR  xR
2. Calibration: What  will make y match yR when x matches xR?
•
Stat’l issues: “Honest” assessment of uncertainty
•
e.g. groundwater flow, incomplete porosity information
3. Validation: (How well / Where) does y match yR when x = xR?
•
Stat’l issues: Uncertainty about  and/or xR
•
Of particular concern with limited reference data, e.g.
weapons stockpile
4. Inverse problems: What xR led to this yR?
•
Stat’l issues: Uncertainty about fidelity of M(-, )
•
e.g. subterranean void detection via acoustic pinging
“TWISTS” CONCERNING GaSP MODELS
1. Use of derivative information
•
Motivated by “augmented” codes (e.g. adjoint-equation form)
2. Modeling at interface of statistics and numerical analysis
•
•
Statistically model an “intermediate” quantity that is:
•
more regular/well-behaved than output
•
highly reliable as a simple predictor of output
e.g. truncation errors
3. Knowledge/Model/Model/Physical data-merging (e.g. LANL)
4. System-of-systems modeling
•
“Patching” a meta-model when one component of a modular
system is changed
•
e.g. supply chain models
POSSIBLE LINKS TO MORE GENERAL ALGORITHMS
1. Physical model validation
•
“Benchmark” comparisons?
2. Joint analysis of related models/algorithms
•
“Sampling and experimental design for asymptotic analysis”?
•
“Statistical methods for assessing convergence”?
3. “Species Discovery” , “x-partitioning” (rather than “y-prediction”)
•
Success rate / “basin of attraction” questions about random
starts
4. Experimental design for software reliability assessment, Spatial
“bump-hunting”
•
“Modeling the relationship between input parameters and
performance”?
ABSTRACT
I'll summarize statistical ideas and methods that have been
developed for empirical studies involving computer models. The
presentation will focus on three general areas: (1.) assessment of
the relative importance of inputs, (2.) quantitative understanding
of input-output relationships, and (3.) questions involving
relationships between models and their context. Most of the
specific work to which I'll refer is motivated by problems involving
computer models constructed to represent a ``reality'' of some
sort; I'll conclude with some thoughts concerning how these and
other statistical ideas might be useful in evaluations of more
general models and algorithms.
A FEW REFERENCES
•
Dalal, S.R., and C.L. Mallows (1989) “Factor-Covering Designs for Testing
Software,” Technometrics 40, 234-243. -- Brief discussion of how software
testing can be viewed as an experimental design problem.
•
Dean, A. and S. Lewis (eds.) Screening: Methods for Experimentation in Industry,
Drug Discovery, and Genetics, Springer – Recent volume of statistical screening
ideas applied in several experimental applications areas.
•
Diaconis, P. (1988) “Bayesian Numerical Analysis,” Statistical Decision Theory
and Related Topics IV, J. Berger, S. Gupta (eds.), Springer-Verlag. -- Introduction
to the idea of what the title says.
•
Kennedy, M. and A. O’Hagan (2000) “Predicting the Output from a Complex
Computer Code when Fast Approximations are Available,” Biometrika 87, 1-13. –
Heavily referenced paper on joint analysis of similar models.
•
Sacks, J., W. Welch, T. Mitchell, and H. Wynn (1989) “Design and Analysis of
Computer Experiments,” Statistical Science 4, 409-423. -- Early description of
using stochastic processes to examine computer models.
•
Saltelli, A., K. Chan, and E.Scott (2000) Sensitivity Analysis, John Wiley and Sons
– Summary of input sampling techniques to support relatively simple sensitivity
analyses.
•
Santner, T.J., B.J. Williams, and W. I. Notz (2003) The Design and Analysis of
Computer Experiments, Springer, ISBN 0-387-95420-1 – General overview of the
use of GASP models in computer experiments.
Download