Extending the Bootstrap Using a Univariate Sampling Model to

advertisement
Extending the Bootstrap Using a
Univariate Sampling Model to
Multivariate Settings
Joseph Lee Rodgers and William Beasley
University of Oklahoma
Focusing on the Alternative
(Researcher’s) Hypothesis: AHST
Joseph Lee Rodgers and William Beasley
University of Oklahoma
Goals
• To review history of NHST
• To review recent attention to AHST
• To show how the bootstrap is facile in AHST
settings using univariate correlations
• To extend this conceptualization into
multivariate correlation settings
– Multiple regression
– Factor analysis, etc.
History of NHST
• Not necessary, though entertaining
• NHST emerged in the 1920’s and 1930’s from
Fisher’s hypothesis testing paradigm, and from
the Neyman-Pearson decision-making paradigm
• In its modern version, it combines Fisher’s
attention to the null hypothesis and p-value with
Neyman-Pearson’s development of alpha, the
alternative hypothesis, and statistical power
• The result: “an incoherent mismash of some of
Fisher’s ideas on the one hand, and some of the
ideas of Neyman and E. S. Pearson on the other
hand” (Gigerenzer , 1993)
Criticism of NHST
• NHST “never makes a positive contribution”
(Schmidt & Hunter, 1997)
• NHST “has not only failed to support and advance
psychology as a science but also has seriously
impeded it” (Cohen, 1994)
• NHST is “surely the most boneheadedly
misguided procedure ever institutionalized in the
rote training of science students” (Rozeboom,
1997).
• In a recent American Psychologist paper I
suggested that the field of quantitative
methodology has transitioned away from
NHST toward a modeling framework that
emphasizes the researcher’s hypothesis – an
AHST framework – with relatively little
discussion
Praise for AHST
• AHST  “Alternative Hypothesis Significance
Testing”
• “Most tests of null hypotheses are rather
feckless and potentially misleading. However,
an additional brand of sensible significance
tests arises in assessing the goodness-of-fit of
substantive models to data.” (Abelson, 1997)
• “After the introduction of … structural models …, it
soon became apparent that the structural modeler
has, in some sense, the opposite intention to the
experimentalist. The latter hopes to “reject” a
restrictive hypothesis of the absence of certain
causal effects in favor of their presence—rejection
permits publication. . . . The former wishes to
“accept” a restrictive model of the absence of
certain causal effects—acceptance permits
publication.” (McDonald, 1997)
• Our “approach allows for testing null
hypotheses of not-good fit, reversing the role
of the null hypothesis in conventional tests of
model fit, so that a significant result provides
strong support for good fit.” (MacCallum,
Browne, & Sugawara, 1996)
More background
• Both Gosset and Fisher wanted to test
hypotheses using resampling methods:
• “Before I had succeeded in solving my problem
analytically, I had endeavored to do so
empirically.” (Gosset, 1908)
• “[The] conclusions have no justification beyond
the fact they could have been arrived at by this
very elementary [re-randomization] method.”
(Fisher, 1936)
• Both actually did resampling (rerandomization)
– Gosset (1908) used N=3000 datapoints collected
from prisoners, written on slips of paper and drawn
from a hat
– Fisher (1920’s) used some of Darwin’s data,
comparing cross-fertilized and self-fertilized corn
• So one reasonable view is that these statistical
pioneers developed parametric statistical
procedures because they lacked the
computational resources to use resampling
methods
Modern Resampling
• Randomization or permutation tests – Gosset
and Fisher
• The Jackknife – Quennoille, 1949; Tukey,
1953
• The Bootstrap – Efron, 1979
Bootstrapping Correlations
• Early work
– Diaconis and Efron, Scientific American, 1983,
provided conceptual motivation
– But straightforward percentile-based bootstraps
didn’t work very well
– So they were bias-corrected, and accelerated, and
for awhile the BCa bootstrap was the state-of-theart
• Lee and Rodgers (1998) showed how to regain
conceptual simplicity using univariate
sampling, rather than bivariate sampling
– Especially effective in small samples and highly
skewed settings
– Was as good or even better than parametric
methods in normal distribution settings
• Example using Diaconis & Efron (1983) data
• Beasley et al (2007) applied the same
univariate sampling logic to test nonzero null
hypotheses about correlations
• The methodology used there is what we’re
currently extended to multivariate settings,
and will be described in detail
• Steps in the Beasley et al approach
– Given observed bivariate data, and sample r
– Define a univariate sampling frame (rectangular)
that respects the two marginal distributions
– Diagonalize the sampling frame to have a given
correlation using matrix square root procedure
(e.g., Kaiser & Dickman, 1962)
– Use the new sampling frame to generate
bootstrap samples and construct an empirical
sampling distribution
• Two methods:
– HI, hypothesis-imposed – generate an empirical
bootstrap distribution around the hypothesized
null non-zero correlation
• E.g., to test ρ = .5, diagonalize sampling frame to have
correlation of .5, bootstrap, then evaluate whether
observed r is contained in the 95% percentile interval
– OI, observed-imposed – generate an empirical
bootstrap distribution around the observed r
• E.g., to test ρ = .5, diagonalize sampling frame to have
correlation of observed r, bootstrap, then evaluate
whether hypothesized ρ = .5 is contained in the 95%
percentile interval
• Both OI and HI work effectively, OI seems to
work slightly better
• Beasley’s (2010) dissertation was a Bayesian
approach that is highly computationally
intensive – both the bootstrap and the
Bayesian method require lots of
computational resources – but it works quite
well
Review the AHST logic
• Define a hypothesis, model, theory, that makes a
prediction (e.g., of a certain correlation, or a
correlation structure)
• Define a sampling distribution in relation to that
hypothesis
• Directly test the hypothesis, using an AHST logic
• Why didn’t Gosset/Fisher or Neyman-Pearson do this?
They didn’t have a method to generate a sampling
distribution in relation to the alternative hypothesis –
they only had the computational/mathematical ability
to generate a sampling distribution around the null, so
that’s what they did, and the rest is history (we hope)
• But using resamping theory, and the bootstrap in
particular, we can generate a sampling
distribution (empirically) in settings with high
skewness, unknown distribution, small N, using
either HI or OI logic
• Note – be prepared for some computationally
intensive methods
• The programmers and computers will have lots of
work to do
• But to applied researchers, the consumers of
these methods, this computational intensity can
be transparent
Previous applications
• Bollen & Stine (1993) used a square root
transformation to adjust the bootstrap for SEM fit
indices, using similar logic to that defined above
• Parametric bootstraps are now popular – these
use a distributional model of the data, rather
than the observed data, to draw the bootstrap
• Zhang & Browne (2010) used this method in a
dynamic factor analysis of time series data (in
which the model was partially imposed by using a
moving block bootstrap across the time series)
MV “diagonalization”
• The major requirement to extend this type of
AHST are methods to impose the hypothesized
MV model on a univariate sampling frame
• There have been advances in this regard
recently
Cudeck & Browne (1992), Psychometrika
”Constructing a Covariance Matrix that Yields a Specified
Minimizer and a Specified Minimum Discrepancy Function Value”
• Cudeck and Browne (1992) showed how to construct a
covariance matrix according to a model with a
prescribed lack of fit, designed specifically for Monte
Carlo research
• In fact, such Monte Carlo methods – designed to
produce matrices to study – themselves become
hypothesis testing methods in the current paradigm
• We won’t use this method here, because in our
application we need to produce raw data with specified
correlation structure, rather than the covariance/
correlation matrix
• But this method can help when extentions to
covariance structure analysis are considered
Headrick (2002) Computational
Statistics and Data Analysis
“Fast fifth-order polynomial transforms for generating univariate
and multivariate nonnormal distributions”
• Power method (using a high-order polynomial
transformation)
• Draw MV normal data and transform, using up
to fifth order polynomial, which reproduces
up to six moments of specified nonnormal
distribution
Ruscio & Kaezetow (2008) MBR
“Simulating Multivariate Nonnormal Data Using an Iterative
Algorithm”
• SI method (Sample and Iterate)
• “… implements the common factor model
with user-specified non-normal distributions
to reproduce a target correlation matrix”
• Construct relatively small datasets with
specified correlation structure
Work-in-progress – Using AHST with
Multiple Regression, bootstrapping R2
• Design
– Four types of hypothesis tests
•
•
•
•
•
MV Parametric procedure
MV Sampling, regular bootstrap
Ruscio sampling for MV diagonalization
Ruscio sampling for MV diagonalization, sample N2 points
Note: All bootstrap procedures were bias -corrected
– Three 4 X 4 correlation matrices
• One completely uncorrelated, two correlated patterns
– Seven distributional patterns
– 10,000 bootstrap cycles
Correlation matrices
(Population matrices below, MV data with this
correlation structure generated using
Hedrick’s method)
R2 = .27
R2 = .27
Y X1 X2 X3
Y X1 X2 X3
Y 1 .4 .4 .4
1 .4 .4 0
X1 .4 1 .4 .4
.4 1 .2 0
X2 .4 .4 1 .4
.4 .2 1 0
X3 .4 .4 .4 1
0 0 0 1
Distributional patterns
• These combined normal, 1 df chi-square, and
3 df chi-square
– Normal Y, X1, X2, and X3
– Normal Y, 1 df chi-square X1, X2, and X3
– Normal Y, 3 df chi-square X1, X2, and X3
– 1 df chi-square Y, normal X1, X2, and X3
– 3 df chi-square Y, normal X1, X2, and X3
– 1df chi-square Y, X1, X2, and X3
– 3 df chi-square Y, X1, X2, and X3
How these raw data look
Normal Y, ChiSq3 X
0
2
4
6
8
-2
0
2
4
6
3
-2
X1
-2
0
2
4
6
8
-3 -2 -1 0
1
2
Y
-2
0
2
4
6
0
2
4
6
X2
X3
ChiSq3 Y, ChiSq3 X
2
4
6
-1
0
1
2
3
4
5
5
0
6
-1 0
1
2
3
4
Y
6
0
2
4
X1
-1 0
1
2
3
4
5
0
2
4
X2
X3
• To evaluate this method, we put it within a
monte carlo design replicating this process
1000 times per cell
• Results:
.20
All .4
Two .4, One .2, Three 0
.05
.10
Rejection
.15
NormalYNormalNormalNormalX
60
Sample Size
100
200
30
100
200
30
60
100
200
60
100
200
60
100
200
Sample Size
.20
30
.05
.10
Rejection
.15
ChiDf3YChiDf3ChiDf3ChiDf3X
60
.20
30
.10
Rejection
.15
NormalYChiDf3ChiDf3ChiDf3X
.05
Analytic
MV Sampling
OI Rus
OI Rus Sq
30
60
100
200
30
Comments
• Based on these patterns, so far we’re not
convinced that the implementation of the
Ruscio method works effectively for this
problem
• There are theoretical reasons to prefer
Headrick, because that method respects not
only the marginals, but also the original
moments – it recreates our specific population
distributions better
• What’s next for multiple regression/GLM?
– Headrick’s procedure evaluated
– Move onto model comparisons – bootstrap the F
statistic to compare two nested linear models
– Expand number of correlation structures
• What’s next more broadly?
– This approach appears to be generalizable – using
a MV data generation routine to produce
observations with a structure consistent with a
model, then bootstrapping some appropriate
statistic off of that alternative hypothesis to test
the model
– To CFA, for example
– To HLM, with hypothesized structure
Conclusion
• Rodgers (2010): “The … focal point is no
longer the null hypothesis; it is the current
model. This is exactly where the researcher—
the scientist—should be focusing his or her
concern.”
Download