Extending the Bootstrap Using a Univariate Sampling Model to Multivariate Settings Joseph Lee Rodgers and William Beasley University of Oklahoma Focusing on the Alternative (Researcher’s) Hypothesis: AHST Joseph Lee Rodgers and William Beasley University of Oklahoma Goals • To review history of NHST • To review recent attention to AHST • To show how the bootstrap is facile in AHST settings using univariate correlations • To extend this conceptualization into multivariate correlation settings – Multiple regression – Factor analysis, etc. History of NHST • Not necessary, though entertaining • NHST emerged in the 1920’s and 1930’s from Fisher’s hypothesis testing paradigm, and from the Neyman-Pearson decision-making paradigm • In its modern version, it combines Fisher’s attention to the null hypothesis and p-value with Neyman-Pearson’s development of alpha, the alternative hypothesis, and statistical power • The result: “an incoherent mismash of some of Fisher’s ideas on the one hand, and some of the ideas of Neyman and E. S. Pearson on the other hand” (Gigerenzer , 1993) Criticism of NHST • NHST “never makes a positive contribution” (Schmidt & Hunter, 1997) • NHST “has not only failed to support and advance psychology as a science but also has seriously impeded it” (Cohen, 1994) • NHST is “surely the most boneheadedly misguided procedure ever institutionalized in the rote training of science students” (Rozeboom, 1997). • In a recent American Psychologist paper I suggested that the field of quantitative methodology has transitioned away from NHST toward a modeling framework that emphasizes the researcher’s hypothesis – an AHST framework – with relatively little discussion Praise for AHST • AHST “Alternative Hypothesis Significance Testing” • “Most tests of null hypotheses are rather feckless and potentially misleading. However, an additional brand of sensible significance tests arises in assessing the goodness-of-fit of substantive models to data.” (Abelson, 1997) • “After the introduction of … structural models …, it soon became apparent that the structural modeler has, in some sense, the opposite intention to the experimentalist. The latter hopes to “reject” a restrictive hypothesis of the absence of certain causal effects in favor of their presence—rejection permits publication. . . . The former wishes to “accept” a restrictive model of the absence of certain causal effects—acceptance permits publication.” (McDonald, 1997) • Our “approach allows for testing null hypotheses of not-good fit, reversing the role of the null hypothesis in conventional tests of model fit, so that a significant result provides strong support for good fit.” (MacCallum, Browne, & Sugawara, 1996) More background • Both Gosset and Fisher wanted to test hypotheses using resampling methods: • “Before I had succeeded in solving my problem analytically, I had endeavored to do so empirically.” (Gosset, 1908) • “[The] conclusions have no justification beyond the fact they could have been arrived at by this very elementary [re-randomization] method.” (Fisher, 1936) • Both actually did resampling (rerandomization) – Gosset (1908) used N=3000 datapoints collected from prisoners, written on slips of paper and drawn from a hat – Fisher (1920’s) used some of Darwin’s data, comparing cross-fertilized and self-fertilized corn • So one reasonable view is that these statistical pioneers developed parametric statistical procedures because they lacked the computational resources to use resampling methods Modern Resampling • Randomization or permutation tests – Gosset and Fisher • The Jackknife – Quennoille, 1949; Tukey, 1953 • The Bootstrap – Efron, 1979 Bootstrapping Correlations • Early work – Diaconis and Efron, Scientific American, 1983, provided conceptual motivation – But straightforward percentile-based bootstraps didn’t work very well – So they were bias-corrected, and accelerated, and for awhile the BCa bootstrap was the state-of-theart • Lee and Rodgers (1998) showed how to regain conceptual simplicity using univariate sampling, rather than bivariate sampling – Especially effective in small samples and highly skewed settings – Was as good or even better than parametric methods in normal distribution settings • Example using Diaconis & Efron (1983) data • Beasley et al (2007) applied the same univariate sampling logic to test nonzero null hypotheses about correlations • The methodology used there is what we’re currently extended to multivariate settings, and will be described in detail • Steps in the Beasley et al approach – Given observed bivariate data, and sample r – Define a univariate sampling frame (rectangular) that respects the two marginal distributions – Diagonalize the sampling frame to have a given correlation using matrix square root procedure (e.g., Kaiser & Dickman, 1962) – Use the new sampling frame to generate bootstrap samples and construct an empirical sampling distribution • Two methods: – HI, hypothesis-imposed – generate an empirical bootstrap distribution around the hypothesized null non-zero correlation • E.g., to test ρ = .5, diagonalize sampling frame to have correlation of .5, bootstrap, then evaluate whether observed r is contained in the 95% percentile interval – OI, observed-imposed – generate an empirical bootstrap distribution around the observed r • E.g., to test ρ = .5, diagonalize sampling frame to have correlation of observed r, bootstrap, then evaluate whether hypothesized ρ = .5 is contained in the 95% percentile interval • Both OI and HI work effectively, OI seems to work slightly better • Beasley’s (2010) dissertation was a Bayesian approach that is highly computationally intensive – both the bootstrap and the Bayesian method require lots of computational resources – but it works quite well Review the AHST logic • Define a hypothesis, model, theory, that makes a prediction (e.g., of a certain correlation, or a correlation structure) • Define a sampling distribution in relation to that hypothesis • Directly test the hypothesis, using an AHST logic • Why didn’t Gosset/Fisher or Neyman-Pearson do this? They didn’t have a method to generate a sampling distribution in relation to the alternative hypothesis – they only had the computational/mathematical ability to generate a sampling distribution around the null, so that’s what they did, and the rest is history (we hope) • But using resamping theory, and the bootstrap in particular, we can generate a sampling distribution (empirically) in settings with high skewness, unknown distribution, small N, using either HI or OI logic • Note – be prepared for some computationally intensive methods • The programmers and computers will have lots of work to do • But to applied researchers, the consumers of these methods, this computational intensity can be transparent Previous applications • Bollen & Stine (1993) used a square root transformation to adjust the bootstrap for SEM fit indices, using similar logic to that defined above • Parametric bootstraps are now popular – these use a distributional model of the data, rather than the observed data, to draw the bootstrap • Zhang & Browne (2010) used this method in a dynamic factor analysis of time series data (in which the model was partially imposed by using a moving block bootstrap across the time series) MV “diagonalization” • The major requirement to extend this type of AHST are methods to impose the hypothesized MV model on a univariate sampling frame • There have been advances in this regard recently Cudeck & Browne (1992), Psychometrika ”Constructing a Covariance Matrix that Yields a Specified Minimizer and a Specified Minimum Discrepancy Function Value” • Cudeck and Browne (1992) showed how to construct a covariance matrix according to a model with a prescribed lack of fit, designed specifically for Monte Carlo research • In fact, such Monte Carlo methods – designed to produce matrices to study – themselves become hypothesis testing methods in the current paradigm • We won’t use this method here, because in our application we need to produce raw data with specified correlation structure, rather than the covariance/ correlation matrix • But this method can help when extentions to covariance structure analysis are considered Headrick (2002) Computational Statistics and Data Analysis “Fast fifth-order polynomial transforms for generating univariate and multivariate nonnormal distributions” • Power method (using a high-order polynomial transformation) • Draw MV normal data and transform, using up to fifth order polynomial, which reproduces up to six moments of specified nonnormal distribution Ruscio & Kaezetow (2008) MBR “Simulating Multivariate Nonnormal Data Using an Iterative Algorithm” • SI method (Sample and Iterate) • “… implements the common factor model with user-specified non-normal distributions to reproduce a target correlation matrix” • Construct relatively small datasets with specified correlation structure Work-in-progress – Using AHST with Multiple Regression, bootstrapping R2 • Design – Four types of hypothesis tests • • • • • MV Parametric procedure MV Sampling, regular bootstrap Ruscio sampling for MV diagonalization Ruscio sampling for MV diagonalization, sample N2 points Note: All bootstrap procedures were bias -corrected – Three 4 X 4 correlation matrices • One completely uncorrelated, two correlated patterns – Seven distributional patterns – 10,000 bootstrap cycles Correlation matrices (Population matrices below, MV data with this correlation structure generated using Hedrick’s method) R2 = .27 R2 = .27 Y X1 X2 X3 Y X1 X2 X3 Y 1 .4 .4 .4 1 .4 .4 0 X1 .4 1 .4 .4 .4 1 .2 0 X2 .4 .4 1 .4 .4 .2 1 0 X3 .4 .4 .4 1 0 0 0 1 Distributional patterns • These combined normal, 1 df chi-square, and 3 df chi-square – Normal Y, X1, X2, and X3 – Normal Y, 1 df chi-square X1, X2, and X3 – Normal Y, 3 df chi-square X1, X2, and X3 – 1 df chi-square Y, normal X1, X2, and X3 – 3 df chi-square Y, normal X1, X2, and X3 – 1df chi-square Y, X1, X2, and X3 – 3 df chi-square Y, X1, X2, and X3 How these raw data look Normal Y, ChiSq3 X 0 2 4 6 8 -2 0 2 4 6 3 -2 X1 -2 0 2 4 6 8 -3 -2 -1 0 1 2 Y -2 0 2 4 6 0 2 4 6 X2 X3 ChiSq3 Y, ChiSq3 X 2 4 6 -1 0 1 2 3 4 5 5 0 6 -1 0 1 2 3 4 Y 6 0 2 4 X1 -1 0 1 2 3 4 5 0 2 4 X2 X3 • To evaluate this method, we put it within a monte carlo design replicating this process 1000 times per cell • Results: .20 All .4 Two .4, One .2, Three 0 .05 .10 Rejection .15 NormalYNormalNormalNormalX 60 Sample Size 100 200 30 100 200 30 60 100 200 60 100 200 60 100 200 Sample Size .20 30 .05 .10 Rejection .15 ChiDf3YChiDf3ChiDf3ChiDf3X 60 .20 30 .10 Rejection .15 NormalYChiDf3ChiDf3ChiDf3X .05 Analytic MV Sampling OI Rus OI Rus Sq 30 60 100 200 30 Comments • Based on these patterns, so far we’re not convinced that the implementation of the Ruscio method works effectively for this problem • There are theoretical reasons to prefer Headrick, because that method respects not only the marginals, but also the original moments – it recreates our specific population distributions better • What’s next for multiple regression/GLM? – Headrick’s procedure evaluated – Move onto model comparisons – bootstrap the F statistic to compare two nested linear models – Expand number of correlation structures • What’s next more broadly? – This approach appears to be generalizable – using a MV data generation routine to produce observations with a structure consistent with a model, then bootstrapping some appropriate statistic off of that alternative hypothesis to test the model – To CFA, for example – To HLM, with hypothesized structure Conclusion • Rodgers (2010): “The … focal point is no longer the null hypothesis; it is the current model. This is exactly where the researcher— the scientist—should be focusing his or her concern.”