Research Hypotheses and Multiple Regression: 2 • Comparing model performance across populations • Comparing model performance across criteria Comparing model performance across groups This involves the same basic idea as comparing a bivariate correlation across groups • only now we’re working with multiple predictors in a multivariate model This sort of analysis has multiple important uses … • theoretical – different behavioral models for different groups? • psychometric – important part of evaluating if “measures” are equivalent for different groups (such as gender, race, across cultures or within cultures over time) is unraveling the multivariate relationships among measures & behaviors • applied – prediction models must not be “biased” Comparing model performance across groups There are three different questions involved in this comparison … Does the predictor set “work better” for one group than another? • Asked by comparing R2 of predictor set from the 2 groups ? • we will build a separate model for each group (allowing different regression weights for each group) • then use Fisher’s Z-test to compare the resulting R2s Are the models “substitutable”? • use a cross-validation technique to compare the models • use Steiger’s t-test to compare R2 of “direct” & “crossed” models Are the regression weights of the 2 groups “different” ? • use Z-tests to compare the weights predictor-bypredictor • or using interaction terms to test for group differences Things to remember when doing these tests!!! • the more collinear the variables being substituted, the more collinear they will be -- for this reason there can be strong collinearity between two models that share no predictors • the weaker the two models (lower R²), the less likely they are to be differentially correlated with the criterion • nonnill-H0: tests are possible -- and might be more informative!! • these are not very powerful tests !!! • compared to avoiding a Type II error when looking for a given r , you need nearly twice the sample size to avoid a Type II error when looking for an r-r of the same magnitude • these tests are also less powerful than tests comparing nested models So, be sure to consider sample size, power and the magnitude of the r-difference between the non-nested models you compare ! Comparing multiple regression models across groups 3 Group #1 (larger n) “direct model” R²D1 y’1 = b1x + b1z + a1 ?s Group #2 (smaller n) “direct model” y’2 = b2x + b2z + a2 R²D2 Does the predictor set “work better” for one group than another? Compare R²D1 & R²D2 using Fisher’s Z-test • Retain H0: predictor set “works equally” for 2 groups • Reject H0: predictor set “works better” for higher R2 group Remember!! We are comparing the R2 “fit” of the models… But, be sure to use R in the computator!!!! Are the multiple regression models “substitutable” across groups? Group #1 (larger n) “G1 direct model” R²D1 y’1 = b1x + b1z + a1 Apply the model (bs & a) from Group 2 to the data from Group 1 “G1 crossed model” y’1 = b2x + b2z + a2 Compare R²D1 & R²X1 Group #2 (smaller n) “ G2 direct model” y’2 = b2x + b2z + a2 R²D2 Apply the model (bs & a) from Group 1 to the data from Group 2 R²X1 “G1 crossed model” y’1 = b2x + b2z + a2 Compare R²D2 & R²X2 using Hotelling’s t-test or Steiger’s Z-test will need rDX -- correlation between models – from each group R²X2 Are the regression weights of the 2 groups “different” ? • test of an interaction of predictor and grouping variable • Z-tests using pooled standard error terms Asking if a single predictor has a different regression weight for two different groups is equivalent to asking if there is an interaction between that predictor and group membership. (Please note that asking about a regression slope difference and about a correlation difference are two different things – you know how to use Fisher’s Test to compare correlations across groups) This approach uses a single model, applied to the full sample… Criterion’ = b1predictor + b2group + b3predictor*group + a If b3 is significant, then there is a difference between then predictor regression weights of the two groups. However, this approach gets cumbersome when applied to models with multiple predictors. With 3 predictors we would look at the model … y’ = b1G + b2P1 + b3G*P1 + b4P2 + b5G*P2 + b6P3 + b7G*P3 +a Each interaction term is designed to tell us if a particular predictor has a regression slope difference across the groups. Because the collinearity among the interaction terms and between a predictor’s term and other predictor’s interaction terms all influence the interaction b weights, there has been dissatisfaction with how well this approach works for multiple predictors. Also, because his approach does not involve constructing different models for each group, it does not allow… • the comparison of the “fit” of the two models • an examination of the “substitutability” of the two models Another approach is to apply a significance test to each predictor’s b weights from the two models – to directly test for a significant difference. (Again, this is different from comparing the same correlation from 2 groups). The most common formula is … bG1 - bG2 Z = -----------------SE b-difference However, there are competing formulas for “SE b-difference “ The most common formula (e.g., Cohen, 1983) is… SE b-difference (dfbG1 * SEbG12) + (dfbG2 * SEbG22) = --------------------------------------------√ dfbG1 + dfbG2 However, work by two research groups has demonstrated that, for large sample studies (both N > 30) this Standard Error estimator is negatively biased (produces error estimates that are too small), so that the resulting Z-values are too large, promoting Type I & Type 3 errors. • Brame, Paternost, Mazerolle & Piquero (1998) • Clogg, Petrova & Haritou (1995) Leading to the formulas … SE b-difference = √ ( SEbG12 + SEbG22 ) and… bG1 - bG2 Z = --------------------------√ ( SEbG12 + SEbG22 ) Match the question with the most direct test… Practice is better correlated to performance for novices than for experts. The structure of a model involving practice, motivation & recent experience is different for novices than experts. Practice has a larger regression weight in the model for novices than for experts. Practice contributes to the regression model for novices, but not for experts. Testing r for each group Comparing r across groups Testing b for each group Comparing R2 across groups A model involving practice, motivation & recent experience better predicts performance of novices than experts. Comparing R2 of direct & crossed models Practice is correlated with performance for novices, but not for experts. Comparing b across groups Comparing model performance across criteria • same basic idea as comparing correlated correlations, but now the difference between the models is the criterion, not the predictor. There are two important uses of this type of comparison • theoretical/applied -- do we need separate models to predict related behaviors? • psychometric -- do different measures of the same construct have equivalent models (i.e., measure the same thing) ? • the process is similar to testing for group differences, but what changes is the criterion that is used, rather than the group that is used • we’ll apply the Hotelling’s t-test and/or Steiger’s Z-test to compare the structure of the two models Are multiple regression models “substitutable” across criteria? Criterion #1 “A direct model” “A” R²DA A’ = bx + bx + a Apply the model (bs & a) from group 2 to the data from group 1 “A crossed model” Criterion #2 “B” “B direct model” B’ = bx + bx + a R²DB Apply the model (bs & a) from group 1 to the data from group 2 R²XA A’ = bx + bx + a Compare R²DA & R²XA “B crossed model” R²XB B’ = bx + bx + a Compare R²DB & R²XB using Hotelling’s t-test or Steiger’s Z-tests (will need rDX -- r between models) Retaining the H0: for each suggests group comparability in terms of the “structure” of a single model for the two criterion variables -- there is no direct test of the differential “fit” of the two models to the two criteria.