Education 795 Class Notes P-Values, Partial Correlation, Multi-Collinearity Note set 4 Today’s Agenda Announcements (ours and yours) Q/A? Leveraging what we already know Partial Correlation and Multi-Collinearity P-Values “p-value refers to the probability of the evidence having arisen as a result of sampling error given that the null hypothesis is true” (Pedhazur & Pedhazure, 1991) What is inherently wrong the p-values? Why do we use them? P-Values “Even though I am very critical of statistical inference… I shall probably continue to pay homage to “tests of significance” in the papers I submit to psychological journals. My rationale for this admitted hypocrisy is straightforward: until the rules of the science game are changed, one must abide by at least some of the old rules, or drop out of the game” (Mahoney, 1976, p. xiii) What to do? “Perhaps p values are like mosquitos. They have an evolutionary niche somewhere and no amount of scratching, swatting, or spraying will dislodge them” (Campbell, 1982, p 698) Statistical Significance vs. Practical Significance We should refrain from what Tukey calls “statistical sanctification.” Concern with practical significance is addressed through effect sizes or relational magnitudes (betas in regression). “A difference is a difference only if it makes a difference” (Huff, 1954, p. 58) Introduction to Effect Size Effect sizes imply strength of meaningfulness or importance General Rule set forth by Cohen (1988) for small, medium, large ES We will address how effect sizes are computed later in the course Transition Back to Multiple Regression 1. Multiple predictors typically yield better technical solutions (e.g., higher R2) 2. Multiple predictors provide opportunities to test more realistic models (e.g., why is nothing as simple as it should be?) 3. Multiple regression models allow for an examination of more complex research hypotheses than is possible with simple regression / correlation approaches Regression Raw score depiction: where each b: is the unique and independent contribution of that predictor to the model for quantitative IVs, the expected direction and amount of change in the DV for each unit change in the IV, holding all other IVs constant For dichotomous IVs, the direction and amount of group mean difference on DV, holding all other IVs constant Revisit b’s Example: Dependent Variable: Promote Racial Understanding Independent Variable: Sex, Race bsex = rsex,promote if sex and race are not correlated. These are population based estimates and they are “effect sizes” because we can compare relative strength of predictors in the model In the Venn diagram on the following slide, note X1 and X2 are not correlated but X2 and X3 are Venn Diagram Depiction Correlation Regression Coefficients Warning Pedhazur believes that the topics of partial correlations and semi-partial correlations can be confusing and lead to misinterpretations of regression coefficients. Why talk about them? Awareness and enough knowledge to evaluate research where partials are used Partial Correlations A variation on the idea of residualization (removal of the predictable part of y from y) First-order partial correlations: correlation of variable 1 and 2 partialling variable 3 from 1 and 2 Plug and Chug r Quiz Exam Speed Motiv Quiz 1.00 .40 .35 .25 Exam Speed Motiv 1.00 .45 .30 1.00 .15 1. What is the correlation between quiz and exam score, controlling for test taking speed? 2. What is the correlation between exam score and motivation, controlling for test taking speed? 1.00 Semi-Partial Correlations r1(2.3)=correlation of variables 1 and 2 after having partialed variable 3 only from variable 2. (semi-partial) VS r12.3=correlation of variables 1 and 2 after having partialed variable 3 from both variable 1 and variable 2 (partial) Before Jumping Into Regression Examine the data using common-sense (e.g., are the data appropriate for producing interpretable correlation coefficients?) as well as standard diagnostic procedures Review the r among the predictors for collinearity problems Multicollinearity Multicollinearity refers to correlations among the independent variables only Multicollinearity is measured by the tolerance statistic, defined as 1 – R2 predicting each predictor using all other predictors (values close to 1 are better, values close to 0 are bad) Excessive collinearity (even singularity – perfect correlation between two or more IVs) suggests that predictors have extensive overlaps, and we may need to be selective in picking predictors or combining them (through factor analytic techniques) Dangers Multicollinearity has adverse effects on regression analysis High multicollinearity leads to a reduction in the magnitude of the b’s High multicollinearity leads to inflated se’s, reducing the t-ratios for the coefficients Solutions Be selective in choosing variables that are related Combine like variables into an index using scales or ‘factor analysis’ which we will talk about soon Suppressors When a partial correlation is larger than the original r, it is considered to be the result of a suppressor effect Suppressor variables effectively mask (suppress) the relationship between other variables This effect occurs when there is an unbalanced mix of +/- correlations between the DV and the IVs Project Activity Dataset: Chose a dataset and run a multiple regression Dependent variable: SATC=SATM+SATV Independent variables: sex, family income, mother’s education and father’s education Use syntax to get the tolerance statistic Rerun the regression summing mothers and fathers education into one variable. Compare the tolerance statistic for mothers and fathers education with the summed index. For Next Week Read Pedhazur Ch 10 p211-216 Read Pedhazur Ch 14 p304-310 Read Pedhazur Ch 19 p464-466 Read Pedhazur Ch 21 p545-558, p567579