Handout Linear Regression

Handout Linear Regression – Spring 2016 Consuelo Arbona Ph.D. EPSY 8334 University of Houston Definition: Multiple linear regressions allow one to examine the individual and collective contribution of more than one independent variable (or predictor variable) on a dependent variable (also called criterion variable). For example, regression could be used to examine the contribution of gender, ethnic identity and self esteem to depression scores. Typically, one would say depression scores are regressed onto gender, ethnicity, self esteem and ethnic identity scores. Regression assumes that the relations of the independent and dependent variables are linear (positive or negative). Regression line – best line that describes the relation of one or more dependent variables ( X1 X2 X3) to an independent variable Y: Y = b1 X1 + b2 X2 + b3 X3 + b4 X4 + a (constant). The bs are the beta coefficients, or weight of each variable). The larger the b the stronger the unique association of the predictor variable to the criterion. Characteristics: Both criterion and predictor variables in regression analyses should be continuous. Categorical variables can be included as predictors only if they have two levels (e.g. gender: 1 = F; 2 =M). Categorical variables with more than two levels (e.g. ethnicity: White, Black and Hispanic) need to be dummy-coded. One way to do it is to create two dummy variables that use one group- lets say Whites - as the reference group: Var 1 Hisp where the coding goes Hisp = 1 and W =2 and B = 2 ; Var 2 Black where Black =1 and W=2 and H = 2--- so we code three ethnic groups in terms of two variables Hisp with two values 1= Hisp and 2 for every one else and Black where 1 stands for Blacks and 2 stands for every one else. Both Hisp (coded 1,2)and Black (coded 1,2) are entered as variables to account for three ethnic groups. Predictor variables should be highly related to the criterion variable and the correlation of the predictor variables among themselves should be low (to aovid multicollinearity problems). Three types of linear regression analyses: Simultaneous, Stepwise, and Hierarchical Simultaneous – all predictors are entered at once in the equation Stepwise – Computer use an algorithm to decide which predictors variables to enter. For example in forward stepwise regression, the variable with the highest correlation with the dependent variable is entered first, followed by the variable that has the highest correlation with the dependent variable once controlling for the fist variable entered and so on. Stepwise methods (there is also backward stepwise) are used for strictly prediction purposes (not to test conceptual models)and with large sample sizes (about 40 cases per predictor). Results typically are very sample specific and typically do not generalize to other samples. Hierarchical – researcher chooses order that variables are entered in the equation, decided according to (1) causal priority – variables presumed to cause other predictor variables are entered first (e.g. parental SES and offspring’s academic attainment), (2) research relevance/theory- those variables that have been studied before are entered first; variables that theory predict should EPSY 8334 - Linear Regression Page 1/5 antecede other variables are entered first (3) main effect variables are always entered before interaction effect variables. Hierarchical regression typically is used to test psychological models. Sample size – Depends on the level of power one wants, the expected value of R2 and the number of predictors. Some authors give the following formula to calculate minimum sample size for linear regression (not considering interaction effects): SS = 50 + 8k (k = # of predictors) Interpretation of Regression Table: R2 Equals the proportion of variance in the criterion variable accounted for or “explained” by the linear combination of the predictor variables. It is considered a measure of effect size. (Comparing size of R2 from different studies is tricky because determining the relative magnitude of the R2 requires taking into consideration range of scores in independent variables, number of independent variables in the analyses and sample size for each study.) Δ R2 In hierarchical regression, ΔR2 equals the proportion of variance accounted for by a predictor variable (or a collection of predictor variables) entered in one step over and above the proportion of variance accounted by all the predictor variables that have been entered in the previous steps in the equation (in other words, the incremental value of R2 ). (B) Non-standardized Beta Partial regression coefficients (Column labeled B in the SPSS Regression Table refers to the bs in the equation: Y = b1 X1 + b2 X2 + b3 X3 + c). The Non-standardized Betas indicate how much Y (the value of the criterion variable or DV) will change for a unit of change in the predictor variable (IV) when all the other variables are controlled for. Each B is expressed in the metric of the DV (e.g in one variable scores may range from 1 to 5 and another form 1to 10), therefore the values of the Nonstandardized Bs cannot be compared with each other. β Standardized Beta (or Beta coefficients ) are the Beta values standardized (expressed in terms of standard deviation units from the mean of the criterion or DV), so within one regression analyses they can be compared to each other (However, without a test of significance one cannot determine if the differences observed in  coefficients in a regression output are statistically significant or not). EPSY 8334 - Linear Regression Page 2/5 Table 2 from Pina-Watson et al., (2013) study. Interpretation of Hierarchical Regression Table  Step 1 SES Generation .08 - .09 Step 2 SES Generation Indep. Self-C .01 -.03 .45*** Step 3 SES Generation Indep. Self-C Barriers CDSE  R² .02 R² .02 .19*** .21*** .10 .31*** -.01 -.02 .26*** -.18*** .29*** Step 4 DV – Life Satisfaction R² indicates that the combination of SES and Generation shares 2% of variance in Life Satisfaction (LS). The R² is not stat. sig. –which means that the combination of variables is not associated to LS. The s for SES and Gen are not stat. sig. either. The R² from step 1 to step 2 = .19 and is statistically significant, which indicates that the addition of Indep. Self C. increased by 19% the amount of variance explained in LS above and beyond the control variables. The for Indep. Self C is statistically significant, which means that when controlling for the other two variables, Indep. Self C contributes unique variance to LS. The R² is stat sig. This means that controlling for all other variables in the model (those entered in Step 1 and Step 2) Barriers and CDSE, as a set, contributed additional variance to LS. Examination of the coefficients in Step 3 shows that when controlling for all variables the model, Indep. Self C., Barriers and CDSE are stat. sig. which means that each one contributed unique variance to LS, in the expected direction .06** .52*** If the study would have examined moderation effects, the interaction terms would have been entered in this stepIf the R² is statistically significant in the step where interaction terms are entered, then, the interaction effect is statistically significant. But is it strong enough? The  for the interaction term is examined next. If the  for the interaction term is stat. sig., it means that the relation of the predictor to the criterion is moderated by the moderator. Therefore, in order to claim an interaction effect both, the R² in the step where interaction term is entered and the  for the interaction term must be statistically significant In the next page this is explained further based on the Wei et al (2012) article. **p < .01, ***p < .001 EPSY 8334 - Linear Regression Page 3/5 Wei et al (2012) Moderation Analyses: Class 4 Moderation analyses is used to examine to what extent the strength and direction of the corelation of a predictor variable (perceived racial discrimination—PRDE ) to a criterion variable (PTSD) is different for people who vary in terms of a third variable –moderator (Ethnic Social Connectedness ESC). This refers to interaction effects or the multiplicative effect of the predictor and the moderator on the outcome of interest. In this case, (PRD X ESC) To test the interaction effect, a new variable – called interaction term- is computed multiplying scores in the predictor and the moderator for each participant – (PRD X ESC, in this case). The interaction term is entered in the last step of the hierarchical regression analyses. If the R² for that step is statistically significant (indicating that the interaction term explains an additional amount of variance), is the first indication that there may be an interaction effect. The predictor and moderator variables must be entered in earlier steps so the interaction term (so that the R² can be examined). In the table above, the R² = .02* from step 3 to 4 is statistically significant therefore, the interaction effect may be statistically significant. In step 4, the  for one of the two interaction terms is also stat. sig. (PRD X Ethnic SC;  = -.12*). Therefore, the relations of perceived racial discrimination (PRD) to PTSD is moderated by Ethnic Social Connectedness - But how?? Is the moderation effect consistent with the researcher’s hypotheses?? The nature (strength and direction of correlation) of the interaction is revealed by two separate graphs plotting the relation of PRD to PTSD among those who scored either high \ (1 SD>mean) or low ( 1SD<mean) in Ethnic SC. What do the graphs show?? How to plot such graph? EPSY 8334 - Linear Regression Page 4/5 Interaction Effect: plotting the interaction effect (Class 5) Plot the interaction for the Wei et al (2012) article using the Excel file provided by http://www.jeremydawson.co.uk/slopes.htm 1. Read the Analyses section of Wei et al (2012) to determine if they standardized the variables before conducting the regression. 2. Determine if the article reports results of a two-way or a three-way interaction. 3. Select the appropriate Excel File in the web site – enter the name of the Predictor and moderator variables 4. Consult Table 2 in the Wei et al (2012) article to obtain the values for the unstandardized (B) values for the interaction effect (the table does not include the intercept/constant- so do not change that value). 5. Describe what the graph shows regarding the relation of discrimination to PTSD symptoms for participants high versus low in social connectedness with their ethnic group. If you have two standardized variables, you can plot your interaction effect by entering the unstandardized Beta regression coefficients (including intercept/constant) in the following worksheet. If you have control variables in your regression, the values of the dependent variable displayed on the plot will be inaccurate unless you also standardize (or centered) all control variables first (although the pattern, and therefore the interpretation, will be correct). Note that the interaction term should not be standardized after calculation, but should be based on the standardized values of the IV & moderator. 2-way_standardised.xls EPSY 8334 - Linear Regression Page 5/5

Handout Linear Regression

Related documents

Products

Support

Handout Linear Regression

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib