Logistic Regression Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Michael J. Kalsher Department of Cognitive Science PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2012, Michael Kalsher 1 Outline • Logistic regression: When and why - Binary - Multinomial • Theory behind logistic regression – Assessing the model – Assessing predictors – Things that can go wrong • Interpreting logistic regression PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 2 Using Logistic Regression: When and Why? • To predict an outcome variable that is categorical from one or more categorical or continuous predictor variables. • Used because having a categorical outcome variable violates the assumption of linearity in normal regression. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 3 Examples of its Use • Medical Research - Using a database of patient information to predict the malignancy of a tumor If the predicted probability of malignancy for a tumor detected in a patient is low, then the physician may decide not to carry out expensive and painful surgery. • Levels of Analysis - Predicting membership of only two categorical outcomes = binary logistic regression. Predicting membership of more than two categories = multinomial regression. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 4 Principles Behind Logistic Regression • In simple linear regression we saw that the outcome variable is predicted from the equation: Yi = b0 + b1X1i + b2X2i + bnXni +εi • An assumption of the linear model is that the relationship between variables is linear. • When the outcome variable is categorical, this assumption is violated. • One way around this problem is to transform the data using a logarithmic transformation that expresses the non-linear relationship in a linear way. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 5 Expressing the regression equation logarithmically: The Logit • Logistic regression expresses the multiple linear regression equation in logarithmic terms (the logit), thereby overcoming the problem of violating the linearity assumption. • Instead of predicting the value of the outcome variable (“Y”) from one or more predictor variables (“X’s”), we predict the probability of the outcome occurring, given known values of the predictors. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 6 The Logistic Regression Equation: One Predictor • Outcome – We predict the probability of the outcome occurring P(Y ) = 1+e-( b0+b1X1i ) 1 • b0 and b1 • • • P(Y) = probability of Y e = the base of natural logarithms The other coefficients form a linear combination much the same as in simple regression – Can be thought of in much the same way as multiple regression – Note the normal regression equation forms part of the logistic regression equation PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher The Logistic Regression Equation: Several Predictors • Outcome – The equation expands to accommodate additional predictors – We still predict the probability of the outcome occurring – “0” (Y very unlikely to have occurred) to “1” (Y very likely to have occurred) P(Y ) = 1+e-(b0+b1X1i+b2X2 i+...+bnXni ) 1 PSYC 4310/6310 Slide 8 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Estimating Parameters: Linear vs. Logistic Regression • In linear regression, coefficients (parameters) are estimated using the least squares method. • In logistic regression, maximum-likelihood estimation is used. – Selects coefficients that make the observed values most likely to have occurred. – The chosen estimates of the bs will be ones that, when values of the predictor variables are placed in it, result in values of Y closest to the observed values. PSYC 4310/6310 Slide 9 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Assessing the Model: the Loglikelihood statistic Multiple regression: To assess how well our model fits the data, we compare the observed and predicted values to compute R2. Note: Remember that R2 is the squared Pearson correlation between observed values of the outcome and the values predicted by the regression model. Logistic regression: Here the assessment of model fit is based on summing the probabilities associated with the predicted and actual outcomes. The measure is termed the log-likelihood statistic. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Assessing the Model: the Loglikelihood statistic – Is analogous to the residual sum of squares in multiple regression, in that it is an indicator of how much unexplained information there is after the model has been fitted. – Large values indicate poorly fitting statistical models. log likelihood N Y lnPY 1 Y ln1 PY i i1 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher i i i Assessing the Model: the deviance statistic – The deviance, referred to as “-2LL”, is closely related to log-likelihood: Deviance = -2 x log-likelihood – Has a chi-square distribution, so is easy to calculate the significance of the value. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher The logic behind the deviance statistic – We’ve seen that it is useful to compare a model against some baseline state. • In multiple regression, the most basic model was the “mean”. – With a categorical outcome it makes no sense to use the mean. All we know is whether an event happened or not. • In logistic regression, the baseline model is the value of the outcome that occurs most often. • Note: This is the logistic regression model when only the constant is included. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher The logic behind the deviance statistic – If we add one or more predictors to the model, we can compute the improvement of the model as follows: – This difference is termed a likelihood ratio. – We can build up models hierarchically and make comparisons at each step. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Assessing the model: The R-statistic – is the partial correlation between the outcome variable and each of the predictor variables. – Varies between −1 and 1. – If a variable has a small value of R it means that it contributes a small amount to the model. • Positive values – As the predictor variable increases, likelihood of the event occurring increases. • Negative values – As the predictor variable increases, the likelihood of the outcome occurring decreases. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Assessing the model: Calculating the R-statistic Where: -2LL = the deviance for the original model z = the Wald statistic Note: The Wald statistic can be inaccurate under certain conditions, so the value of R should be treated with caution! PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Analogues to 2 R: Hosmer & Lemeshow, 1989 Hosmer and Lemeshow’s to R2L Represents the proportional reduction in the absolute value of the log-likelihood measure. Measures how much the “badness of fit” improves as a result of the inclusion of the predictor variables. 0 = predictors are useless; 1 = model predicts the outcome variable perfectly. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Analogues to 2 R: Cox & Snell, 1989 Cox and Snell’s R2C R2CS = 1 - exp ( (-2LL(new) – (-2LL (baseline) n Note: This is what SPSS reports. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher ) The Wald Statistic: Assessing the contributions of the predictors Wald b SE b • Similar to t-statistic in Regression. • Tests the null hypothesis that b = 0. • Biased when b is large (because it inflates the standard error and increases the chances of a Type II error). • Alternative is to enter predictors hierarchically and look at Likelihood-ratio statistics. PSYC 4310/6310 Slide 19 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher The odds ratio: exp(B) Odds after a unit change in the predictor Odds ratio = Original odds • Crucial to interpretation of logistic regression. • Indicates the change in odds resulting from a unit change in the predictor. – Odds Ratio > 1 means that as the predictor increases, the odds of the outcome occurring increase. – Odds Ratio < 1 means that as the predictor increases, the odds of the outcome occurring decrease. PSYC 4310/6310 Slide 20 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Methods of Entry • Forced Entry: All variables entered simultaneously. • Hierarchical: Variables entered in blocks. – Blocks should be based on past research, or theory being tested. Good Method. • Stepwise: Variables entered on the basis of statistical criteria (i.e. relative contribution to predicting outcome). – Should be used only for exploratory analysis. PSYC 4310/6310 Slide 21 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Modeling Building: Parsimony is best • Build an initial model that includes all potential predictors, then systematically remove any that don’t seem to contribute to the model. – predictors should not be included unless they have explanatory benefit. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Sources of Bias: Assumptions – Linearity • assumes there is a linear relationship between any continuous predictors and the logit of the outcome variable. • Tested by looking at the interaction term between the predictor and its log transformation. – Independence of Errors • Violating this assumption produces overidispersion – Multicollinearity PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Unique Problems • Incomplete Information • Complete Separation • Overdispersion PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Unique Problems: Incomplete information regarding the predictors • Categorical Predictors: – Predicting cancer from smoking and eating tomatoes. – We don’t know what happens when nonsmokers eat tomatoes because we have no data in this cell of the design. • Continuous variables – Will your sample contain one, or a small number of subjects who possess unique characteristics? PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Unique Problems: Complete Separation • Occurs when the outcome variable can be perfectly predicted. – Can occur, for example, when too many variables are fitted to too few cases. • Example: predicting whether a person is a burglar based on their weight. - Example 1: Is it a burglar? Or is it your teenage son/one of his friends? - Example 2: Is it a burglar or your cat? 1 = definitely a burglar; 0 = definitely not a burglar PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Complete Separation 1.0 1.0 0.8 0.8 Probability of Outcome Probability of Outcome Unique Problems: 0.6 0.4 0.2 0.0 0.6 0.4 0.2 0.0 20 30 40 50 60 70 80 90 0 Weight (KG) Advanced Experimental Methods and Statistics 40 60 80 Weight (KG) Relationship between weight (x-axis) and a dichotomous outcome variable. PSYC 4310/6310 20 An example of complete separation. The weights of the two categories do not overlap. © 2011, Michael Kalsher Unique Problems: Overdispersion • Occurs when the variance is larger than expected from the model. • Can be caused by violating the independence assumption. • Tends to result in artificially small standard errors, creating the following problems: – Test statistic will be falsely deemed significant. Recall that statistical tests of regression coefficients are computed by dividing by the standard error. – Overconfidence in the relationship between predictors and the outcome in the population. Recall that confidence intervals are computed from the standard error. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher A Sample Problem: • • • • Andy Field’s NOT Kalsher’s Predictors of a treatment intervention. Participants: 113 adults with a “medical” problem Outcome: Cured (1) or not cured (0). Predictors: – Intervention: intervention or no treatment. – Duration: the number of days before treatment that the patient had the problem. • In SPSS: – Outcome = Categorical variable (cured/not cured) – Intervention = Categorical variable (intervention/no treatment – Duration = Continuous variable (# days person had the problem) PSYC 4310/6310 Slide 29 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 30 Building the Model: Hierarchical Approach We’re most interested in whether the intervention has an effect. Ultimately, though, we want to know which model best fits the data. PSYC 4310/6310 Slide 31 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Logistic Regression: The General Procedure PSYC 4310/6310 Slide 32 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Specifying models using the Logistic Regression dialog box PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 34 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 35 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 36 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 37 We have only one categorical predictor variable. “Indicator” means that standard dummy variable coding will be used. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Change “Last” to “First”, then click on “Change”, then click on “continue”. 38 Initial Output: Remember the coding scheme! PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Output: Overall Model Summary Statistics Represents the difference between Model 1 and the intercept (no predictors). Here, we’re interested in the improvement of Model 2 over Model 1 (given by the chi-square for “Block”). Obtained by taking the difference between the model chi-square for the two models (9.928 – 9.926 = 0.002, which is non-significant (.964). Ditto for Model 3. Conclusion: Duration and Duration x Intervention interaction add nothing to the model. Proceed with Model 1. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Back to Model 1: Some Options Bootstrapping allows us to estimate the properties of the sampling distribution from the sample data. We can use this procedure to establish 95% confidence intervals for the parameter being estimated and also calculation the std.dev. Of the parameter estimates and use this as the standard error. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 41 Back to Model 1: Some Options Predicted Values options are unique to logistic regression. Predicted Probabilities Probabilities of Y occurring, given the values of each predictor for a given case. Predicted group membership Tells us which of the two outcome categories a participant is most likely to belong to based on the model. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 42 Back to Model 1: Some Options Histograms of the actual and predicted values of the outcome variable, useful for assessing the fit of the model to the observed data. Confidence interval for the odds ratio. Assesses how well the chosen model fits the data. Cases for which the standardized residuals are larger than the level specified (here, 2 standard deviations. Iteration history is the only way to get SPSS to display the initial -2LL and we need this value to compute R. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 43 Interpreting Logistic Regression Summary Statistics for the Model Represents the difference between Model 1 (i.e., presence/absence of the intervention) and the intercept (no predictors). The model is highly significant. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Interpreting Logistic Regression Classification Table The Model uses whether a patient had an intervention, or not, to predict whether they were cured or not. PSYC 4310/6310 Predicts that the 57 people who received the intervention will be cured. Predicts that the 56 people who did not receive the intervention will not be cured. Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Interpreting Logistic Regression Classification Table 56 57 The Model -Correctly classifies 32 patients who were not cured, but misclassifies 16 others (correctly classifies 66.7% ). -Correctly classifies 41 patients who were cured, but misclassifies 24 others (correctly classifies 63.1%). -The overall “weighted” accuracy is 64.6%. -With only the constant is included, accuracy is 57.5% (cured/total) PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Interpreting Logistic Regression Model Summary • Wald Statistic tells us whether the b coefficient for the predictor is significantly different from zero (and significantly predicting the outcome). • Interpretation of the coefficients: “The change in the logit of the outcome associated with a one-unit change in the predictor.” PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher The odds ratio: Exp(B) Odds after a unit change in the predictor Odds ratio = Original odds • Indicates the change in odds resulting from a unit change in the predictor. – Odds Ratio greater than 1 means that as the predictor increases, the odds of the outcome occurring increase. – Odds Ratio less than 1 means that as the predictor increases, the odds of the outcome occurring decrease. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 48 ncluded, and block 1 describes the model after Intervention is included. As such, block 1 s the main bit in which we’re interested. The bit of the block 0 output that does come in useful is in Output 19.3, and will be there only if you selected Iteration history in Figure 19.10. This table tells us the initial 2LL, which is 154.084. We’ll use this value later so Output: Block 0 don’t forget it. Interpreting Logistic Regression OU Note: We can use the -2LL value (154.084) to calculate R 19.6.2. Model summary ➁ PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Interpreting Logistic Regression Effect Size Where: • -2LL = the deviance for the original model • z = the Wald statistic PSYC 4310/6310 Advanced Experimental Methods and Statistics Hosmer and Lemeshow’s Measure (1989) © 2011, Michael Kalsher Interpreting Logistic Regression Effect Size PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Summary • The overall fit of the final model is shown by −2LL and its associated chi-square statistic. – If the significance of the chi-square statistic is less than .05, then the model is a significant fit of the data. • Check the table labelled Variables in the equation to see the regression parameters for any predictors in the model. • Look at the Wald statistic and its significance. • Use the odds ratio, Exp(B), for interpretation. – OR > 1, then as the predictor increases, the odds of the outcome occurring increase. – OR < 1, then as the predictor increases, the odds of the outcome occurring decrease. – The confidence interval of Exp(B) should not cross 1. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Reporting the Analysis PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher Example: Factors predicting successful soccer penalty kicks • Outcome: Perfect for logistic regression, since the outcome is a dichotomy: Whether the penalty kick was successful or not. • Predictors: – Previous research suggests two factors: • Whether the kicker is a worrier (measured by the Penn State Worry Questionnaire – PSWQ) • Past success at scoring penalty kicks – State Anxiety may also play a role. • SPSS: – – – – PSYC 4310/6310 Scored: 0 = penalty missed; 1 = penalty scored PSWQ: Measure of the degree to which a player worries. Anxious: Measure of state anxiety before a penalty kick. Previous: Percentage of previous penalty kicks made. Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 54 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 55 Testing for linearity of the logit In this example, we have three continuous variables (PASQ, Anxious, Previous) We need to check that each one is linearly related to the log of the outcome variable (Scored).Run the logistic regression, but include predictors that are the interaction between each predictor and the log of itself. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 56 1. For each predictor, create a new variable that is the log of the original variable. Example: For PSWQ, create a new variable called LnPSWQ by entering this name into the Target Variable box and then “click” on “Type and Label …” 2. When the “Type and Label” dialogue box appears, you may choose to give each predictor the same, or a different, name (e.g., Ln(PSWQ). Then click “Continue” PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 57 3. Replace the “?” with PSWQ by selecting it from the variable list at left and clicking the blue horizontal arrow. Click “OK” and then repeat for each of the other continuous variables (Anxiety, Previous) 1. In the “Function group” box, click on “Arithmetic”, 2. In the “Functions and Special Variables” box click on “Ln” and transfer it to the command area using the blue “up” arrow. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 58 Rerun the analysis, as before, except … PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 59 … force all three variables (PSWQ, Anxiety, Previous) into the “Covariates” box, along with … PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 60 Use the >a*b> button to move the interaction terms: PSWQ & LnPSWQ; Anxiety & LnAnxiety; Previous & LnPrevious into the Covariates box. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 61 Any interaction that is significant indicates the main effect has violated the assumption of linearity of the logit. The results show that the assumption has been met. Note: multicollinearity tests (tolerance, VIF) must be tested by running a linear regression analysis using the same outcome and predictor variables. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 62 Multinomial logistic regression • Used to predict membership of more than two categories. • Analysis breaks the outcome variable down into a series of two-category comparisons. – Say you have three outcome categories: A, B and C. Analysis will consist of two comparisons of your choosing: • Compare everything against the 1st category (A vs. B & A vs. C) • Or your last category (A vs. C and B vs. C) • Or a custom category (B vs. A and B vs. C) • The important parts of the analysis and output are much the same as for binary logistic regression PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 63 Example: How successful are chat lines? • The chat lines used by 348 men and 672 women in a nightclub were recorded. • Outcome: Whether the chat line resulted in one of the following three events: – The person got no response or the recipient walked away. – The person got the recipient’s phone number. – The person left the night-club with the recipient. • Predictors: – The content of the chat lines were rated for: • Funniness (0 = not at all funny, 10 = funniest thing I have ever heard) • Sexuality (0 =no sexual content; 10 = very sexually direct) • Moral values (0 = reflects poor moral values; 10 = high moral values). – Gender of recipient (0 = female; 1 = male) PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 64 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 65 Success is the outcome variable and is comprised of three categories, or levels. Click on “Reference Category” to select the baseline category. Here, the first category (complete rejection) seems like a good choice. Change the default (“Last Category”) to “First Category”. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 66 The “Reference Category” has been changed to the First Category (complete rejection) PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 67 Next, click on “Model” Categorical Predictors go here Continuous Predictors go here PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 68 So we specify a custom model. By default, SPSS examines the main effects of the predictor variables. But the most interesting information often comes from the interactions. For example, funny chat lines may be more effective with women than with men (Gender x Funny Interaction). Similarly chat lines with a high sexual content may be more effective when used on men, but not women (Sex x Gender Interaction). PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 69 Highlight all of the predictors. Use the drop-down list to select “Main Effects” Then move the predictors to the “Forced Entry Terms” box. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 70 Select each desired interaction term and move them to the “Stepwise Terms” box as shown here. Use the “Forward entry” Stepwise method PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 71 Produces Cox-Snell and Negelkerke R2 statistics (effect size estimates). Produces table of observed and expected frequencies. Produces table summarizing the predictors entered/removed at each step. Produces Pearson and likelihood ratio chisquare statistics for the model. Produces table comparing the model(s) to the baseline model. The model overall is tested using likelihood ratio statistics, but this option computes the same test for individual effects in the model. Produces the beta values, test statistics and confidence intervals for predictors in the model—very important! PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 72 Criteria options: Leave the defaults in place unless you receive an error message: “failing to converge”, then increase maximum interactions. But may not solve the problem. PSYC 4310/6310 Advanced Experimental Methods and Statistics Options: The Scale drop-down box can be useful if overdispersion is a problem since it reduces the standard errors used to test the significance and construct the confidence intervals of the parameter estimates. Use the deviance or Pearson options. © 2011, Michael Kalsher 73 Save: Can save predicted probabilities and predicted group membership (same as in binary logistic regression, except they are called Estimated response probabilities and Predicted category. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 74 Indicates a significant decrease in unexplained variance from the baseline model to the final model. added. Because we requested a stepwise analysis for our interaction terms, we get the “Step Summary” summarizing the steps in the analysis. After the main effects were entered (Model 0), the Gender x Funny and the Gender x Sex interactions were entered stepwise and contributed significantly. Note that the AIC (Akaike’s information criterion) and BIC (Schwarz’s Bayesian information criterion) get smaller as the terms are added, evidence that the model is a better fit as these terms are added. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 75 From the model summary, we know that the model is significantly better than no model, but is it a good fit to the data? The Pearson and Deviance statistics test whether the predicted values from the model differ significantly from the observed values. If these statistics are not significant then the model is a good fit. Here, we have conflicting results. One possibility is that the Pearson statistic is inflated by many empty cells. Another possibility is overdispersion. Overdispersion is present if the ratio of the Goodness-of-Fit statistic to its degrees of freedom is greater than 1 (the disperson parameter) and is problematic if greater than 2. Neither value is particularly high. Φpearson = Χ2Pearson = 886.62 = 1.44 df 614 ΦDeviance = Χ2Deviance = 617.48 = 1.01 df 614 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 76 Likelihood Ratio Tests Tell us which predictors significantly predict the outcome category Effects on the success rates of chat lines: Main effects: Interaction effects Good-Mate: Χ2(2) = 6.32, p = .04 Gender x Funny: Χ2(2) = 35.81, p < .001 Gender: Χ2(2) = 18.54, p = .04 Gender x Sex: Χ2(2) = 13.45, p = .001 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 77 Parameter Estimates More detailed information concerning the effects Compares “Got Phone Number” against “Rejection” Compares “Goes Home with Person” against “Rejection” PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 78 Individual Effects: A few examples Good-Mate: Whether the chat line showed good morals significantly predicted whether you get a phone number or rejected, b = 0.13, Wald Χ2(1) = 6.02, p = .014 The odds ratio tells us that as this variable increases by one unit, the change in the odds of getting a phone number is 1.14 (i.e., more likely to get a phone number if you use a chat line that demonstrates good morals). Funny x Gender: The success of funny chat lines depended on whether they were delivered to a man or a woman,, b = 0.49, Wald Χ2(1) = 12.37, p <.001 The odds ratio tells us that as gender changes from female (0) to male (1), in combination with “funniness” increasing, the change in odds of giving out a phone number compared to not is 1.64. As funniness increases, women become more likely to give out their phone number than men. Sex x Gender: The success of chat lines with sexual content depended on whether they delivered to a man or a woman, b = -0.35, Wald Χ2(1) = 10.82, p =.001 The odds ratio tells us that as gender changes from female (0) to male (1), in combination with “sexual content” increasing, the change in odds of giving out a phone number compared to not is 0.71. As the sexual content of a chat line increases, women become less likely than men to give out their phone number. PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 79 Reporting the Results PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2011, Michael Kalsher 80