Leona Aiken Psychology Spring, 2004 Psychology 531 Multiple Regression in Psychological Research B139, M W 8:30-10:30 syllab04.doc What is Multiple Regression Analysis? "Multiple regression analysis (MR) is a general system for examining the relationship of a collection of independent variables to a single dependent variable. It is among the most extensively used statistical analyses in the behavioral sciences. Multiple regression is highly flexible and lends itself to the investigation of a wide variety of questions. The independent variables may be quantitative measures such as personality traits, abilities, or family income; or they may be categorical measures such as gender, ethnic group, or treatment condition in an experiment. In the most common form of multiple regression analysis, which we will consider here, the dependent variable is continuous. The basic ideas of multiple regression can be extended to consider other types of dependent variables such as categories or counts or even multiple dependent variables. The relationship between an independent variable and the dependent variable may be linear, curvilinear, or may depend on the value of another independent variable1." Instructor: Leona Aiken Office: Psychology 249A Phone: 965-3494 email: Leona.Aiken@asu.edu Teaching Assistant: Nick Schweitzer Office: Office hours in statistics lab, Psychology B153 email: njs@asu.edu Office Hours The following are my office hours; they are close to being certain and will be confirmed within the next week. Nick Schweitzer's office hours will be set when we create the schedule of hours in the Psychology Computing Laboratory within the next week. I will inform you as soon as the permanent schedule is set. Leona Aiken, Monday Monday Wednesday Office Hours 10:45AM-12:00 PM 1:45PM- 4:00 PM 10:45AM-12:15 PM Nick Schweitzer, Office Hours Office hours will be held in the Statistics Lab, PSYB153, to be arranged first week of class If your schedule conflicts with office hours, please contact me or Nick Schweitzer to make an appointment. I prefer that you make an appointment if you can, so that we insure the time is set aside. Text The text for the course is the third edition of the now classic Cohen and Cohen regression analysis text. The book should serve you as a reference text for many topics in multiple regression beyond what we will cover in this semester. Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). 1Aiken, Multiple L. S., West, S. G., & Pitts, S.C. (2003). Multiple regression analysis. In Schinka, J. A., & Velicer, W. F., (Eds) Comprehensive Handbook of Psychology, Volume 2, Research Methods in Psychology. New York: Wiley. Leona Aiken, Psychology 531, Multiple Regression, Spring 2004 2 regression/correlation analysis for the behavioral sciences (3rd Ed.) Mahwah, NJ: Lawrence Erlbaum. Supplemental Source Tests Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage. There will be three noncumulative tests. No tests will be dropped. The dates of the tests are as follows (Test 1 and 2 dates are approximate; Test 3, certain) Test 1 Wednesday, February 25 Test 2 Wednesday, April 5 Test 3 Monday, May 10, 7:40AM-9:30AM. Homework There will be approximately 6 or 7 homework assignments. They will be graded on a four-point scale: Excellent = 4; Good = 3; Fair = 2; and Poor = 1. Please turn these in on time. Please note that all problem sets must be turned in for you to receive a grade in the course. Final Grade The final grade will based on the three tests plus the homework. The three tests will count equally. Total points on the problem sets will be counted as half a test. Computer Usage We will be using both SPSS for Windows and SAS PC for class examples and homework. We will be using the statistical and graphical capabilities of these two computer packages. My goal is that you become familiar with both SAS and SPSS for regression analysis. All the software is available in the Psychology Computing Lab, Room 153. and SAS are available at other computing sites around campus as well, including the Goldwater Center, across from Noble Library. SPSS I am assuming that most of you have used SPSS for Windows with syntax (not just "point and click".) I am assuming that most of you have no familiarity with SAS PC. You will receive fully documented examples of data analyses in both SPSS and SAS that will serve as models for your homework. We will also have training sessions in the statistics lab for those of you who are not familiar with a software package. Nick Schweitzer is preparing a series of handouts on SPSS 12 graphics (As I prepare this syllabus, we are at the outset of transition from SPSS 11.5 to SPSS 12. The SPSS materials you will receive this semester cover both versions of SPSS. The differences are in the graphics between the two versions.) Nick Schweitzer, the teaching assistant, is remarkable at statistical computing. He is also a saint of a teacher. Computing should go well for you all. We will make available the information you need for problem sets (the data plus initial SPSS and SAS syntax) will be stored on my Website, http://www.public.asu.edu/~atlsa/PSY531 In addition, the information will be stored on the server in the Psychology Statistics Lab, Psy B. 153. Leona Aiken, Psychology 531, Multiple Regression, Spring 2004 3 Schedule Issues This semester we are short one class from the usual semester. The usual semester contains 29 class sessions. This semester contains 28 class sessions. We need 29 sessions to complete the course content. Thus I will schedule an additional class session at a time everyone can attend. Handouts I will be giving a full set of handouts covering lecture material throughout the course. In addition, we will be distributing computer output with documentation, which will run into many pages. We need to reimburse the Psychology Department for all this xeroxing. Later in the semester, I will collect funds for the Department in the amount of 30.00 per person. I will collect money from all those who are registered for the course and who are informally sitting in the course as well. Class Study Strategy Please read the assigned material before coming to class; note that I will make more explicit reading assignments for daily lectures. I will be giving handouts as well. The strategy I used when I was taking graduate statistics courses was to copy over my notes after class. I would use this strategy to force myself to determine whether I could follow all the notes I had taken. If something was unclear, I would leave space for clarification and ask the instructor. This is an easy way to keep up. Please keep up with the class, because the material is cumulative. Don't worry about the course. I'll do whatever I can to make everything clear, and Nick Schweitzer is a highly competent teacher of things to do with computing and statistics. I would be delighted if you would enjoy this course and want to continue to study quantitative methods. Course Content: The course is devoted to the study of multiple regression as a general system for assessing the relationship of a dependent variable Y to a set of independent variables Xs, i.e. a "general data analytic system" (J. Cohen, 1968). During the course of the semester we will first examine the prediction of a dependent variable from a single independent variable (bivariate regression) and then quickly move to the case of multiple independent variables (multiple regression). Our focus will be on linear regression, with the general form of the multiple regression equation with which we work as follows: Yˆ = bo + b1 X1 + b2 X2 + ...+ bp Xp where X1...Xp are a set of independent variables, Yˆ is a predicted score on the dependent variable based on the set of independent variables, and b1...bp are a set of "regression coefficients" that indicate the relationship of the independent variables to the dependent variable. Multiple regression analysis is a completely general approach to data analysis. Your work from last semester on the Analysis of Variance (ANOVA) can be subsumed as a special case of multiple regression. The independent variables in the above equation can be factors and their interactions from ANOVA, or a combination of factors and covariates from the Analysis of Covariance (ANCOVA). Thus the analytic approaches we consider this semester are applicable both to "experimental" research in which factors (Xs) are Leona Aiken, Psychology 531, Multiple Regression, Spring 2004 4 manipulated and outcomes observed and to "correlational" research in which cases are selected along the X variable dependent variables (Ys) are observed. Linear regression, it will turn out, is not confined only to individual independent variables linearly related to some dependent variable. We will examine how curvilinear relationships and interactions among independent variables are included in multiple regression. The outcomes of multiple regression analysis are affected by violations of assumptions of the analysis. We will explore how graphical methods help to surface the nature of data and violations of assumptions. In addition, statistical methods of detecting violations of assumptions and problematic data points that may grossly affect outcomes will be examined, and remedies will be explored. If time permits, I will provide an introduction to missing data imputation. Topics and Readings Reading abbreviations: Cohen, Cohen, West, & Aiken (CCWA), Aiken & West (A&W) Topic Reading 1. Overview of multiple regression analysis CCWA chap. 1 2. Bivariate Regression CCWA chap. 2 CCWA chap. 4, pp110-116 3. Two-predictor regression, multiple regression CCWA chap. 3 4. Multiple prediction in matrix form CCWA chap. 3a 5. Sets of predictors and variable selection CCWA chap. 5 6. Curvilinear relationships CCWA chap. 6 Leona Aiken, Psychology 531, Multiple Regression, Spring 2004 7. Interactions among continuous variables CCWA chap. 7 A&W chap. 2,3 8. Regression Assumptions CCWA chap. 4 9. Regression Graphics Introduction CCWA chap. 4 10. Regression Diagnostics and Model Fixes (transformations) CCWA chap.10 CCWA chap. 6 5 11. Categorical Independent Variables and ANOVA (design matrices) CCWA chap. 8 12. Categorical by continuous variable interactions, ANCOVA CCWA chap. 9 A&W chap. 7 West, Aiken, Krull (1996)b 13. Measurement error, power, how many subjects? CCWA, pp. 51-53, 92-95, 176-182, 297 Maxwell (2000)c,e 14. Missing data imputation Allison (2001)a,d CCWA, chap. 11 aTime permitting. bWest, S.G., Aiken, L. S., & Krull, J. L. (1996). Experimental personaltiy designs: Analyzing categorical by continuous variable interactions. Journal of Personality, 64(1), 1-48. cMaxwell, S. E. (2000). Sample size and multiple regression analysis. Psychological Methods, 5, 434-458. dAllison, eKelly, D. (2001). Missing Data. Thousand Oaks, CA: Sage. K., & Maxwell, S. E. (2003). Sample size for multiple regression: obtaining regression coefficients that are accurate, not simply significant. Psychological Methods, 8(3), 305-321.