Statistics 501: Regression Methods The Pennsylvania State University Spring 2010 Course Details General Description Statistics (STAT) 501 is an applied linear regression course that emphasizes data analysis and interpretation. Generally, statistical regression is a collection of methods for determining and using models that explain how a response variable (dependent variable) relates to one or more explanatory variables (predictor variables). A list of specific topics usually covered is given later in this document. Prerequisites Students enrolling for this course should have taken at least one other statistics course and should be conversant with the basic fundamentals of statistical testing and estimation. They also should have a rudimentary knowledge of matrices. Required Text This course requires the textbook Applied Linear Regression Models (4th edition) by Kutner, Nachtsheim, and Neter (click here for the book’s offi ial site). The larger version of this book, Applied Linear Statistical Models (5th edition) will also do, although we will only cover the fi half of that book. The fi half of this larger book is identical to the Applied Linear Regression Models book. Statistical Software Students must have the ability to use the computer to analyze data in this course. You can use any statistical software that you wish, although we recommend the use of Minitab, either version 14 or 15 (although I don’t know if there are any issues with those of you who use Windows 7). The student version of Minitab is suffi t. Students can use any software they wish for assignments, but most will fi it easiest to use Minitab. Plus, examples for the course units will be demonstrated using Minitab. Typical Requirements There will be weekly homework assignments and 3 exams. The homework will count as 50% of the course grade and the exams will count as the remaining 50%. 1 Online Sections The online sections of STAT 501 follow closely with the in-class sections. Students in the web section will have essentially the same course activities and requirements as students in the in-class sections. For web students, more detailed versions of “lecture” notes are posted and message boards will be maintained so that students may ask and answer questions (thus gaining a sense of participation in a learning community). I highly encourage you to answer questions posted on the message boards. If a fellow student correctly answers a question that you post, then I will not post a response unless I wish to add something else. Otherwise, I will respond to your posting as soon as possible (usually within 24 hours, but I will let you know if I will be out of reach for any longer period of time). Topics Covered in STAT 501 1. Simple Linear Regression Model: One Predictor Variable • model for E(Y ), model for distribution of errors • least squares estimation of model for E(Y ) • estimation of variance • regression through the origin 2. Inferences for Simple Linear Model • inferences concerning the slope (confi intervals and t-tests) • confidence interval estimate of the mean Y at a specific X • prediction interval for a new Y • analysis of variance (ANOVA) partitioning of variation in Y • calculation and interpretation of R2 3. Diagnostic Procedures for Aptness of Model • residual analyses – plots of residuals versus fi residuals versus x, residuals versus new x – tests for normality of residuals – lack of fi test, pure error, lack of fi concepts • transformations as a solution to problems with the model • weighted least squares as a solution for variance problems 4. Matrix Notation and Literacy for Regression Models • X matrix, β vector, matrix formula for estimating coefficients • linear dependence issues • variance-covariance matrix of sample coefficients 2 5. Multiple Regression Models and Estimation: Multiple Predictor Variables • basic estimation and statistical inference within multiple regression • interaction terms and the interpretation of interaction 6. General Linear F -Test for Testing Hypotheses • reduced and full models associated with hypotheses about the model’s coeffi ts • F -test for general linear hypotheses 7. Assessing and Interpreting the Effect of a Single Predictor Variable Within a Multiple Regression • properly interpreting the t-test • sequential sums of squares • partial correlation between y and an x variable 8. Examining All Possible Regressions to Identify the Potential Models • R2 , MSE , Cp , AIC, BIC, and PRESS criteria • stepwise algorithms for identifying models 9. Problems Caused by Correlations (Confounding) Among Predictor Variables • infl ion effects on standard deviations of coeffi ts • problems in interpreting effects of individual variables • apparent conflicts between overall F -test and individual variable t-tests • benefi s of designed experiments 10. Incorporating Categorical Predictor Variables • indicator variables • interpretation of models containing indicator variables • piecewise regression 11. More Diagnostic Measures and Remedial Measures for Lack of Fit • variance infl ion factors (VIFs) • deleted residuals • infl statistics - hat matrix, Cook’s D and related measures 12. Time Series Issues: Autocorrelation in Errors and Autoregressive Time Series Models 13. Polynomial Regression Models and Response Surface Regression 14. Logistic Regression Models for a Binary Response Variable 3