Statistics 501: Regression Methods The Pennsylvania State University

advertisement
Statistics 501: Regression Methods
The Pennsylvania State University
Spring 2010
Course Details
General Description
Statistics (STAT) 501 is an applied linear regression course that emphasizes data analysis
and interpretation. Generally, statistical regression is a collection of methods for determining
and using models that explain how a response variable (dependent variable) relates to one
or more explanatory variables (predictor variables). A list of specific topics usually covered
is given later in this document.
Prerequisites
Students enrolling for this course should have taken at least one other statistics course and
should be conversant with the basic fundamentals of statistical testing and estimation. They
also should have a rudimentary knowledge of matrices.
Required Text
This course requires the textbook Applied Linear Regression Models (4th edition) by Kutner,
Nachtsheim, and Neter (click here for the book’s offi ial site). The larger version of this book,
Applied Linear Statistical Models (5th edition) will also do, although we will only cover the
fi half of that book. The fi half of this larger book is identical to the Applied Linear
Regression Models book.
Statistical Software
Students must have the ability to use the computer to analyze data in this course. You
can use any statistical software that you wish, although we recommend the use of Minitab,
either version 14 or 15 (although I don’t know if there are any issues with those of you who
use Windows 7). The student version of Minitab is suffi t. Students can use any software
they wish for assignments, but most will fi it easiest to use Minitab. Plus, examples for
the course units will be demonstrated using Minitab.
Typical Requirements
There will be weekly homework assignments and 3 exams. The homework will count as 50%
of the course grade and the exams will count as the remaining 50%.
1
Online Sections
The online sections of STAT 501 follow closely with the in-class sections. Students in the
web section will have essentially the same course activities and requirements as students in
the in-class sections. For web students, more detailed versions of “lecture” notes are posted
and message boards will be maintained so that students may ask and answer questions (thus
gaining a sense of participation in a learning community). I highly encourage you to answer
questions posted on the message boards. If a fellow student correctly answers a question that
you post, then I will not post a response unless I wish to add something else. Otherwise, I
will respond to your posting as soon as possible (usually within 24 hours, but I will let you
know if I will be out of reach for any longer period of time).
Topics Covered in STAT 501
1. Simple Linear Regression Model: One Predictor Variable
• model for E(Y ), model for distribution of errors
• least squares estimation of model for E(Y )
• estimation of variance
• regression through the origin
2. Inferences for Simple Linear Model
• inferences concerning the slope (confi
intervals and t-tests)
• confidence interval estimate of the mean Y at a specific X
• prediction interval for a new Y
• analysis of variance (ANOVA) partitioning of variation in Y
• calculation and interpretation of R2
3. Diagnostic Procedures for Aptness of Model
• residual analyses
– plots of residuals versus fi residuals versus x, residuals versus new x
– tests for normality of residuals
– lack of fi test, pure error, lack of fi concepts
• transformations as a solution to problems with the model
• weighted least squares as a solution for variance problems
4. Matrix Notation and Literacy for Regression Models
• X matrix, β vector, matrix formula for estimating coefficients
• linear dependence issues
• variance-covariance matrix of sample coefficients
2
5. Multiple Regression Models and Estimation: Multiple Predictor Variables
• basic estimation and statistical inference within multiple regression
• interaction terms and the interpretation of interaction
6. General Linear F -Test for Testing Hypotheses
• reduced and full models associated with hypotheses about the model’s coeffi
ts
• F -test for general linear hypotheses
7. Assessing and Interpreting the Effect of a Single Predictor Variable Within a Multiple
Regression
• properly interpreting the t-test
• sequential sums of squares
• partial correlation between y and an x variable
8. Examining All Possible Regressions to Identify the Potential Models
• R2 , MSE , Cp , AIC, BIC, and PRESS criteria
• stepwise algorithms for identifying models
9. Problems Caused by Correlations (Confounding) Among Predictor Variables
• infl ion effects on standard deviations of coeffi
ts
• problems in interpreting effects of individual variables
• apparent conflicts between overall F -test and individual variable t-tests
• benefi s of designed experiments
10. Incorporating Categorical Predictor Variables
• indicator variables
• interpretation of models containing indicator variables
• piecewise regression
11. More Diagnostic Measures and Remedial Measures for Lack of Fit
• variance infl ion factors (VIFs)
• deleted residuals
• infl
statistics - hat matrix, Cook’s D and related measures
12. Time Series Issues: Autocorrelation in Errors and Autoregressive Time Series Models
13. Polynomial Regression Models and Response Surface Regression
14. Logistic Regression Models for a Binary Response Variable
3
Download