Statistics 501 Applied Regression Analysis (Section 1)

advertisement
Statistics 501 Applied Regression Analysis (Section 1)
Fall 2005
Penn State University
Instructor.
Bob Heckard, rho@stat.psu.edu, 308 Thomas Building, 5-3131,
Office hours - M,W 2:15 -3:30 pm or by appt.
Teaching Assistant
Michael Zhang, email = yuz115@psu.edu, 331A Thomas Building,
Office hours = Tu, Th 10:00 – 11:00 am
Course Description
Statistics 501 is an applied linear regression course that involves hands-on data analysis. Most
students are graduate students from a wide variety of academic disciplines other than statistics.
A few students are in the Masters of Applied Statistics program. Students enrolling for this
course should have taken at least one other statistics course and should be conversant with the
basic fundamentals of statistical testing and estimation. Generally, statistical regression is
collection of methods for determining and using models that explain how a response variable
(dependent variable) relates to one or more explanatory variables (predictor variables). A list of
topics usually covered is given later in this syllabus.
Text
Applied Linear Regression Models (4th edition) by Kutner, Nachtsheim, and Neter. The newest
edition of the larger version of the book, Applied Linear Statistical Models will also do, although
we will only cover the first half of that book. Older versions of either version will not do.
Computer Usage
Data analysis is emphasized so students frequently use the computer during the course. One class
meeting per week will be held in a computer lab. We'll use Minitab (Version 14) for handouts
and lecture demonstrations. Students can use any software they wish for assignments, but most
will find it easiest to use Minitab.
Requirements and Evaluation
Exams, 2 in-class exams and a final take-home (to be given out during last week of classes) ,
count 50% of grade
Lab and homework assignments, and one or two group data analysis assignments, count
50% of the grade. Tentative split of this is 30% for lab/homework assignments and 20% for
group data analysis assignments. .
Course percentage over 90% guarantees some form of "A" Course percentage over 65%
guarantees some form of "B" Plus and minus borderlines will be determined based on closeness
of score(s) to these borderlines and spacing among student scores.
Tentative Exam Dates
Tentative exam dates are Oct. 10, Nov. 11, and take-home questions to be distributed during last
week of classes.
Academic Integrity Policy
All Penn State policies regarding ethics, honorable behavior and academic integrity apply to this
course. All exam answers must be your own and you must not provide any assistance to other
students during exams. University and Eberly College of Science regulations and policies
concerning academic integrity can be viewed at
www. science.psu.edu/Academic/Integrity/Links.html .
Topics Usually Covered
1. Simple Linear Regression Model
 Model for E(Y), model for distribution of errors
 Least squares estimation of model for E(Y)
 Estimation of variance
2. Inferences for Simple Linear Model
 Inferences concerning the slope ( confidence intervals and t-test)
 Confidence interval estimate of the mean Y at a specific X
 Prediction interval for a new Y
 Analysis of Variance partitioning of variation in Y
 R-squared calculation and interpretation
3. Diagnostic procedures for aptness of model
 Residual analyses
o Plots of residuals versus fits, residuals versus x, residuals versus new x
o Tests for normality of residuals
o Lack of Fit test, Pure Error, Lack of Fit concepts
 Transformations as solution to problems with the model
4. Matrix Notation and Literacy
 X matrix, vector, y vector, vector
 (X'X)-1 X'Y estimates coefficient vector
 Variance- Covariance matrix
5. Multiple Regression Models and Estimation
 Hyperplane extension to simple linear model
 Interaction models
 Basic estimation and inference for multiple regression
6. General Linear F test and Sequential SS
 Reduced and Full models
 F test for general linear hypotheses
 Effects of a variable controlled for other predictors
o Sequential SS
o Partial correlation
7. Multicollinearity between X variables
 Effect on standard deviations of coefficients
 Problems interpreting effects of individual variables
 Apparent conflicts between overall F test and individual variable t tests
 Benefits of designed experiments
8. Polynomial Regression Models
9. Categorical Predictor Variables
 Indicator Variables
 Interpretation of models containing indicator variables
 Piecewise regression
10. More Diagnostic Measures and Remedial Measures for Lack of Fit
 Variance Inflation Factors
 Ridge Regression
 Deleted Residuals
 Influence statistics - Hat matrix, Cook's D and related measures
11. Examining All Possible Regressions
 R2, MSE , Cp, and PRESS criteria
 Stepwise algorithms
12. Miscellaneous Topics as time permits
 Estimating the regression model when residuals have autocorrelation.
 Logistic and Poisson regression models
 Nonlinear Regression
Download