STAT 462: Applied Regression Analysis

advertisement
STAT 462: Applied Regression Analysis
Instructor: Xizhen Cai
Email: xzc103@psu.edu
Office: 330B Thomas Building
Phone: (814)863-0692
Office Hours: M & W 1:30-3:00 PM or by appointment
TA: Won Chul Song
Email: wxs5052@psu.edu
Office: 333 Thomas Building
Phone: (814) 863-3374
Office Hours: T & R 12:30-1:30 PM or by appointment
Time and Location
M W F 11:10 AM - 12:25 PM 012 Life Sciences Bldg
TR
11:10 AM - 12:25 PM 071 Willard Bldg
Class and lab attendance are required.
Text: Applied Linear Regression Models, 4th edition, by Neter, Nachtsheim, Kutner
(recommended but NOT required)
Or the first half part of Applied Linear Statistical Models, 5th edition by Neter, Nachtsheim,
Kutner and Li
Description: STAT 462 is an applied linear regression course that involves "hands on" data
analysis. Students enrolling for this course should have taken at least one other Statistics
course and should be familiar with the basic fundamentals of statistical testing and estimation.
Computer Usage and Data Sets Data analysis is emphasized, so computers will be used
frequently during the course. Throughout the course, Minitab for Windows will be used to
analyze the data for lecture demonstrations in class and lab activities. Those wishing to install
Minitab on their own computers may go to www.minitab.com/education for details.
Data sets can be found at the PSU ANGEL website.
Grading:
(15%)
(15%)
(20%)
(20%)
(25%)
(5%)
Homework
Lab Activities
Exam 1 (scheduled on July 11th)
Exam 2 (scheduled on July 30th)
Final Project
Attendance
Homework and Labs:
The assignment (homework and lab) is worth 30% of the semester grade and consists of weekly
lab activities and homework assignments.
Lab activities need to be submitted in the drop box on Angel during the class time, and
students are required to attend lab class. The lowest grade for lab will be dropped.
Homework is due every Friday at the beginning of the class. Late homework will not be
accepted unless the student has the permission from the instructor and will incur at least 10%
penalty. (However, no late homework will be accepted if solutions have been posted.) All
homework assignments will be counted.
Exams: Two exams are worth 20% of the semester grade. Most questions will be short answer
and will include Minitab analysis output. A simple calculator and one sheet (both sides) of notes
may be used, and some tables will be provided. Tentative dates for these exams are July 11
(Thursday) and July 30 (Tuesday).
Conflicts on exam dates must be resolved in advance (at least one week before). An unexcused
absence on an exam date will incur at least a 10% penalty and may result in a score of 0.
Please allow 24 hours for email response.
Final Project: final project is worth 25% of the semester grade. A dataset will be provided and
you will be asked to perform regression analysis. A statistical analysis report need to be
submitted on Aug 9th (no more than 10 pages). More specific descriptions and instructions on
the final project will be given later.
Exams:
All the exams are closed-book test. A simple calculator and one sheet (both sides) of
notes may be used, and some tables will be provided.
Letter Grades: Semester grades are assigned according to this scale.
93 – 99% A
77 – 79% C+
90 – 92% A70 – 76% C
87 – 89% B+
60 – 69% D
83 – 86% B
0 – 59% F
80 – 82% BAnnouncements: Lecture notes, lab activities, assignments, and all due dates will be posted
on ANGEL, available at www.angel.psu.edu or from the PSU homepage. Students are
expected to check this several times a week for updates.
Academic Integrity:
 Includes a commitment to not engage in or tolerate acts of falsification, deception, or
misrepresentation. Such acts of dishonesty violate the fundamental ethical principles of the
Penn State community and compromise the worth of work completed by others.
 This course will follow the guidelines found in Section 49-20 of the University Faculty
Senate Policies for Students.
See http://www.science.psu.edu/academic/Integrity concerning academic integrity for details.
Disability Policy:
It is Penn State’s policy to not discriminate against qualified students with documented
disabilities in the educational programs. If you have a disability-related need for modifications in
the course, contact both the instructor and the Office for Disability Services (116 Boucke) at the
beginning of the semester.
Specific Topics Usually Covered
1. Simple Linear Regression Model



Model for E(Y), model for distribution of errors
Least squares estimation of model for E(Y)
Estimation of variance
2. Inferences for Simple Linear Regression Model





Inferences concerning the slope ( confidence intervals and t-test)
Confidence interval estimate of the mean Y at a specific X
Prediction interval for a new Y
Analysis of Variance partitioning of variation in Y
R-squared calculation and interpretation
3. Diagnostic Procedures for Aptness of Model: assessing regression assumptions


Residual analyses

Plots of residuals versus fits, residuals versus x

Tests for normality of residuals

Lack of Fit test, Pure Error, Lack of Fit concepts
Transformations as solution to problems with the model
4. Multiple Regression Models and Estimation



Matrix Notations
Hyperplane extension to simple linear model
Basic estimation and inference for multiple regression
5. Additional Topics for Multiple Regression Analysis



General Linear F test and Sequential SS
Effects of a variable controlled for other predictors

Sequential SS

Partial Correlation
Multicollinearity between X variables

Effect on standard deviations of coefficients

Problems interpreting effects of individual variables

Apparent conflicts between overall F test and individual variable t tests

Benefits of designed experiments
6. Categorical Predictor Variables


Indicator Variables
Interpretation of models containing indicator variables
7. Model Comparison and Selection Methods


R2, MSE , Cp, and PRESS criteria
Stepwise algorithms
8. Logistic Regression: categorical outcome variables


Binary outcome: Bernoulli Distribution
Interpretation of models: odds ratio
9. Miscellaneous Topics as Time Permits
Download