regression - SAS Halifax Regional User Group

advertisement
Linear Regression Analysis with a focus on
Influence Diagnostics
using proc reg
prepared by
Voytek Grus
for
SAS user group, Halifax
February 23, 2007
Introduction: What is Regression Analysis?
• A broad collection of statistical techniques used to
explore relationship between measurable variables.
– It’s primary purpose is to describe the relationship
between variables (model) and predict response or study
its components (coefficients).
• A central idea to RA is that it is a statistical (stochastic)
process (not a deterministic equation)
• A subgroup of Generalized Linear Models or/and
Multivariate Analysis.
Introduction: Types of Regression Analysis
• Data types and statistical techniques
– Analysis of observational versus experimental data (proc rsreg)
– Discrete response variable: logistic regression (proc logistic,
transreg)
– Time series versus cross-sectional data (procs autoreg, pdlreg,
arimax)
– Survival Analysis: lifetime or failure time (proc lifereg)
– Regression on random predictors
• Simultaneous Econometric equations (procs model, syslin)
• Structural Equation Modeling (proc calis)
• Estimation techniques
–
–
–
–
Linear vs non-linear (proc nlin nlinmix)
Least square vs non-least squares such as MLE. (proc robustreg)
Least squares vs partial-least squares (proc pls)
Multivariate regression (multiple response regression)
SAS offers many diverse tools to do regression analysis
-
A good way to start is to read about RA in SAS help.
-
-
Chapter 2 of “Introduction to Regression Procedures” gives a
good overview of RA and SAS procedures available to do
varies analyses.
SAS procedures, SAS Enterprise Guide, Matrix
Programming language
Regression Analysis: Process
• State the purpose of the analysis: prediction, variable
screening, model specification, parameter estimation
(signs and significance), influence diagnostics.
• Identify type of regression analysis to be conducted
and find appropriate tools
• Assess quality of your data
• Fit in regression model
• Examine compliance with statistical assumptions,
remedy violation of where necessary, assess quality of
fit.
• Draw conclusions
Diagnostics: testing for violation of assumptions
• Analysis of residuals
– Normality assumption (QQ- and PP-plots, added
variable plots, partial residual plots, histograms, F tests
for lack of fit, Durbin Watson)
– Heteroscedasticity (ACOV and SPEC options).
– Outlier detection (How large is too large?)
– Influence diagnostics (cook’s distance, press)
• Model specification (Levarage plots, Cp Mallow)
– Non-linearity (scatter plots, partial res. Plots)
– Over- and under-specfication
• Multicollinearity tests (tol, vif, colin)
• Autocorrelation (Durbin Watson)
• Random predictors (X’s measured with errors)
Remedies to violation of assumptions
• Variable selection process (stepwise, mxrl etc proc reg)
– Variable transformation
• Dummy variables
• Box-Tidwell Procedure
• Not all functions are linearizable and non-linear regression must be
used.
• Polynomial regression (proc rsreg)
• Weighted Least squares (weight statement in proc reg)
• Non-least Squares Regression
– Failure of normality: Huber M-estimator (proc robustreg)
– Principal Components regression (proc pls princomp)
– Ridge regression (proc reg)
• Partial Least Squares: random predictors
–
Proc pls
• Non-linear regression
– Proc NMLX, proc nlin, proc model
Functionality of Proc Reg in Linear Regression Analysis
• Data modeling: by group processing, where statement,
multiple model statements
• Interactive analysis: reweigh, paint, plot statements etc.
• Diagnostic tools: plots, tests (outliers, normality etc)
Hypothesis Testing: F, t tests, partitioning of variability
• Automated variable selection procedures: stepwise
regression. Forward selection, backward elimination, maxr.
• Model validation: Mallow Cp graphs.
• Prediction: prediction intervals, press residuals etc.
Literature
• Classical and Modern Regression with
Applications Raymond H. Myers (1986)
• Applied Linear Regression by Sanford Weisberg (
1985)
• SAS Help Examples
Questions?
Download