Uploaded by niashchynskaya

Regression Analysis Model Building Presentation

advertisement
Module 8
Regression Analysis Model Building
1
Regression Analysis Model Building
•
•
•
•
•
•
General Linear Model
Determining When to Add or Delete Variables
Analysis of a Larger Problem
Variable-Selection Procedures
Residual Analysis
Multiple Regression Approach
to Analysis of Variance and
Experimental Design
2
General Linear Model
• Models in which the parameters (0, 1, . . . , p ) all
have exponents of one are called linear models.
• A general linear model involving p independent variables
is
y   0   1z1   2 z2 
  pzp  
• Each of the independent variables z is a function of x1,
x2,..., xk (the variables for which data have been collected).
3
General Linear Model
• The simplest case is when we have collected data for just
one variable x1 and want to estimate y by using a straightline relationship. In this case z1 = x1.
• This model is called a simple first-order model with one
predictor variable.
y   0   1 x1  
4
Modeling Curvilinear Relationships
• To account for a curvilinear relationship, we might
set z1 = x1 and z2 =x12 .
• This model is called a second-order model with one
predictor variable (Quadratic).
y   0   1 x1   2 x12  
5
Second Order or Quadratic
• Quadratic functional forms take on a U or inverted U
shapes depending on the values of the coefficients
y   0  1 x1   2 x12
1  0and 2  0
6
Second Order or Quadratic
• For example the relationship between earnings and age.
Earnings would rise, level out and the fall as age increased.
y   0  1 x1   2 x12
1  0and 2  0
7
Interaction
• If the original data set consists of observations for y and
two independent variables x1 and x2 we might develop a
second-order model with two predictor variables.
y   0   1 x1   2 x2   3 x12   4 x22   5 x1 x2  
• In this model, the variable z5 = x1x2 is added to account for
the potential effects of the two variables acting together.
• This type of effect is called interaction.
8
Model Assumptions
• Assumptions About the Error Term 
– The error  is a random variable with mean of zero.
– The variance of  , denoted by 2, is the same for all
values of the independent variables.
– The values of  are independent.
– The error  is a normally distributed random variable
reflecting the deviation between the y value and the
expected value of y given by
0 +  1 x1 + 2 x2 + . . . + p xp
9
Autocorrelation or Serial Correlation
• Serial correlation or autocorrelation is the violation of the
assumption that different observations of the error term are
uncorrelated with each other. It occurs most frequently in
time series data-sets. In practice, serial correlation implies
that the error term from one time period depends in some
systematic way on error terms from another time periods.
10
Residual Analysis: Autocorrelation
• With positive autocorrelation, we expect a positive residual
in one period to be followed by a positive residual in the
next period.
• With positive autocorrelation, we expect a negative
residual in one period to be followed by a negative residual
in the next period.
• With negative autocorrelation, we expect a positive residual
in one period to be followed by a negative residual in the
next period, then a positive residual, and so on.
11
Residual Analysis: Autocorrelation
• When autocorrelation is present, one of the regression
assumptions is violated: the error terms are not
independent.
• When autocorrelation is present, serious errors can be
made in performing tests of significance based upon the
assumed regression model.
• The Durbin-Watson statistic can be used to detect firstorder autocorrelation.
12
Residual Analysis: Autocorrelation

Durbin-Watson Test for Autocorrelation
• Statistic
n
2
 ( et  et  1 )
d  t 2 n
2
 et
t 1
• The statistic ranges in value from zero to four.
• If successive values of the residuals are close
together (positive autocorrelation), the statistic
will be small.
• If successive values are far apart (negative autocorrelation), the statistic will be large.
• A value of two indicates no autocorrelation.
13
Durbin Watson
Observation
Residuals
1
2
3
4
5
(et-et-1)
-3
1
-2
2
2
(et-et-1)^2
4
-3
4
0
Sum
SSE
d
16
9
16
0
41
22
1.86
14
General Linear Model
Often the problem of nonconstant variance can be
corrected by transforming the dependent variable to a
different scale.
• Logarithmic Transformations
Most statistical packages provide the ability to apply
logarithmic transformations using either the base-10
(common log) or the base e = 2.71828... (natural log).
• Reciprocal Transformation
Use 1/y as the dependent variable instead of y.
15
Transforming y
• Transforming y. If residual vs y-hat is convex up lower the
power on y.
• If residual vs y-hat is convex down increase the power on
y
• Examples –1/y^2;-1/y;-1/y^.5; log y ; y; y^2;y3
16
Download