Chapter 7 - Routledge

advertisement
Business Forecasting
Chapter 7
Forecasting with Simple
Regression
Chapter Topics







Types of Regression Models
Determining the Simple Linear Regression
Equation
Measures of Variation
Assumptions of Regression and Correlation
Residual Analysis
Measuring Autocorrelation
Inferences about the Slope
Chapter Topics



(continued)
Correlation—Measuring the Strength of the
Association
Estimation of Mean Values and Prediction of
Individual Values
Pitfalls in Regression and Ethical Issues
Purpose of Regression Analysis

Regression Analysis is Used Primarily to Model
Causality and Provide Prediction


Predict the values of a dependent (response)
variable based on values of at least one
independent (explanatory) variable.
Explain the effect of the independent variables on
the dependent variable.
Types of Regression Models
Positive Linear Relationship
Negative Linear Relationship
Relationship NOT Linear
No Relationship
Linear Regression Equation
Sample regression line provides an estimate of
the population regression line as well as a
predicted value of Y.
Sample
Y Intercept
Y    X  
t
t
t
Yˆ  a  bX
Sample
Slope
Coefficient
Residual
Simple Regression Equation
(Fitted Regression Line, Predicted Value)
Simple Linear Regression:
Example
You wish to examine
the linear dependency
of the annual sales of
produce stores on their
sizes in square footage.
Sample data for seven
stores were obtained.
Find the equation of
the straight line that
best fits the data.
Store
Square
Feet
Annual
Sales
($1000)
1
2
3
4
5
6
7
1,726
1,542
2,816
5,555
1,292
2,208
1,313
3,681
3,395
6,653
9,543
3,318
5,563
3,760
Scatter Diagram: Example
12,000
Annual Sales ($000)
10,000
8,000
6,000
4,000
2,000
0
0
1,000
2,000
3,000
Square Feet
Excel Output
4,000
5,000
6,000
Simple Linear Regression
Equation: Example
Yˆi  a  bX i
 1636.415  1.487 X i
From Excel Printout:
Intercept
1636.415
Square Footage 1.486634
Graph of the Simple Linear
Regression Equation: Example
Annual Sales ($)
12,000.00
10,000.00
8,000.00
6,000.00
4,000.00
2,000.00
0.00
0
2,000
4,000
Square Feet
6,000
Interpretation of Results:
Example
Yˆi  1636.415  1.487 X i
The slope of 1.487 means that, for each increase of
one unit in X, we predict the average of Y to
increase by an estimated 1.487 units.
The equation estimates that, for each increase of 1
square foot in the size of the store, the expected
annual sales are predicted to increase by $1,487.
Simple Linear Regression in
Excel


In Excel, use | Regression …
EXCEL Spreadsheet of Regression Sales on
Square Footage
Measures of Variation:
The Sum of Squares
SST
=
Total
=
Sample
Variability
SSR
Explained
Variability
+
SSE
+
Unexplained
Variability
The ANOVA Table in Excel
ANOVA
Regression
Residuals
Total
df
SS
k
SSR
n−k−1 SSE
n−1
SST
MS
MSR=SSR/k
MSE=
SSE/(n−k−1)
Significance
F
F
P-value of
MSR/MSE
the F Test
Measures of Variation
The Sum of Squares: Example
Excel Output for Produce Stores
Degrees of freedom
Regression (explained) df
ANOVA
df
SS
Regression
1
Residual
5
1,871,199.59
Total
6
32,251,655.71
MS
F
30,380,456.12 30,380,456.12
81.17909
0.000281201
374,239.92
SSR
Error (residual) df
Total df
Significance F
SST
SSE
The Coefficient of Determination


SSR Regression Sum of Squares
r 

SST
Total Sum of Squares
2
Measures the proportion of variation in Y that
is explained by the independent variable X in
the regression model.
Measures of Variation:
Produce Store Example
Excel Output for Produce Stores
r2 = 0.94
Regression Statistics
Multiple R
0.9705572
R Square
0.94198129
Adjusted R Square 0.93037754
Standard Error
611.751517
Observations
7
94% of the variation in annual sales can be
explained by the variability in the size of the
store as measured by square footage.
Syx
Linear Regression Assumptions



Normality

Y values are normally distributed for each X.

Probability distribution of error is normal.
2. Homoscedasticity (Constant Variance)
3. Independence of Errors
Residual Analysis

Purposes



Examine linearity
Evaluate violations of assumptions
Graphical Analysis of Residuals

Plot residuals vs. X and time
Residual Analysis for Linearity
Y
Y
X
e
X
X
e
X
Not Linear

Linear
Residual Analysis for
Homoscedasticity
Y
Y
X
SR
X
SR
X
Heteroscedasticity
X
Homoscedasticity
Residual Analysis: Excel Output
for Produce Stores Example
Excel Output
Observation
1
2
3
4
5
6
7
Residual Plot
0
2,000
4,000
Square Feet
6,000
Predicted Y
4,202.34442
3,928.80382
5,822.77510
9,894.66469
3,557.14541
4,918.90184
3,588.36472
Residuals
-$521.34442
-$533.80382
$830.22490
-$351.66469
-$239.14541
$644.09816
$171.63528
Residual Analysis for
Independence

The Durbin–Watson Statistic


Used when data are collected over time to detect
autocorrelation (residuals in one time period are
related to residuals in another period).
Measures the violation of the independence
assumption
n
D
2
(
e

e
)
 i i1
i 2
n
2
e
i
i 1
Should be close to 2.
If not, examine the model
for autocorrelation.
Durbin–Watson Statistic in
Minitab

Minitab | Regression | Simple Linear
Regression …

Check the box for Durbin–Watson Statistic
Obtaining the Critical Values of
Durbin–Watson Statistic
Finding the Critical Values of Durbin–Watson Statistic
  5
k=1
k=2
n
dL
dU
dL
dU
15
1.08
1.36
0.95
1.54
16
1.10
1.37
0.98
1.54
Using the Durbin–Watson
Statistic
H0 :
H1
No autocorrelation (error terms are independent).
: There is autocorrelation (error terms are not
independent).
Reject H0
(positive
autocorrelation)
0
dL
Inconclusive
Accept H0
(no autocorrelation)
dU
2
4-dU
Reject H0
(negative
autocorrelation)
4-dL
4
Residual Analysis for
Independence
Graphical Approach

Not Independent
e
Independent
e
Time
Time
Cyclical Pattern
No Particular Pattern
Residual is plotted against time to detect any autocorrelation
Inference about the Slope:
t Test

t Test for a Population Slope


Null and Alternative Hypotheses



Is there a linear dependency of Y on X ?
H0:  = 0
H1:   0
(No Linear Dependency)
(Linear Dependency)
Test Statistic
b
t
where Sb 
Sb
df = n - 2
SYX
n
 (Xi
i 1
 X )2
Example: Produce Store
Data for Seven Stores:
Store
Square
Feet
Annual
Sales
($000)
1
2
3
4
5
6
7
1,726
1,542
2,816
5,555
1,292
2,208
1,313
3,681
3,395
6,653
9,543
3,318
5,563
3,760
Estimated Regression
Equation:
Yˆ  1636.415  1.487 X i
The slope of this
model is 1.487.
Does Square Footage
Affect Annual Sales?
Inferences about the Slope:
t Test Example
Test Statistic:
H0:  = 0
From Excel Printout
Sb t
H1:   0
Coefficients Standard Error t Stat P-value
  0.05
Intercept
1636.4147
451.4953 3.6244 0.01515
df  7 - 2 = 5
Footage
1.4866
0.1650 9.0099 0.00028
Critical Value(s):
b
Reject
0.025
Decision:
Reject H0
Reject
0.025
−2.5706 0 2.5706
t
Conclusion:
There is evidence that
square footage affects
annual sales.
Inferences about the Slope:
F Test

F Test for a Population Slope


Null and Alternative Hypotheses



Is there a linear dependency of Y on X ?
H0:  = 0
H1:   0
(No Linear Dependency)
(Linear Dependency)
Test Statistic
SSR
1
F 
SSE
 n  2

Numerator df=1, denominator df=n-2
Relationship between a t Test
and an F Test

Null and Alternative Hypotheses


H0:  = 0
H1:   0
t 
n2
(No Linear Dependency)
(Linear Dependency)
2
 F1,n 2
Inferences about the Slope:
F Test Example
Test Statistic:
H0:  = 0
From Excel Printout
ANOVA
H1:   0
df
SS
MS
F Significance F
  0.05
Regression
1 30,380,456.12 30,380,456.12 81.179
0.000281
numerator Residual
5 1,871,199.59
374,239.92
df = 1
Total
6 32,251,655.71
denominator
df  7 − 2 = 5
Decision: Reject H0
Reject
Conclusion:
 = 0.05
0
6.61
F1, n  2
There is evidence that
square footage affects
annual sales.
Purpose of Correlation Analysis
(continued)


Population Correlation Coefficient  (rho) is
used to measure the strength between the
variables.
Sample Correlation Coefficient r is an
estimate of  and is used to measure the
strength of the linear relationship in the
sample observations.
Sample of Observations from
Various r Values
Y
Y
r = −1
X
Y
r = −0.5
Y
X
X
r=0
Y
r = 0.5
X
r=1
X
Features of  and r





Unit Free
Range between −1 and 1
The Closer to −1, the Stronger the Negative
Linear Relationship.
The Closer to 1, the Stronger the Positive
Linear Relationship.
The Closer to 0, the Weaker the Linear
Relationship.
Pitfalls of Regression Analysis




Lacking an Awareness of the Assumptions
Underlining Least-squares Regression.
Not Knowing How to Evaluate the Assumptions.
Not Knowing What the Alternatives to Leastsquares Regression are if a Particular
Assumption is Violated.
Using a Regression Model Without Knowledge
of the Subject Matter.
Strategy for Avoiding the Pitfalls
of Regression



Start with a scatter plot of X on Y to observe
possible relationship.
Perform residual analysis to check the
assumptions.
Use a normal probability plot of the residuals
to uncover possible non-normality.
Strategy for Avoiding the Pitfalls
of Regression
(continued)


If there is violation of any assumption, use
alternative methods (e.g., least absolute
deviation regression or least median of squares
regression) to least-squares regression or
alternative least-squares models (e.g.,
curvilinear or multiple regression).
If there is no evidence of assumption violation,
then test for the significance of the regression
coefficients and construct confidence intervals
and prediction intervals.
Chapter Summary






Introduced Types of Regression Models.
Discussed Determining the Simple Linear
Regression Equation.
Described Measures of Variation.
Addressed Assumptions of Regression and
Correlation.
Discussed Residual Analysis.
Addressed Measuring Autocorrelation.
Chapter Summary



(continued)
Described Inference about the Slope.
Discussed Correlation—Measuring the
Strength of the Association.
Discussed Pitfalls in Regression and Ethical
Issues.
Download