403: Quantitative Business Analysis for Decision Making

advertisement
Quantitative Business Analysis for
Decision Making
Simple Linear
Regression
Lecture Outlines
Scatter Plots
 Correlation Analysis
 Simple Linear Regression Model
 Estimation and Significance Testing
 Coefficient of Determination
 Confidence and Prediction Intervals
 Analysis of Residuals

403.7
2
Regression Analysis ?
Regression analysis is used for modeling
the mean of “response” variable Y as a
function of “predictor” variables X1, X2,..,
Xk.
When K = 1, it is called simple regression
analysis.
403.7
3
Random Sample
Y: Response Variable,
X: Predictor Variable
For each unit in a random sample of n, the pair
(X, Y) is observed resulting a random sample:
(x1, y1), (x2, y2),... (xn, yn)
403.7
4
Scatter Plot
Scatter Plot is a graphical displays of the
sample (x1, y1), (x2, y2),... (xn, yn) by n
points in 2-dimension.
It will suggest if there is a relationship
between X and Y
403.7
5
A Scatter Plot Showing Linear Trend
A Scatter Plot Showing Linear Trend
of Peoples Ratings and Nielsen Ratings
PeopleM
25
20
15
16
21
26
Nielsen
403.7
6
A Scatter Plot Showing No Linear
Trend
A Scatter Plot Showing No Linear Trend
of Today's With Yesterday's DJIA
Yesterda
1
0
-1
-1
0
1
Today
403.7
7
Modeling linear Trend
A perfect linear relationship between Y
and X exists if Y    X .
Coefficient  of X is the slope--quantifying
the amount of change in y corresponding
to one unit change in x.
 There are no perfect linear relationships
in practical world.

403.7
8
Simple Linear Regression Model
Model:

Y    X  
 X is linear function (nonrandom)

is random error. It is assumed to be
normally distributed mean 0 and standard
deviation  . So  y    X
  ,  and  are parameters of the model

403.7
9
Estimation
Simple linear regression analysis estimates the mean of
Y (linear trend)  y    X by yˆ  a  bx
a  y  bx
and
( x  x )( y  y )

b
 (x  x)
2
403.7
10
Standard deviation
Standard deviation (s) of the sample of
n points in the scatter plot around the
estimated regression line yˆ  a  bx
is:
s
2
ˆ
 y  y 
n2
403.7
11
Testing the Slope of Linear Trend
For Testing
H 0 :    0 vs. H a :    0
compute t-statistic and its p value:
b - 0
t - statistic 
sb
403.7
12
Coefficient of Determination: R2
A quantification of the significance of
estimated model yˆ  a  bx is denoted by
R2.
 R2 > 85% = significant model
 R2 < 85% = model is perceived as
inadequate
 Low R2 will suggest a need for additional
predictors for modeling the mean of Y

403.7
13
Correlation Coefficient: r
The correlation coefficient r is the square
root of R2. It is a number between -1 and 1.
– Closer r is to -1 or 1, the stronger is the
linear trend
– Its sign is positive for increasing trend
(slope b is positive)
– Its sign is negative for decreasing trend
(slope b is negative)
403.7
14
Confidence and Prediction
Intervals
To estimate  y    x by a confidence
interval, or to predict response Y
corresponding to its predictor value x = x0
– 1. Compute: yˆ  a  bx0
– 2. compute:
yˆ  s.e. yˆ 
403.7
15
What is s.e. yˆ  ?
i.e. Standard Error of
For estimating
ŷ
 y,
2
1
( x  x0 )

s.e.( yˆ )  s

 n  (x  x)2





For Predicting Y,
2
 1
( x  x0 )

s.e.( yˆ )  s 1  
 n  (x  x)2

403.7




16
Analysis of Residuals
Residuals are defined:
ei  yi  yˆ i , i  1, 2,....n
Residual analysis is used to check the
normality and homogeneity of variance
assumptions of random errors  .
 Histogram or box plot of residuals will
help to ascertain if errors  are
normally distributed.

403.7
17
Analysis of Residuals (con’t)
Plot of residual e i against observed
predictor values xi will help ascertain
homogeneity assumption.
– random appearance = homogeneity of
variance assumption is valid.
– non-random appearance =homogeneity
assumption is not valid and variance is
dependent on predictor values.
403.7
18
Download