何謂誤差平方和?

advertisement
The Simple Linear Regression Model

Simple Linear Regression Model
y = 0 + 1 x + 

Simple Linear Regression Equation
E(y) = 0 + 1x

Estimated Simple Linear Regression Equation
y^ = b0 + b1x
Slide 1
最小平方直線(最佳預測直線)


通過平面分佈圖資料點的直線中,使預測誤差平方和
爲最小者即稱爲最小平方直線,而此方法即稱爲最小
平方法(Least Square Method)
何謂誤差平方和?

設 ( x1 , y1 ), ( x2 , y2 ),...,( xn , yn )爲n個資料點,若以
y  b0  b1 x 做

爲以X預測Y的直線,則當X=x1,預測值
y1  b0  b1 x 與實際觀

察的y1之差異 y1  y1 即稱爲預測誤差,誤差平方和即定義爲
n

n
f (b0 , b1 )   ( yi  y i )   ( yi  b0  b1 xi ) 2
i 1
2
i 1
求 b0 , b1使函數 f 爲最小時,由微積分解“極大或極小”方法。
Slide 2
最小平方直線
解此聯立方程組
可得
 f (b0 , b1 )
0
 b
0
 f (b , b )
0 1

0
 b1

xi yi  ( xi  yi ) / n

b1  :
2
2
x

(
x
)

 i  i /n
b  y  b x
1
 0
故最小平方直線為 yˆ  b0  b1x  y  b1 x  b1x  y  b1( x  x)
Slide 3
Example: Reed Auto Sales

Simple Linear Regression
Reed Auto periodically has a special week-long sale.
As part of the advertising campaign Reed runs one or
more television commercials during the weekend
preceding the sale. Data from a sample of 6 previous
sales are shown below.
Number of TV Ads
1
3
2
1
3
2
Number of Cars Sold
14
24
18
17
27
22
Slide 4
Example: Reed Auto Sales



Slope for the Estimated Regression Equation
b1 = 264 - (12)(122)/5 = 5
28 - (12)2/5
y-Intercept for the Estimated Regression Equation
b0 = 20.333 - 5(2) = 10.333
Estimated Regression Equation
y^ = 10.333 + 5x
Slide 5
Example: Reed Auto Sales
Scatter Diagram
30
Cars Sold

20
10
0
0
1
2
3
4
TV ad
Slide 6
The Coefficient of Determination

Relationship Among SST, SSR, SSE
SST = SSR + SSE
2
2
^ )2
 ( y i  y )   ( y^i  y )   ( y i  y
i

Coefficient of Determination
r2 = SSR/SST
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
Slide 7
判定係數



定義: r2 = SSR/SST
用以表示Y的變異數中已被X解釋的部分(比率)
• 當r2 愈大時,表示最小平方直線愈精確
• 1- r2為總變異數(SST)中無法由X解釋的餘量(剩餘的比率)
Example: Reed Auto Sales
• r2 = SSR/SST = 100/117.333 = .852273
•
表示汽車銷售量的差異與變化有85.2%可由“廣告次數”這個
因素來解釋(而有14.8%無法由“廣告次數”所解釋)
Slide 8
The Correlation Coefficient

Sample Correlation Coefficient
rxy  (sign of b1 ) Coefficien t of Determinat ion
rxy  (sign of b1 ) r 2
where:
b1 = the slope of the estimated regression
equation yˆ  b0  b1 x
Slide 9
Example: Reed Auto Sales

Sample Correlation Coefficient
rxy  (sign of b1 ) r 2
The sign of b1 in the equation yˆ  10.333  5 x is “+”.
rxy   0.852273
rxy = +.923186
Slide 10
Model Assumptions

Assumptions About the Error Term 
• The error  is a random variable with mean of
zero.
• The variance of  , denoted by  2, is the same for
all values of the independent variable.
• The values of  are independent.
• The error  is a normally distributed random
variable.
Slide 11
Testing for Significance



To test for a significant regression relationship, we
must conduct a hypothesis test to determine whether
the value of 1 is zero.
Two tests are commonly used
• t Test
• F Test
Both tests require an estimate of  2, the variance of 
in the regression model.
Slide 12
Testing for Significance

An Estimate of  2
The mean square error (MSE) provides the estimate
of  2, and the notation s2 is also used.
s2 = MSE = SSE/(n-2)
where:
SSE   (yi  yˆi ) 2   ( yi  b0  b1 xi ) 2
Slide 13
Testing for Significance

An Estimate of 
• To estimate  we take the square root of  2.
• The resulting s is called the standard error of the
estimate.
SSE
s  MSE 
n2
Slide 14
Testing for Significance: t Test

Hypotheses
H0 :  1 = 0
Ha :  1 = 0


Test Statistic
Rejection Rule
b1
t
sb1
where sb1 
s
2
(
x

x
)
 i
Reject H0 if t < -t or t > t
where t is based on a t distribution with
n - 2 degrees of freedom.
Slide 15
Example: Reed Auto Sales

t Test
• Hypotheses
• Rejection Rule
H 0 : 1 = 0
H a : 1 = 0
For  = .05 and d.f. = 4, t.025 = 2.776
Reject H0 if t > 2.776
• Test Statistics
t = 5/1.0408 = 4.804
• Conclusions
Reject H0
• P-value
2P{T>4.804}=0.0086 <0.05
Reject H0
Slide 16
Confidence Interval for 1


We can use a 95% confidence interval for 1 to test
the hypotheses just used in the t test.
H0 is rejected if the hypothesized value of 1 is not
included in the confidence interval for 1.
Slide 17
Confidence Interval for 1

The form of a confidence interval for 1 is:
b1  t / 2 sb1
where
b1
t / 2 sb1
t / 2
is the point estimate
is the margin of error
is the t value providing an area
of /2 in the upper tail of a
t distribution with n - 2 degrees
of freedom
Slide 18
Example: Reed Auto Sales



Rejection Rule
Reject H0 if 0 is not included in the confidence
interval for 1.
95% Confidence Interval for 1
b1  t / 2 sb1 = 5  2.776(1.0408) = 5  2.89
or 2.11 to 7.89
Conclusion
Reject H0
Slide 19
Testing for Significance: F Test

Hypotheses
H 0 : 1 = 0
H a : 1 = 0

Test Statistic
F = MSR/MSE

Rejection Rule
Reject H0 if F > F
where F is based on an F distribution with 1 d.f. in
the numerator and n - 2 d.f. in the denominator.
Slide 20
Example: Reed Auto Sales

F Test
• Hypotheses
• Rejection Rule
H 0 : 1 = 0
H a : 1 = 0
For  = .05 and d.f. = 1, 4: F.05 = 7.709
Reject H0 if F > 7.709.
• Test Statistic
F = MSR/MSE = 100/4.333 = 23.077
• Conclusion
We can reject H0.
Slide 21
Some Cautions about the
Interpretation of Significance Tests


Rejecting H0: 1 = 0 and
concluding that the relationship
between x and y is significant
does not enable us to conclude
that a cause-and-effect
relationship is present between x
and y.
Just because we are able to reject
H0: 1 = 0 and demonstrate
statistical significance does not
enable us to conclude that there
is a linear relationship between x
and y.
Slide 22
Using the Estimated Regression Equation
for Estimation and Prediction

Confidence Interval Estimate of E(yp)
y p  t /2 s y p

Prediction Interval Estimate of yp
yp + t/2 sind
where the confidence coefficient is 1 -  and
t/2
is based on a t distribution with n - 2 d.f.
s yˆ p is the standard error of the estimate of E(yp)
sind
is the standard error of individual
ˆp
estimate of y
Slide 23
Standard Errors of Estimate of
E(yp) and yp
s yˆ p
( x0  x ) 2
1
S

n  ( xi  x ) 2
sind
( x0  x ) 2
1
 S 1 
n  ( xi  x ) 2
Slide 24
E(yp) 與yp估計式的變異數
2

y 的變異數:

b1 ( x0  x) 的變異數: S

的變異數:  2

E( yp)  0  1 x0 估計式的變異數:
n
2
b1
( x0  x) 
2
 2 ( x0  x) 2
2
(
x

x
)
 i
Var( yˆ )  Var[ y  b1 ( x0  x)]
 2 where 2s (x0 ( xsx )x )2


n

b1

2
2
(
x

x
)
 i
i
yp  0  1 x0   估計式的變異數:
Var ( yˆ )   2
Slide 25
Example: Reed Auto Sales



Point Estimation
If 3 TV ads are run prior to a sale, we expect the
mean number of cars sold to be:
y^ = 10.333 + 5(3) = 25.333 cars
Confidence Interval for E(yp)
95% confidence interval estimate of the mean number
of cars sold when 3 TV ads are run is:
25.333 + 3.730 = 21.603 to 29.063 cars
Prediction Interval for yp
95% prediction interval estimate of the number of
cars sold in one particular week when 3 TV ads are
run is:
25.333 + 6.878 = 18.455 to 32.211 cars
Slide 26
Download