Why_Deisgn.ppt

advertisement
Why Design?
(why not just observe and model?)
Q: Why Experimental Design
A: To avoid multicollinearity
Issues:
(1) Testing joint importance versus individual significance
Two engine plane can still fly if engine #1 fails
Two engine plane can still fly if engine #2 fails
Neither is critical individually
Jointly critical (can’t omit both!!)
(2) Prediction versus modeling individual effects
(3) Collinearity (correlation among inputs)
Example: Hypothetical company’s sales Y depend on TV
advertising X1 and Radio Advertising X2.
Y = b0 + b1X1 + b2X2 +e
Data Sales;
input store TV radio sales;
(more code)
cards;
1 869 868 9089
2 836 820 8290
(more data)
40 969 961 10130
Sales
Radio
TV
proc g3d data=sales;
scatter radio*TV=sales/shape=sval color=cval zmin=8000;
run;
Conclusion: Can predict well with just TV, just radio, or both!
SAS code:
proc reg data=next; model sales = TV radio;
Analysis of Variance
Source
Model
Error
Corrected Total
Root MSE
Sum of
Squares
32660996
1683844
34344840
DF
2
37
39
213.32908
Mean
Square
16330498
45509
R-Square
F Value
358.84
Pr > F
<.0001 (Can’t omit both)
0.9510  Explaining 95% of variation in sales
Parameter Estimates
Variable
Intercept
TV
radio
DF
1
1
1
Parameter
Estimate
531.11390
5.00435
4.66752
Standard
Error
359.90429
5.01845
4.94312
t Value
1.48
1.00
0.94
Pr > |t|
0.1485
0.3251 (can omit TV)
0.3512 (can omit radio)
Estimated Sales = 531 + 5.0 TV + 4.7 radio with error variance 45509 (standard deviation 213).
TV approximately equal to radio so, approximately
Estimated Sales = 531 + 9.7 TV
or
Estimated Sales = 531 + 9.7 radio
Regression
The REG Procedure
Model: MODEL1
Dependent Variable: sales
Number of Observations Read
Number of Observations Used
40
40
Analysis of Variance
Source
DF
Sum of
Squares
Model
Error
Corrected Total
2
37
39
32660996
1683844
34344840
Root MSE
Dependent Mean
Coeff Var
213.32908
9955
2.14291
R-Square
Adj R-Sq
Mean
Square
16330498
45509
F Value
Pr > F
358.84
<.0001
0.9510
0.9483
Parameter Estimates
Variable
Intercept
TV
radio
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
531.11390
5.00435
4.66752
359.90429
5.01845
4.94312
1.48
1.00
0.94
0.1485
0.3251
0.3512
Design
The REG Procedure
Model: MODEL1
Dependent Variable: SALES
Number of Observations Read
Number of Observations Used
40
40
Analysis of Variance
Source
DF
Sum of
Squares
Model
Error
Corrected Total
2
37
39
32641505
1683699
34325204
Root MSE
Dependent Mean
Coeff Var
213.31990
10300
2.07111
R-Square
Adj R-Sq
Mean
Square
16320753
45505
F Value
Pr > F
358.66
<.0001
0.9509
0.9483
Parameter Estimates
Variable
Intercept
TV
Radio
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
530.72803
5.00492
4.66742
366.53079
0.25552
0.25552
1.45
19.59
18.27
0.1560
<.0001
<.0001
X1 X2
1

1
1

1
1

1
X 
1

1
1

1

1
1

1 1
1 1
1

1
1

1
1

1
X 
1

1
1

1

1
1

1 1
1 1
1 1
1 1
1 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Design matrix
-1 for low level +1 for high
12 obs.
1

1
1

1
1
0
0
0.10 
 0.15



1
0
0.15 0.10
0  2
1 2

, (X ' X )  

 0
1
0.10 0.15
0 



1
0
0
0.15 
 0.10
1 
1
Var (0.5(estimated effect ))  0.15 2

1
1 
1

1
1

1
1
0
0
0 
 0.083



1
0
0.083
0
0  2
1 2

, (X ' X )  

 0
1
0
0.083
0 



1
0
0
0.083 
 0
1
1
Var (0.5(estimated effect ))  0.08333 2

1
1 
High
Low
High
5
1
Low
1
5
High
Low
High
3
3
Low
3
3
Download