Why Design? (why not just observe and model?) Q: Why Experimental Design A: To avoid multicollinearity Issues: (1) Testing joint importance versus individual significance Two engine plane can still fly if engine #1 fails Two engine plane can still fly if engine #2 fails Neither is critical individually Jointly critical (can’t omit both!!) (2) Prediction versus modeling individual effects (3) Collinearity (correlation among inputs) Example: Hypothetical company’s sales Y depend on TV advertising X1 and Radio Advertising X2. Y = b0 + b1X1 + b2X2 +e Data Sales; input store TV radio sales; (more code) cards; 1 869 868 9089 2 836 820 8290 (more data) 40 969 961 10130 Sales Radio TV proc g3d data=sales; scatter radio*TV=sales/shape=sval color=cval zmin=8000; run; Conclusion: Can predict well with just TV, just radio, or both! SAS code: proc reg data=next; model sales = TV radio; Analysis of Variance Source Model Error Corrected Total Root MSE Sum of Squares 32660996 1683844 34344840 DF 2 37 39 213.32908 Mean Square 16330498 45509 R-Square F Value 358.84 Pr > F <.0001 (Can’t omit both) 0.9510 Explaining 95% of variation in sales Parameter Estimates Variable Intercept TV radio DF 1 1 1 Parameter Estimate 531.11390 5.00435 4.66752 Standard Error 359.90429 5.01845 4.94312 t Value 1.48 1.00 0.94 Pr > |t| 0.1485 0.3251 (can omit TV) 0.3512 (can omit radio) Estimated Sales = 531 + 5.0 TV + 4.7 radio with error variance 45509 (standard deviation 213). TV approximately equal to radio so, approximately Estimated Sales = 531 + 9.7 TV or Estimated Sales = 531 + 9.7 radio Regression The REG Procedure Model: MODEL1 Dependent Variable: sales Number of Observations Read Number of Observations Used 40 40 Analysis of Variance Source DF Sum of Squares Model Error Corrected Total 2 37 39 32660996 1683844 34344840 Root MSE Dependent Mean Coeff Var 213.32908 9955 2.14291 R-Square Adj R-Sq Mean Square 16330498 45509 F Value Pr > F 358.84 <.0001 0.9510 0.9483 Parameter Estimates Variable Intercept TV radio DF Parameter Estimate Standard Error t Value Pr > |t| 1 1 1 531.11390 5.00435 4.66752 359.90429 5.01845 4.94312 1.48 1.00 0.94 0.1485 0.3251 0.3512 Design The REG Procedure Model: MODEL1 Dependent Variable: SALES Number of Observations Read Number of Observations Used 40 40 Analysis of Variance Source DF Sum of Squares Model Error Corrected Total 2 37 39 32641505 1683699 34325204 Root MSE Dependent Mean Coeff Var 213.31990 10300 2.07111 R-Square Adj R-Sq Mean Square 16320753 45505 F Value Pr > F 358.66 <.0001 0.9509 0.9483 Parameter Estimates Variable Intercept TV Radio DF Parameter Estimate Standard Error t Value Pr > |t| 1 1 1 530.72803 5.00492 4.66742 366.53079 0.25552 0.25552 1.45 19.59 18.27 0.1560 <.0001 <.0001 X1 X2 1 1 1 1 1 1 X 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 X 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Design matrix -1 for low level +1 for high 12 obs. 1 1 1 1 1 0 0 0.10 0.15 1 0 0.15 0.10 0 2 1 2 , (X ' X ) 0 1 0.10 0.15 0 1 0 0 0.15 0.10 1 1 Var (0.5(estimated effect )) 0.15 2 1 1 1 1 1 1 1 0 0 0 0.083 1 0 0.083 0 0 2 1 2 , (X ' X ) 0 1 0 0.083 0 1 0 0 0.083 0 1 1 Var (0.5(estimated effect )) 0.08333 2 1 1 High Low High 5 1 Low 1 5 High Low High 3 3 Low 3 3