Topic 8 - Pegasus @ UCF

Lecture & Examples Topic 8: Models with Qualitative Independent Variable Model with One Qualitative Independent Variable with k Levels: Suppose we want to develop a model for the mean yield per acre, E(y), of four different varieties of snow peas (A, B, C, and D). Notice that we can not assign a quantitative measure for a given variety of snow pea. Although we can assign 1, 2, 3, and 4 to these four varieties of snow peas, these numbers have no meaningful quantitative interpretation. To solve this problem, we introduce the concept of a dummy variable. Let  1 if the snow pea is variety A x1   0 if the snow pea is other variety  1 if x2   0 if  1 if x3   0 if the snow pea is variety B the snow pea is other variety the snow pea is variety C the snow pea is other variety Then, we can write the following model equation: y   0  1 x1   2 x2  3 x3   . 1 Suppose that  A ,  B ,  C ,  D is the mean yield for variety A, B, C, and D, respectively. Now, we can represent the mean yield of variety B by checking the dummy variable x1, x2, and x3. We can see that we should use x1 = 0, x2 = 1 and x3 = 0 to get  B  E ( y )   0  1 (0)   2 (1)  3 (0)   0   2 . Similarly, we can find that  A   0  1 ,  C   0  3 , and  D   0 . In general, we can write the model with one qualitative independent variable with k levels as follows: Step 1: Use k1 dummy variables. Step 2: Let xi be the dummy variable for level i, for i = 1 to k1. Step 3: The model equation is y   0  1 x1   2 x 2   k 1 xk 1    1 if y is observed at level i x  where i 0 otherwise  Step 4: The unknown parameters and the mean effect of each level have the following relationship: 2 1   0  1  2  0  2  3  0  3   k 1   0   k 1  k  0 . Also, we have the following relationship: 0   k 1  1   k 2   2  k 3   3   k  k 1    k 1   k . 3 Step 5: The assumptions about the error terms for a model with qualitative independent variables are similar to the assumptions for a model with quantitative independent variables.  E() = 0;  Var() = 2;  The error for each observation comes from a normal population;  Error terms are independent. 4 Example 12.15: The following model was used to relate E(y) to a single qualitative variable with four levels: E ( y )   0  1 x1   2 x 2  3 x3  1 if the first level x1   0 if the other level 1 if x2   0 if where 1 if x3   0 if the second level the other level the third level the other level This model fits to n = 40 observations and the regression prediction equation is yˆ  87  63 x1  45 x2  57 x3 . 5 (a) Use the least squares prediction equation to find the estimate of E(y) for each level of the qualitative independent variable. Solution: ˆ 1  ˆ 0  ˆ 1  87  63  150 ˆ 2  ˆ 0  ˆ 2  87  45  132 ˆ  ˆ  ˆ  87  57  144 3 0 3 ˆ 4  ˆ 0  87 (b) Specify the null and alternative hypotheses you would use to test whether E(y) is the same for all four levels of the dependent variable. Solution: H 0 : 1   2  3  0 H a : at least one i  0 6 Example 12.16: A large company in Iowa is currently investigating five varieties of snow peas. The yields produced from each plot are shown in Table 12.13. Table 12.13 Data for Example 12.16 Variety A 26.2 24.3 21.8 28.1 Variety B 29.2 28.1 27.3 31.2 Variety C 29.1 30.8 33.9 32.8 Variety D 21.3 22.4 24.3 21.8 Variety E 20.1 19.3 19.9 22.1 We define the dummy variables as follows: x1 = 1 for variety A x2 = 1 for variety B x3 = 1 for variety C x4 = 1 for variety D 7 SAS Printout analysis with Regression Model: MODEL1 Dependent Variable: Y Analysis of Variance Source Model Error C Total DF 4 15 19 Root MSE Dep Mean C.V. Sum of Squares 342.04000 53.52000 395.56000 1.88892 25.70000 7.34986 Mean Square 85.51000 3.56800 R-square Adj R-sq F Value 23.966 Prob>F 0.0001 0.8647 0.8286 Parameter Estimates Variable INTERCEP X1 X2 X3 X4 Parameter Estimate 20.350000 4.750000 8.600000 11.300000 2.100000 Standard Error 0.94445752 1.33566463 1.33566463 1.33566463 1.33566463 T for H0: Parameter=0 21.547 3.556 6.439 8.460 1.572 Prob > |T| 0.0001 0.0029 0.0001 0.0001 0.1367 (a) Find  A ,  B ,  C ,  D and  E . Solution:  A   0  1  20.35  4.75  25.10  B   0   2  20.35  8.60  28.95  C   0   3  20.35  11.30  31.65  D   0   4  20.35  2.10  22.45  E   0  20.35 8 (b) Report the least-squares prediction model from the SAS printout with regression analysis. Solution: yˆ  20.35  4.75 x1  8.60 x 2  11.30 x3  2.10 x 4 (c) What null and alternative hypotheses are tested by the global F-test for this model? Interpret the hypotheses both in terms of the  coefficients and the mean yields for the five varieties of peas. Solution: H 0 : 1   2  3   4  0 H a : at least one i  0 or H 0 : A  B  C  D  E H a : at least one  i   j (d) Test the hypotheses in part (c) at  = 0.05. Solution: Test Statistic: Fc = 23.966 Rejection Region: F > 3.06 9 Thus, reject the null hypothesis and we can conclude that at least one pair of mean yields are not equal. (e) Place a 95% confidence interval on the difference between the mean yields of varieties D and E. Solution: 95% confidence = ˆ 4  t0.025,15  sˆ 4 = 2.10  2.1311.33566463 = [0.75, 4.94] (f) Place a 95% confidence interval on the difference between the mean yields of varieties D and A. Note: (1)  D   A   0   4    0  1    4  1 (2) s xD  xA  s 1 1 1 1   1.88892    1.336 nD nA 4 4   95% confidence interval = ˆ 4  ˆ 1  t0.025,15  s x =(2.10  4.75)  2.131 1.336 =[5.497, 0.197] D  xA 10 SAS Printout Analysis with Complete Randomized Design Analysis of Variance Procedure Dependent Variable: Y Source DF Model 4 Error 15 Corrected Total 19 R-Square 0.864698 Source VARIETY DF 4 Sum of Squares 342.04000000 53.52000000 395.56000000 Mean Square 85.51000000 3.56800000 C.V. 7.349864 Anova SS 342.04000000 Root MSE 1.8889150 Mean Square 85.51000000 F Value 23.97 Pr > F 0.0001 Y Mean 25.700000 F Value 23.97 Pr > F 0.0001 Analysis of Variance Procedure Level of VARIETY 1 2 3 4 5 N 4 4 4 4 4 --------------Y-------------Mean SD 25.1000000 2.69196335 28.9500000 1.69016764 31.6500000 2.12994523 22.4500000 1.31275791 20.3500000 1.21518174 (g) What are the null and alternative hypotheses tested by the above SAS Printout? Solution: H 0 : A  B  C  D  E H a : at least one  i   j Test Statistic: Fc = 23.97 Rejection Region: F > 3.06 Thus, reject the null hypothesis and we can conclude that at least one pair of mean yields are not equal. 11 12

Topic 8 - Pegasus @ UCF

Related documents

Products

Support

Topic 8 - Pegasus @ UCF

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib