The Use of Dummy Variables

The Use of Dummy Variables Dummy variables are artificially defined variables designed to convert a model including categorical independent variables to the standard multiple regression model. Comparison of Slopes of k Regression Lines with Common Intercept Situation: - k treatments or k populations are being compared. - For each of the k treatments we have measured both Y (the response variable) and X (an independent variable) - Y is assumed to be linearly related to X with the slope dependent on treatment (population), while the intercept is the same for each treatment The Model: Y = β0 + β 1( i ) X + ε for treatment i (i = 1, 2, ... , k) Graphical Illustration of the above Model 120 Treat k 100 Treat 3 ..... Treat 2 80 Treat 1 y 60 40 Different Slopes 20 Common Intercept 0 0 x 10 20 30 This model can be artificially put into the form of the Multiple Regression model by the use of dummy variables to handle the categorical variable Treatments. Dummy variables are variables that are artificially defined: In this case we define a new variable for each category of the categorical variable. That is we will define Xi for each category of treatments as follows: Then the model can be written as follows: X if the subject receives treatment i Xi = 0 otherwise  The Complete Model: (in Multiple Regression Format) (2) (k) Y = β0 + β(1) 1 X1 +β 1 X2+ ... + β 1 Xk+ ε X if the subject receives treatment i where Xi = 0 otherwise  Dependent Variable: Y page 67 Independent Variables: X1, X2, ... , Xk In the above situation we would likely be interested in testing the equality of the slopes. Namely the Null Hypothesis H0: β 1(1) = β 1(2 ) = = β 1(k ) = β 1 (q = k-1) In this situation the model would become as follows The Reduced Model: Y = β0 + β1X + ε Dependent Variable: Independent Variables: Y X = X1 + X2 + ... + X2 The Anova Table to carry out this test would take on the following form: The Anova Table : Source df Sum of Squares Mean Square 1 SSReg 1 SSReg 1 /s2 MSReg k -1 SSH0 1 SSH0 k-1 MSH0 N-k-1 SSError s2 Regression 1 (for the reduced model) Departure from H0 F s2 (Equality of Slopes) Residual (Error) Total N-1 SSTotal (N= The total number of cases = n1 + n2 + ... + nk and ni = the number of cases for treatment i) Example In the following example we are measuring Yield Y as it dependents on the amount of pesticide X. Again we will assume that the dependence will be linear. (I should point out that the concepts that are used in this discussion can easily be adapted to the non-linear situation.) Suppose that the experiment is going to be repeated for three brands of pesticides - A, B and C. The quantity, X, of pesticide in this experiment was set at 4 different levels 2 units/hectare, 4 units/hectare and 8 units per hectare. Four test plots were randomly assigned to each of the nine combinations of test plot and level of pesticide. Note that we would expect a common intercept for each brand of pesticide since when the amount of pesticide, X, is zero the four brands of pesticides would be equivalent. page 68 The data for this experiment is given in the following table: 2 29.63 31.87 28.02 35.24 32.95 24.74 23.38 32.08 28.68 28.70 22.67 30.02 A B C 4 28.16 33.48 28.13 28.25 29.55 34.97 36.35 38.38 33.79 43.95 36.89 33.56 8 28.45 37.21 35.06 33.99 44.38 38.78 34.92 27.45 46.26 50.77 50.21 44.14 A graph of the data is displayed below: 60 40 A B C 20 0 0 1 2 3 4 page 69 5 6 7 8 The data as it would appear in a data file. The variables X1, X2 and X3 are the “dummy” variables Pesticide A A A A B B B B C C C C A A A A B B B B C C C C A A A A B B B B C C C C X (Amount) 2 2 2 2 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 4 4 4 8 8 8 8 8 8 8 8 8 8 8 8 X1 2 2 2 2 0 0 0 0 0 0 0 0 4 4 4 4 0 0 0 0 0 0 0 0 8 8 8 8 0 0 0 0 0 0 0 0 X2 0 0 0 0 2 2 2 2 0 0 0 0 0 0 0 0 4 4 4 4 0 0 0 0 0 0 0 0 8 8 8 8 0 0 0 0 page 70 X3 0 0 0 0 0 0 0 0 2 2 2 2 0 0 0 0 0 0 0 0 4 4 4 4 0 0 0 0 0 0 0 0 8 8 8 8 Y 29.63 31.87 28.02 35.24 32.95 24.74 23.38 32.08 28.68 28.70 22.67 30.02 28.16 33.48 28.13 28.25 29.55 34.97 36.35 38.38 33.79 43.95 36.89 33.56 28.45 37.21 35.06 33.99 44.38 38.78 34.92 27.45 46.26 50.77 50.21 44.14 Fitting the complete model ANOVA Regression Residual Total Intercept X1 X2 X3 df 3 32 35 SS 1095.815813 637.6415754 1733.457389 MS F Significance F 365.2719378 18.33114788 4.19538E-07 19.92629923 Coefficients 26.24166667 0.981388889 1.422638889 2.602400794 Fitting the Reduced model ANOVA Regression Residual Total Intercept X df 1 34 35 SS 623.8232508 1109.634138 1733.457389 MS F Significance F 623.8232508 19.11439978 0.000110172 32.63629818 Coefficients 26.24166667 1.668809524 The Anova Table for testing the equality of slopes common slope zero Slope comparison Residual Total df 1 2 32 35 SS 623.8232508 471.9925627 637.6415754 1733.457389 page 71 MS F Significance F 623.8232508 31.3065283 3.51448E-06 235.9962813 11.84345766 19.92629923 0.000141367 Comparison of Intercepts of k Regression Lines with a Common Slope (One-way Analysis of Covariance) Situation: - k treatments or k populations are being compared. - For each of the k treatments we have measured both Y (then response variable) and X (an independent variable) - Y is assumed to be linearly related to X with the intercept dependent on treatment (population), while the slope is the same for each treatment. - Y is called the response variable, while X is called the covariate. The Model: Y = β(i) 0 + β1X + ε for treatment i (i = 1, 2, ... , k) Graphical Illustration of the One-way Analysis of Covariance Model 200 Treat k Treat 3 y Treat 2 Treat 1 100 Common Slopes 0 0 x 10 20 30 Equivalent Forms of the Model: _ Y = µi + β1(X - X ) + ε (treatment i), where µi = the adjusted mean for treatment i _ 2) Y = µ + αi + β1(X - X ) + ε (treatment i), where µ = the overall adjusted mean response αi = the adjusted effect for treatment i µi = µ + α i The Complete Model: (in Multiple Regression Format) Y = β0 + δ1X1 + δ2X2+ ... + δk-1Xk-1+ β1X + ε 1 if the subject receives treatment i where Xi = 0 otherwise  (i) Comment: β 0 = β0 + δi for treatment i = 1, 2, 3, .., k-1; and β(k) 0 = β0 . Dependent Variable: Y Independent Variables: X1, X2, ... , Xk-1, X 1) page 72 Testing for the Equality of Intercepts (Treatments) (2) (k) H0: β(1) 0 = β 0 = ... = β 0 (= β0 say) (q = k-1) ( or δ1 = δ2 = ... = δk-1= 0) The Reduced Model: Y = β0 + β1X + ε Dependent Variable: Independent Variables: Y X The Anova Table (Analysis of Covariance Table): Source df Sum of Squares Mean Square F 1 SSReg 1 SSReg 1 /s2 MSReg k -1 SSH0 1 SSH0 k-1 MSH0 N-k-1 SSError s2 N-1 SSTotal Regression 1 (for the reduced model) Departure from H0 s2 (Equality of Intercepts (Treatments)) Residual (Error) Total where and N = The total number of cases = n1 + n2 + ... + nk ni = the number of cases for treatment i An Example In this example we are comparing four treatments for reducing Blood Pressure in Patients whose blood pressure is abnormally high. Ten patients are randomly assigned to each of the four treatment groups. In addition to the drop in blood pressure (Y) during the test period the initial blood pressure (X) prior to the test period was also recorded. It was thought that this would be correlated with X. The data is given below for this experiment. 1 2 3 4 5 6 7 8 9 10 Treatment case 1 X 186 185 199 167 187 168 183 176 158 190 Y 34 36 41 34 36 38 39 34 37 35 2 X 183 202 149 187 182 139 167 192 160 185 Y 29 36 27 29 27 28 22 32 26 30 3 X 182 168 175 174 183 182 181 148 205 188 Y 27 30 28 31 28 25 27 25 32 25 4 X 176 202 159 164 176 173 159 167 174 175 Y 26 26 20 18 27 20 24 22 22 25 page 73 The data as it would appear in a data file: X Y Treatment 186 185 199 167 187 168 183 176 158 190 183 202 149 187 182 139 167 192 160 185 182 168 175 174 183 182 181 148 205 188 176 202 159 164 176 173 159 167 174 175 34 36 41 34 36 38 39 34 37 35 29 36 27 29 27 28 22 32 26 30 27 30 28 31 28 25 27 25 32 25 26 26 20 18 27 20 24 22 22 25 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 X1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 X2 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 page 74 X3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 The Complete Model ANOVA df Regression Residual Total Intercept X1 X2 X3 X 4 35 39 SS 1000.862103 239.0378966 1239.9 MS 250.2155258 6.829654189 F Significance F 36.6366318 4.66264E-12 SS 187.7440297 1052.15597 1239.9 MS 187.7440297 27.68831501 F Significance F 6.78062315 0.013076205 Coefficients 6.360395468 12.68618508 5.397430901 4.211584999 0.096461476 The Reduced Model ANOVA df Regression Residual Total Intercept X 1 38 39 Coefficients 2.991349082 0.147157885 The Anova Table for comparing intercepts: ANOVA Testing for slope Comparison of intercepts Residual Total df 1 3 35 39 SS 187.7440297 813.1180737 239.0378966 1239.9 page 75 MS 187.7440297 271.0393579 6.829654189 F Significance F 27.48953674 7.68771E-06 39.68566349 2.32981E-11

The Use of Dummy Variables

Related documents

Products

Support

The Use of Dummy Variables

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib