Chapter 5-4. Linear Regression Adjusted Means, ANOVA, ANCOVA, Dummy Variables, and Interaction In Stata version 11, there were some changes made on how to use categorical predictor variables, which made it easier. Both the current Stata 11 approach and the earlier approach, which will be called Stata 10, are given in this chapter, for the benefit of those with an earlier version. For this discussion, we will use an example published by Nawata et al (2004). The data were taken from the authors’ Figure 1, a scatterplot, and so only approximate the actual values used by the authors. File rmr.dta Codebook group urinary excretion of albumin group (U-Alb) a = U-Alb < 30 mg/d b = 30 mg/d ≤ U-Alb ≤ 300 mg/d c = 300 mg/d < U-Alb lbm lean body mass (kg) rmr resting metabolic rate (kJ/h/m2) In Chapter 5, we saw that the t test is identically a simple linear regression (univariable regression, without covariates). The linear regression predicted the mean outcome (unadjusted mean) for the two groups. A more convincing analysis, however, is to model the adjusted means, which are the group means adjusted for potential confounding variables. Look at the Nawata (2004) article. Notice the unadjusted means for RMR are reported in their Table 1, although significance between the means is tested both in an unadjusted fashion and adjusted fashion (last two columns of Table 1). Next, notice the adjusted means are given in the legend to Figure 1. In the authors statistical methods section, they stated they used analysis of variance (ANOVA) and analysis of covariance (ANCOVA). Although there are special routines in Stata to fit them more easily, analysis of variance (ANOVA) is just a linear regression with categorical predictor variables, and analysis of covariance (ANCOVA) is just a linear regression with both categorical variables and continuous variables. In ANOVA terminology, categorical variables are called factors and continuous variables are called covariates. _________________ Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah School of Medicine, 2010. Chapter 5-4 (revision 16 May 2010) p. 1 Opening the rmr dataset in Stata, File Open Find the directory where you copied the course CD Change to the subdirectory datasets & do-files Single click on rmr.dta Open use "C:\Documents and Settings\u0032770.SRVR\Desktop\ Biostats & Epi With Stata\datasets & do-files\rmr.dta", clear * which must be all on one line, or use: cd "C:\Documents and Settings\u0032770.SRVR\Desktop\" cd "Biostats & Epi With Stata\datasets & do-files" use rmr, clear Click on the data browser icon, and we discover the group variable is alphabetic (called a string variable, for “string of characters”). Stata displays string variables with a red font. Since arithmetic cannot be done on letters, we first need to convert the string variable group into a numeric variable, Data Create or change variables Other variable transformation commands Encode value labels from string variable Main tab: String variable: group Create a numeric variable: groupx OK encode group, generate(groupx) To see what this did, Statistics Summaries, tables & tests Tables Twoway tables with measures of association Row variable: group Group variables: groupx OK tabulate group groupx Chapter 5-4 (revision 16 May 2010) p. 2 | groupx group | a b c | Total -----------+---------------------------------+---------a | 10 0 0 | 10 b | 0 10 0 | 10 c | 0 0 12 | 12 -----------+---------------------------------+---------Total | 10 10 12 | 32 It looks like nothing happened. Actually the variable groupx has values 1, 2, and 3. It is just that value labels were assigned that matched the original variable. Actual value 1 2 3 Label a b c To see this crosstabulation without the value labels, we use, Statistics Summaries, tables & tests Tables Twoway tables with measures of association Row variable: group Group variables: groupx Check Suppress value labels OK tabulate group groupx, nolabel | groupx group | 1 2 3 | Total -----------+---------------------------------+---------a | 10 0 0 | 10 b | 0 10 0 | 10 c | 0 0 12 | 12 -----------+---------------------------------+---------Total | 10 10 12 | 32 This is still a nominal scaled variable, but that is okay for the ANOVA type commands in Stata, which create indicator variables “behind the scenes”. To test the hypothesis that the three unadjusted means are equal for rmr, similar to what Nawata did for his Table 1, we can compute a one-way ANOVA, where one-way implies one predictor variable, which is groupx. Chapter 5-4 (revision 16 May 2010) p. 3 Statistics Linear models and related ANOVA/MANOVA One-way ANOVA Response variable: rmr Factor variable: group <- string variable OK for oneway Output: produce summary table OK oneway rmr group, tabulate | Summary of rmr group | Mean Std. Dev. Freq. ------------+-----------------------------------a | 136.08333 16.940851 12 b | 132.7 19.032428 10 c | 166.5 19.019129 12 ------------+-----------------------------------Total | 145.82353 23.604661 34 Analysis of Variance Source SS df MS F Prob > F -----------------------------------------------------------------------Between groups 7990.92451 2 3995.46225 11.91 0.0001 Within groups 10396.0167 31 335.355376 -----------------------------------------------------------------------Total 18386.9412 33 557.180036 Bartlett's test for equal variances: chi2(2) = 0.1787 Prob>chi2 = 0.915 The p value from the Analysis of Variance table, “Prob > F = 0.0001”, is what Nawata reported in Table 1 on the RMR row, P Value < 0.0001. (Nawata’s data were slightly different.) This was a test of the hypothesis that the three group means are the same: Ho : 1 2 3 To get Nawata’s Adjusted P Value (for LBM), the last column of Table 1, we use an ANCOVA. Stata 10: Linear models and related ANOVA/MANOVA Analysis of variance and covariance Dependent variable: rmr Model: group lbm Model variables: Categorical except the following continuous variables: lbm OK anova rmr group lbm, continuous(lbm) Chapter 5-4 (revision 16 May 2010) // Stata version 10 p. 4 which gives an error message: . anova rmr group no observations r(2000); lbm, continuous(lbm) partial Stata 11: Linear models and related ANOVA/MANOVA Analysis of variance and covariance Dependent variable: rmr Model: group lbm Model variables: Categorical except the following continuous variables: lbm OK anova rmr group lbm // Stata version 11 which gives an error message: . anova rmr group lbm group: may not use factor variable operators on string variables r(109); When you see the error message “no observations”, it usually means that you tried to use a string variable where a numeric variable was required. (We used “group” instead of “groupx”, which the oneway command allows but the anova command does not.) Stata 10: Going back to change this, Linear models and related ANOVA/MANOVA Analysis of variance and covariance Dependent variable: rmr Model: groupx lbm Model variables: Categorical except the following continuous variables: lbm OK anova rmr groupx lbm, continuous(lbm) // Stata version 10 Chapter 5-4 (revision 16 May 2010) p. 5 Number of obs = 34 Root MSE = 17.8777 R-squared = Adj R-squared = 0.4785 0.4264 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------Model | 8798.55426 3 2932.85142 9.18 0.0002 | groupx | 7734.4293 2 3867.21465 12.10 0.0001 lbm | 807.629752 1 807.629752 2.53 0.1224 | Residual | 9588.38691 30 319.612897 -----------+---------------------------------------------------Total | 18386.9412 33 557.180036 Stata 11: Going back to change this, we use our numeric variable for group, groupx, and we put a “c.” in front of our continuous variable, where the “c.” informs Stata it is a continuous variable. Linear models and related ANOVA/MANOVA Analysis of variance and covariance Dependent variable: rmr Model: groupx c.lbm OK anova rmr groupx c.lbm // Stata version 11 Number of obs = Root MSE 34 R-squared = 17.8777 = 0.4785 Adj R-squared = 0.4264 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------Model | 8798.55426 3 2932.85142 9.18 0.0002 | groupx | 7734.4293 2 3867.21465 12.10 0.0001 lbm | 807.629752 1 807.629752 2.53 0.1224 | Residual | 9588.38691 30 319.612897 -----------+---------------------------------------------------Total | 18386.9412 33 557.180036 We use the p value for the groupx row, which is “p=0.0001”, in approximate agreement with Nawata’s p=0.0008 (differing only because the datasets differ slightly). In this model, we tested the null hypothesis that the three group means are the same: Ho : 1 2 3 versus H A : 1 2 3 after adjustment for LBM (after controlling for LBM). In other words, we actually tested Ho : adjusted 1 adjusted 2 adjusted 3 versus H A : adjusted 1 adjusted 2 adjusted 3 Chapter 5-4 (revision 16 May 2010) p. 6 Since p=.0001 < 0.05, we reject H0 and accept HA , concluding that at least two of the group’s differ in their mean value. Aside: Review of Hypothesis Testing Recall, that the p value is the probability of obtaining the result that we did in our sample if the null hypothesis H0 is true. If H0 is true, then we observed at least two means that were different from each other just due to sampling variation. It can be attributed to sampling variation, since under H0, no differences in means existed in the population we sampled from, so the sample means differ simply because “they just happen to come out that way” by the process of taking a sample (by chance). H0, then, can be viewed as the mathematical expression consistent with sampling variation, which is what we want to eliminate as an explanation for the observed effect. If this probability is small, p < 0.05, we conclude that sampling variation was too unlikely to be an explanation for the sample result. We then accept the opposite, the alternative hypothesis HA, that there is a difference somewhere among the means in the population we sampled from. We have to always keep in mind the dysjunctive syllogism of research (Ch 5-2, p.1) when accepting HA, however, recognizing that the p value cannot rule out bias or confounding by other confounders as alternative explanations for the observed effect. Chapter 5-4 (revision 16 May 2010) p. 7 The same results could be derived using a linear regression model. Stata 10: We can get the anova procedure to show us the regression model that matches the ANOVA, using the regress option. Linear models and related ANOVA/MANOVA Analysis of variance and covariance Model tab: Dependent variable: rmr Model: groupx lbm Model variables: Categorical except the following continuous variables: lbm Reporting tab: Display anova and regression table OK anova rmr groupx lbm, continuous(lbm) regress anova // Stata 10 Source | SS df MS -------------+-----------------------------Model | 8798.55426 3 2932.85142 Residual | 9588.38691 30 319.612897 -------------+-----------------------------Total | 18386.9412 33 557.180036 Number of obs F( 3, 30) Prob > F R-squared Adj R-squared Root MSE = = = = = = 34 9.18 0.0002 0.4785 0.4264 17.878 -----------------------------------------------------------------------------rmr Coef. Std. Err. t P>|t| [95% Conf. Interval] -----------------------------------------------------------------------------_cons 142.1363 16.17229 8.79 0.000 109.1081 175.1645 groupx 1 -29.91667 7.305324 -4.10 0.000 -44.83613 -14.9972 2 -33.32727 7.660557 -4.35 0.000 -48.97222 -17.68233 3 (dropped) lbm .5454562 .3431357 1.59 0.122 -.1553203 1.246233 -----------------------------------------------------------------------------Number of obs = 34 Root MSE = 17.8777 R-squared = Adj R-squared = 0.4785 0.4264 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------Model | 8798.55426 3 2932.85142 9.18 0.0002 | groupx | 7734.4293 2 3867.21465 12.10 0.0001 lbm | 807.629752 1 807.629752 2.53 0.1224 | Residual | 9588.38691 30 319.612897 -----------+---------------------------------------------------Total | 18386.9412 33 557.180036 Notice it reports that group 3 was dropped. What was actually dropped was the indicator variable (or dummy variable) for group 3. Group 3 became the referent and was absorbed into the intercept term (_cons). The anova command creates indicator variables for the categorical variables, uses them in the calculation, but does not add them to the variables in the data browser. Chapter 5-4 (revision 16 May 2010) p. 8 Stata 11: We can follow the anova procedure with a regress command to show us the regression model that matches the ANOVA, anova rmr groupx c.lbm // Stata version 11 regress . anova rmr groupx c.lbm // Stata version 11 Number of obs = 34 Root MSE = 17.8777 R-squared = Adj R-squared = 0.4785 0.4264 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------Model | 8798.55426 3 2932.85142 9.18 0.0002 | groupx | 7734.4293 2 3867.21465 12.10 0.0001 lbm | 807.629752 1 807.629752 2.53 0.1224 | Residual | 9588.38691 30 319.612897 -----------+---------------------------------------------------Total | 18386.9412 33 557.180036 . regress Source | SS df MS -------------+-----------------------------Model | 8798.55426 3 2932.85142 Residual | 9588.38691 30 319.612897 -------------+-----------------------------Total | 18386.9412 33 557.180036 Number of obs F( 3, 30) Prob > F R-squared Adj R-squared Root MSE = = = = = = 34 9.18 0.0002 0.4785 0.4264 17.878 -----------------------------------------------------------------------------rmr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------groupx | 2 | -3.410606 7.654802 -0.45 0.659 -19.0438 12.22258 3 | 29.91667 7.305324 4.10 0.000 14.9972 44.83613 | lbm | .5454562 .3431357 1.59 0.122 -.1553203 1.246233 _cons | 112.2196 15.87451 7.07 0.000 79.79954 144.6397 ------------------------------------------------------------------------------ Notice it reports that group 1 was dropped. What was actually dropped was the indicator variable (or dummy variable) for group 1. Group 1 became the referent and was absorbed into the intercept term (_cons). The anova command creates indicator variables for the categorical variables, uses them in the calculation, but does not add them to the variables in the data browser. Notice that the anova command provides a single p value to test the equality of the three groups, whereas the regression model uses two p values (one for each of two group indicators). We will see how to combine these into a single p value below. Chapter 5-4 (revision 16 May 2010) p. 9 When modeling a categorical predictor variable, one less indicator than the number of categories is used in the model (see box). ______________________________________________________________________________ Why One Indicator Variable Must Be Left Out Regression models use a data matrix which includes a column of 1’s for the constant term. _cons group1 group2 group3 group 1 1 0 0 1 1 0 1 0 2 1 0 0 1 3 The combination of group1, group2, and group3 indicator variables predict the constant variable exactly. This is, we have perfect colinearity, where Constant = group1 + group2 + group3 With all three indicator variables in the model, it is impossible for the constant term to have an “independent” contribution to the outcome when holding constant all of the indicator terms—the constant term is completely “dependent” upon the indicator terms. Leaving one indicator variable out resolves this problem. That indicator becomes part of the constant (the constant being the combined referent group for all predictor variables in the model). ______________________________________________________________________________ Let’s create the indicator variables for group and verify this is what the anova command did. Data Create or change variables Other variable creation commands Create indicator variables Variable to tabulate: groupx New variables stub: group_ OK tabulate groupx, generate(group_) Notice in the Variables window that three new variables were added: group_1, group_2, and group_3. We can verify these new variables are indicator variables by looking at them in the data browser, or by listing them: Chapter 5-4 (revision 16 May 2010) p. 10 Data Describe data List data Main tab: Variables: groupx group_1-group_3 Options tab: Display numeric codes rather than label values OK list groupx group_1-group_3, nolabel +--------------------------------------+ | groupx group_1 group_2 group_3 | |--------------------------------------| 1. | 1 1 0 0 | 2. | 1 1 0 0 | 3. | 1 1 0 0 | ... 10. | 1 1 0 0 | |--------------------------------------| 11. | 2 0 1 0 | 12. | 2 0 1 0 | 13. | 2 0 1 0 | ... 20. | 2 0 1 0 | |--------------------------------------| 21. | 3 0 0 1 | 22. | 3 0 0 1 | 23. | 3 0 0 1 | ... 32. | 3 0 0 1 | +--------------------------------------+ You can use the space bar (display next page) or enter key (display next line) to scroll through the output. Chapter 5-4 (revision 16 May 2010) p. 11 Stata 10: Now, to duplicate the above model, with the indicator for group 3 left out: Source | SS df MS -------------+-----------------------------Model | 8798.55426 3 2932.85142 Residual | 9588.38691 30 319.612897 -------------+-----------------------------Total | 18386.9412 33 557.180036 Number of obs F( 3, 30) Prob > F R-squared Adj R-squared Root MSE = = = = = = 34 9.18 0.0002 0.4785 0.4264 17.878 -----------------------------------------------------------------------------rmr Coef. Std. Err. t P>|t| [95% Conf. Interval] -----------------------------------------------------------------------------_cons 142.1363 16.17229 8.79 0.000 109.1081 175.1645 groupx 1 -29.91667 7.305324 -4.10 0.000 -44.83613 -14.9972 2 -33.32727 7.660557 -4.35 0.000 -48.97222 -17.68233 3 (dropped) lbm .5454562 .3431357 1.59 0.122 -.1553203 1.246233 ----------------------------------------------------------------------------- we use, Statistics Linear models and related Linear regression Dependent variable: rmr Independent variables: group_1 group_2 lbm OK regress rmr group_1 group_2 lbm // Stata version 10 Source | SS df MS -------------+-----------------------------Model | 8798.55426 3 2932.85142 Residual | 9588.38691 30 319.612897 -------------+-----------------------------Total | 18386.9412 33 557.180036 Number of obs F( 3, 30) Prob > F R-squared Adj R-squared Root MSE = = = = = = 34 9.18 0.0002 0.4785 0.4264 17.878 -----------------------------------------------------------------------------rmr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------group_1 | -29.91667 7.305324 -4.10 0.000 -44.83613 -14.9972 group_2 | -33.32727 7.660557 -4.35 0.000 -48.97222 -17.68233 lbm | .5454562 .3431357 1.59 0.122 -.1553203 1.246233 _cons | 142.1363 16.17229 8.79 0.000 109.1081 175.1645 ------------------------------------------------------------------------------ Chapter 5-4 (revision 16 May 2010) p. 12 Stata 11: Now, to duplicate the above model, with the indicator for group 3 left out: Source | SS df MS -------------+-----------------------------Model | 8798.55426 3 2932.85142 Residual | 9588.38691 30 319.612897 -------------+-----------------------------Total | 18386.9412 33 557.180036 Number of obs F( 3, 30) Prob > F R-squared Adj R-squared Root MSE = = = = = = 34 9.18 0.0002 0.4785 0.4264 17.878 -----------------------------------------------------------------------------rmr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------groupx | 2 | -3.410606 7.654802 -0.45 0.659 -19.0438 12.22258 3 | 29.91667 7.305324 4.10 0.000 14.9972 44.83613 | lbm | .5454562 .3431357 1.59 0.122 -.1553203 1.246233 _cons | 112.2196 15.87451 7.07 0.000 79.79954 144.6397 ------------------------------------------------------------------------------ we use, Statistics Linear models and related Linear regression Dependent variable: rmr Independent variables: group_2 group_3 lbm OK regress rmr group_2 group_3 lbm // Stata version 11 Source | SS df MS -------------+-----------------------------Model | 8798.55426 3 2932.85142 Residual | 9588.38691 30 319.612897 -------------+-----------------------------Total | 18386.9412 33 557.180036 Number of obs F( 3, 30) Prob > F R-squared Adj R-squared Root MSE = = = = = = 34 9.18 0.0002 0.4785 0.4264 17.878 -----------------------------------------------------------------------------rmr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------group_2 | -3.410606 7.654802 -0.45 0.659 -19.0438 12.22258 group_3 | 29.91667 7.305324 4.10 0.000 14.9972 44.83613 lbm | .5454562 .3431357 1.59 0.122 -.1553203 1.246233 _cons | 112.2196 15.87451 7.07 0.000 79.79954 144.6397 ------------------------------------------------------------------------------ If we used this regression approach, what p value would we use to report in Table 1 (similar to Nawata’s single p value)? Chapter 5-4 (revision 16 May 2010) p. 13 The anova command provided such a single p value (p=0.0001). Number of obs = 34 Root MSE = 17.8777 R-squared = Adj R-squared = 0.4785 0.4264 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------Model | 8798.55426 3 2932.85142 9.18 0.0002 | groupx | 7734.4293 2 3867.21465 12.10 0.0001 lbm | 807.629752 1 807.629752 2.53 0.1224 | Residual | 9588.38691 30 319.612897 -----------+---------------------------------------------------Total | 18386.9412 33 557.180036 We can get this from a post-estimation approach, after running the linear regression command as well. To test the hypothesis, Ho : adjusted 1 adjusted 2 adjusted 3 we need to constuct a command that looks like: Stata 10: test (_cons+igroup1)=(_cons+igroup2)=_cons Statistics Postestimation Tests Test linear hypotheses Specification 1: tests these coefficients: _cons+group_1 = _cons+group_2 = _cons OK test _cons+group_1 = _cons+group_2 = _cons // Stata Version 10 ( 1) ( 2) group_1 - group_2 = 0 group_1 = 0 F( 2, 30) = Prob > F = Chapter 5-4 (revision 16 May 2010) 12.10 0.0001 p. 14 Stata 11: test (_cons+igroup2)=(_cons+igroup3)=_cons Statistics Postestimation Tests Test linear hypotheses Specification 1: tests these coefficients: _cons+group_2 = _cons+group_3 = _cons OK test _cons+group_2 = _cons+group_3 = _cons // Stata Version 11 . test (_cons+group_2 = _cons+group_3 = _cons) ( 1) ( 2) group_2 - group_3 = 0 group_2 = 0 F( 2, 30) = Prob > F = 12.10 0.0001 We see that the F statistic and p value match those in the above ANOVA model output. Chapter 5-4 (revision 16 May 2010) p. 15 Reporting Adjusted Group Means with Confidence Intervals Reporting adjusted group means is rather clever, because you are able to show how groups differ when controlling for potential confounders. Exercise: Look at the article by Kalmijn et al (2002), which is another article that reports adjusted group means. 1) Notice in their Statistical Analysis Section, they described the use of dummy variable coding in their linear regression model as: “…In addition, subjects who smoked >0-20 pack-years and >20 pack-years were compared to never smokers by including two dummy variables in the regression model. Confounders that were taken into account were age (continuous), sex, education (four dummy categories), body mass index, total cholesterol level, and systolic blood pressure…” 2) Notice in their Table 2 that they are reporting adjusted mean scores. To get the adjusted group means following a regression, we might try: Stata 10: Statistics Postestimation Adjusted means and proportions Main tab: Compute and display predictions for each level of variables: groupx Options tab: Prediction: linear prediction confidence or prediction intervals OK adjust, by(groupx) xb ci Stata 11: The adjust command is no longer listed on the menu, but can be run as a command, adjust, by(groupx) xb ci Chapter 5-4 (revision 16 May 2010) p. 16 ------------------------------------------------Dependent variable: rmr Command: regress Variables left as is: lbm, group_1, group_2 ---------------------------------------------------------------------------------------------groupx | xb lb ub ----------+----------------------------------a | 136.083 [125.543 146.623] b | 132.7 [121.154 144.246] c | 166.5 [155.96 177.04] ---------------------------------------------Key: xb = Linear Prediction [lb , ub] = [95% Confidence Interval] Notice, however, that these means are identical to the means from the above one-way ANOVA, which did not adjust for lbm, since lbm was not in that one-way ANOVA model. | Summary of rmr group | Mean Std. Dev. Freq. ------------+-----------------------------------a | 136.08333 16.940851 12 b | 132.7 19.032428 10 c | 166.5 19.019129 12 ------------+-----------------------------------Total | 145.82353 23.604661 34 Analysis of Variance Source SS df MS F Prob > F -----------------------------------------------------------------------Between groups 7990.92451 2 3995.46225 11.91 0.0001 Within groups 10396.0167 31 335.355376 -----------------------------------------------------------------------Total 18386.9412 33 557.180036 Bartlett's test for equal variances: Chapter 5-4 (revision 16 May 2010) chi2(2) = 0.1787 Prob>chi2 = 0.915 p. 17 To get the adjusted mean, we need to hold the covariates constant at some value. Usually, researchers choose the covariate’s overall mean value. Stata 10: Statistics Postestimation Adjusted means and proportions Main tab: Compute and display predictions for each level of variables: groupx Variables to be set to there overall mean value: lbm Options tab: Prediction: linear prediction confidence or prediction intervals OK adjust lbm, by(groupx) xb ci // Stata version 10 Stata 11: Just running the command (not available on the menu), adjust lbm, by(groupx) xb ci // Stata version 10 -------------------------------------------------------------------------------Dependent variable: rmr Command: regress Variables left as is: group_1, group_2 Covariate set to mean: lbm = 44.088235 ----------------------------------------------------------------------------------------------------------------------------groupx | xb lb ub ----------+----------------------------------a | 136.268 [125.725 146.81] b | 132.857 [121.31 144.405] c | 166.184 [155.637 176.732] ---------------------------------------------Key: xb = Linear Prediction [lb , ub] = [95% Confidence Interval] These adjusted means are slightly different from those computed above: ---------------------------------------------groupx | xb lb ub ----------+----------------------------------a | 136.083 [125.543 146.623] b | 132.7 [121.154 144.246] c | 166.5 [155.96 177.04] ---------------------------------------------Key: xb = Linear Prediction [lb , ub] = [95% Confidence Interval] Chapter 5-4 (revision 16 May 2010) p. 18 The adjusted mean is simply the predicted value from the regression equation. Copying the regression equation from above, -----------------------------------------------------------------------------rmr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------group_1 | -29.91667 7.305324 -4.10 0.000 -44.83613 -14.9972 group_2 | -33.32727 7.660557 -4.35 0.000 -48.97222 -17.68233 lbm | .5454562 .3431357 1.59 0.122 -.1553203 1.246233 _cons | 142.1363 16.17229 8.79 0.000 109.1081 175.1645 ------------------------------------------------------------------------------ and computing the adjusted group mean for group2, display 142.1363 -29.91667*0 -33.32727*1 + 0.5454562*44.088235 132.85723 which agrees with the adjusted group mean computed above. -------------------------------------------------------------------------------Dependent variable: rmr Command: regress Variables left as is: group_1, group_2 Covariate set to mean: lbm = 44.088235 ----------------------------------------------------------------------------------------------------------------------------groupx | xb lb ub ----------+----------------------------------a | 136.268 [125.725 146.81] b | 132.857 [121.31 144.405] c | 166.184 [155.637 176.732] ---------------------------------------------Key: xb = Linear Prediction [lb , ub] = [95% Confidence Interval] What did this model look like? To see how well the model fitted the data, we can overlay the fitted lines onto the original data. First we need a variable containing the predicted values: Statistics Postestimation Predictions, residuals, etc. New variable name: rmr_hat Produce: Fitted values (xb) OK predict rmr_hat, xb Chapter 5-4 (revision 16 May 2010) p. 19 We now want to overlay three scatterplots (one for each group) and three prediction line graphs (one for each group). We could use the graphics menu, which would require setting the the overlaying of six plots. Alternatively, we can copy the following into the do-file editor, and run it (also in Ch 5-4.do) 100 120 140 RMR 160 180 200 sort lbm // always sort on x variable for line graphs #delimit ; twoway (scatter rmr lbm if groupx==1 , msymbol(square) mfcolor(green) mlcolor(green) msize(large)) (scatter rmr lbm if groupx==2 , msymbol(circle) mfcolor(blue) mlcolor(blue) msize(large)) (scatter rmr lbm if groupx==3 , msymbol(triangle) mfcolor(red) mlcolor(red) msize(large)) (line rmr_hat lbm if groupx==1 , clpattern(solid) clwidth(thick) clcolor(green)) (line rmr_hat lbm if groupx==2 , clpattern(solid) clwidth(thick) clcolor(blue)) (line rmr_hat lbm if groupx==3 , clpattern(solid) clwidth(thick) clcolor(red)) , legend(off) ytitle(RMR) xtitle(LBM) ; #delimit cr 30 40 50 LBM 60 70 Notice, the regression procedure assumes all lines are parallel, just shifted vertically for each group, unless interaction terms are added to the model. Chapter 5-4 (revision 16 May 2010) p. 20 Look at Figure 1 in the Nawata article. Our model did not fit the data well at all, compared to what Nawata noticed about these data. (The red line should be slanting downward.) These data are said to interact. That is, the best fitting lines for each group are not parallel. It would be better to let each group have its own slope, as well as its own intercept. We need to first create some interaction variables. Data Create or change variables Create new variable (Version 10): New variable name : lbmXgroup1 (Version 11): Variable name : lbmXgroup1 (Version 10): Contents of new variable: lbm*group_1 (Version 11): Contents of variable: lbm*group_1 OK generate lbmXgroup1 = lbm*group_1 and modifying the command for group 2, generate lbmXgroup2 = lbm*group_2 Fitting a new model with these interaction terms added, Statistics Linear models and related Linear regression Dependent variable: rmr Independent variables: igroup1 igroup2 lbm lbmXgroup1 lbmXgroup2 OK regress rmr group_1 group_2 lbm lbmXgroup1 lbmXgroup2 Source | SS df MS -------------+-----------------------------Model | 10820.5192 5 2164.10384 Residual | 7566.42198 28 270.229356 -------------+-----------------------------Total | 18386.9412 33 557.180036 Number of obs F( 5, 28) Prob > F R-squared Adj R-squared Root MSE = = = = = = 34 8.01 0.0001 0.5885 0.5150 16.439 -----------------------------------------------------------------------------rmr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------group_1 | -122.4433 34.5581 -3.54 0.001 -193.2324 -51.65425 group_2 | -100.1493 39.63594 -2.53 0.017 -181.3399 -18.95877 lbm | -.8470919 .6166423 -1.37 0.180 -2.110226 .4160425 lbmXgroup1 | 2.085718 .7644016 2.73 0.011 .5199118 3.651523 lbmXgroup2 | 1.498063 .8819738 1.70 0.100 -.3085782 3.304704 _cons | 204.3368 27.94916 7.31 0.000 147.0855 261.588 ------------------------------------------------------------------------------ Chapter 5-4 (revision 16 May 2010) p. 21 The interaction term for group 1 is significant, and so should be kept in the model. For group 2, it is approaching significant, so it could be kept in or not. Keeping this second term in would not improve the fit by much. Keeping both interaction terms in, let’s see what happens graphically. First saving the new predicted values, Statistics Postestimation Predictions, residuals, etc. New variable name: rmr_hat2 Produce: Fitted values (xb) OK predict rmr_hat2, xb and repeating graph, using the new predicted values sort lbm // always sort on x variable for line graphs #delimit ; twoway (scatter rmr lbm if groupx==1 , msymbol(square) mfcolor(green) mlcolor(green) msize(large)) (scatter rmr lbm if groupx==2 , msymbol(circle) mfcolor(blue) mlcolor(blue) msize(large)) (scatter rmr lbm if groupx==3 , msymbol(triangle) mfcolor(red) mlcolor(red) msize(large)) (line rmr_hat2 lbm if groupx==1 , clpattern(solid) clwidth(thick) clcolor(green)) (line rmr_hat2 lbm if groupx==2 , clpattern(solid) clwidth(thick) clcolor(blue)) (line rmr_hat2 lbm if groupx==3 , clpattern(solid) clwidth(thick) clcolor(red)) , legend(off) ytitle(RMR) xtitle(LBM) ; #delimit cr Chapter 5-4 (revision 16 May 2010) p. 22 100 120 140 RMR 160 180 200 This looks much more like Nawata’s model, except for the one outlying blue point added to the dataset for illustration. 30 40 50 LBM 60 70 Clearly, now, the adjusted mean difference depends very much on what value we hold lbm constant at, since the groups differ more at low values of LBM than at high values. Stata 10: Statistics Postestimation Adjusted means and proportions Compute and display predictions for each level of variables: groupx Variables to be set to there overall mean value: lbm OK adjust lbm, by(groupx) xb ci Stata 11: adjust lbm, by(groupx) xb ci Chapter 5-4 (revision 16 May 2010) p. 23 -------------------------------------------------------------------------------Dependent variable: rmr Command: regress Variables left as is: group_1, group_2, lbmXgroup1, lbmXgroup2 Covariate set to mean: lbm = 44.088235 ----------------------------------------------------------------------------------------------------------------------------groupx | xb lb ub ----------+----------------------------------a | 135.797 [126.067 145.527] b | 132.456 [121.801 143.11] c | 166.99 [157.242 176.738] ---------------------------------------------Key: xb = Linear Prediction [lb , ub] = [95% Confidence Interval] Compared to the earlier model, without the interaction terms, -------------------------------------------------------------------------------Dependent variable: rmr Command: regress Variables left as is: group_1, group_2 Covariate set to mean: lbm = 44.088235 ----------------------------------------------------------------------------------------------------------------------------groupx | xb lb ub ----------+----------------------------------a | 136.268 [125.725 146.81] b | 132.857 [121.31 144.405] c | 166.184 [155.637 176.732] ---------------------------------------------Key: xb = Linear Prediction [lb , ub] = [95% Confidence Interval] it did not change much. This is because regression lines always passes through the group means (X-variable mean and Yvariable mean) and so the slope pivots around it. At the mean, then, the slope is of no consequence. If you wanted to know the adjusted mean values for a point far away from the mean however, it would have a big effect. Stata 10: Statistics Postestimation Adjusted means and proportions Compute and display predictions for each level of variables: groupx Variables to be set to there overall mean value: <leave blank this time> Variables to be set to a specified value: Variable: lbm Value: 57 OK adjust lbm=57, by(groupx) xb ci Chapter 5-4 (revision 16 May 2010) p. 24 Stata 11: adjust lbm=57, by(groupx) xb ci -------------------------------------------------------------------------------Dependent variable: rmr Command: regress Variables left as is: group_1, group_2, lbmXgroup1, lbmXgroup2 Covariate set to value: lbm = 57 ----------------------------------------------------------------------------------------------------------------------------groupx | xb lb ub ----------+----------------------------------a | 124.859 [105.505 144.214] b | 121.518 [101.735 141.302] c | 156.053 [137.69 174.415] ---------------------------------------------Key: xb = Linear Prediction [lb , ub] = [95% Confidence Interval] Compared to the earlier model, without the interaction terms, -------------------------------------------------------------------------------Dependent variable: rmr Command: regress Variables left as is: group_1, group_2 Covariate set to mean: lbm = 44.088235 ----------------------------------------------------------------------------------------------------------------------------groupx | xb lb ub ----------+----------------------------------a | 136.268 [125.725 146.81] b | 132.857 [121.31 144.405] c | 166.184 [155.637 176.732] ---------------------------------------------Key: xb = Linear Prediction [lb , ub] = [95% Confidence Interval] is very different. Stata’s New Categorical Variable Facility With Version 11, Stata has a much easier way to work with categorical variables. You simply put an “i.” in front of each categorical variable, and Stata will create indicator variables behind the scenes to use in the regression model. The command, regress rmr group_2 group_3 lbm can now be specified as, regress rmr i.groupx lbm Chapter 5-4 (revision 16 May 2010) // Stata version 11 p. 25 Source | SS df MS -------------+-----------------------------Model | 8798.55426 3 2932.85142 Residual | 9588.38691 30 319.612897 -------------+-----------------------------Total | 18386.9412 33 557.180036 Number of obs F( 3, 30) Prob > F R-squared Adj R-squared Root MSE = = = = = = 34 9.18 0.0002 0.4785 0.4264 17.878 -----------------------------------------------------------------------------rmr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------groupx | 2 | -3.410606 7.654802 -0.45 0.659 -19.0438 12.22258 3 | 29.91667 7.305324 4.10 0.000 14.9972 44.83613 | lbm | .5454562 .3431357 1.59 0.122 -.1553203 1.246233 _cons | 112.2196 15.87451 7.07 0.000 79.79954 144.6397 ------------------------------------------------------------------------------ By default, it uses the first category of groupx as the referent group. To specify the second category as the referent group, we can use, “ib2.”, which the “b” stands for “base level”, which is another name for referent category, regress rmr ib2.groupx lbm // Stata version 11 Source | SS df MS -------------+-----------------------------Model | 8798.55426 3 2932.85142 Residual | 9588.38691 30 319.612897 -------------+-----------------------------Total | 18386.9412 33 557.180036 Number of obs F( 3, 30) Prob > F R-squared Adj R-squared Root MSE = = = = = = 34 9.18 0.0002 0.4785 0.4264 17.878 -----------------------------------------------------------------------------rmr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------groupx | 1 | 3.410606 7.654802 0.45 0.659 -12.22258 19.0438 3 | 33.32727 7.660557 4.35 0.000 17.68233 48.97222 | lbm | .5454562 .3431357 1.59 0.122 -.1553203 1.246233 _cons | 108.809 16.05747 6.78 0.000 76.01528 141.6028 ------------------------------------------------------------------------------ o We see that category 2 of groupx was left out this time, which turns it into the referent category. To specify the last category as the referent group, we can use, “ib(last).”, regress rmr ib(last).groupx lbm Chapter 5-4 (revision 16 May 2010) // Stata version 11 p. 26 Source | SS df MS -------------+-----------------------------Model | 8798.55426 3 2932.85142 Residual | 9588.38691 30 319.612897 -------------+-----------------------------Total | 18386.9412 33 557.180036 Number of obs F( 3, 30) Prob > F R-squared Adj R-squared Root MSE = = = = = = 34 9.18 0.0002 0.4785 0.4264 17.878 -----------------------------------------------------------------------------rmr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------groupx | 1 | -29.91667 7.305324 -4.10 0.000 -44.83613 -14.9972 2 | -33.32727 7.660557 -4.35 0.000 -48.97222 -17.68233 | lbm | .5454562 .3431357 1.59 0.122 -.1553203 1.246233 _cons | 142.1363 16.17229 8.79 0.000 109.1081 175.1645 ------------------------------------------------------------------------------ This left group 3, the last category, out of the model to serve as the referent category. To get the interaction terms, we use the “#” operator. To get the model specified by this command, regress rmr group_1 group_2 lbm lbmXgroup1 lbmXgroup2 we use, regress rmr ib(last).groupx lbm i.groupx#c.lbm // Stata version 11 Source | SS df MS -------------+-----------------------------Model | 10820.5192 5 2164.10384 Residual | 7566.42198 28 270.229356 -------------+-----------------------------Total | 18386.9412 33 557.180036 Number of obs F( 5, 28) Prob > F R-squared Adj R-squared Root MSE = = = = = = 34 8.01 0.0001 0.5885 0.5150 16.439 -----------------------------------------------------------------------------rmr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------groupx | 1 | -122.4433 34.5581 -3.54 0.001 -193.2324 -51.65425 2 | -100.1493 39.63594 -2.53 0.017 -181.3399 -18.95877 | lbm | -.8470919 .6166423 -1.37 0.180 -2.110226 .4160425 | groupx#c.lbm | 1 | 2.085718 .7644016 2.73 0.011 .5199118 3.651523 2 | 1.498063 .8819738 1.70 0.100 -.3085782 3.304704 | _cons | 204.3368 27.94916 7.31 0.000 147.0855 261.588 ------------------------------------------------------------------------------ Chapter 5-4 (revision 16 May 2010) p. 27 How to Interpret the Interaction Term The file Ch5-4.dta contains the following practice data. This is just for illustration, so we will ignore that the sample size is too small for three predictor variables. 1. 2. 3. 4. 5. 6. 7. 8. +---------------------------+ | group x y xgroup | |---------------------------| | 0 10 200 0 | | 0 12 201 0 | | 0 14 202 0 | | 0 17 203 0 | |---------------------------| | 1 9 320 9 | | 1 15 350 15 | | 1 20 400 20 | | 1 24 475 24 | +---------------------------+ where group = group variable (0 or 1) x = a continuous predictor variable y = a continuous outcome variable xgroup = x by group interaction, created by multiplying x by group. We will see why this strange interaction variable produces something meaningful, and learn how to interpret this interaction term. Chapter 5-4 (revision 16 May 2010) p. 28 The regression model without the interaction term looks like: 0 50 100 150 200 250 300 350 400 450 500 Y -----------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | 8.356792 2.016724 4.14 0.009 3.172639 13.54095 group | 153.412 19.17877 8.00 0.000 104.1114 202.7126 _cons | 90.7725 29.48489 3.08 0.028 14.97919 166.5658 ------------------------------------------------------------------------------ 10 15 20 25 X With just these two “main effect” terms (main effects as opposed to interaction terms), the model is constrained to fit the data as best it can with the limitation that the slopes are equal (lines parallel) and all the group variable can do is shift the line up or down (change the intercept). -----------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | 8.356792 2.016724 4.14 0.009 3.172639 13.54095 group | 153.412 19.17877 8.00 0.000 104.1114 202.7126 _cons | 90.7725 29.48489 3.08 0.028 14.97919 166.5658 ------------------------------------------------------------------------------ Y = a + b1(X) + b2(group) = 90.77 + 8.36 (X) + 153.41(group) from the regression table Y = a + b1(X) + b2(group) = 90.77 + 8.36 (X) + 153.41 (0) = 90.77 + 8.36 (X) Y = a + b1(X) + b2(group) = 90.77 + 8.36 (X) + 153.41 (1) = 90.77 + 8.36 (X) + 153.41 for group 0 for group 1 (shifted up by 153.41) while for either group the slope for X remains 8.36. Chapter 5-4 (revision 16 May 2010) p. 29 With the interaction term, the model looks like: 0 50 100 150 200 250 300 350 400 450 500 Y -----------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .4299065 3.16022 0.14 0.898 -8.34427 9.204083 group | 19.77166 49.9967 0.40 0.713 -119.0414 158.5847 xgroup | 9.609776 3.479546 2.76 0.051 -.0509926 19.27054 _cons | 195.8037 42.66296 4.59 0.010 77.35236 314.2551 ------------------------------------------------------------------------------ 10 15 20 25 X With both the two “main effect” terms and the interaction term, the model is permitted to have different slopes. Now the distance between the two lines, or the group difference, depends on the covariate X. -----------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .4299065 3.16022 0.14 0.898 -8.34427 9.204083 group | 19.77166 49.9967 0.40 0.713 -119.0414 158.5847 xgroup | 9.609776 3.479546 2.76 0.051 -.0509926 19.27054 _cons | 195.8037 42.66296 4.59 0.010 77.35236 314.2551 ------------------------------------------------------------------------------ Y = a + b1(X) + b2(group) + b3(X group) = 195.8 + 0.4(X) +19.8(group) + 9.6 (X group) = 195.8 + 0.4 (X) +19.8 (0) + 9.6 (0) = 195.8 + 0.4 (X) for group 0 Y = a + b1(X) + b2(group) + b3(X group) = 195.8 + 0.4 (X) +19.8 (group) + 9.6(X group) = 195.8 + 0.4 (X) +19.8 (1) + 9.6 (X 1) = 195.8 + 0.4 (X) +19.8+ 9.6 (X) = 195.8 + (0.4+9.6)(X) +19.8 for group 1 Chapter 5-4 (revision 16 May 2010) p. 30 For the reference group, group 1, a simple regression line is fitted. For group 1, the line is shifted up or down by the correct amount at the point X=0 (it extrapolated in this case) and then adds an increment, or decrement, to the slope for the covariate X. Thus the interaction term is how much needs to be added or substracted from the reference group slope to modify it to be the correct slope for group 1. Notice in the main effects model, the covariate X is significant. -----------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | 8.356792 2.016724 4.14 0.009 3.172639 13.54095 group | 153.412 19.17877 8.00 0.000 104.1114 202.7126 _cons | 90.7725 29.48489 3.08 0.028 14.97919 166.5658 ------------------------------------------------------------------------------ However, in the model with the interaction term, the covariate X is no longer significant. -----------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .4299065 3.16022 0.14 0.898 -8.34427 9.204083 group | 19.77166 49.9967 0.40 0.713 -119.0414 158.5847 xgroup | 9.609776 3.479546 2.76 0.051 -.0509926 19.27054 _cons | 195.8037 42.66296 4.59 0.010 77.35236 314.2551 ------------------------------------------------------------------------------ Should we drop the X variable from the model, then, because it is not significant? Let’s see what happens if we do. 0 50 100 150 200 250 300 350 400 450 500 Y -----------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------group | 14.0754 24.49134 0.57 0.590 -48.88159 77.03239 xgroup | 10.03968 1.305393 7.69 0.001 6.684064 13.3953 _cons | 201.5 7.326498 27.50 0.000 182.6666 220.3334 ------------------------------------------------------------------------------ 10 15 20 25 X Chapter 5-4 (revision 16 May 2010) p. 31 -----------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------group | 14.0754 24.49134 0.57 0.590 -48.88159 77.03239 xgroup | 10.03968 1.305393 7.69 0.001 6.684064 13.3953 _cons | 201.5 7.326498 27.50 0.000 182.6666 220.3334 ------------------------------------------------------------------------------ Y = a + b1(group) + b2(X group) = 201.5 +14.1(group) + 10.0(X group) = 201.5 +14.1(0) + 10.0(X 0) = 201.5 for group 0 Y = a + b1(group) + b2(X group) = 201.5 +14.1(group) + 10.0(X group) = 201.5 +14.1(1) + 10.0(X 1) = 201.5 +14.1 + 10.0(X) for group 1 The model seems to do something reasonable for group 1, providing a correct single slope value. However, it forces group 0 to just be a constant. That isn’t too bad for this example, since the slope of X was so close to 0 for group 0 anyway. In a dataset where the slope of the referent group is not close to 0, however, this will provide a terrible fit. Chapter 5-4 (revision 16 May 2010) p. 32 This example illustrates the following rule: Rule About Interaction Terms When an interaction term is included in the model, all of the main effects for the variables that are multiplied to produce the interaction term must be included in the model as well, whether significant or not. If a greater than 2-way interaction term is included, all of the lower-order interaction terms as well as the main effects must remain in the model. (If a three-way interaction is included, then, the model must contain: A , B , C , A*B , A*C , B*C , A*B*C. A formal description of why this rule is needed in found in Chapter 5-8, p.21. How to Report Models With Interaction Terms Interaction terms are difficult to discuss and would confuse nearly any reader if shown as a line in a table that shows all of the coefficients for the multivariable model. What researchers do, then, is state in the text that a significant interaction was found, along with a p value, but never discuss the coefficient associated with the interaction term. Instead, the researcher will go on to show a stratified analysis, stratifying by one of the variables composing the interaction term. This approach is particularly useful, since an interaction means the effect is different for the different levels of the covariate in the group x covariate interaction. Showing these individual subgroups, or strata, is the most informative way to present an interaction. Chapter 5-4 (revision 16 May 2010) p. 33 Exercise. Look at the Kalmijn et al (2002) paper. Under the heading Alcohol Consumption in the Results section (p.939), they state: “There was a significant interaction between sex and alcohol consumption in relation to speed (p = 0.008), indicating that, for women, the association between alcohol consumption and pscyhomotor speed was positive and linear (p-trend < 0.001), whereas for men it was absent (table 4).” Then, in Table 4, they show the results stratified by gender. They actually used alcohol as a ordinal scale variable in the model, rather than 5 indicator variables, to obtain the p-trend. So they did not use interaction terms like we have done in this chapter. The concept of reporting an interaction is the same, however. Linear trend tests are covered in Chapter 5-26 “Trend Tests.” References Chatterjee S, Hadi AS, Price B. (2000). Regression Analysis by Example. 3rd ed. New York, John Wiley and Sons. Hamilton LC. (2003). Statistics With Stata. Updated for Version 7. Belmont CA, Wadsworth Group/Thomson Learning. Kalmijn S, van Boxtel MPJ, Verschuren MWM, et al. (2002). Cigarette smoking and alcohol consumption in relation to cognitive performance in middle age. Am J Epidemiol 156(10):936-944. Nawata K, Sohmiya M, Kawaguchi M, et al. (2004). Increased resting metabolic rate in patients with type 2 diabetes mellitus accompanied by advanced diabetic nephropathy. Metabolism 53(11) Nov: 1395-1398. Chapter 5-4 (revision 16 May 2010) p. 34