SOME REMARKS ABOUT THE EXERCISE ON THE ANALYSIS OF COVARIANCE To begin, the collective performance on this exercise was not good, which suggests to me that I didn’t cover the ideas and procedures sufficiently well in class or that I confused you with the assignment. When every member of a good class performs below expectation, any good instructor looks inward. So I’ll keep this problem in mind as I evaluate what the grades on the assignment are telling me. To continue, let’s think about the method per se. In any problem of this sort, one is trying to dissect the effects of the covariate (the continuous, regression-style predictor) and the effects of the classification variable (the categorical predictors like population in this case) on the response variable. Now, the classic “analysis of covariance” was designed to test the null hypothesis that there is no effect of the categorical predictor after we adjust for the covariate’s effect on the response. This is the test of “adjusted means” but it presumes that the slope of the response variable on the covariate is the same for all levels of the categorical predictor (i.e.each population shows the same slope of response on covariate) and that the strength of the regression within each category is roughly the same. Remember the logic of these presumptions. If the regression is lousy or even nonexistent in some groups, the analysis of adjusted means is rather a silly thing to do. If the slopes are different in different groups, the test of the null hypothesis is ambiguous because the outcome can depend upon where along the covariate axis one performs the test. These are issues we discussed in class. All of which combines to dictate the sequence in which you proceed with an analysis of covariance. First, you do regressions within each category. This verifies for you that the covariate works within each category and gives you the information needed to make the first test, which is, formally, a test of the null hypothesis that the residual variances within the groups are the same. We discussed using the F-max test to do this, and textbooks advocate various tests on the equality of variances that work as well but with more tedium. Second, presuming that you accepted the null hypothesis, you then test the null hypothesis that the slopes of the regression of the response on the covariate are equal across groups. This is done with most software packages by running a general linear model with terms for the covariate, the categorical variable, and the interaction between the two. It is the interaction between the two that tests the “slope” hypothesis. Third, if justified, you proceed to test the effects of the adjusted means. In other words, you test the null hypothesis that that there is no effect of the categorical predictor after we adjust for the covariate’s effect on the response. This is done in most software by re-running your general linear model call and now deleting the interaction from the terms to be considered. Now, I focused the class time on showing you where the sums of squares for all of these tests come from and how one can do some of these tests without running a general linear model program. This is where I don’t appear to have been completely clear in my discussion; at one level you can always stumble through by running the “GLM” modules but it is nice to know where this stuff originates and what the software is really doing. The exercise was focused in part on reinforcing the lessons in class and in part on helping you see that, when the groups differ in their distributions of the covariate, the sums of squares for the effect of the covariate and the group will change, and sometimes change dramatically, when you exclude or include the other factor. That is, the sum of squares for the effect of population changed substantially whether we considered population by itself or in the presence of the covariate. This is the same idea as we discussed for the two-way analysis of variance with unequal sample sizes: the effects are confounded and we get different answers with Type I SS (calculated for one factor by ignoring the presence of the other, or summing over it) and Type III SS (calculated for one factor by acknowledging the presence of the other and calculating the effect of the factor in question effectively “within levels” of that other factor). To conclude, these data has some idiosyncratic challenges and I think most folks, in their focus on the method, glossed over those challenges. For one, there was a scaling issue. Mass should scale with the cube of length, so any analysis of a mass variable would best be done on the same scale as the covariate, length, meaning the first thing to do is either take the cube root of mass or create the cube of length. If you do either one, you’ll see the pattern in the data looks much cleaner at the outset. Failure to transform in some fashion like either of these creates a problem in that the strength of the regression is different in the different populations because the populations have different distributions of length and the scaling effect makes the regressions with the raw data look more heterogeneous than they really can be. There are some legitimate ambiguities in how one might use the data to address the question. If by “condition” we mean some measure of energy reserves, then the response might be the difference between the lean dry mass and total dry mass (which is presumably the mass of the extractable storage lipids). If we mean overall robustness, we might mean total mass. If we mean some more inchoate measure of muscle/bone density, we might mean the lean mass. The traditional measure of “condition” just places total mass over length and doesn’t usually address these nuances. There is similar nuance in “potential fecundity” - do we mean available offspring (fertilized ova, however far each has developed), do we mean capacity for producing offspring (perhaps total mass of reproductive tissue), or perhaps proportionate mass of tissue (mass of reproductive tissue divided by total dry mass). But why do analysis of covariance on such problems rather than just take ratios? For one reason, a ratio of two random variables can hide information. Here one recalls work by your graduate student colleague Brian Storz, who showed that the ratio of the orbitohyoideus muscle to total tadpole mass is NOT necessarily a good indication of the development of the cannibalistic morph. Yes, cannibals have high ratios, but you can also get high ratios in stunted omnivores in which the muscle is developing normally but the overall somatic growth is slow. For another reason, a ratio of two random variables has a variance and a distribution that is a complex function of the distribution of each individual random variable. If the two variables covary, the result can be a mess and information can be lost. For a third, consider that a ratio of something like reproductive tissue mass to body length cubed is a good index only if reproductive tissue mass varies isometrically with length cubed - that is, the ratio in a big individual is the same as that in a small individual. If there is any allometry, then the ratio is different in individuals of different size just because of the allometry. Now if populations differ in body length distributions and the response variable is an allometric function of the covariate, then a simple analysis of variance on the ratio could lead one to think that there are differences among populations when, in fact, they really have no difference other than the simple difference in body size its concomitant allometric relationships. Bad scene, isn’t it, which is why analysis of covariance is so helpful in this sort of context.