Stat 301: HW 8 answers 1. Model academic salaries (4 pts, 2 for the problem, 2 for the solution): Short answer: The proposed model forces the difference between each pair of adjacent academic ranks to be β1, which is too restrictive. A better model allows each academic rank to have its own mean salary. One way to do this is to create 3 indicator variables. Explanation: The proposed model treats the mean salary for each of the four academic ranks as a linear regression on the ``level’’ (0, 1, 2, 3). According to this model, the mean salary for lecturers is β0, the mean salary for asst. profs is β0 + β1, the mean salary for assoc. profs is β0 + 2 β1, and the mean salary for profs is β0 + 3 β1. This forces the difference between each pair of adjacent academic ranks to be β1. The data may easily be otherwise. E.g., the mean salaries does increase with rank, but the difference between lecturers and asst profs may not be the same as the difference between asst and assoc profs. One way to specify a model that allows each rank to have its own mean salary is to create 3 indicator variables that specify the differences between the groups. The approach in lecture and the book is to define one group as a reference group (e.g. lecturers) and 3 indicator variables, (e.g. one for asst profs, one for assoc profs, and one for profs). Each indicator has the value of 1 for each of the specified group. 2. Pace of life – model comparison a. 2 pts. Full model: Reduced model: E Y = β0 + β1 bank + β2 walk + β3 talk + β4 bank*walk E Y = β0 + β2 walk + β3 talk b. 4 pts. F statistic by comparing two fitted models using an ANOVA table: Source df SS MS F Difference 2 125.547 62.77 2.737 Full Error 31 710.849 22.93 Reduced Error 33 836.396 Notes: You fit the Full and Reduced model separately. Look at the JMP Analysis of Variance (not the Lack of Fit!) and find the Error line. df and SS are the error df and Sums-of-squares for each model. Difference is the difference within each column. MS is the ratio of SS/ df within each row. F is the ratio of MS for difference / MS for Full Error. c. 1 pt. F = 2.737, p = 0.08 Notes: Estimates / Custom Test on results from the Full model. d. 1 pt. Yes, because the p-value is > 0.05. (Or, No because the p-value is < 0.10) Notes: This p-value is definitely in a grey area. If much smaller (more significant), the result is clear: at least one is not zero so you shouldn’t remove them. If much larger (less significant), the result is also clear, no evidence that they’re not zero, so remove them. 3. Bat echolocation a. 1 pt. When JMP creates the indicator variables: the regression coefficients are log mass: 0.815, Type[birds]: 0.042, and Type[e-bats]: 0.018 root MSE: 0.186 Notes: If you constructed 0/1 indicator variables, you would get different estimates for the Type coefficients but the same log mass slope and root MSE. b. 1 pt. β1 and β2, the coefficients for the two indicator variables, must both = 0. c. 1 pt. F = 0.43, p-value = 0.66 d. 1 pt. No evidence of a difference in mean log energy among the three types, when compared at the same log mass. e. 1 pt. F = 0.67, p-value = 0.53. Notes: This is the test of the interaction (type * log mass). f. 1 pt. The estimated β1 = 0.079, p = 0.70. g. 1 pt. This was a think about question. Anything reasonable is acceptable. The statistical reasons I can think of are: Adding more observations gives a more precise estimate of the error sd Adding information on another species may give a more precise estimate of the slope for log mass If you have JMP color observations by the value of Type (plot below), you see that echolocating bats are all quite small (green dots), while non-echolocating bats are all quite large (blue dots), while birds span quite a size range. h. 1 pt. The residual plot looks pretty good to me. The vertical spread is slightly larger at large predicted values, but this isn’t a very large change, and the data set is small.