Blackbird-review - Simon Fraser University

advertisement
Review of “Selenium accumulation and red-winged blackbird productivity 2003-2005”
Carl James Schwarz, P.Stat.
Department of Statistics and Actuarial Science
Simon Fraser University
8888 University Drive
Burnaby, BC V5A 1S6
cschwarz@stat.sfu.ca
2010-05-05
1. Introduction
This is a review of the statistical methodology used in the report “Selenium accumulation
and red-winged blackbird productivity 2003-2005” what was prepared by SciWrite
Environmental Services, dated 2007-04-13. A pdf file of the report was supplied to me by
the B.C. Ministry of Environment.
2. Sampling Protocol.
The sampling protocol was held fairly constant over the three years of the study as
follows:
Sediment and Water Sampling
 Triplicate water samples were taken from all sites in 2005 and selected sites in
2004. Each sample was measured for the concentration of Se. The raw data are
presented in Appendix 2 (Table 9) in the report.
 Sediment samples were taken from selected sites. The raw data are presented in
Appendix 2 (Table 10) in the report.
Egg Collection and Productivity
 Blackbird nests were identified at various sites in the study area.
 Either 0, 1 or 2 eggs were removed from selected nests. The removed eggs were
weighed, sized, and the concentration of Se was determined (excluding the shell).
The raw data are presented in Appendix 5 (Table 20 and Table 21) of the report.
 The remaining eggs in the nest were monitored over time, and the number of
failed eggs, hatched eggs, and fledged chicks was determined. The raw data are
presented in Appendix 5 (Table 24 and Table 25) of the report.
Prey Item Sampling:
 In 2005, prey items were sampled from live chicks at selected sites. These were
analyzed for total Se. The raw data are available in Appendix 4 (Table 17)
Liver sampling:
1

Dead nestlings had their liver excised and Se measured. The raw data are
presented in Appendix 9 Table 31.
Blood sampling:
 Blood glutathione peroxidase activity was measured using blood samples from
juveniles in selected sites. The raw data are presented in Appendix 8 (Table 30).
The sites in the study area were classified by Se concentration in the water/sediment as
High Se Exposure, Low Se Exposure, or Reference, the latter presumably was not
affected by mining operations in the study area. Not every site was monitored in all years
of the study.
3. Review of Statistical Analysis
The analyses done in this report are very briefly reported on page 24, but the description
is too brief to know exactly how the data were analyzed. I extracted the raw data from the
appendices in the report to assess which models were fit and to verify the findings. Based
on what was reported and the comparison with a more appropriate reanalysis of the data,
I have concluded that, unfortunately, most of the analyses were done incorrectly in
the report and that many of the conclusions are not supported by the data.
In the following sections, I briefly review my finding and present more appropriate
models. The more appropriate models are all available in standard statistical packages
and do not require specialized software to fit.
I have not reviewed all of the analyses in the report but have concentrated on the analyses
with large amounts of data.
3.1 Analysis of Prey Items (page 32):
On page 32, the report states that “Of the prey items that could be identified by source …
36 prey items (43%) were terrestrial and 48 (37%) were aquatic. This difference was
statistically significant (X2, p=.01)”. The hypothesis being tested was never clearly
specified (e.g. Is the hypothesis that the proportion of prey items from terrestrial and
aquatic should be 50:50?) and so the p-value cannot be interpreted. The authors appear to
use some sort of chi-square test, but appear to have engaged in sacrifical pseudoreplication (Hurlbert, 1984) by ignoring the structure of the sampling protocol with
sample from different sites being pooled prior to analysis. Perhaps the intent was to
examine if the prey distribution (terrestrial vs aquatic) varied among exposure areas?
among sites?
2
3.2 Analysis of Mean Se in egg vs. Se in water (Figure 13)
The authors fit a curve comparing the concentration of Se in the egg contents to the Se
concentration in the water of the various sites. There are a number of difficulties in their
analysis.
In order to avoid pseudo-replication (Hurlbert, 1984), the mean concentration of the Se
for all eggs measured in a site was used. This is appropriate. However, the authors failed
to account for potential multiple measurements on each site over the 3 years (i.e. if a site
was measured for 3 years, the 3 averages were used). A simple regression model will not
be appropriate.
Also, Figure 7 shows that the variability in aquatic concentration of Se is large over the 3
samples (high standard error in the mean), but only the mean value was used along the X
axis. No account for the uncertainty in the aquatic Se concentration was allowed for. This
is a difficult problem (called the error-in-variables problem) when the X measurements
are subject to large uncertainties and ordinary regression methods are no longer
appropriate.
The authors never present the fitted equation, and state in the legend of Figure 13, that a
polynomial equation was fit, but in the text (page 34) claim that “The asymptote of the
curve (the level that the curve approaches but does not quite reach) was about 24 mg/kg
dry weight. Above about 80 ug/L aqueous selenium, further increases in MES would be
neglible.” However, a polynomial model (e.g. a quadratic) does not have an asymptote
and does reach a maximum which is much greater than the apparent limit in this plot.
3.3 Analysis of Mean log(Se) in egg vs exposure (Figure 15)
The authors claim to have done a multivariate comparison on the effect of year and site
on Se concentration using MANOVA but MANOVA is not the appropriate tool in this
case. Not all sites are measured in all years and the default analysis of MANOVA is
casewise deletion, i.e. sites that are not measured in all years will be deleted. More
modern ANOVA approaches can (and should be used) to allow for this missing data.
3.4 Analysis of Mean log(Se) in egg across years (Bottom of
page 35)
The authors do a comparison over all sites of the mean log(Se) concentration across
years. I was able to reproduce their results using an incorrect model that did not account
for having the same site repeated measured over time, did not account for the different
mix of sites and exposure levels in the three years, and did not account for the multiple
eggs measured on the same nest. The authors conclusion reflects a shift in monitoring
effort across the years among sites (and exposure levels).
3
The proper model needs to account for the hierarchical structure and takes the form (in a
standard model notation)
log(Se) = year site(r) nest(site*year)(r)
where the (r) indicates the random effect of site or nest. Under this correct model, the
conclusions are different that reported by the authors with statistically significant
differences in the mean log(Se) concentration detected between 2004 and 2005. No
difference in the mean log(Se) concentration between 2005 and 2003 was detected
because of the much sample size in 2003 compared to 2004 and 2005.
3.5 Analysis of Mean log(Se) in egg across sites (Bottom of page 36)
The authors performed a comparison of mean log (Se) across sites (colonies) as reported
in Figure 16. Their analysis fails to account for the multiple eggs measured in each nest.
The standard error bars in Figure 16 are too small and the comparison of the mean
log(Se) level among the site have an increase Type I error (false positive rate.
The proper model accounts for the multiple eggs measured in a single nest (i.e. there may
be some ecological factor so that the eggs within a single nest are not independent in the
Se levels). In standard model notation the proper model is:
log(Se) = site nest(site)(r)
where the (r) indicates the random effect of nest upon the multiple eggs within the nest.
Fewer statistically significant effects are found – many of the reported differences among
sites in Appendix Table 15 are false positives and the reported se are too small.
The authors should also replace their text on the bottom of page 36 with a suitable
graphic such as joined-line plot (refer to Section 6.6.7 of
http://www.stat.sfu.ca/~cschwarz/Stat-650/Notes/PDF/Chapter06.pdf).
3.6 Analysis of mean log(Se) in egg across exposure level (Bottom of
page 37)
The authors performed a comparison of mean log(Se) concentration across exposure
levels. I was able to reproduce their results using an incorrect model that did not account
for having multiple sites within each exposure site and did not account for multiple eggs
from the same nest in each site. The authors conclusions reflect both the effects of
exposure level and differences in sampling effort among the sites within exposure level.
For example, in Figure 16, it is quite apparent that the LCM site has much lower Se
concentrations than the CP and GM sites but all are within the same high exposure
category.
The proper model needs to account for the hierarchical structure and takes the form (in
standard model notation):
log(Se) = exposure site(exposure)(r) nest(site*exposure)(r)
4
where the random effects of multiple sites within each exposure level and nests within
each site are accounted for. Under this correct model, there is in fact NO evidence of a
difference in mean log(Se) levels across exposure groups – mainly because the very high
variability in mean log(Se) levels across the High exposure group.
The same incorrect model was used in previous years (Appendix Table 13) and many of
the conclusions are actually not supported by the data. For example, there is no evidence
of a difference in the mean log(Se) concentration across exposure level in any year. The
revised model above needs to be fit for each year of the data.
3.7 Comparison of clutch sizes across year (Page 39)
As in the analysis of the mean Se levels, this analysis fails to account for the hierarchical
structure of the data with multiple nests in multiple colonies in each year, and not every
colony is measured in every year or with the same intensity. The reported standard errors
are too small.
The authors used a simple ANOVA, but a Poisson regression model that recognizes that
the data are discrete small counts may be more appropriate.
3.8 Comparison on Hatchability (Page 40)
As in the analysis of the mean Se levels above, this analysis fails to account for the
hierarchical structure of the data with multiple colonies in each exposure category,
multiple nests in multiple colonies in each year, and not every colony is measured in
every year or with the same intensity. The reported standard errors are too small.
Additionally, the authors used ANOVA on the hatchability rate (ratio of eggs that hatch
to eggs monitored). This will be approximately correct, but is old fashioned. A more
appropriate approach is to use generalized linear mixed model (GLIMM, logistic
regression with random effects) that mimics the mixed models for the mean Se analyses.
Logistic regression is the (now) standard way to analyze data that presents itself as a
proportion (i.e. proportion of eggs that hatch).
A logistic regression approach would also avoid the obvious problems in Figure 21 where
the confidence intervals for the hatchability exceed 100%!
3.9 Comparison of nestling mortality and survival (Page 41)
As in the comparison of hatchability, the authors used incorrect models for this analysis
and should use a more modern GLIMM approach.
5
The authors also reported the results of a chi-square and Fisher Exact test comparing the
survival rates across the exposure classes. These analyses are incorrect and are and
example of sacrificial pseudo-replication (Hurlbert, 1984). The problem is that that the
traditional chi-square and Fisher’s Exact test cannot be conducted on data that is
collected in the hierarchical fashion – it is not valid to simply pool across the colonies
within the exposure levels. A GLIMM analysis must be done here.
The comparison on nestling mortality across sites (Figure 23) using ANOVA is also
inappropriate. Here the very small counts require the use of Poisson ANOVA. The use of
Poisson ANOVA would also avoid confidence intervals less than zero.
3.10 Comparison of Egg Health
The analysis of egg weights (Figure 26) is incorrect because of the failure to use a proper
linear mixed model. See comments earlier in this report. Similarly, the comparison of Se
levels among the hatched and failed eggs is also incorrect.
3.11 Comparison of Nesting survival by half of season.
The authors should use GLIMM logistic regression method here rather than ANOVA and
need to account for the paired nature of the data. These methods will again avoid
confidence intervals that exceed 100%.
3.12 Comparison of Productivity, Hatchabilty, and Nestling Survival
vs Selenium Levels
The authors need to use Poisson regression or logistic regression model for number o
failures and hatchability respectively rather than ordinary regression. For example,
ordinary regression methods would allow the number of failed eggs to fall below 0
(which is impossible) and the hatchability to exceed 100% (which is also impossible).
Similarly as noted above, the hierarchical structure of the data collection needs to be
incorporated into the analysis.
3.13 Use of log() transform and intepretation of analysis
The authors used a logarithm base-10 transformation of the wet Se concentration in the
analysis. The use of the base-10 logarithms vs the natural (base e) logarithm does not
affect the conclusions other than the log-10 values are about 2.3x larger than the natural
logarithm values.
However, the ANOVA on the log() values tests the hypothesis that the mean log(Se)
concentration is the same among groups. However, this corresponds to the MEDIAN Se
6
value on the anti-log scale but the authors state their conclusion about the mean Se value
on the anti-log scale.
3.14 Use of Bonferonni Multiple Comparison Procedure
The authors used a Bonferonni multiple comparison procedure (MCP) to control the
experimentwise error rate. This MCP takes the simplistic approach that if every
comparison has a .05 chance of a type I error (false positive), then k comparison will have
a k(.05) chance of at least one type I error in the set. So if there are 3 comparisons (e.g.
among exposure classes), the overall error rate is 3(.05) or .15. This is unacceptably high.
The Bonferonni approach declares each individual comparison statistically significant
only if the p-value is less than (.05)/k, i.e. makes it harder to detect a significant
difference on each individual comparison so that the overall error rate is controlled at the
.05 level.
The Bonferonni procedure is too conservative because it treats each comparison as being
independent of each other. But the pairwise comparisons are not independent. For
example, when comparing effects across exposure levels, the High vs Low, High vs
Reference, and Low vs. Reference comparisons use each level twice in the set of three
comparisons. In these case, a preferred multiple comparison procedure is the TukeyKramer procedure which is available in all standard statistical packages.
4. Conclusions:
The major problems in the statistical analyses in this report are
(a) Failure to explicitly state the statistical model that was fit to the data. For
example in Figure 13, what was the equation that was fit? How are effects of multiple
measurements on the same site over 3 years taken into account? In the analysis of the Se
concentration in eggs vs exposure levels, the statistical model was never explicitly stated.
The computer code used to fit the model for each analysis needs to presented in an
appendix so that reader can verify that the correct models have been fit.
(b) Failure to account for the hierarchical nature of the sampling protocol. The
statistical model needs to match the way the data are collected. For example, there are
multiple sites within each exposure level. There are multiple nests within each site. There
are multiple eggs measured from each nest. The models in this report fit to the individual
egg values do not account for this hierarchical sampling scheme.
This implies that the measurements at the lowest level in the hierarchy are pseudoreplicates and not the “experimental units”. For example, in comparison of Se
concentration among exposure levels, the site is the experimental unit, and not the egg.
The (incorrect) models used by the authors often confound differences in sampling effort
across year with exposure or year effects.
7
These (incorrect) models that treats the egg as the experimental unit leads to standard
errors that appear to be too precise (i.e. smaller than can be justified by the data) and
inflated Type I error (false positive) rates (i.e. too many conclusions about the existence
of effects that can be justified by the data).
Some of the problems can be resolved by taking successive averages, e.g. average the Se
from multiple eggs in the same nest; average the averages for multiple nests within the
same site. However, this analysis will only be approximate, and a linear mixed model
ANOVA should be used as outlined above.
(c) Use of ANOVA/Regression instead of logistic or Poisson ANOVA/regression
In cases where the response variable is a count of success/failure, a better approach is the
use logistic ANOVAa/regression rather than simple ANOVA. The logistic approach
must, of course, properly account for the hierarchical structure of the data collection and
this can be done with Generalized Linear Mixed Models (GLIMM) the extension of
mixed model to generalized linear models. The logistic approach will properly account
for the discrete nature of the data and that survival rates must be between 0 and 100%.
Similarly, the analysis of data with very small counts should be done with a Poisson
ANOVA/regression approach. Again, this needs to account for the hierarchical structure
of the data. The poisson approach properly accounts for the discrete nature of the data
and that counts must be non-negative.
References:
Hurlbert, S. H. (1984). “Pseudoreplication and the design of ecological field
experiments". Ecological Monographs 54 (2): 187–211. doi:10.2307/1942661.
8
Download