File - Brandi Locklear

advertisement
Brandi Locklear
Junior Seminar: First Paper
2-14-11
A peer review of the cost effective prediction of the eutrophication status of lakes and
reservoirs
Abstract:
Eutrophication is a serious threat to both humans and wildlife. Identifying factors that
influence this process is both costly and challenging. This study uses data from the IDF region
of France to identify these factors and use them to intern predict concentrations of Chlorophyll
A. To predict the trophic status of the lakes and reservoirs the authors use three models: the
General Linear Model (GLM), the General Additive Model (GAM), and the Random Forest
Model (RF). They will, using various methods, test which of these models produces the greatest
accuracy and robustness.
Introduction
Eutrophication refers to the nutrient enrichment of both inland and coastal waters.
Phosphates and nitrates are often the cause that, lead to the eventual increase of
phytoplankton biomass (1). Eutrophication is considered a serious threat throughout most of
the world (2). The algal biomass increases as an influx of nutrients come from surrounding
natural or anthropogenic sources; the resulting oxygen depletion leads to the production of
nuisance algal species (1). The nutrient loading of fresh water systems also, imposes a new cost
on society. United States, tax payers, government agencies, and water treatment facilities have
lost a approximately 4.4 billion dollars due to eutrophication (3). The high fiscal cost is caused
by: drinking water treatment, regulations that deal with threatened and endangered species
recovery, the revenue loss cause by decreased boating and fishing, and the decreased value of
water front property. Eutrophic water, overall, reduces the health, safety, and value of
recreational lakes and reservoirs to the public (3).
Studies done by various parties have helped to increase our awareness of the
environmental pressures effecting lakes and reservoirs (4). However, the tools currently used
to predict eutrophication, at larger spatial scales, are currently limited to mass balance
modeling. Although deterministic mass balance models can be useful for synthesizing data,
they require intensive field sampling to be accurate and are somewhat costly (4). Cost-effective
tools are needed to predict the eutrophication (4). The study done by Catherine evaluated the
ability of low cost variables.
The correlation between algae and nutrients is widely recognized by scientist (4). This
correlation is highly dependent on the morphological parameters of the system (the
lake/reservoir volume and size), as well as, drainage characteristics (the land-use patterns and
network size). The use of GIS (Geographical Information Systems) data further shows the
correlation by pin-pointing the influx of nutrients based on spatial analysis (4). For instance, J.R
Jones and his associates showed that the trophic status of Missouri reservoirs could be
accurately predicted with estimates gathered from land use data and hydromorphology
characteristics (4). The authors admit that they have not seen this method of prediction on a
large scale and that they might be the first to do so.
To predict chlorophyll-a concentrations, as well as, eutrophication the authors
used three various modeling methods for statistical analysis (4). The first being the generalized
linear models (GLMs), this is the most common approach and stems from the use of
multivariate linear regressions (4). To understand this modeling method one must first
understand a few basic concepts, linear and bivariate models (5). The word, linear, refers to
the application of a line. The word model refers to the equation (ie: rise over run or delta Y
over delta X) that fits that line (5). In a two-variable linear model, or bivariate model the
purpose of the linear regression (y=mx+b) is to summarize the data not fit an accurate line to
every point (5). The points in a bivariate model, the regression line, as well as, the error
combine to make a general linear model (5). The basic equation for a GLM is y = b0 + bx + e.
Each of these variables can represent multiple outcomes, which is important when conducting a
scientific experiment. “X” would represent the treatment, or the thing being tested, multiple
“x”s would equal multiple treatments (5). The “y” in the equation represents the outcome,
and “b” is an estimation of a “xy” relationship (5). The estimation of “b” statistically allows the
experimenters to test their hypothesis and it also allows them examine the relationship
between the multiple groups they are testing (5). The second method the authors refer to are
General Additive Models (GAMs) which use multiple regressions and algorithms (4). The final
statistical analysis they use is the only non-linear method, and is called the regression-tree-base
method (4). This method has been used in other ecological disciplines (forestry), but has yet to
be used as a prediction for the eutrophication of freshwater systems (4). The authors are the
first to use it this way.
Methods
The authors used lakes in the IDF region of France to conduct their experiment. The IDF
region covers an area of 12,011 km2 in northern France and houses more than 19% of France’s
population. The authors used the program Carthrage 3.0 to uncover 990 surface water bodies
in this region. They then decided to restrict the bodies of water to water bodies with a surface
area greater than 5 hectares. The authors felt that these lakes and reservoirs had the greatest
value to the populace. They felt that because of all of the recreational activities that happened
in them that they should be maintained and preserved. All of their chosen sites were manmade. Most of them consist of sand and gravel quarries that had been used in the 1940’s and
the 1980’s. Some came about from peat-extraction in the mid 1800’s. The older water bodies,
the sites that dated back to the 17th and 18th centuries, were water supplies used by Versailles
and Paris, as a water supply (4).
The authors used a stratified sampling strategy to eliminate bias. Different sites have
different environmental conditions; the altitude, land use and hydrology affecting each site
would vary, hence, the stratified sampling. For each shallow water body, three stations were
set equidistance from each other. This standard was set to define horizontal heterogeneity
within each of the selected sites. To prevent further data bias the authors took into account
individual seasons. They conducted four sampling campaigns, or SCs, on each of the fifty lakes
and reservoirs. They were conducted during: the summer of 2006, the winter of 2007, the
spring of 2007, and autumn of 2007. The duration of their campaigns was no longer than two
weeks; they wanted to keep each campaign as short as possible. This would also, help with
reducing variability caused by weather, and an influx of nutrients (4).
The author’s removed two lakes from the data set. Both of these sites were affected by
point source pollution and were immediately thrown out. Triel pond, the first site, was used to
store sludge. The dredging of the Seine River caused its impairment. The second site, the
Gazeran pond, was used for sewage dumping. The data produced from both sites showed
extremely high level of both dissolved phosphorus and total phosphorus. At Triel pond the
average total phosphorus equaled 23.4 plus or minus 9.6 micrometre; the dissolved
phosphorus equaled 20.4 plus or minus 7.7 micrometre. The total phosphorus of the Gazeran
pond equaled 24.7 plus or minus 8.3 micrometre; the dissolved phosphorus equaled 9.5 plus or
minus 3.7 micrometre (4).
The surrogate used for eutrophication in this experiment was chlorophyll-A (4). The
authors felt that Chlorophyll A and nutrient loading directly correlated, and thus set it as the
standard. They measured Chlorophyll A concentrations fluorometrically, by using a portable,
submersible FluoroProbe. Chlorophyll A data was obtained by profiling the water column at
each of the sampling stations. The values obtained at each of the three stations were then
averaged to produce a single number. They then used natural log or log10 to stabilize variance
and eliminate statistical errors. The authors used a sampling rate of one point every 50m and
the database BD Alti, as well as, multiple mapping software to perform the catchment
delineation. To correct any errors they used a map of the IGN region of France at a scale of
1:25000 (4).
The authors then defined their predictor variables, these would be used in the GMAs,
GMLs, and the regression trees previously mentioned. The ration between the catchment and
the size of the water body was used as one of their variables. They defined land use as another
prediction value. Forest, agriculture, and impervious cover (pavement, concrete, etcetera),
were their named variables. They, also, used the density of the catchment network, mean
depth and landscape placement. Drainage density was used because of its ability to affect
nutrient transport. Mean depth helps in modulating the buffering capacity of nutrient and also
the act of colonies to photosynthesize. Primary production is influenced by sedimentation and
the penetration of light throughout the water column. It also influences mixing. Altitude is
another predicting factor they used. Season change was another predictor variable used (4).
The statistical analyses were performed by using the program Statistical Environment R,
version 2.7.2. Prior to the analysis the authors checked the data for spatial dependency. If left
uncheck the spatial dependency would create bias and would lead to predictive power
inflation. To create the three models they determined a function. Biomass was modeled as a
function of both landscape and hydro morphological features (these were the predictor values
mentioned above). Once their function was determined they made their predictions. To assess
the predictive power of each of the models they used a cross-validation procedure. To further
ensure accuracy the authors calculated the R2 between observed and predictive values and
compared them to Cohen’s Kappa model (4).
Results
The hypothesis was in fact answered in this experiment. The trophic status of lakes and
reservoirs can be predicted. The predictive value of the RF gave the best results, followed by
the GAM. The GLM was out performed by both the RF and the GAM. The RF gave the best
results and showed the lowest variability, also making this method the most robust. These
results can be verified based on the authors use of the Cohen’s Kappa model, as well as, the
cross validation. The authors proved their hypothesis while obtaining more hydrological
information about the IDF region of France. (4)
Disscussion
Seventy percent of the water bodies in the IDF region were classified as either eutrophic
or hypereutrophic, according to their corresponding Chlorophyll A concentrations. These
results, however, were not astounding according to the authors. The IDF region is one of the
most populated regions in all of Europe. The size and location of these lakes were also factors
that pertained to the high levels of nutrients found in the water bodies. Many of these waters
were small with a mean surface area of .22 kilometer squared. Also, most of these lakes and
reservoirs were isolated away from the river network, making the waters more sensitive to
nutrient loading (4).
Predicting water quality on a large scale remains a challenge for managers. The GLM
can be used to predict eutrophication; it however, is not the best model for complex systems
involving multiple predictors. The experimental findings suggest that non-linear methods are
the best. GMA out preformed the GLM, but did not do better than the RF model. Although,
GLM and GMA have a limited ability in complex situations, their power, comes from their ability
to account for such interactions. An example of this can be seen in heterogeneous landscapes.
The effect of some predictor values influence other values. It is commonly known that water
quality is affected by land use. Different land uses result in varying influxes of nutrients. When
nutrients come from two landscapes, the diversity within them, can lead to different
relationships between the response variables and the predictors (4).
The RF
model the best
method to predict
eutrophication.
This method is
also the most
robust. Unlike the other models the RF model is not monotonic; meaning that the function
increases or decreases as “x (the variable)” increases or decreases (a linear relationship). The
RF model is multimodal, meaning it has multiple probabilities and is not limited by a
mathematical boundary. This allows for the statistical comparisons of multiple predictors (see
figure) (4).
Today very few tools exist to assess anthropocentric impacts on water quality. Lake and
reservoirs provide important features for both humans and biota, and should be properly
managed and conserved. The main approaches used are based on concentrations of nitrogen
and phosphorus and come for large data sets. These approaches on a large scale are often
costly and inaccurate. The methods offered in this study are low in cost and offer a regional
model, based on environmental characteristics. Once the regional models have been calibrated
to their corresponding regions, predictive models can be made to help managers maintain or
improve water quality standards, when the land scape is changed (4).
Works Cited
1. Miller, T. G. (2007). Living in the Environment. Belmont: Thomas Learning, Inc.
2. Walter, D. (2008). Study examines Economic impact of fresh water pollution. Clean Water
Report.
3. Hein, L. (2005). Cost-efficient eutrophication control in a shallow lake ecosystem subject to two
steady states. Ecological Economics , 429.
4. Cathrine, A., Mouillot, D., Escoffier, N., Bernard, C., & Troussellier, M. (2010). Cost effective
prediction of the eutrophication status of lakes and reservoirs. Fresh Water Biology, 2425-2435.
5. Trochim, W. M. (2006). Research Methods: Knowledge Base. Retrieved Febuary 9, 2011, from
General Linear Model: http://www.socialresearchmethods.net/kb/genlin.php
Download