The Characterization and Uncertainty of

advertisement
The Variability and Forecasting of
Treatment Plant Influent Water Quality
Erin Towler
Class Project - CVEN 6833
December 19, 2005
INTRODUCTION
As new drinking water regulations come into effect, utilities are faced with the
challenge of meeting new regulations without jeopardizing their compliance with existing
regulations. This is a formidable task, considering financial constraints coupled with the
fact that water sources are becoming scarcer and often more polluted. To this end, an
electronic decision tool has been proposed that will help utilities to strategize
simultaneous compliance methods. The decision tool will need to consider a utility’s
characteristics, including its influent water quality and treatment processes. This paper is
concerned with characterizing the variability in treatment plant influent water quality and
forecasting future values. This is important in deciding how to simulate influent water
quality parameters that will be sent through the decision tool.
This paper compiles relevant diagnostic information on influent water quality on a
national level. First, national averaged data was examined to determine variable
relationships. Next, spatial variability of key influent water parameters were examined.
Finally, the perspective shifts to look at nine utilities in the Colorado Front Range. A
principle component analysis was conducted to try to resolve the spatial-temporal
variability and to forecast future pH and alkalinity values.
DATASET
This paper utilizes the data from the United States Environmental Protection
Agency’s (EPA) monitoring program, called the Information Collection Rule (ICR). The
ICR’s Auxillary 1 database contains eighteen months of historical data (July 1997 –
December 1998) from over 400 utilities from around the United States. The spatial
distribution of the continental water utilities that participated in the ICR are seen in
Figure 1. The database contains a large amount of data regarding influent, intermediary,
and finished water quality parameters as well as information about the processes of each
treatment plant.
Figure 1 Spatial distribution of water utilities that participated in the Information Collection Rule
(ICR).
METHODOLOGY
The analyses completed in this paper were completed using Microsoft Access and
the statistical program R. For all analyses that involved averages, only averages of
available data were computed, any missing values were skipped. In all of the contour
plots, annual averages were only taken on utilities that had at least 8 of 12 months of
complete monthly data, unless noted. In analyses where all 18 months were needed, any
missing values were replaced with averages over the remaining months.
To examine parameter relationships, both the parametric linear correlation and the
nonparametric mutual information calculations were employed. To look at the spatial
variability of parameters, a local polynomial was fit to the data and contoured. A further
discussion of local polynomial method can be seen in Loader (1999). The principle
component analysis (PCA) was completed following Storch and Zwiers (1999). The
forecasting aspect of the PCA analysis proceeded by fitting a locfit relationship between
the first PC of alkalinity and the first PC of pH. The first PC of alkalinity was simulated
using the k-nearest neighbor method. Only one simulation of 18 months was completed,
and only the last 12 months were used (so as to simulate from where the original data left
off). Then, based on the previously fit local polynomial relationship, the corresponding
first PC of pH’s were calculated. The other PCs were bootstrapped and the whole PC
matrix was multiplied by its respective eigen vector to return to the original space. One
can see Regonda et. al. (2005) for more details.
RESULTS
Variable Relationships
Linear correlations and mutual information (MI) was computed for all of the
variables in order to determine if there were relationships between influent parameters. If
the linear correlation or the MI indicated that there was a relationship, scatterplots for
those relationships were examined. Figure 2 shows the scatterplots from the relationships
between alkalinity, calcium hardness, total hardness, and pH. Strong correlations were
found between total hardness and calcium hardness, total hardness and alkalinity, and
calcium hardness and alkalinity. This is what one would expect, since calcium hardness
and total hardness are simply different measures of the same thing, and are the main
contributors to alkalinity. A lower, but significant relationship was also found between
pH and alkalinity, as well as with pH and total calcium and pH and total hardness. In all
of the plots that include pH, a strong nonlinear relationship is observed. Of note are the
very low values of alkalinity that correspond to low pH values – when providing
ensembles for the decision tool, these values may need to be generated together. Figure 3
shows the correlations between TOC, UV254, and temperature. TOC and UV254 are
strongly correlated, as one would expect since UV254 is a measure of TOC. Also
temperature and TOC and UV254 yielded some correlation, but only linear (not MI).
This indicates that the relationship with temperature needs to be further examined,
perhaps looking at the relationship between monthly TOC and monthly temperature (i.e.
not averaged values). As a point of confirmation, the results found in this section are
consistent with other findings (Zachman 2005) and these relationships will shape the way
the rest of the analyses are approached. Complete correlation tables can be seen in
Appendix A.
Figure 2 Scatterplots of related variables: alkalinity, calcium hardness, total hardness, and pH. Uses
last 12 months of averaged data from continental US.
Figure 3 Scatterplots of related variables. UV_254 and TOC are highly related, while temperature
and TOC and temperature and UV_254 are less related. Uses last 12 months of averaged data from
continental US.
Spatial Variability
By creating contour plots of the average influent water quality parameters using
local polynomial method, geographic trends can be seen.
Alkalinity and pH
Figure 4 shows a contour plot of the fitted average alkalinity over the last 12
months of the study. Alkalinity often comes from the leaching of calcium carbonate from
rocks and soil. High levels of calcium carbonate are found in limestone, and in general,
the eastern part of the United States has higher alkalinity than the western US because
there is more limestone. The upper northeast has lower alkalinity because much of the
limestone in that area has been scoured away by glaciers. This is consistent with what we
see in Figure 4. It would be expected that a contour plot of calcium hardness and total
hardness would garner the same relationship, based on the high correlation that was seen
in the previous section. The alkalinity plot can also be compared by looking at a map of
the interpolated raw data in Figure 5. This plot is less smooth, but allows one to focus in
on local values. The other plot is better for seeing general trends. Figure 6 shows the
standard errors from the fit – notice that there are higher errors on the outskirts of the
original data. In addition, there are some higher errors internally.
Figure 4 Average alkalinity in ppm CaCO3 over the last 12 months of the study.
Figure 5 Contour plot of the interpolated raw data for average alkalinity.
Figure 6 Plot of the standard errors from locfit for average alkalinity.
A plot of the standard deviation of annual alkalinity can be seen in Figure 7. The
highest standard deviation can be seen below the Great Lakes, in Illinois, Iowa, and
Missouri. This plot shows that although there is some seasonality, in general, alkalinity
is not all that different with the seasons.
Figure 7 Standard deviation of annual alkalinity.
The first eigen vector of alkalinity was also examined in order to see if the
space/time decomposition was consistent with the previous findings. This can be seen in
Figure 8. One can see negative weights along much of the east coast and then positive
weights in the central United States. The Western United States is mostly neutral, but
with some positive weights.
Figure 8 Plot of the first eigen vector of alkalinity.
Finally, the local polynomial model was checked to see how good the fit was.
The observed alkalinity is compared to the cross-validated estimate of the raw data in
Figure 9. There is a fair amount of scatter, especially in the higher observed alkalinities.
Figure 10 shows a 3D plot of the observed versus the cross-validated estimates, again
showing that the model does not capture all of the observed alkalinities, especially in the
higher range.
Figure 9 Cross-validated estimates of the raw data with a one-to-one line overlaid.
Figure 10 3D scatterplot of the observed alkalinity (filled red circles) and the cross-validated estimate
of the alkalinity (empty black circles).
The contour plot of the fitted average pH over the last 12 months of the study can
be seen in Figure 11. Based on the relationship that was identified in the previous
section, one would expect that the trends in standard errors would be similar.
Figure 11 Average pH over the last 12 months of the study.
Total Organic Carbon (TOC)
TOC averages over the last 12 months of the study were also examined for their
spatial variability. TOC concentrations differ greatly between surface water and
groundwater sources, warranting a separate examination. Figure 12 shows the surface
water contour plot of TOC concentrations. One can notice the higher contour values
corresponding to the center of the US, which is largely agricultural and probably prone to
erosion. Also, Florida has much higher TOC values than the rest of the country. This
can loosely be attributed to the “swampy” nature of Florida. Figure 13 shows analyses
that had been done previously on the same ICR data. This shows average TOC by state.
In general, the plots show the same trends, but the local polynomial fit is not bound by
arbitrary state lines, and uses interpolation. The averaged by state map does do a nice job
of showing where there is no data, whereas the local polynomial map creates extrapolated
concentrations in the north central part of the country, where there is no data. Figure 14
shows the groundwater TOC concentrations. Here, the values are generally lower than in
the surface waters. However, Florida again shows very high TOC values. Similar
information is gained from Figure 15, but it lacks interpolation capability.
Figure 12 Local polynomial contour plot of the surface water sources: Last 12 months of study
average TOC (ppm). Utilities contributing data are overlaid.
Figure 13 Surface water plot, last 12 months of study average TOC (ppm) from Cadmus online
http://www.cadmusonline.net/twg/epaweb3/sect3/Sect3Q1/Quest1.asp?Anal=TOC&Out=Map
Figure 14 Local polynomial contour plot of the groundwater sources: last 12 months of study average
TOC (ppmC). Utilities contributing data are overlaid. Utilizes all locations, regardless of the
number of complete months of data. See Appendix for map of locations that use at least 8 months of
data.
Figure 15 Groundwater last 12 months of study average TOC (ppm) from Cadmus online
http://www.cadmusonline.net/twg/epaweb3/sect3/Sect3Q1/Quest1.asp?Anal=TOC&Out=Map
Contour plots of standard deviations for surface water and groundwater are shown
in Figure 16 and Figure 17, respectively. For surface water, the central US has higher
standard deviations. In addition, one can see that Florida also has the highest standard
deviations1. There is less information gained from the groundwater countour plot, since
the locations are sparse. There are higher standard deviations found in Florida, but
careful examination shows that the standard deviations are lower in near all the places
where there are data.
Figure 16 Surface water standard deviation contour map.
Figure 17 Groundwater standard deviation contour map.
1
To show the higher standard deviations, some non-equally spaced contour line were added.
Bromide
Bromide was examined in the context of spatial variation. In general bromide
concentrations in ppm of bromide ar relatively low, with increases in the southwest and
Texas, as well as in the northeast.
Figure 18 Average bromde concentration in ppm bromide over the last 12 months of the study.
Turbidity
Turbidity was also examined in the context of a spatial variation. However, this
analysis did not have quite as conclusive results. The range in turbidity was great, with a
number of outliers that were not geographically related. Rather, the water sources with
the highest turbidity were typically large, widely used rivers such as the Missouri R., the
Mississippi R., the Rio Grande R., the Ohio R., as well as others. Thus, the spatial
analysis was not all that useful, and rather turbidity will likely need to be examined in a
more case by case basis. Nevertheless, Figure 19 shows the average turbidity for all
utilities whose average turbidity value was less than 25 NTU. Forty-six utilities were not
included using this criteria. To further illuminate this dataset, Figure 20 shows a
histogram of the data used to generate Figure 19. Again it should be noted that the
majority of turbidities are extremely low, and that one may need to take a closer look at
turbidity before making any generalizations.
Figure 19 Average turbidity in NTU over the last 12 months of the study (only utilities with average
turbidity values less than 25 NTU were included).
Figure 20 Histogram of values used in Figure 19.
PCA Analysis
Nine of the Front Range utilities in Colorado were examined in a PCA analysis
that looked at alkalinity and pH. The nine utilities were broken into two groups based on
differences in pH and alkalinity, shown in Figure 21 and Figure 22. Group 1 (the left
group) consists of two utilities in Aurora, one Denver utility, and one Pueblo utility.
Group 2 (the right group) consists of Boulder, one Denver utility, two Colorado Springs
utilities, and a Fort Collins utility. This was just done to aid in viewing the PCA and
forecast results.
Figure 21 Left shows “Group 1” and right shows “Group 2” alkalinity time series.
Figure 22 Left “Group 1” and right shows “Group 2” pH time series.
Figure 23 shows the 9 utilities that were used, broken into two groups based on
differences in pH and alkalinity shown in Figure 21 and Figure 22. The red triangles are
Group 1, and the blue circles are Group 2.
Figure 23 Blue circles (Group 2) include Fort Collins, Boulder, 1 Denver plant, and 2 Colorado
Springs plants (which are on top of one another). Red triangles symbolize 2 Aurora plants, Pueblo,
and 1 Denver plant (Group 1).
A PCA was completed that used all of the data for the nine utilities (Groups 1 and
2 were combined). The Eigen spectrums were computed and can be seen in Appendix B.
The first three PCs of each pH and alkalinity were scatterplotted with one another. This
can be seen Figure 24. Analyzing all of the scatterplots, a similar nonlinear relationship
is seen in the scatterplot of the first PCs of pH and alkalinity. Therefore, the rest of the
PC analysis will be limited to the first leading PCs.
Figure 24 Scatterplots of the first three leading components of pH and alkalinity with locfit lines
through the points.
The first eigen vector of alkalinity is spatially plotted in Figure 25 and zoomed in
Figure 26. This shows that most of the locations contribute negative weights to the first
PC, and contours changing in a northwest to northeast direction. The first eigen vector of
pH is spatially plotted in Figure 27 and zoomed in Figure 28. Again, most of the
locations contribute a negative weight to the first PC, in a west to east direction.
Figure 25 First eigen vector of alkalinity
Figure 26 First eigen vector, zoomed in (note the “uneven” scale).
Figure 27 1st Eigen vector of pH.
Figure 28 First eigen vector of pH zoomed in (note the “uneven” scale).
Figure 29 and Figure 30 show the time series of the PC values and their corresponding
spectrum. As we would expect with such limited data, the only pattern being picked up is
the annual cycle. This can be seen in the spectrum peak around 1 cycle per year.
Figure 29 First PC plot and smooth spectrum for alkalinity.
Figure 30 First PC plot and smooth spectrum for pH
Using the forecast method described in the methodology section, the 12 months
following the historical data were forecast for all nine utilities. Next the two groups were
separated (again, just for visual purposes), and the resulting forecasts are shown to the
right of the vertical line in Figure 31 and Figure 32. The top plots in each figure are
Group 1 and the bottom plot is Group 2. In each figure, the first 18 months of original
data are shown, and then 12 more months have been forecast. One can see that each
time series forecast (coded by color) seems to follow the same subtle seasonal trend as
the original data.
Figure 31 Group 1 (top) and Group 2 (bottom) historical and forecast pHs. The original data is to
the left of the vertical line and the forecast is to the left of the vertical line. Different point colors
correspond to different utilities within the group.
Figure 32 Group 1 (top) and Group 2 (bottom) historical and forecast alkalinity. The original data is
to the left of the vertical line and the forecast is to the left of the vertical line. Different point colors
correspond to different utilities within the group.
CONCLUSIONS
The field of drinking water quality has not utilized many of the advanced
statistical methods that have been used in other branches of water resources. The aim of
this paper was to experiment with some advanced statistical methods, such as local
polynomial method and principle component analysis in the field of water quality. Local
polynomial allowed the viewing of spatial trends across the United States. This will be
useful in pinpointing areas that might be susceptible to certain regulation violations. The
ability to forecast using the PCA analysis was of a more limited value. The benefit was
being able to forecast water quality data over multiple locations at the same time.
However, due to the short nature of the time series (18 months), it was only able to
capture the annual cycle, and was not useful in showing long-term trends or any hydroclimatic connections, such as with ENSO.
This paper has shown that there is potential for the utilization of advanced
statistical trends in the field of drinking water. However, a limiting factor is the short
nature of the time series. One recommendation for future study would be to utilize
datasets characterizing water quality in reservoirs. In places like Colorado where
snowmelt is the dominant source for reservoirs, concurrent snowmelt records could also
be obtained. Looking at longer records would be more likely to yield meaningful
conclusions in a time series analysis framework.
Appendix A
Table 1 Linear correlations of 18 months of averaged data for the continental US.
Alkalinity
Calcium
Hardness
pH
Temp
TOC
Total Hardness
UV_254
Turbidity
BROMIDE
NH3_N
Alkalinity
1
Calcium
Hardness
-
0.81
0.42
0.23
0.08
0.85
0.03
0.12
-0.02
0.30
1
0.35
0.23
0.15
0.89
0.08
0.15
0.03
0.24
pH
1
0.13
0.02
0.37
-0.05
0.21
-0.06
0.05
Temp
1
0.32
0.17
0.37
0.04
0.02
0.16
TOC
1
0.08
0.92
0.14
0.04
0.21
Total
Hardness
-
UV_254
-
Turbidity
-
BROMIDE
-
NH3_N
-
1
0.00
0.14
0.03
0.20
1
0.11
0.02
0.16
1
-0.01
0.11
1
0.03
1
Total
Hardness
-
UV_254
-
Turbidity
-
BROMIDE
-
NH3_N
-
-
-
-
Table 2 Mutual information of 18 months averaged data for the continental US.
Alkalinity
Alkalinity
Calcium
Hardness
pH
Temp
TOC
Total Hardness
UV_254
Turbidity
BROMIDE
NH3_N
0.63
0.37
0.12
0.07
0.69
0.04
-0.04
-0.10
0.025
Calcium
Hardness
-
pH
-
0.27
0.08
0.05
0.76
0.03
-0.03
-0.08
0.01
0.07
0.06
0.27
0.04
-0.02
-0.10
-0.02
Temp
-
TOC
-
-
-
0.10
0.05
0.10
-0.02
-0.09
0.01
0.04
0.50
0.00
-0.11
0.01
0.02
-0.03
-0.07
-0.01
0.00
-0.12
-0.01
-0.18
-0.09
-0.16
-
Appendix B
Figure 33 Eigen spectrum for alkalinity PCs.
Figure 34 Eigen spectrum for pH PCs.
Figure 35 Local polynomial contour plot of the groundwater sources: last 12 months of study average
TOC (ppmC). Utilities contributing data are overlaid. Utilizes locations where there were 8 or more
complete months of data.
References
Loader, Clive. 1999. Local Regression and Likelihood. New York: Springer.
Regonda, S., B. Rajagopalan, M. Clark and E. Zagona, Multi-model Ensemble Forecast
of Spring Seasonal Flows in the Gunnison River Basin (in review) Water Resources
Research, 2005.
Storch, H.V. and F.W. Zwiers. 1999. Statistical analysis in climate research. New York:
Cambridge University Press.
Zachman, B. 2005. Understanding and Predicting Natural Organic Matter Adsorption by
Granular Activated Carbon Adsorbers. (Masters of Science Thesis: University of
Colorado). Boulder, CO.
Download