6/15/2006 Linsey DeBell Cooperative Institute for Research in the Atmospheres

advertisement
6/15/2006
Linsey DeBell
Cooperative Institute for Research in the Atmospheres
Interagency Monitoring of Protected Visual Environments
Colorado State University
Data Validation Historical Report Aerosol Sulfate
Data Start: March 1988
Data End: December 2003
Retrieved from the VIEWS Database on Various Dates: 11/2004-3/2005
Background
The standard IMPROVE sampler has four independent sampling modules: A, B, and C
collect PM2.5 particles (0-2.5 um), and D collects PM10 particles (0-10 um). Module A
utilizes a Teflon filter which is analyzed for gravimetric mass and elemental
concentrations by XRF and PESA and prior to 12/2001 by PIXE. Module B utilizes a
nylon filter and is analyzed primarily for anions by IC. Module C utilizes a quartz filter
and is analyzed for organic and elemental carbon by TOR carbon analysis.
IMPROVE analyzes the nylon filter for sulfate ([SO4] ug/m3) and the Teflon filter for
sulfur ([S] ug/m3) concentrations by IC and PIXE/XRF respectively. In our data
validation process it is assumed that all aerosol sulfur is in the form of sulfate and thus
3*[S] should equal [SO4] within measurement uncertainty. The most basic checks on
these measurements include scatter plots of [SO4] versus [S] and time series plots of
[SO4], [S] and [SO4]/[S].
General Observations Based on Visual Inspection of Time Series
Site specific time series plots of [SO4], [S] and [SO4]/[S] indicate a substantial number of
sample pairs with [SO4]/[S] ratios far away from the ideal value of 3 and persistent time
periods where the central tendency of the ratio is either above or below 3. Most time
periods exhibit a dominant bias direction that is visible at most operational sites.
However, not all sites follow this network level pattern, possibly indicating that local
conditions are overriding some network level phenomena.
The time series can be broken into 9 time periods based on the dominant bias direction
across the network:
Table 1. Major trends in dominant bias direction observed in time series of [SO4]/[S]
1
Affected Time
Period
1988-1989
Dominant Bias
Direction
[SO4]>3*[S]
Data Indicator
Central tendency of the [SO4]/[S]ratio
1
2
1990-1994
[SO4]>3*[S]
3
1995-early
1997
Early 1997-late
1997
1998-mid 2000
[SO4]<3*[S]
Mid 2000-mid
2001
Mid 2001-2002
[SO4]>3*[S]
Early 2003-mid
2003
Mid 2003-late
2003
[SO4]>3*[S]
4
5
6
7
8
9
No dominant
direction
[SO4]<3*[S]
[SO4]<3*[S]
[SO4]<3*[S]
appeared to be greater than 3 at all
operational sites
Central tendency of the [SO4]/[S] ratio
appeared to be greater than 3 at many sites
Central tendency of the [SO4]/[S] ratio
appeared to be less than 3 at many sites
No clear network wide pattern to the typical
central tendency of [SO4]/[S]
Central tendency of the [SO4]/[S] ratio
appeared to be less than 3 at many sites
Central tendency of the [SO4]/[S] ratio
appeared to be greater than 3 at most sites
Central tendency of the S[SO4]/[S] ratio
appeared to be less than 3 at many sites
Central tendency of the [SO4]/[S] ratio
appeared to be greater than 3 at many sites
Central tendency of the [SO4]/[S] ratio
appeared to be less than 3 at many sites
Monthly averages of the [SO4]/[S] ratio for the whole network confirm that the
generalized patterns observed in the site specific data validation charts and described in
Table 1 are indeed dominant across the network (Figure 1).
Site specific examples of the 8 time periods with a dominant bias direction are given
below including maps depicting the spatial distribution of sites which followed the
network level pattern. The associated PowerPoint file has slides documenting all bias
problems at all sites, both those that conform to the network level pattern and those that
appear to be local in nature.
There are 8 sites along the west coast which have had persistently low [SO4]/[S] ratios
from 1990-2003. The high [SO4]/[S] excursions during mid 2000-mid 2001 and early
2003-mid 2003 are visible at some of these sites. The sites involved are documented
below under Anomaly 8.
An additional observation made during this analysis was that of a cyclical pattern to the
[SO4]/[S] ratio on roughly a 1 year cycle. This pattern is present at all sites to varying
degrees, in its most extreme expression it is the dominant pattern visible at the site. An
example is given below under Anomaly 9.
2
Figure 1. Monthly averages of the [SO4]/[S] ratio for the whole network confirm the generalized patterns observed in the site specific
data validation charts. The nine time periods described in Table 1 are indicated by the blue lines.
Monthly Network Average of the SO4/S Ratio
4
3.8
3.6
Average SO4/S
3.4
3.2
3
2.8
2.6
2.4
2.2
2
Jan-88 Jan-89 Jan-90 Jan-91 Jan-92 Jan-93 Jan-94 Jan-95 Jan-96 Jan-97 Jan-98 Jan-99 Jan-00 Jan-01 Jan-02 Jan-03
3
Anomaly 1: Significant bias between [SO4] and [S] indicated by high [SO4]/[S] ratios
from 1988-1994. Every site in operation in 1988-1989 showed very high [SO4]/[S] ratios
for most sample pairs. During 1990- late 1994, the network norm was high [SO4]/[S]
ratios, but not all sites had high [SO4]/[S] in all years and some sites had short to
extended periods of the reverse situation. For details at each site the reader can refer to
the associated PowerPoint file.
Example:
Figure 2. SAGO1 provides a good example of the high [SO4]/[S] ratios observed at
many sites for the period 1988-1994.
Legend:
Sulfate concentration: SO4fVal = [SO4]
Sulfur concentration: SfVal = [S]
Sulfate to Sulfur Ratio: SO4_S = [SO4]/[S]
4
Figure 3. All sites identified with a blue circle visually had high [SO4]/[S] ratios for all
or part of the period 1990-1994. All sites in operations from 1988-1989 had high
[SO4]/[S] ratios. Some sites identified with black circles were not operational during this
time period.
5
Anomaly 2: Significant bias between SO4 and S indicated by low SO4/S ratios from late
1995-early 1997 at the majority of the sites. For details at each site the reader can refer to
the associated PowerPoint file.
Example:
Figure 4. UPBU1 provides a good example of the low [SO4]/[S] ratios observed at many
sites for the period 1995-early 1997.
Legend:
Sulfate concentration: SO4fVal = [SO4]
Sulfur concentration: SfVal = [S]
Sulfate to Sulfur Ratio: SO4_S = [SO4]/[S]
6
Figure 5. All sites identified with a blue circle visually had low [SO4]/[S] ratios for all or
part of the period 1995-early 1997. Some sites identified with black circles were not
operational during this time period.
7
Anomaly 3: Significant bias between SO4 and S indicated by low SO4/S ratios from
early 1998-mid 2000 at the majority of the sites. For details at each site the reader can
refer to the associated PowerPoint file.
Example:
Figure 6. UPBU1 also provides a good example of the low [SO4]/[S] ratios observed at
many sites for the period 1998-mid 2000.
Legend:
Sulfate concentration: SO4fVal = [SO4]
Sulfur concentration: SfVal = [S]
Sulfate to Sulfur Ratio: SO4_S = [SO4]/[S]
8
Figure 7. All sites identified with a blue circle visually had low [SO4]/[S] ratios for all or
part of the period 1998-mid 2000. Some sites identified with black circles were not
operational during this time period.
9
Anomaly 4: Significant bias between SO4 and S indicated by high SO4/S ratios during
mid-2000 through mid-2001 at the majority of operating sites across most of the network.
The sites identified as possibly affected had strong seasonal cycles that made this period
look similar to surrounding years. For details at each site the reader can refer to the
associated PowerPoint file.
Example:
Figure 8. REDW1 provides a good example of the high [SO4]/[S] ratios observed at most
sites for the period mid 2000-mid 2001.
Legend:
Sulfate concentration: SO4fVal = [SO4]
Sulfur concentration: SfVal = [S]
Sulfate to Sulfur Ratio: SO4_S = [SO4]/[S]
10
Figure 9. All sites identified with a blue circle visually had high [SO4]/[S] ratios for the
period mid 2000-mid 2001. Some sites identified with black circles were not operational
during this time period.
11
Anomaly 5: Significant bias between SO4 and S indicated by low SO4/S ratios from mid
2001-2002 at the majority of the sites. For details at each site the reader can refer to the
associated PowerPoint files.
Example:
Figure 10. KALM1 provides a good example of the low [SO4]/[S] ratios observed at
many sites for the period mid 2001-2002.
Legend:
Sulfate concentration: SO4fVal = [SO4]
Sulfur concentration: SfVal = [S]
Sulfate to Sulfur Ratio: SO4_S = [SO4]/[S]
12
Figure 11. All sites identified with a blue circle visually had low [SO4]/[S] ratios for all
or part of the period mid 2001-2002. Some sites identified with black circles were not
operational during this time period.
13
Anomaly 6: Significant bias between SO4 and S indicated by high SO4/S ratios from
early-mid 2003 at the majority of the sites. For details at each site the reader can refer to
the associated PowerPoint file.
Example:
Figure 12. YELL2 provides a good example of the high [SO4]/[S] ratios observed at
many sites for the period early 2003-mid 2003.
Legend:
Sulfate concentration: SO4fVal = [SO4]
Sulfur concentration: SfVal = [S]
Sulfate to Sulfur Ratio: SO4_S = [SO4]/[S]
14
Figure 13. All sites identified with a blue circle visually had high [SO4]/[S] ratios for the
period early 2003-mid 2003. Some sites identified with black circles were not
operational during this time period.
15
Anomaly 7: Significant bias between SO4 and S indicated by low SO4/S ratios from mid
2003 late 2003 at the majority of the sites. For details at each site the reader can refer to
the associated PowerPoint files.
Example:
Figure 14. KALM1 provides a good example of the low [SO4]/[S] ratios observed at
many sites for the period mid 2003-late 2003.
Legend:
Sulfate concentration: SO4fVal = [SO4]
Sulfur concentration: SfVal = [S]
Sulfate to Sulfur Ratio: SO4_S = [SO4]/[S]
16
Figure 15. All sites identified with a blue circle visually had low [SO4]/[S] ratios for the
period mid 2003-late 2003. Some sites identified with black circles were not operational
during this time period.
17
Anomaly 8: Significant persistent bias between SO4 and S indicated by low SO4/S ratios
from 1990-2003. This anomaly was limited to west coast sites in Alaska, California,
Oregon, and Washington. For details at each site the reader can refer to the associated
PowerPoint files.
Affected Sites
Site Code
Site Name
State
Denali NP
Columbia
River Gorge
Mount Rainier
NP
Snoqualmie
Pass
Crater Lake
NP
AK
WA
LAVO1
Lassen
Volcanic NP
CA
THSI1
Three Sisters
Wilderness
OR
SOLA1
South Lake
Tahoe
CA
DENA1
CORI1
MORA1
SNPA1
CRLA1
WA
IMPROVE
Region
Alaska
Columbia
River Gorge
Northwest
WA
Northwest
OR
Oregon and
Northern
California
Oregon and
Northern
California
Oregon and
Northern
California
Sierra
Nevadas
18
Figure 16. DENA1 provides a good example of the persistent low [SO4]/[S] ratios
observed at a handful of sites along the west coast from 1990-2003.
Legend:
Sulfate concentration: SO4fVal = [SO4]
Sulfur concentration: SfVal = [S]
Sulfate to Sulfur Ratio: SO4_S = [SO4]/[S]
19
Figure 17.
20
Anomaly 9: Cyclical pattern to the SO4/S ratio on roughly a 1 year cycle. In its most
extreme form the ratio switches from being consistently above 3 for ~1/2 the time to
consistently below 3 the other ~1/2. This pattern is present to varying degrees at all sites;
however the severity of the cyclicity is not constant through time at all sites. For details
at each site the reader can refer to the associated PowerPoint files.
Examples
21
Figure 18 a-b. VOYA2 and DENA1 provide good examples of the more extreme
expressions of cyclicity in the [SO4]/[S] ratio observed at all sites.
Legend:
Sulfate concentration: SO4fVal = [SO4]
Sulfur concentration: SfVal = [S]
Sulfate to Sulfur Ratio: SO4_S = [SO4]/[S]
22
General Observations Based on Visual Inspection of Scatter Plots
Additional analysis was done to better define at what sulfate concentrations the sample
pairs with poor agreement between 3*[S] and [SO4] were occurring. The purpose of this
analysis was to determine if the bulk of sample pairs with extreme [SO4]/[S] ratios were
at very low concentrations. Very low concentrations near the mdl have much higher
uncertainty and therefore a higher number of sample pairs with poor agreement might be
expected. Symmetrical distribution of the [SO4]/[S] ratios above and below 3 was
expected at all concentrations. Ideally a third independent measure of sulfate would be
used to explore the relationship between the [SO4]/[S] ratios and aerosol sulfate
concentration.
Scatter plots of [SO4]/[S] versus [SO4] were examined to see at what sulfate
concentrations the high and low ratios were occurring. Looking at network data for
1998-2003 aggregated by IMPROVE region (Figure 19), scatter plots of [SO4]/[S] versus
[SO4] show a distinct bias of low ratios at low concentrations and high ratios at high
concentrations for most regions. This relationship persists even if one restricts the dataset
to those samples whose concentration are at least 10 times the reported minimum
detection limit (mdl) and therefore should be well quantified. This relationship was
unexpected and was initially suspected of being an artifact of using a dependent variable
as the measure of aerosol sulfate concentration. For details for each region the reader is
referred to the associated PowerPoint file.
To test if the relationship was an artifact of dependent variables, scatter plots of [SO4]/[S]
versus total PM10 Mass [PM10] were also examined. While this achieves independence
between the “true” and the “predicted” variables in the analysis, it introduces a new set of
possible complications since PM10 aerosol and sulfate aerosol may or may not be
correlated for a given site or time period. Scatter plots of [SO4]/[S] versus [PM10] for the
period 1998-2003 with the data grouped by IMPROVE region showed to a lesser degree
the same pattern of low ratios at low concentrations and high ratios at high concentrations
for most regions (Figure 20 ). For details for each region the reader is referred to the
associated PowerPoint file.
23
Figure 19. Regional scatter plots of [SO4]/[S] versus [SO4] for 1998-2003 show a
distinct bias of low ratios at low concentrations and high ratios at high concentrations.
This pattern persists even if the dataset is restricted to those sample pairs where both
[SO4] and [S] are greater than 10*mdl and therefore should be well quantified.
Legend:
Sulfate concentration: SO4fVal = [SO4]
Sulfur concentration: SfVal = [S]
Sulfate to Sulfur Ratio: SO4_S = [SO4]/[S]
24
Figure 20. Regional scatter plots of [SO4]/[S] versus [PM10] for 1998-2003 also show a
distinct bias of low ratios at low concentrations and high ratios at high concentrations.
The pattern is not as distinct as for [SO4]/[S] versus [SO4], but this could be because
[PM10] is often not a direct analog for aerosol sulfate which is confined primarily to the
fine aerosol mode.
Legend:
Sulfate concentration: SO4fVal = [SO4]
Sulfur concentration: SfVal = [S]
PM10 concentration: MTVAl = [PM10]
Sulfate to Sulfur Ratio: SO4_S = [SO4]/[S]
25
Collocated data collection began in 2003 at a handful of sites by adding a fifth module to
the standard IMPROVE sampler, referred to within IMPROVE as X modules. As of the
writing of this report collocated A & B module data was available for 8 sites from the site
specific installation dates through May 2004 (Table 2). The collocated data was analyzed
to determine if the pattern of low ratios at low concentrations and high ratios at high
concentrations was present for comparisons involving just [S] data or just [SO4] data.
Scatter plots of [S]/[Sx] versus [S] and [SO4]/[SO4x] versus [S], where the x indicates the
X module, were examined for this pattern.
Table 2. IMPROVE Collocated QA Modules
Site Name
Site
Module A
Mesa Verde NP
MEVE1
X
Proctor Maple R. F
PMRF1
X
Olympic NP
OLYM1
X
Sac and Fox
SAFO1
X
Lassen Volcanic NP
LAVO1
Mammoth Cave NP
MACA1
Big Bend NP
BIBE1
Gates of the Mountains
GAMO1
Module B
X
X
X
X
Start Date
8/13/2003
9/3/2003
11/8/2003
11/20/2003
4/21/2003
5/15/2003
9/3/2003
9/24/2003
The same general pattern is not readily apparent in scatter plots of [Sx]/[S] versus [S]
(Figure 21). There is a slight indication of a bias towards high ratios at low
concentrations but no indication of the reverse at high concentrations. So even if one
reversed the ratio, the pattern of low ratios at low concentrations and high ratios at high
concentrations is not present.
In contrast, in scatter plots of [SO4x]/[SO4] versus [S] the pattern of low ratios at low
concentrations and high ratios at high concentrations is clearly present at 2 of the 4 sites
(Figure 22). The affected sites are Gate of the Mountains in MT (GAMO1) and Lassen
Volcanic National Park (LAVO1) in CA, both of which typically have low sulfate
concentrations that very rarely exceed 1.5 ug. Furthermore, scatter plots of Vol/Volx
versus [S], where Vol is the regular network B module’s average air volume and Volx is
the average air volume for the collocated B module, show the inverse pattern of high
ratios at low concentrations and low ratios at low concentrations for the same 2 sites
(Figure 23). The inverse relationship between [SO4x]/[SO4] and Vol/Volx, could indicate
that there are data quality problems related to our sampling procedures or blank
correction process which are affecting our [SO4] measurements and causing the observed
bias between [SO4] and [S], particularly at low concentrations.
26
Collocated A Module S comparison
1.6
1.4
1.2
Sx/S
1
OLYM1
PMRF1
SAFO1
0.8
0.6
0.4
0.2
0
0
500
1000
1500
2000
2500
3000
S
Figure 21. Scatter plots of [S]/[Sx] versus [S] for sites with collocated data for 20032004 do not show the same pattern as in [SO4]/[S] versus [SO4]. There is a slight
indication of a bias towards high ratios at low concentrations but no indication of the
reverse at high concentrations.
Legend:
Regular network sulfur and collocated sulfur concentrations: S = [S] and Sx=[Sx]
Sulfur to collocated sulfur ratio: S/Sx = [S]/[ Sx]
27
Collocated B module SO4 comparison
1.5
1.4
SO4x/SO4 quantifiable
1.3
1.2
1.1
MACA1
BIBE1
GAMO1
LAVO1
1
0.9
0.8
0.7
0.6
0.5
0
1
2
3
4
5
6
7
S, ug/m3
Figure 22. A scatter plot of [SO4x]/[ SO4] versus [S] for sites with collocated data for
2003-2004 shows the same general pattern as in [SO4]/[S] versus [SO4].
Legend:
Regular network sulfate and collocated sulfate concentrations: SO4 = [SO4] and SO4x=[
SO4x]
Collocated sulfate to sulfate ratio: SO4x/ SO4 = [SO4x]/[ SO4]
28
Collocate B Module Flow Rate Comparison
1.5
1.4
1.3
VOLx/VOL
1.2
1.1
MACA
BIBE
GAMO
LAVO
1
0.9
0.8
0.7
0.6
0.5
0
1
2
3
4
5
6
7
S, ug/m3
Figure 23. A scatter plot of [Volx]/[ Vol] versus [S] for sites with collocated data for
2003-2004 shows the reverse pattern as in [SO4]/[S] versus [SO4].
Legend:
Regular network sulfur and collocated sulfur concentration: S = [S]
Regular network B module flow rate and collocated B module flow rate: network flow =
Vol and collocated flow=Volx
collocated flow ratio to network flow: VOLx/ VOL
Summary of Time Series and Scatter Plots
The combined analyses of time series and scatter charts above indicates that the degree of
agreement and the direction of bias in the [SO4] to [S] relationship for a sample pair is
dependent on when the samples were collected and/or analyzed, where the samples were
collected and the sample sulfate concentrations. To quantify these general observations
the spatial, temporal and concentration distribution of sample pairs where [SO4] and [S]
do not agree within 3σ uncertainty were investigated and are described below. These
29
results were also compared to statistical expectations for a population of samples with
accurate measurements and well quantified measurement uncertainties.
Statistical Background for Quantitative Checks on Measurement
Comparability
The Z test and the T test both follow the same general formula and can be used to test if
two numbers are equal within their uncertainty. The T test differs from the Z test in that
it allows for the populations to have unknown variances. If the underlying populations
from which the two numbers are drawn are normal than the test scores will follow a
standard normal in the case of known variability and a t distribution with properly
calculated degrees of freedom in the case of estimated variability. The formulas for
calculating the Z score and T score are:
Z score= ([SO4]-3*[S])/√(SO42+(3*S)2)
Where  represents a known measurement uncertainty
T score= ([SO4]-3*[S])/√(  SO42+(3*  S)2)
Where  represents a statistically estimated measurement uncertainty
In our case we are comparing two independent measurements, [SO4] and 3*[S], with the
assumption that their difference should be 0 within measurement uncertainty. We are
treating each measurement as an estimate of the true atmospheric sulfate value and the
uncertainty that is uniquely reported for each sample as an estimate of the true variance
for that measurement reported in terms of the standard deviation. Since our measurement
uncertainties are based on a theoretical understanding we are treating the samples as
having known variability and therefore are loosely referring to them as Z scores. The Z
scores indicate how many standard deviations of the difference apart [SO4] is from 3*[S].
This interpretation of the Z score is independent of any assumptions about distributions
of the underlying populations or the test scores. Z scores can be positive or negative. A
positive Z score indicates that the [SO4] value is greater than 3*[S]. A negative Z score
indicates that [SO4] is less than 3*[S].
If the measurement errors are symmetrical, than the Z scores will also be symmetrical.
Furthermore, if the measurement errors are distributed normally than the Z scores will
follow a standard normal distribution. If neither is the case, than the Z scores will still
represent a standardized score which measures the distance, in standard deviations of the
difference, between the paired samples. However no assumptions about the distribution
of the Z scores can be made independent of assumptions about the sample populations.
In this case the population of interest is not our time series of [SO4] and [S], which follow
an approximately log normal distribution, but the theoretical population of all potential
[SO4] and [S] samples that could have been collected at the same point in time and space
as our sample date of interest. On a theoretical level, assuming all S is in the form of
sulfate and a well mixed air mass, we would expect these potential measurements to both
pull from a single population. Additionally, we would expect our measurements to only
have unbiased random errors associated with sampling and analysis. So under ideal
30
sampling and analytical conditions, we would expect the calculated Z scores to minimally
follow a symmetrical distribution and possibly a standard normal distribution.
Additionally, according to Chebychev's rule, in any distribution the proportion of scores
between the mean and k standard deviations is at least 1-1/k2 scores. So even if our test
scores do not follow a normal distribution, if all or our parameters are reasonably
accurate than we can minimally expect that at least 89% of the scores would reside
symmetrically between the mean, 0, and ±3. If the test scores do follow a normal
distribution than 99% of the scores would reside between ±3. For the purposes of this
report, sample pairs with calculated Z scores outside of the range [-3, 3], which is
comparable to pairs that are not equivalent within 3 uncertainty, are being referred to as
“outlier” pairs.
Expectations
The dataset was explored using calculated Z scores for all samples with reported [S], S,
[SO4] and SO4 to see if certain assumptions were met. It was assumed that given
accurate measurements and well estimated measurement uncertainty that the following
would be true:
 At most 10% of the sample pairs should be outliers
 The Z scores should be symmetrically distributed above and below 0
 This symmetry should persist through time, space and all quantifiable
(concentration>10*mdl) aerosol concentrations
Methods
To test if the outlier sample pairs were distributed evenly in bias direction, time, space,
and aerosol sulfate concentration the samples were sorted and aggregated in the following
ways:
Bias Direction--The dataset was aggregated into two groups, those with Z<-3 indicating
that 3*[S]>> [SO4] and those with Z>3 indicating the reverse. The sample pairs with Z in
[-3,3] were excluded from this analysis.
Time--The data was aggregated by year and then year and month to look at how the total
number of outliers pairs, outlier pairs with Z<-3 and outlier pairs with Z>3 relative to the
number of valid sample pairs changed through time.
Space--The data was aggregated by year and site and then month and site for 2003 to
look at how the total number of outliers pairs, outlier pairs with Z<-3 and outlier pairs
with Z>3 relative to the number of valid sample pairs changed through space.
Aerosol Concentration--Based on visual inspection of the [SO4]/[S] versus [SO4] and
[PM10] scatter plots, arbitrary concentration cut points of 1 ug and 10 ug were selected for
[SO4] and [PM10] respectively. These values seemed to roughly coincide with the
inflection points in the curves where the bias shifted from [SO4]/[S]<3 to [SO4]/[S]>3. In
each case for [SO4] and [PM10], the data was aggregated into two groups based on these
31
cut points and the total number of outliers pairs, outlier pairs with Z<-3 and outlier pairs
with Z>3 relative to the number of valid sample pairs were compared between the
groups.
Results
When the dataset was taken as a whole, that is all sites for the period 1988-2003, the
expectations of less than 10% of the valid samples being outliers and even distribution of
the outliers in bias direction and across [SO4] and [PM10] concentrations were met.
However, when the dataset was further broken down by site, year and/or month or
concentration then the expectations for even distribution and percent outliers were no
longer met for the data subsets.
Consistent with our expectations of good measurements:
Less than 10% of all valid samples outliers for whole dataset
Across the network for the period 1988-2003, 8.8% of the valid sample pairs (those with
all 4 necessary parameters reported) were outliers, indicating that on the whole we do not
have an unexpected number of outlier sample pairs (Tables 3 and 4).
Symmetrical distribution of low and high Z scores for whole dataset
Across the network for the period 1988-2003, the outlier sample pairs were fairly evenly
split between those with Z scores above and those below 0 (Tables 3 and 4). Of the
outlier sample pairs, 53% of the pairs had Z<-3 and 47% had Z>3.
Symmetrical distribution of outlier samples between low and high [SO4] concentrations
Total counts of outliers were fairly evenly split when the outlier population is split into
two populations based on [SO4] being below or above 1 μg/m3 with some indication of
bias towards higher numbers of outlier pairs at lower [SO4] concentrations (Tables 3 and
4). This distribution could easily be shifted by picking a lower [SO4] concentration as
the split point. Of the outlier sample pairs, 59% and SO4<1 μg/m3 and 41% had [SO4]>1
μg/m3. The percentages of samples classified as high or low concentration were similar
in the non-outlier population (Tables 3 and 4).
Symmetrical distribution of outlier samples between low and high [MT] concentrations
Total counts of outliers were also fairly evenly split when the outlier population is split
into two populations based on [PM10] being below or above 10 ug (Tables 3 and 4). Of
the outlier sample pairs, 52% of the pairs had [PM10]<10, 43% had [PM10]>=10 and the
remainder had null [PM10] values. The percentages of samples classified as high or low
concentration were similar in the non-outlier population (Tables 3 and 4).
Summary Statistics for the period 1988-2003
Table 3. Counts of samples meeting certain criteria including tests on existence,
concentration, and Z score values.
Dataset Subset
Conditions
Network Count
for 1988-2003
Low Outliers
Samples with Z<-3
5090
High Outliers
Samples with Z>3
4499
32
Total Outliers
Low Outliers with Low PM10
mass
High Outliers with Low
PM10 mass
Total Outliers with Low
PM10 mass
Low Outliers with High
PM10 mass
High Outliers with High
PM10 mass
Total Outliers with High
PM10 mass
Low Outliers with Low
sulfate mass
High Outliers with Low
sulfate mass
Total Outliers with Low
sulfate mass
Low Outliers with High
sulfate mass
High Outliers with High
sulfate mass
Total Outliers with High
sulfate mass
Total Valid Samples
Total Potential Samples
Total Non-Outlier Samples
Total Non-Outlier Samples
with Low PM10 mass
Total Non-Outlier Samples
with High PM10 mass
Total Non-Outlier Samples
with Low sulfate mass
Total Non-Outlier Samples
with High Sulfate mass
Valid Samples with Low
PM10 mass
Valid Samples with High
PM10 mass
Valid Samples with Low
sulfate mass
Valid Samples with High
Sulfate mass
Total Outliers: Samples with Z not
in [-3,3]
Samples with Z<-3 and PM10<10
ug
Samples with Z>3 and PM10<10
ug
Samples with Z not in [-3,3] and
PM10<10 ug
Samples with Z<-3 and PM10>10
ug
Samples with Z>3 and PM10>10
ug
Samples with Z not in [-3,3] and
PM10>10 ug
Samples with Z<-3 and SO4<1 ug
9589
Samples with Z>3 and SO4<1 ug
1634
Samples with Z not in [-3,3] and
SO4<1 ug
Samples with Z<-3 and SO4>1 ug
5631
Samples with Z>3 and SO4>1 ug
2864
Samples with Z not in [-3,3] and
SO4>1 ug
Samples with SO4, SO4UNC, S,
and SUNC all not null
Total Records (all records)
Samples with Z in [-3,3]
Samples with Z in [-3,3] and
PM10<10 ug
Samples with Z in [-3,3] and
PM10>10 ug
Samples with Z in [-3,3] and
SO4<1 ug
Samples with Z in [-3,3] and
SO4>1 ug
Samples with PM10<10 ug
3957
Samples with PM10>10 ug
44511
Samples with SO4<1 ug
62263
Samples with SO4>1 ug
46989
3348
1641
4989
1514
2649
4163
3997
1093
109257
144336
99668
55222
40348
56632
43032
60211
33
Table 4. Relative counts of samples, expressed as percentages, meeting certain criteria
including tests on existence, concentration, and Z score values.
Network percentages for
Formula
Values
1988-2003
Percentage of Total Outliers
(# Samples where Z<-3)/( # Samples where Z
53.1%
that are Low Outliers
not in [3,3])*100%
Percentage of Total Outliers
(# Samples where Z>3)/( # Samples where Z
46.9%
that are High Outliers
not in [-3,3])*100%
Percentage of Valid Samples
( # Samples where Z not in [-3,3])/(# Samples
8.8%
that are Low or High Outliers where Z can be calculated) *100%
Percentage of Total Outliers
(# Samples where Z<-3 and MT<10)/ (#
67.1%
with Low PM10 Mass that are Samples where Z not in [-3,3] and MT<10)
Low Outliers
*100%
Percentage of Total Outliers
(# Samples where Z>3 and MT<10)/ (#
32.9%
with Low PM10 Mass that are Samples where Z not in [-3,3] and MT<10)
High Outliers
*100%
Percentage of Total Outliers
(# Samples where Z not in [-3,3] and
52.0%
that have Low PM10 Mass
MT<10)/(# Samples where Z not in [3,3])*100
Percentage of Total Outliers
(# Samples where Z<-3 and MT>10)/ (#
36.4%
with High PM10 Mass that
Samples where Z not in [-3,3] and MT>10)
are Low Outliers
*100%
Percentage of Total Outliers
(# Samples where Z>3 and MT>10)/ (#
63.6%
with High PM10 Mass that
Samples where Z not in [-3,3] and MT>10)
are High Outliers
*100%
Percentage of Total Outliers
(# Samples where Z not in [-3,3] and
43.4%
that have High PM10 Mass
MT>10)/(# Samples where Z not in [3,3])*100
Percentage of Total Outliers
(# Samples where Z<-3 and SO4<1)/ (#
71.0%
with Low Sulfate Mass that
Samples where Z not in [-3,3] and SO4<1)
are Low Outliers
*100%
Percentage of Total Outliers
(# Samples where Z>3 and SO4<1)/ (#
29.0%
with Low Sulfate Mass that
Samples where Z not in [-3,3] and SO4<1)
are High Outliers
*100%
Percentage of Total Outliers
(# Samples where Z not in [-3,3] and
58.7%
that have Low Sulfate Mass
SO4<1)/(# Samples where Z not in [3,3])*100
Percentage of Total Outliers
(# Samples where Z<-3 and SO4>1)/ (#
27.6%
with High Sulfate Mass that
Samples where Z not in [-3,3] and SO4>1)
are Low Outliers
*100%
Percentage of Total Outliers
(# Samples where Z>3 and SO4>1)/ (#
72.4%
with High Sulfate Mass that
Samples where Z not in [-3,3] and SO4>1)
are High Outliers
*100%
Percentage of Total Outliers
(# Samples where Z not in [-3,3] and
41.3%
34
that have High Sulfate Mass
Percentage of Potential
Samples that are Valid
Samples
Percentage of Non-Outlier
Samples that have low PM10
mass
Percentage of Non-Outlier
Samples that have High PM10
mass
Percentage of Non-Outlier
Samples that have low Sulfate
mass
Percentage of Non-Outlier
Samples that have High
Sulfate mass
Percentage of Valid Samples
that have Low PM10 mass
Percentage of Valid Samples
that have High PM10 mass
Percentage of Valid Samples
that have Low Sulfate mass
Percentage of Valid Samples
that have High Sulfate mass
SO4>1)/(# Samples where Z not in [3,3])*100
(# Samples where Z not null)/(# of potential
samples)*100%
75.7%
(# samples where Z in [-3,3] and MT<10)/(#
samples where Z in [-3,3]) *100%
55.4%
(# samples where Z in [-3,3] and MT>10)/(#
samples where Z in [-3,3]) *100%
40.5%
(# samples where Z in [-3,3] and SO4<1)/(#
samples where Z in [-3,3]) *100%
56.8%
(# samples where Z in [-3,3] and SO4>1)/(#
samples where Z in [-3,3]) *100%
43.2%
(# samples where MT<10)/(# Samples where
Z not null)
(# samples where MT<10)/(# Samples where
Z not null)
(# samples where MT<10)/(# Samples where
Z not null)
(# samples where MT<10)/(# Samples where
Z not null)
55.1%
40.7%
57.0%
43.0%
Inconsistent with our expectations of good measurements:
Non-symmetrical distribution of low and high Z scores within low and high
concentration groups
While the total number of outliers was fairly evenly distributed between the low and high
concentration groups, the bias direction of the outliers within the groups is unevenly
distributed regardless of whether [SO4] or [PM10] is used to define the concentration
groups. At low concentrations, ~70% of the sample pairs had Z<-3 whereas at high
concentrations ~30% of the sample pairs had Z<-3 (Tables 3 and 4). In other words,
outlier pairs at low concentrations typically had 3*[S] >>[SO4] and the reverse is true at
higher concentrations. This is in line with the general observations made looking at
scatter plots of [SO4]/[S] versus [SO4] or [PM10].
Relative number of outliers more than 10% for a quarter of the months from 19882003
The total number of outliers, expressed as a percentage of the valid samples, has not been
consistent over time (Figure 24). Aggregating the data by year and month, the total
number of outliers has ranged from 1.4% to 100% of the valid samples for a given month
with a median value of 6.1%. Of the 190 months of data collection, 25% of them have
had outlier sample pairs makeup over 10% of the sample pairs. That is 25% of the time
35
more than 10% of the valid sample pairs consist of [SO4] and 3*[S] measurements which
are not equivalent within 3σ uncertainty.
Additionally, the clustering of outliers in particular months does not appear to be random.
There is a roughly seasonal cycle to the percentage of total outliers, with the peak
percentages typically occurring in summer or fall.
Non-symmetrical distribution of low and high Z scores for most months
Aggregating the data by year, month and by bias direction results in even more drastic
temporal patterns (Figure 24). During a given month the outlier sample pairs are rarely
evenly distributed in terms of bias direction. The period 1988-2003 can be broken into
about 10 distinct periods lasting from months to years based on which direction of bias
was dominant as indicated by the percentage of outlier sample pairs with Z<-3 or those
with Z>3. These are the same time periods give or take a couple months as those
identified through visual inspection of time series of 1) [SO4]/[S]for each site and (Table
1) 2) monthly network averages of [SO4]/[S] (Figure 1). The only exception is time
period # 3 which was not identified through visual inspection and is only a couple months
long.
Table 5. Major trends in the dominant bias direction observed in the time series of the
percentage of outlier samples with Z<-3 or with Z>3 calculated each month for the whole
network. These time periods are marked with blue lines in figure 24.
Time
Affected Time Dominant Bias
Data Indicator
Period # Period
Direction
1
1988- late 1994 [SO4]>>3*[S]
Majority of outliers have Z>3
2
Late 1994-late [SO4]<<3*[S]
Majority of outliers have Z<-3
1995
3
Late 1995
[SO4]>>3*[S]
Majority of outliers have Z>3
4
1996-early
[SO4]<<3*[S]
Majority of outliers have Z<-3
1997
5
Early 1997No dominant bias outliers evenly split between those that
early 1998
direction
have Z>3 and those that have Z<-3
6
Early 1998-mid [SO4]<<3*[S]
Majority of outliers have Z<-3
2000
7
Mid 2000-mid [SO4]>>3*[S]
Majority of outliers have Z>3
2001
8
Mid 2001-2002 [SO4]<<3*[S]
Majority of outliers have Z<-3
9
Early 2003-mid [SO4]>>3*[S]
Majority of outliers have Z>3
2003
10
Mid 2003-late
[SO4]<<3*[S]
Majority of outliers have Z<-3
2003
This network wide look at the dominant direction of bias in the outlier sample pairs
captures all the major trends identifiable in the site specific time series plots of [SO4]/[S].
Additionally, relative percentages of outlier pairs may provide an objective metric for
testing overall performance in terms of [SO4] and [S] comparability.
36
The relative number of outliers for a year was more than 10% for a changing subset of
sites even when the relative number of outliers for the network as a whole was less
than 10%. Additionally, there was non-symmetrical distribution of low and high Z
scores for most sites with more than 10% outliers.
Aggregating the data by year, site, % total outliers, and by dominant bias direction
indicates that within a given time period the outlier pairs are clustered at particular sites
(see appendix A for details). In addition, at the sites where outliers make up more than
10% of the valid samples, a dominant bias direction is usually apparent with between 60100% of the outlier pairs being on one side of the distribution. The dominant bias
direction of particular sites is not always in-line with the network wide dominant bias
direction. For example in 1991, the network wide dominant bias direction was
SO4>>3*S but Washington DC, which had over 10% relative outliers, had over 60% of
the outliers have SO4<<3*S (see Appendix A). Local deviations from the network wide
pattern like this one suggest that local conditions can override whatever is causing the
network wide pattern in terms of dominant bias direction.
There are some sites that have significant numbers of outlier pairs most years and there
are also sites which have never exceeded 10% (see appendix A and B for details). The
regional patterns shift from year-to-year in terms of which regions have sites with high
outlier counts and which sites have a particular bias direction. There is no obvious
pattern between the type of [SO4] to 3*[S] disagreement and site location. Maps
displaying which sites had over 10% outlier sample pairs and the dominant bias direction
for that site are shown for each year in Appendix A at the end of this report. A table
listing all sites with no years with % total outliers≥10% is included in Appendix B at the
end of this report.
37
% of valid samples that are outliers
% of valid samples that are outliers with Z<-3
% of valid samples that are outliers with Z>3
SO4 artifact
30
5
25
20
% Outliers
-5
15
-10
10
-15
5
0
Jan-88 Jan-89 Jan-90 Jan-91 Jan-92 Jan-93 Jan-94 Jan-95 Jan-96 Jan-97 Jan-98 Jan-99 Jan-00 Jan-01 Jan-02 Jan-03
Date
Figure 24. Looking network wide, the percentage of valid sample pairs with Z<-3 or
Z>3 (Navy), with Z<-3 (red) and Z>3 (pink) for a given month has varied significantly
over time. Rarely are the outlier pairs evenly distributed between those with SO4>>3*S
(pink) and those with SO4<<3*S (red). This network wide look at the outliers captures
all the major bias trends observable in site specific time series. From 6/2002 forward
when the blank corrections began being applied to monthly rather than quarterly data
batches, there is a rough correlation (R2=0.5) between the SO4 artifact concentration and
the % valid samples with Z<-3.
Other Interesting Observations

A major shift in dominant bias direction from [SO4]>>3*[S] (Z>3) to
[SO4]<<3*[S] (Z<-3) occurred in November 1994. It has been proposed that
many of the sample pairs with [SO4]>>3*[S] from east coast sites with high
aerosol sulfate concentrations were due to S loss from the Teflon filters due to the
mask which reduced the exposed surface area of the filter. However, this shift
precedes the removal of masks from the east coast sites, which did not begin until
spring 1995.
38
-20
SO4 artifact ug/filter
0

Starting in 6/2002, when the switch was made from quarterly to monthly blank
subtraction, there is a rough (R2=0.5) correlation between the % valid samples
with Z<-3 and the blank correction concentration (ug/filter) (Figure 24).

Looking at the collocated [SO4] data for 2003-2004, there appears to be a negative
correlation between the sulfate offsets ([SO4x]/[SO4]) and the flow offsets
(Volx/Vol), particularly at low values. This suggests there may be a relationship
between flow offsets, blank corrections and how accurately our reported [SO4]
concentrations represent the true atmospheric aerosol SO4 concentrations at low
concentrations.

Using [SO4] and [S] comparability as a metric of data quality, our performance is
degrading rather than improving. Looking at the percentage of samples which
qualify as outliers, 2003 at 13% is the worst year since 1989 at 24% when there
were known [SO4] problems. Looking only at the post-validation data, it cannot
be determined if this is the result of changes in data reporting or to changes in
measurement quality.
Table 6. Percentage of valid samples which qualify as outliers for each calendar year
Year
# Samples
with Z<-3
# Samples # Samples
with Z>3 with Z<-3
or Z>3
# Valid % Outlier
Samples Sample
Pairs
1988
11
543
554
2286
24.2%
1989
6
213
219
2928
7.48%
1990
18
129
147
3515
4.2%
1991
40
126
166
3949
4.2%
1992
26
161
187
4442
4.2%
1993
46
246
292
4583
6.4%
1994
69
227
296
4969
6.0%
1995
140
125
265
5174
5.1%
1996
225
32
257
5199
4.9%
1997
115
109
224
5434
4.1%
1998
330
65
395
4992
7.9%
1999
211
74
285
4905
5.8%
2000
434
415
849
8104
10.5%
2001
646
590
1236
13607
9.1%
2002
1229
649
1878
16873
11.1%
2003
1544
795
2339
18297
12.8%
39

Qualitatively checking the relative number of outliers with Z<-3, the relative
number of outliers with Z>3 and the relative number of outliers with Z<-3 or Z>3
seems to be a valid way of quickly detecting all significant bias problems.
Various data aggregates can be used to look at problems on a site specific,
regional or network wide basis. They also provide a reasonably defensible basis
for setting standard expectations or acceptance criteria for valid data.
Conclusions
Taking the IMPROVE dataset as a whole, the sulfur measurements do not have an
unexpected number of pairs that are more than 3σ apart. However, once the temporal,
spatial, and concentration dimensions of the population are taken into account the outlier
sample pairs are clustered in certain spatial or temporal subsets of the data. The clusters
are typically not symmetrical in terms of bias direction. The bottom line is that the
expectation of random distribution of the outliers is violated and the expectation of at
most 10% of the population being in extreme disagreement is also violated for all dataset
fractions that take the key dimensions of the population into account. Therefore, I would
suggest that the poor agreement in many of these sample pairs is not due to random
chance but the reflection of real analytical and/or sampling problems specific to certain
conditions. More generally, these results indicate that either the SO4 and/or the S
measurements are not accurate under some conditions and/or one or both of the estimated
measurement uncertainties are an under prediction of the true uncertainty under some
conditions. However, they also indicate that the measurements are consistently accurate
under some conditions and that the estimated uncertainties are accurate or even an
overestimate of the actual uncertainty under some conditions.
The network wide patterns in terms of dominant bias direction are likely due to problems
in our analytical process or to sampling media. It appears that local conditions can
override the network wide patterns suggesting that sampling conditions are a key
component to producing comparable [SO4] and [S] measurements. Sampling conditions
could alternately play the role of enhancing the network level signal at particular sites or
damping it—either reducing or increasing the comparability of those particular [SO4] and
[S] measurements. Additional investigation is required to understand what factors in
terms of analytical and sampling equipment and procedure might be negatively or
positively impacting the comparability of our measurements. The fact that low Z scores
dominate at low concentrations and high Z scores dominate at high concentrations
suggests there may be different underlying problems causing sample pairs with
[SO4]<<3*[S] (low Z scores) and those with [SO4]>>3*[S] (high Z scores). Furthermore,
the correlation between sulfate offsets and flow offsets in the collocated data and the
correlation between the percentage of valid samples with Z<-3 and blank concentration
hint at a connection between flow problems, blank corrections and the poor agreement
between [SO4] and [S] at low concentrations, even those well over 10*mdl, for at least
the recent past.
40
Appendix A. Spatial Distribution of Sites with High Outlier Counts
All sites identified with a pink, red or blue circle had at least 10% of their valid sample
pairs for the calendar year have Z Scores outside of the range [-3,3].
Legend:
● 0-40% of the outlier pairs had Z<-3 (SO4<<3*S)
● 41-60% of the outlier pairs had Z<-3 (SO4<<3*S)
● 61-100% of the outlier pair had Z<-3 (SO4<<3*S)
● IMPROVE site, not necessarily operational during specified time period
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
Appendix B. Sites with no site-year combinations with over 10% outliers (excluding
1988-1989)
Site Code
Site Name
State
IMPROVE
Region
ARCH1
Arches NP
UT
Colorado
Plateau
BADL1
Badlands NP
SD
Northern
Great Plains
BALD1
Mount Baldy
AZ
Mongollon
Plateau
BAND1
Bandelier NM NM
Colorado
Plateau
BOAP1
Bosque del
NM
Mongollon
Apache
Plateau
CHER1
Cherokee
OK
Mid South
Nation
CRES1
Crescent Lake NE
Central Great
Plains
CRMO1
Craters of the ID
Hells Canyon
Moon NM
ELDO1
El Dorado
MO
Central Great
Springs
Plains
GICL1
Gila
NM
Mongollon
Wilderness
Plateau
GRCA2
Hance Camp
AZ
Colorado
at Grand
Plateau
Canyon NP
HOOV1
Hoover
CA
Sierra
Nevadas
IKBA1
Ike's
AZ
Mongollon
Backbone
Plateau
ISRO1
Isle Royale
MI
Boundary
NP
Waters
JEFF1
Jefferson NF
VA
Appalachia
MAVI1
Martha's
MA
Northeast
Vineyard
PEFO1
Petrified
AZ
Mongollon
Forest NP
Plateau
PINN1
Pinnacles NM CA
California
Coast
QUCI1
Quaker City
OH
Ohio River
Valley
RAFA1
San Rafael
CA
California
Coast
RMHQ1
Rocky
CO
Central
56
STAR1
WHPE1
WIMO1
YELL1
ZICA1
Mountain NP
HQ
Starkey
Wheeler Peak
Wichita
Mountains
Yellowstone
NP 1
Zion Canyon
Rockies
OR
NM
OK
WY
UT
Hells Canyon
Central
Rockies
Mid South
Northern
Rockies
Colorado
Plateau
57
Appendix D
Associated files for additional detail:
Year-Site_Zscore.xls
Network_Zscore.xls
2003_Monthly_BiasMaps.ppt
Yearly_BiasMaps.ppt
ZScore_TimeSeries_88-03.ppt
SO4_SvsSO4_Scatter_Regional_98-03.ppt
SO4_SvsMT_Scatter_Regional_98-03.ppt
SO4_S_TimeSeries_88-03.ppt
SO4_S_TimeSeries_98-03.ppt
58
Related documents
Download