environmental statistics: it*s all about the comparable quality

advertisement
ENVIRONMENTAL STATISTICS: IT’S ALL ABOUT THE
COMPARABLE QUALITY
Barry D. Nussbaum, nussbaum.barry@epa.gov, US Environmental Protection Agency
ABSTRACT
The collection and compilation of environmental statistics in a meaningful and useful manner is
no easy matter, especially where quality and comparability of data are concerned. Taken in a
vacuum, the data and the information they convey is of minimal utility. When compared with
other data from varying locations or from different time periods, these data do take on the
important function of indicating changing status and trends. However, that means that the
quality and comparability of the data become of paramount importance. This concern frequently
becomes the dominant factor in producing environmental statistics.
This paper will present the steps used by the U.S. Environmental Protection Agency in producing
its Report on the Environment, and will highlight the concerns for measureable, quality,
comparable data. Preparing the Report on the Environment included major efforts to assure
quality information. Yet, it also resulted in some apparent internal inconsistencies in the
information presented. The discussion in this paper indicates that these are not true
inconsistencies, but rather demonstrate how careful adherence to definitions may at times defy
the more familiar meanings of certain terms.
Keywords: Environmental Statistics, Report on the Environment, Indicators, Quality, Trends,
Toxic Release Inventory
The Report on the Environment: A measure of progress
In 2003, the US Environmental Protection Agency issued a draft Report on the
Environment to describe the status of many important environmental variables to give the public
a glimpse of the state of their surroundings. The major sections revolved around the basic
structure of air, land, water, health, and ecology. Each section provided a narrative of
explanation to accompany the facts, figures, tables, and maps. The significance of the report is
that not only will it give the present state of important indicators of the environment, but it also
serves as a point of departure to observe and understand changes when compared with future or
past observations. But in providing this opportunity to observe changes, subtle, and not so
subtle, changes to definitions and categories can result in inconsistencies that camouflage the
true changes in the environment. This paper reviews some of those subtleties and discusses the
conundrum frequently encountered in this area.
Toxic Releases
One major section in the land use chapter of the Draft Report on the Environment
discusses chemicals in the landscape. The major source of information regarding these toxic
substances in the United States is the US Environmental Protection Agency’s Toxic Release
Inventory (TRI). This widely used data base was developed in response to the disaster in Bhopal
in 1984, in which deadly methyl isocyanate was spewed forth into India. The US Congress
decided that the public should have the right to know what toxic activity might be occurring in
their area. Thus as part of the EPCRA legislation of 1986, the EPA was required to develop a
list of toxic releases that could be evaluated by location. At the time of the Draft Report on the
Environment, the TRI listed the releases and transfers of 650 toxic chemicals from 20,000
facilities. Its companion tools, Envirofacts and TRI Explorer, permit users to slice and dice the
data as well as to download information so it may be loaded into databases and spreadsheets for
further analysis.
The TRI data are widely used outside of the EPA. For instance, labor unions put requests
for reduction of toxic chemicals into their labor demands, facilities use the data to uncover costeffective ways to use less toxic material, IRS uses it for identification of companies using CFCs,
and some mutual fund companies use it to analyze “green” firms. Further, TRI has been widely
acknowledged as one of the examples of displaying data for open and transparent government.
While these examples demonstrate the wide use of TRI data, several important aspects should be
noted. First, TRI is not everything. For instance, the toxics from mobile sources such as cars
and trucks are not counted. Releases from relatively smaller facilities (fewer than ten
employees) are not counted. In fact, the Toxic Substances Control Act inventory identifies
76,000 chemicals currently in use in the US, whereas TRI tracks 650 of them. While these
exclusions may be formidable, the TRI still paints a good picture of the changes in releases.
Second, the releases are in pounds. So, despite the fact that all toxics are not similarly toxic, they
are not weighted by any sort of toxicity factor. Thus when the results are summed, an undue
amount may be based on relatively less toxic substances. In fact, the total poundage is not
completely true either since dioxins which are quite toxic, are measured in grams. Third, and
perhaps the most interesting, the very fact of publishing the data has made the level of toxics
decrease. That is, when facilities were mandated to have their toxic releases available for the
public (as well as for competitors) to see, they were forced by public or competitive pressure to
face the issue of toxic releases and find ways to lower them.
Further nuances are vital to mention in order to enhance the understanding of the
temporal use of this database. The TRI list is not static. It has been subject to many changes
since its initial version in 1988. One major change was the addition of the metal mining sector in
1998. This is an industry with very large releases, and in one fell swoop, the total US toxics
released increased from 2.9 billion pounds in 1997 to 7.1 billion pounds in 1998. Needless to
say, without this knowledge simple charts of the trends of total releases would show a marked
upturn suggesting a totally misleading conclusion of increased hazards in the country.
To explain the differences and concerns when comparing data, the TRI program is careful
to fully publish all of the metadata in TRI explorer. In fact, the space occupied by the metadata
usually exceeds the space used for the actual data. Further, the program has prepared a 29 page
guide entitled, “The Toxics Release Inventory (TRI) and Factors to Consider When Using TRI
Data”. Most usefully, TRI comparisons are done using “core chemicals”, those that have been
reported consistently since the program’s origin. So in the Report on the Environment, it was
stated that “the original set of chemical (332 of the 650 TRI chemicals) from industries that have
reported consistently since 1988 shows that total on- and off-site releases decreased 48 percent
between 1988 and 2000, a reduction of 1.55 billion pounds.” This is an accurate way of
portraying these data, but notable includes only half of all the TRI chemicals for this full term
consistent analysis. Since the other chemicals that were added are also of concern, but are
simply not available for comparison, less powerful statements can be made. For instance, the
ROE adds that between 1998 and 2000, toxic releases of all 650 chemicals decreased by
approximately 409 million pounds. The ROE, and the TRI Program, are very careful to fully
explain and document the reason for inconsistent comparisons, yet with so many users of the
data, it is of concern how many carefully read and consider these metadata explanations.
Data That Should Not Change
The Report on the Environment has an executive summary. In it, one datum exists which
should not change, but apparently does. The map of the United States, reproduced in this paper,
has considerable information associated with it, presumably important since it is in the executive
summary. Note in particular in the lower left hand corner the statement that the coastline of the
US is 66,645 miles. This number is suspiciously accurate and remarkable big. This would be
about 2 ½ times the circumference of the earth. Investigation of this number proved to be more
difficult, and more interesting than initially envisioned.
First, checking with firstgov.gov, the coastline of the US was listed as 95,000 miles of
shoreline. This was certainly much larger than the EPA number. Second, the World Fact Book
gives 19,924 km of coastline. Interestingly, the World Fact Book is published by the Central
Intelligence Agency, and gives the number in kilometers. This converts to 12,380 miles,
considerably shorter than EPA’s estimate. Third, the Information Please Almanac gives two
numbers: 12,383 miles of coastline, and 88,633 miles of shoreline. The first number agrees with
the World Fact Book, and is, in fact, based on that reference. Notably neither number agrees
with EPA’s. About now, one may also encounter the Richardson Effect, which in this case
asserts that measuring the coastline depends on the accuracy of the measuring device: the finer
and more precise the device, the longer the coastline. So, that would suggest the coastline might
be infinite, and leads directly into the study of fractals. So where did EPA get its number from?
In the 1998 Water Quality Report to Congress, EPA noted that the continental US (and Hawaii)
had a coastline of 22,419 miles and Alaska had 44,000 miles of coastline. Adding the two
yields 66,419, roughly EPA’s number in the ROE. It also is an example of merging two
numbers, each have a different number of significant figures. This, of course, violates some
cardinal rule taught in middle school.
The point here is to demonstrate the difficulty of analyzing both data that are to be used
in trends, and even numerical quatities that should remain constant. With the plethora of
available data on the web, the problems of comparability of data, and data quality in general,
become even more acute.
Reference: US Environmental Protection Agency, Draft Report on the Environment 2003, EPA-260-R-02-006, June
2003
Download