ENVIRONMENTAL STATISTICS: IT’S ALL ABOUT THE COMPARABLE QUALITY Barry D. Nussbaum, nussbaum.barry@epa.gov, US Environmental Protection Agency ABSTRACT The collection and compilation of environmental statistics in a meaningful and useful manner is no easy matter, especially where quality and comparability of data are concerned. Taken in a vacuum, the data and the information they convey is of minimal utility. When compared with other data from varying locations or from different time periods, these data do take on the important function of indicating changing status and trends. However, that means that the quality and comparability of the data become of paramount importance. This concern frequently becomes the dominant factor in producing environmental statistics. This paper will present the steps used by the U.S. Environmental Protection Agency in producing its Report on the Environment, and will highlight the concerns for measureable, quality, comparable data. Preparing the Report on the Environment included major efforts to assure quality information. Yet, it also resulted in some apparent internal inconsistencies in the information presented. The discussion in this paper indicates that these are not true inconsistencies, but rather demonstrate how careful adherence to definitions may at times defy the more familiar meanings of certain terms. Keywords: Environmental Statistics, Report on the Environment, Indicators, Quality, Trends, Toxic Release Inventory The Report on the Environment: A measure of progress In 2003, the US Environmental Protection Agency issued a draft Report on the Environment to describe the status of many important environmental variables to give the public a glimpse of the state of their surroundings. The major sections revolved around the basic structure of air, land, water, health, and ecology. Each section provided a narrative of explanation to accompany the facts, figures, tables, and maps. The significance of the report is that not only will it give the present state of important indicators of the environment, but it also serves as a point of departure to observe and understand changes when compared with future or past observations. But in providing this opportunity to observe changes, subtle, and not so subtle, changes to definitions and categories can result in inconsistencies that camouflage the true changes in the environment. This paper reviews some of those subtleties and discusses the conundrum frequently encountered in this area. Toxic Releases One major section in the land use chapter of the Draft Report on the Environment discusses chemicals in the landscape. The major source of information regarding these toxic substances in the United States is the US Environmental Protection Agency’s Toxic Release Inventory (TRI). This widely used data base was developed in response to the disaster in Bhopal in 1984, in which deadly methyl isocyanate was spewed forth into India. The US Congress decided that the public should have the right to know what toxic activity might be occurring in their area. Thus as part of the EPCRA legislation of 1986, the EPA was required to develop a list of toxic releases that could be evaluated by location. At the time of the Draft Report on the Environment, the TRI listed the releases and transfers of 650 toxic chemicals from 20,000 facilities. Its companion tools, Envirofacts and TRI Explorer, permit users to slice and dice the data as well as to download information so it may be loaded into databases and spreadsheets for further analysis. The TRI data are widely used outside of the EPA. For instance, labor unions put requests for reduction of toxic chemicals into their labor demands, facilities use the data to uncover costeffective ways to use less toxic material, IRS uses it for identification of companies using CFCs, and some mutual fund companies use it to analyze “green” firms. Further, TRI has been widely acknowledged as one of the examples of displaying data for open and transparent government. While these examples demonstrate the wide use of TRI data, several important aspects should be noted. First, TRI is not everything. For instance, the toxics from mobile sources such as cars and trucks are not counted. Releases from relatively smaller facilities (fewer than ten employees) are not counted. In fact, the Toxic Substances Control Act inventory identifies 76,000 chemicals currently in use in the US, whereas TRI tracks 650 of them. While these exclusions may be formidable, the TRI still paints a good picture of the changes in releases. Second, the releases are in pounds. So, despite the fact that all toxics are not similarly toxic, they are not weighted by any sort of toxicity factor. Thus when the results are summed, an undue amount may be based on relatively less toxic substances. In fact, the total poundage is not completely true either since dioxins which are quite toxic, are measured in grams. Third, and perhaps the most interesting, the very fact of publishing the data has made the level of toxics decrease. That is, when facilities were mandated to have their toxic releases available for the public (as well as for competitors) to see, they were forced by public or competitive pressure to face the issue of toxic releases and find ways to lower them. Further nuances are vital to mention in order to enhance the understanding of the temporal use of this database. The TRI list is not static. It has been subject to many changes since its initial version in 1988. One major change was the addition of the metal mining sector in 1998. This is an industry with very large releases, and in one fell swoop, the total US toxics released increased from 2.9 billion pounds in 1997 to 7.1 billion pounds in 1998. Needless to say, without this knowledge simple charts of the trends of total releases would show a marked upturn suggesting a totally misleading conclusion of increased hazards in the country. To explain the differences and concerns when comparing data, the TRI program is careful to fully publish all of the metadata in TRI explorer. In fact, the space occupied by the metadata usually exceeds the space used for the actual data. Further, the program has prepared a 29 page guide entitled, “The Toxics Release Inventory (TRI) and Factors to Consider When Using TRI Data”. Most usefully, TRI comparisons are done using “core chemicals”, those that have been reported consistently since the program’s origin. So in the Report on the Environment, it was stated that “the original set of chemical (332 of the 650 TRI chemicals) from industries that have reported consistently since 1988 shows that total on- and off-site releases decreased 48 percent between 1988 and 2000, a reduction of 1.55 billion pounds.” This is an accurate way of portraying these data, but notable includes only half of all the TRI chemicals for this full term consistent analysis. Since the other chemicals that were added are also of concern, but are simply not available for comparison, less powerful statements can be made. For instance, the ROE adds that between 1998 and 2000, toxic releases of all 650 chemicals decreased by approximately 409 million pounds. The ROE, and the TRI Program, are very careful to fully explain and document the reason for inconsistent comparisons, yet with so many users of the data, it is of concern how many carefully read and consider these metadata explanations. Data That Should Not Change The Report on the Environment has an executive summary. In it, one datum exists which should not change, but apparently does. The map of the United States, reproduced in this paper, has considerable information associated with it, presumably important since it is in the executive summary. Note in particular in the lower left hand corner the statement that the coastline of the US is 66,645 miles. This number is suspiciously accurate and remarkable big. This would be about 2 ½ times the circumference of the earth. Investigation of this number proved to be more difficult, and more interesting than initially envisioned. First, checking with firstgov.gov, the coastline of the US was listed as 95,000 miles of shoreline. This was certainly much larger than the EPA number. Second, the World Fact Book gives 19,924 km of coastline. Interestingly, the World Fact Book is published by the Central Intelligence Agency, and gives the number in kilometers. This converts to 12,380 miles, considerably shorter than EPA’s estimate. Third, the Information Please Almanac gives two numbers: 12,383 miles of coastline, and 88,633 miles of shoreline. The first number agrees with the World Fact Book, and is, in fact, based on that reference. Notably neither number agrees with EPA’s. About now, one may also encounter the Richardson Effect, which in this case asserts that measuring the coastline depends on the accuracy of the measuring device: the finer and more precise the device, the longer the coastline. So, that would suggest the coastline might be infinite, and leads directly into the study of fractals. So where did EPA get its number from? In the 1998 Water Quality Report to Congress, EPA noted that the continental US (and Hawaii) had a coastline of 22,419 miles and Alaska had 44,000 miles of coastline. Adding the two yields 66,419, roughly EPA’s number in the ROE. It also is an example of merging two numbers, each have a different number of significant figures. This, of course, violates some cardinal rule taught in middle school. The point here is to demonstrate the difficulty of analyzing both data that are to be used in trends, and even numerical quatities that should remain constant. With the plethora of available data on the web, the problems of comparability of data, and data quality in general, become even more acute. Reference: US Environmental Protection Agency, Draft Report on the Environment 2003, EPA-260-R-02-006, June 2003