- Pennsylvania GAP Analysis Project

2.1 Introduction
Mapping natural land cover requires a higher level of effort than the development of data
for animal species, agency ownership, or land management, yet it is no more important
for gap analysis than any other data layer. Generally, the mapping of land cover is done
by adopting or developing a land cover classification system, delineating areas of relative
homogeneity (basic cartographic “objects”), then labeling these areas using categories
defined by the classification system. More detailed attributes of the individual areas are
added as more information becomes available, and a process of validating both polygon
pattern and labels is applied for editing and revising the map. This is done in an iterative
fashion, with the results from one step causing re-evaluation of results from another step.
Finally, an assessment of the overall accuracy of the data is conducted. The final
assessment of accuracy will show where improvements should be made in the next
update (Stoms et al. 1994).
In its “coarse filter” approach to conservation biology (e.g., Jenkins 1985, Noss 1987),
gap analysis relies on maps of dominant natural land cover types as the most fundamental
spatial component of the analysis (Scott et al. 1993) for terrestrial environments. For the
purposes of GAP, most of the land surface of interest (natural) can be characterized by its
dominant vegetation.
Vegetation patterns are an integrated reflection of the physical and chemical factors that
shape the environment of a given land area (Whittaker 1965). They also are determinants
for overall biological diversity patterns (Franklin 1993, Levin 1981, Noss 1990), and they
can be used as a currency for habitat types in conservation evaluations (Specht 1975,
Austin 1991). As such, dominant vegetation types need to be recognized over their entire
ranges of distribution (Bourgeron et al. 1994) for beta-scale analysis (sensu Whittaker
1960, 1977). These patterns cannot be acceptably mapped from any single source of
remotely sensed imagery, therefore, ancillary data, previous maps, and field surveys are
used. The central concept is that the physiognomic and floristic characteristics of
vegetation (and, in the absence of vegetation, other physical structures) across the land
surface can be used to define biologically meaningful biogeographic patterns. There may
be considerable variation in the floristics of subcanopy vegetation layers (community
association) that are not resolved when mapping at the level of dominant canopy
vegetation types (alliance), and there is a need to address this part of the diversity of
nature. As information accumulates from field studies on patterns of variation in
understory layers, it can be attributed to the mapped units of alliances.
2.2 Land Cover Classification
Land cover classifications must rely on specified attributes, such as the structural features
of plants, their floristic composition, or environmental conditions, to consistently
differentiate categories (Kuchler and Zonneveld 1988). The criteria for a land cover
classification system for GAP are: (a) an ability to distinguish areas of different actual
dominant vegetation; (b) a utility for modeling animal species habitats; (c) a suitability
for use within and among biogeographic regions; (d) an applicability to Landsat Thematic
Mapper (TM) imagery for both rendering a base map and from which to extract basic
patterns (GAP relies on a wide array of information sources, TM offers a convenient
meso-scale base map in addition to being one source of actual land cover information);
(e) a framework that can interface with classification systems used by other organizations
and nations to the greatest extent possible, and (f) a capability to fit, both categorically
and spatially, with classifications of other themes such as agricultural and built
For GAP, the system that fits best is referred to as the National Vegetation Classification
System (NVCS) (FDGC, 1997). The origin of this system was referred to as the
UNESCO/TNC system (Lins and Kleckner in press) because it is based on the structural
characteristics of vegetation derived by Mueller-Dombois and Ellenberg (1974), adopted
by the United Nations Educational, Scientific, and Cultural Organization (UNESCO
1973) and later modified for application to the United States by Driscoll et al. (1983,
1984). The Nature Conservancy and the Natural Heritage Network (Grossman et al.
1994) have been improving upon this system in recent years with partial funding supplied
by GAP. The basic assumptions and definitions for this system have been described by
Jennings (1993).
As noted in the introductory, Pennsylvania’s contemporary landscapes are largely a
legacy of historic human disturbance including marginal agriculture, strip mining, and
extensive deforestation, often followed by fire. Whereas the initial GAP view relegated
disturbed lands to minor interest, they are integral to Pennsylvania’s habitat and
landscape integrity issues that are crucial to conservation. Physiography has substantially
determined degradation under human influence and, therefore, whether land-use pressure
has intensified or been alleviated in favor of return to more naturalistic conditions.
Whether as a consequence of restoration or progressive fragmentation, structure and
composition of vegetation often varies in a complex manner at relatively fine scales with
a limited suite of species in different mixes comprising the overstory. Floristic
distinctions could not be made consistently in this vegetative complex on the basis of
Landsat satellite data that varied widely with respect to season of collection. Since the
primary goal of the Pennsylvania GAP Project has been to lay a foundation of landscapelevel conservation perspectives upon which to build in future work, it has been viewed as
important that data sources not be over-extended in any manner that would be cause for
future investigators to essentially discard these first efforts and start over completely.
Instead of vegetation alliances, therefore, the gap analysis effort for Pennsylvania has
focused on land cover in terms of physiognomy, disturbance, and landscape integrity.
2.3 Mapping Standards
Land cover mapping for Pennsylvania has been conducted at two scales. Eight types of
physiognomy have been combined with 3 intensities of disturbance at a resolution of 2
hectares or better giving a 24-class mapping of generalized land cover to serve both
habitat modeling and a broad range of other uses. A binary mapping of landscape matrix
as naturalistic (forest and water) versus humanistic (herbaceous and barren) has been
conducted at 100-hectare resolution for landscape ecological analysis. The 8 types of
physiognomy are: water, evergreen forest, mixed forest, deciduous forest, (woody)
transitional, perennial herbaceous, annual herbaceous, and barren/hardsurface/rubble/gravel. The 3 intensities of disturbance are rural, suburban, and urban.
Comprehensive mapping of more detailed floristics and/or vegetation structure has not
been undertaken for this initial conservation analysis, although it was investigated and
found to be infeasible using the remote sensing imagery available.
2.4 Methods
Generalized land cover and disturbance were mapped in several modes from Landsat
Thematic Mapper (TM) digital image data collected during a period from 1991 through
1994. The Landsat image data were obtained from USGS EROS Data Center through the
Multi-Resolution Land Characterization (MRLC) consortium. Image data were
compressed through a hyperclustering protocol configured at Penn State Univ. for display
and classification using commercial software. The compressed images have been made
available to the public and have received considerable use in Pennsylvania as backdrops
for GIS applications. An initial binary classification into naturalistic and humanisitic
types of landscape matrix at 100-ha resolution was done by interactively digitizing with a
mouse on a computer display. An interpretive classification of disturbance was done
similarly, but with no specific minimum resolution. By reference to digital orthophoto
quarter-quads (DOQQs), clusters were interpretively assigned to 8 general physiognomic
land cover classes. Combining land cover and disturbance yielded 24 map classes for
habitat modeling and analysis. Aerial videography was acquired along transects and used
for validation. Details in these regards are given in the ensuing paragraphs.
2.4.1 The Land Cover Classification Scheme:
The Pennsylvania 100-ha binary landscape matrix layer has codes as follows:
10 =
20 =
naturalistic (forest, water);
humanistic (transitional, perennial herbaceous, annual herbaceous, barren).
The Pennsylvania 2-ha land-cover/disturbance layer has a two-digit coding scheme, for
which the first digit is coded as:
Rural (wild land or agriculture);
Suburban (primarily low-density residential);
Urban (primarily high-density residential and/or commercial/industrial);
with the second digit being a code for physiognomy as follows:
Open water or wetlands with standing water;
Evergreen forest (not more than 30% of tree canopy cover deciduous);
Mixed forest (deciduous and evergreen both > 30% of tree canopy cover);
Deciduous forest (not more than 30% of tree canopy cover evergreen);
Woody transitional (5%< cover of woody plant foliage<40%), also shrubland
or forest regeneration;
Perennial herbaceous (grasslands, pasture, forage, old fields <5% shrubs);
Annual herbaceous (row crops, grain crops, exposed mineral soil);
Barren, hard-surface, rubble, gravel.
These latter classes form a natural ordination not only for physiognomy, but also for
near-infrared spectral brightness. Spectral confusion is more likely for classes that are
adjacent in the ordination than for classes that are further apart. Additional levels of
classification were considered in cooperation with other northeastern states, but could not
be implemented consistently using available image data.
2.4.2 Imagery Used:
The primary source of remotely sensed image data used in land-cover/disturbance
mapping was from the Landsat Thematic Mapper (TM) sensor in paths 14-18 and rows
31 & 32, with coverage as shown in Figure 2.1. The data for these images were acquired
from USGS EROS Data Center through GAP participation in the MRLC (MultiResolution Land Characterization) consortium. Each frame consisted of six bands, not
including the thermal infrared. The image dates obtained for these path/row positions are
listed in Table 2.1. Delivery of the image tapes was considerably delayed, which became
a major cause of protraction for the Pennsylvania GAP Project. Several of the image
dates were also considerably less than ideal for land classification, being acquired in
phenological circumstances when trees had only partial foliage or were devoid of foliage.
Clouds in portions of several images also required substantial remedial effort.
Figure 2.1. Landsat TM coverage by path/row position.
Table 2.1. Dates of Landsat TM imagery.
Path/Row Date 1
Date 2
10/20/92 6/17/93
Digital orthophoto quarter-quads (DOQQs) derived from 1:40,000 scale black and white
aerial photographs were used as a supplement to the Landsat data. The DOQQs are made
publicly available by the Pennsylvania DCNR Topographic and Geologic Survey via the
PASDA website http://www.pasda.psu.edu for downloading at no cost.
For purposes of validation, dual-resolution aerial videography was collected along
transects using a light plane and camcorder equipment maintained by national GAP to
support state projects. General location of aerial videography transects is shown in
Figure 2.2.
Figure 2.2. Aerial videography transects for use in validation.
2.4.3 Land Cover Map Development:
The 100-ha binary landscape matrix mapping was the first land-cover product developed
by the Pennsylvania GAP Project. This was accomplished by displaying a 3-band colorinfrared composite of Landsat TM on a computer workstation, and interactively
performing interpretive digitizing with a mouse. The digitizing work was then cleaned,
edited, and assembled into polygons. When the interpretive mapping was topologically
consistent, it was then generalized to 100-ha resolution by dissolving the boundaries of
smaller polygons.
The second major image analysis undertaking was inspired by the Spectrum initiatives
of the Khoros Group at Los Alamos National Laboratory. The hypercluster concept was
appealing, but not its implementation. An alternative scenario for hyperclustering was
configured by customizing ERDAS image analysis facilities. This was approached as a
form of image data compression that produces a hybrid image-map layer to be displayed
and analyzed using ESRI GIS software facilities. The available Landsat TM images
were compressed in this manner, and the compressions made publicly available on both
CD-ROM and the PASDA website (http://pasda.psu.edu). Since the customized method
of hyperclustering with ERDAS was still somewhat awkward and restrictive, a suite of
independent software facilities carrying the acronym PHASES (Myers 1999) was
developed in generic C language and made publicly available on the Worldwide Web.
This latter software development work was conducted under an NSF/EPA project.
Studies on extensions of hyperclustering concepts for a variety of image analysis
purposes are continuing under different sources of support.
Hypercluster compressions of the 10 scenes for spring dates provided the primary basis
for unsupervised classification of physiognomy. Each scene had 255 clusters, making a
total of more than 2,500 clusters to be labeled. Cluster labeling was accomplished by
interactive image interpretation using ArcView by ESRI. Two computers were used
simultaneously for the interpretation, with one displaying the clustered image-map, and
the other displaying a panel of DOQQs. A sample of DOQQs was used in the manner of
training sets, with additional samples being used to check consistency of interpretation
across the image.
Clouds were an obstacle to image analysis, as they usually are in Pennsylvania. Most of
the clouds could be eliminated in mosaicking the classified scenes by carefully choosing
which scene took precedence in the area of overlap. Some clouds, however, were too
centrally located to allow this expedient. Fortunately, dual dates were available for these
instances. Accordingly, the alternate date was also classified and the map from the
clouded image was patched from the second date.
Unsupervised classification was conducted at Landsat TM 30-meter pixel resolution. The
classification of physiognomy was then generalized to a 2-ha level of resolution in two
steps using the ANXPHASE facility of the PHASES image analysis system (Myers
1999). The first step performed generalization from pixel level to 1-ha level. The second
step went from the 1-ha level to the 2-ha level. ANXPHASE is computationally
intensive, but its strategy is parallel in several respects to the way a human interpreter
would generalize. As it works, it always looks for the next smallest patch in the vicinity
and blends it with the neighboring type having the greatest border.
The mapping of urbanized disturbance was developed in a manner analogous to the
mapping of landscape matrix, but with a few modifications in technique. The first
modification was to speed up and simplify the process by displaying the compressed
image instead of a 3-band composite, which allowed for faster reloading of graphics
when panning and zooming. The second was to overlay the compressed image with a
digital file of roads in order to lend emphasis to urbanized areas. Digitizing was then
done interactively via mouse. There was no specific minimum mapping unit for
digitizing urbanized areas.
In combining the raster theme of physiognomy with the vector theme of urbanized areas,
The first operation was to rasterize the urbanized layer. The urbanized raster was then
reclassified to a second digit coding; after which the two layers could be combined by
directly adding their codes.
2.4.4 Special Feature Mapping:
Wetlands data were extracted from a land-use/land-cover classification performed by
MRLC. Although the MRLC mapping was deemed to have insufficient landscape
fidelity for habitat modeling in most respects, MRLC analysts had used National
Wetlands Inventory (NWI) as an ancillary data source. Along with open water, two
wetland types were transferred from NWI by MRLC: palustrine herbaceous wetlands
and palustrine woody (shrub and forested) wetlands. For use in habitat modeling, each of
these two classes was isolated from the MRLC digital map as a separate layer. Digitized
NWI quads are only available for 2/3 of Pennsylvania, and they are not merged into a
single layer; thus, we elected not to use them in partial manner. In addition, NWI
coverage usually detects only about half of Pennsylvania’s wetlands. We used variable
width buffers along steams, rivers, lakes, and wetlands to capture a considerable portion
of aquatic habitats.
2.5 Results
The landscape ecological insight that arises from mapping at both broad and fine scales is
evident from the contrast between the broad-scale landscape matrix map in Figure 2.3
and the fine-scale landscape matrix map in Figure 2.4. The broad-scale landscape matrix
depiction comes directly from mapping naturalistic versus humanistic cover at 100-ha
resolution; whereas, the fine-scale landscape version comes from combining water with
forest classes and herbaceous with barren classes in the mapping of physiognomy at 2-ha
resolution. In the broad-scale mapping, the Pittsburgh Low Plateau in southwestern
Pennsylvania appears to be nearly as deforested as the Great Valley and Piedmont
Lowland in southeastern Pennsylvania. In the fine-scale mapping, the Great Valley and
Piedmont Lowland continue to appear as essentially devoid of forest cover; whereas, the
Pittsburgh Low Plateau shows a partial but considerably fragmented forest cover. The
Pittsburgh Low Plateau may also lose much of its remaining naturalistic cover if
fragmentation continues. According to the broad-scale mapping, 69% of Pennsylvania
has a naturalistic landscape matrix with 65% of the state being in one expansive unit that
encompasses much of the northern third of the Commonwealth and extends through the
mountains to the southern border. In this sense, landscapes are relatively intact over
much of Pennsylvania.
The separable mapping of urbanization as shown in Figure 2.5 is also revealing from a
landscape perspective. Pennsylvania is predominantly rural, with 1.5% of its area being
intensively urbanized and another 4.1% being suburban. Much of the urbanization is due
to a few large metropolitan areas such as Philadelphia, Pittsburgh, Harrisburg, Erie, and
Wilkes-Barre/Scranton. Suburban sprawl is, however, a contemporary issue of concern
that has been emphasized by the Governor’s 21st Century Environment Commission (Seif
and Glotfelty 1998).
Figure 2.3. Broad-scale mapping of naturalistic versus humanistic landscape matrix.
Figure 2.4. Fine-scale mapping of naturalistic versus humanistic landscape matrix.
Figure 2.5. Urbanized and suburban areas of Pennsylvania.
Table 2.2 provides a percentage breakdown of the 11,618,719 hectares mapped in
Pennsylvania according to land cover and urbanization. Figure 2.6 is a color plate
showing the major components of the land cover and urbanization mapping.
Table 2.2. Percentage breakdown of Pennsylvania by land cover and urbanization.
Evergreen forest
Mixed forest
Deciduous forest
P. herbaceous
A. herbaceous
2.6 Accuracy Assessment
2.6.1 Introduction:
GAP land cover maps are primarily compiled to answer the fundamental question in gap
analysis: what is the current distribution and management status of the nation’s major
natural land cover types and wildlife habitats? Besides giving a measure of overall
reliability of the land cover map for Gap Analysis, the assessment also identifies which
general classes or which regions of the map do not meet the accuracy objectives for the
Gap Analysis Program. Thus, the assessment identifies where additional effort will be
required when the map is updated. We report the results of the accuracy assessment,
believing that the map is the best map currently available for the project area.
The purpose of accuracy assessment is to allow a potential user to determine the map’s
“fitness for use” for their application. It is impossible for the original cartographer to
anticipate all future applications of a land cover map, so the assessment should provide
enough information for the user to evaluate fitness for their unique purpose. This can be
described as the degree to which the data quality characteristics collectively suit an
intended application. The information reported includes details on the database’s spatial,
thematic, and temporal characteristics and their accuracy.
Assessment data are valuable for purposes beyond their immediate application to
estimating accuracy of a land cover map. The reference data is, therefore, made available
to other agencies and organizations for use in their own land cover characterization and
map accuracy assessments (see Data Availability for access information). The data set
will also serve as an important training data source for later updates.
Even though we have reached an endpoint in the mapping process where products are
made available to others, the gap analysis process should be considered dynamic. We
envision that maps will be refined and updated on a regular schedule. The assessment
data will be used to refine GAP maps iteratively by identifying where the land cover map
is inaccurate and where more effort is required to bring the maps up to accuracy
standards. In addition, the field sampling may identify new classes that were not
identified at all during the initial mapping process.
2.6.2 Methods:
Aerial videography transects were flown as shown in Figure 2.2 with a light plane using a
high resolution 8mm camcorder system loaned to the Pennsylvania GAP Project by the
national GAP coordinators. The system featured dual wide-angle and zoom video
cameras with linkage to a Global Position System (GPS) unit though a notebook
computer. There were numerous logistical difficulties in obtaining aerial videography.
Cloud conditions and otherwise inclement weather were also persistent problems. As a
consequence of these difficulties, the available videography came from different years
and different seasons. Some represented full foliage phenology, some were from fall
foliage transition, and others were from leafless periods.
The strategy for obtaining reference data from the video was to randomly select frames
from the respective flight lines on the basis of time code. The selected frames were then
located on the basis of time code and the center of the zoom image was classified
photointerpretively and recorded. The time code was translated to spatial coordinates on
the basis of the GPS data. A circle having 40-meter radius was then located on the map,
and pixels having their center in the circle were examined. If the circle captured a pixel
having the reference class, then the map was considered to be correct. Otherwise, it was
recorded as confusion for the map class of the pixel nearest the center of the circle. The
accuracy assessment was performed on the individual pixel map before generalization to
a 2-ha level.
The physiognomic land cover categories form a general ordination with respect to
infrared brightness on Landsat TM imagery. Confusion of categories adjacent in the
ordination was expected to occur more frequently, and was not further investigated. If
there was confusion of classes farther apart in the ordination, then the video interpretation
was rechecked to make sure there had not been a problem in this regard.
An original sample of 498 frames was drawn randomly by flight line and interpreted,
which was found to be primarily deciduous forest. Predominance of this class in the map
made it unlikely that other classes would be reasonably represented through additional
unrestricted random sampling. Therefore, a supplemental random sample was taken after
excluding areas where flight lines passed through deciduous and transitional map classes.
Although precluding formal calculation of standard errors for accuracy, it was decided as
most reasonable to pool data from the unrestricted and supplemental samples.
Table 2.3 shows the results of pooled accuracy assessment, with video reference
categories as columns and map land-cover codes as rows. The last column shows
percentage user’s accuracy. The last row shows percentage producer’s accuracy. The
lower corner shows overall accuracy as percent of correct classification.
Table 2.3. Accuracy assessment for physiognomic land cover categories. R=reference;
M=map. Water (watr), evergreen forest (evrgrn), mixed forest (mixd),
deciduous forest (decid), transitional (trans), perennial herbaceous (pherb),
annual herbaceous (aherb), barren (bare). Producer’s accuracy (%pac);
user's accuracy (%uac).
Rwatr Revrgrn Rmixd Rdecid Rtrans Rpherb Raherb Rbare %uac
2.7 Limitations and Discussion
In spite of having broad physiognomic land-cover classes and more than 2,500 clusters,
the accuracy target of 80% was achieved only for the deciduous forest class. Water in the
landscape was also classified with 85% producer’s accuracy, but there was commission
error recorded which could arise in different ways. Cloud shadows were not consistently
distinguishable from water in the clustered image data, and some cases of very deep
terrain shading could also appear in water clusters. The majority of the confusion for
water, however, was with deciduous forest that is usually spectrally distinct from water.
Thus, it seems likely that this apparent confusion arises from forested wetlands that are
more evident in the spectral infrared than on video taken with visible wavelengths.
Likewise, seasonally wet areas may well have dried in the fall video timeframe. It must
also be kept in mind that the videography was taken as much as 6 years later than the
satellite imagery.
It is evident that evergreen forest is not consistently distinguished from mixed forest, so
the evergreen and mixed forest categories would be better combined as “evergreen
component” forests. This would give approximately 70% producer accuracy, but still
less than 60% user accuracy. Thus, it appears that separate sets of leaf-on and leaf-off
imagery are needed for accurate recognition of forests having an evergreen component.
Otherwise, deciduous forests on heavily shaded aspects tend to be clustered with those
having an evergreen component.
It is likewise evident that annual herbaceous vegetation is not consistently separated from
perennial herbaceous vegetation. Pooling these two classes would give close to 80%
producer accuracy, and 70% user accuracy. A major contributor to remaining confusion
lies in the fact that northern Pennsylvania forests in several of the spring Landsat scenes
had only partially foliated. In these cases, herbaceous vegetation in the understory was
contributing strongly to spectral signatures.
Since large paved areas, quarries, etc. were quite evident in the clustered images, it can
be concluded that low accuracy in this class is due largely to mixed-pixel contamination
in the clustering.
The clustered images have a good deal more fidelity to landscape pattern than the error
matrix might suggest. When display scales between 1:50,000 and 1:100,000 are used for
the clustered images, they portray the landscape pattern well enough that they have been
popular in Pennsylvania as backdrops for GIS work. Major roads and paved urban
surfaces are evident, but more minor roads are lost to mixed-pixel effects. Overlaying
GIS layers of roads and streams serves to verify consistency of the landscape pattern in
these clustered images.
This verification exercise does, however, underscore the importance of phenology for
acquisition and processing of remotely sensed image data in Pennsylvania. The best
situation is to have a combination of two phonological conditions. The primary set of
imagery would be late spring or early summer when trees and shrubs have a full
complement of leaves, but agricultural crops have not yet developed complete coverage
of the soil surface. The secondary set would be from late fall after the leaves have been
shed, but before snowfall.