The Prediction Of Place:

The Prediction of Place:
Calculating the Presence of Archaeological Sites in Pisgah National Forest
A Thesis Submitted to the
Department of Anthropology/Sociology of Warren Wilson College
In Partial Fulfillment of the Requirements for the Degree of
Bachelor of the Arts
In the Department of Anthropology/Sociology
May 8, 2012
By Maureen Vaughan
Supervisor-Christey Carwile and David Moore
Table of Contents
Mathematics Behind Predictive Modeling
Common uses of GIS in Predictive Modeling
Predictive Modeling in North Carolina and Surrounding States
Predictive Modeling within the National Forest Service
Predicting the Model
Statistical Analysis
Utilizing Geographic Information Systems (GIS) in archaeology presents
many opportunities for Cultural Resource Management (CRM) and academic
research. Because locating and documenting archaeology sites can be
challenging, GIS databases have been used to systematically predict these sites
based off of a variety of factors previously. In the past, the National Forest Service
(NFS) has relied on the instincts and advice of archaeologists to search out sites
of potential importance. There is no systematized, statistical approach to
predictive modeling in use at the NFS. ‘Fuzzy set estimation’ is a Bayesian
method of statistical analysis that was used to create two predictive models of
Pisgah National Forest through GIS. These models are both analyzed with fuzzy
set estimation, but the variables are based off of the experience and advice of
differing archaeologists. One model is based on the methods presented by Mink
et al. (2009), while the other revolves around the current survey rules created by
the NFS for Pisgah. This thesis then compares these models’ results with surveys
conducted by the NFS.
I would like to thank Philip B Mink II for presenting at the 2010
Southeastern Archaeology Conference. I also thank Rob Snedeker and Holly
Hixon, who have provided me with invaluable data from the National Forest
Service of North Carolina. I would also like to thank the many professors, friends,
and family that have listened to me worry and provided many pearls of wisdom
that were immeasurably helpful in making this thesis.
We are weaving a story. With every point placed on a map, every potsherd
pulled from the ground, every successive layer of soil, we weave the largest, most
fascinating tapestry of all time. It is our collective human past, and we, as
archaeologists, are constantly trying to fight for the right to protect these
remnants of our material past. As budgets tighten around the necks of the
government officials in charge of allocating funds and resources, they are faced
with picking out what places are extensively explored and which are quickly
surveyed before being completely overturned by a construction crew and lost.
What are these men and women to do, but use every item at their disposal to
streamline the process and shave every last extraneous second? That is why we
require models that predict archaeological sites, because we are at war with the
resources we possess.
For over thirty years, Geographic Information Systems (GIS) has proved to
be an incredibly useful tool for archaeological inquiry. This program brings
mapping to a new level, providing professionals with the tools to learn more
about the geography and human activity in the area before even leaving the lab.
GIS databases are designed specifically to aid in the collection, safeguard, and
analysis of whatever spatial data the program is given (Renfrew and Bahn
2000:87). With this tool at our disposal, archaeologists can derive statistical
analysis out of imputed data, from analyzing artifact distribution to calculating
the prospective cost of an archaeological survey. Those involved with Cultural
Resource Management (CRM) often take advantage of this and readily fund
predictive models for archaeological sites. These maps allow them to find those
areas of greatest probable importance when presented with a proposed road or
building complex and focus their attention on those places most likely to yield
artifacts (Whitley 2003).
Because GIS is used to organize complex and vast arrays of data,
information is separated into visual, differentiable layers and then combined and
edited to find the relationships between them. When mapping individual sites,
environmental and artifact data can be inputted into a new map, overlaid with
others, and explored more effectively than with pen and paper alone. GIS can also
be used to record new sites and combine geographic data with site information. It
allows us to document and permanently store sites at the push of a button. With
this system, data are easily stored and queried from a database instead of
physically separated from other data in text-based databases. Data can be viewed
on a much larger context than ever before. These models can be used to estimate
the likelihood that hidden there is valuable material culture where the human eye
cannot discern the difference between a field and an archaic village.
GIS can also model site locations, trade routes, and migration patterns,
where information is inputted into a database, then analyzed mathematically to
find relationships that could tell us more about the period and place we are
exploring (Renfrew and Bahn 2000:260; 551). These models, utilizing various
mathematical theoretical backgrounds, can be used to estimate the likelihood
that hidden there is valuable material culture where the human eye cannot
discern the difference between a field and an archaic village.
Fuzzy set estimation, or fuzzy logic, is a type of model more commonly
used in modeling biological systems like coral reef growth, and projecting their
future development with non-linear relationships between variables, non-linear
relations being those that don’t follow an easily discerned path of logic from the
initial to the conclusion while linear models require continuous dependant
variables (Adèr 2008: 271-304). Because effectively predicting archaeological
sites in a given place can be quite difficult with traditional statistical modeling,
non-linear models are a more effective way to figure out these progressions. With
too many variables and variable interactions to be effectively studied with linear
models, changing the way we approach the mathematics of archaeological
modeling seems the best way to get results.
This research focuses on prehistoric archaeology sites in Pisgah National
Forest in western North Carolina, using fuzzy logic to predict new sites for future
exploration and protection (Mink et. al 2009). By applying fuzzy logic to
topographical models made with GIS, this study examines whether this modeling
system and the parameters within it can be used to effectively predict the chances
of archaeological sites in a given area using certain environmental factors (like
proximity to water, where that water flows, and elevation). This predictive model
type has proved more effective than other statistical models in other research, so
this project should likewise prove effective. In fact, I am basing one of these
models completely on the work of Mink et at from their 2009 paper, Predictive
Archaeological Modeling using GIS Based Fuzzy Set Estimation, which I hope to
expand upon. However, taking into account differing topography between the
original Mink model and Pisagh National Forest, I have also created a set of
parameters presently used to predict site locations within the National Forest
Service. This was done in an effort to compare, not only the efficacy of fuzzy logic
with regards to modeling, but also to find how different parameters can change
the results of this model type.
Because many survey methods are expensive and time consuming, many
researchers have sought a more cost effective approach to predicting site
locations utilizing GIS as a tool. This research examines the previous research of
Mink et. al., (which uses fuzzy logic) and asks how reliable it is for use in the
Appalachians. Their model was adapted and used in this instance to predict the
probability of prehistoric archaeology sites in Pisgah National Forest. Another
model was also created utilizing the parameters currently used by the National
Forest Service. I investigated the veracity of these models with Pisgah National
Park survey data. The methods involved gathering National Forest Service data of
prehistoric archaeology sites and comparing actual sites discovered with the
projected probability of sites in the National Forest. The models were created
using fuzzy logic statistics comparing each model with the archaeology sites
already documented and created a gain statistic to test the accuracy. This project
compares the productivity of two models, set in Pisgah National Forest, analyzed
with fuzzy logic, and utilizing different variables that seek to answer these
1. How accurate is fuzzy logic in identifying archaeological sites accurately
in Pisgah National Forest?
2. If variables are changed to coincide with the experience of local
archaeologists, will this make the model more or less efficacious?
Combining archaeology with GIS has proved to be an effective way to
categorize and organize data uncovered from excavations, shovel tests, and
surveys. GIS can also model site locations, trade patterns, and migration
patterns, where information is inputted into a database, then analyzed
mathematically to find relationships that could tell us more about the period we
are exploring (Renfrew and Bahn 2000:260; 551). Those involved with Cultural
Recourse Management (CRM) often take advantage of this and readily fund
predictive models for archaeology sites. These maps allow them to find those
areas of greatest probable importance when presented a proposed road or
building complex, and focus their attention on those places most likely to yield
artifacts (Whitley 2003).
Surveying is paramount to archaeological inquiry. Reconnaissance
surveying is the practice of searching out archaeological sites by looking at the
landscape. They demand the careful search for something as prominent as brick
walls jutting out of the earth or as subtle as a scattering of projectile points or
pottery pieces on the surface. An area is cordoned off, sampled, and examined
with shovels, trowels, and human cleverness for signs of occupation. More
advanced technologies are also used today to ding and document archaeological
sites. There have been great strides made in using these emergent technologies to
ensure that everything is done to marry efficiency with thoroughness.
Archaeologists use tools like remote sensing, ground penetrating radar,
magnetometry, aerial photography, electrical resistivity, and advanced mapping
programs to better do their jobs as archaeologists and learn as much as possible
while disturbing as little as possible (Renfrew and Bahn 2000:71-105).
Using technology with archaeology is certainly an effective tool, but it lacks
human reflection and is often considered too environmentally deterministic.
Statistical analysis, though a valid and important method of archaeological
exploration, cannot be impartial by the very nature of the information we gather.
New technology oft times becomes mired in innovation rather than
understanding the functionality of the newest “toys.” Without the proper
application of mathematical and archaeological theory, these new technologies
often prove useless in the pursuit of answers to our archaeological questions.
GIS has also been critiqued for its lack of ‘wiggle room,’ so to speak. This
computer program cannot ‘see’ the world in the same way that a human
archaeologist can and, thus, leaves no room for error or hesitation unless
specifically programmed to do so. Thus the mathematical theory, fuzzy logic, has
proved to be effective in mapping complex data relationships and uncertainty in
GIS (Niccolucci et. al. 2001).
Mathematics behind Predictive Modeling
The basis of this research relies on statistics and how fuzzy logic can be
applied to archaeology. Statistical analysis uses data to answer questions, and it
has been used by social scientists since Emile Durkheim used it to find patterns
in society (Gardner 2007:25-26; 72-77). The goal of statistics is to answer our
questions without becoming too lost in uncertainty (Bolstad 2007). Bayesian
models have been successfully used to trace chronologies in archaeology in the
past, and I hope to add to this body of work (Buck and Sahu 2000).
I will be utilizing Fuzzy Set Estimation, or fuzzy logic, to create each model
for this project. Using fuzzy logic in the stead of traditional statistics, we are
better able to marry theory with technology. Fuzzy logic requires that we view
everything, from what cereal is in your cabinet to the election of the nest
president, as a constant array of choices that end up creating the final product.
With such a viewpoint, we are better prepared to take a step back and understand
how we organize and affect the data we acquire in archaeological research.
We choose to place our data in computers, in databases and excel
documents that will count our potsherds and debitage, and these choices affect
how our data are compiled and analyzed. Every choice made in the long line of
judgment calls made by archaeologists, specialists and laymen alike influence
how that information is going to turn out. While, it certainly makes exchanging
and computing that information very easy, it is not immune to mistakes. All
computers are subject to compiling data incorrectly, and no software is ever
perfect. This margin of error must be accounted for, or, sometimes, those errors
will render your interpretations completely wrong (Niccolucci et. al. 2001).
Subjectivity in identification is also especially important because there is no
perfect way to identify actual ground cover or land use in the real world. Most of
these definitions are merely idealized generalizations of what is actually going on
at the ground level. People make mistakes or act inconsistently, and our
mathematical approach should reflect that. (Benz et. al. 2004). Data recorded in
databases often restrict archaeologists into making concrete value judgments,
instead of reflecting their hesitation or uncertainty over their decisions. The
subjective opinion becomes objective fact (Niccolucci et. al. 2001).
Fuzzy-set theory can be used to characterize non-statistical uncertainty in
mathematical models. It allows uncertainty to be estimated under certain
conditions. Computer-simulations have been used time and again to prove the
validity of fuzzy logic (Xia et. al. 2000). This uncertainty accounted for with fuzzy
logic better reflects real life situations in urban planning, medical consultation,
coral reef growth, and fire risk estimation. All of these instances require taking
for granted that many things are simply impossible to completely, perfectly
predict, just as in archaeology, where a perfect understanding of the past is far
beyond our reach. (Puente et al 2007, Boegi et al 2001, Meesters et al 2008,
Iliadis 2005, Perry et al 1999). Predicting archaeology sites with GIS has been
particularly advantageous for various State Department of Transportation
agencies, but it can be a difficult endeavor. Models using a purely statistical
approach need too many variables to be useful. This research combines Spatial
Analyst and fuzzy logic modeling to combine anecdotal evidence with empirical
Anecdotal evidence comes from professional archaeologists who worked
with Mink et. al. and advised them using their experience with prehistoric
archaeology in Kentucky. These conversations focused mainly on measuring river
systems. Such variables as distance from rivers and streams, size and order of the
waterways, elevation in relation to waterways are important measures of the
inhabitability of a given place. As water is a necessary component of life to people
regardless of creed or culture, it is a good place to begin. Topographical gradient,
as well, is important to long-term settlement. Especially as we are in a
mountainous environment, knowing whether a given place is too steep to settle is
just as important as availability of water. With these variables in mind, this
model, when completed and tested, proved to be more accurate than other
statistical models like it (Mink et. al. 2009).
Traditional statistical analysis is used primarily in finding a gain statistic
after a predictive model has been generated. The gain statistic measures the
accuracy and precision of a model’s findings. Having accurate information
without the precise areas, or vice versa, makes for a useless model because having
accuracy without precision could simply result in blanketing the entire map with
a high probability label and precision without accuracy leaves the researcher
scratching their head when no known archaeology sites even touch the areas
indicated by the model. Finding a gain statistic, while not a definitive indication
of the model’s success or failure, is a good beginning measure when looking for
model efficacy (Kvamme 1988, Whitley 2003).
Common uses of GIS in Predictive Modeling
GIS is the vehicle with which this research statistically analyzes the
probability of prehistoric site locations in Pisgah National Forest. GIS is often
used to organize complex and vast data. Data are separated into visual,
differentiable layers and then used to find the relationships between data forms.
When mapping individual sites, environmental and artifact data can be inputted
into a new map, overlaid with one another, and explored more effectively than
with pen and paper alone. GIS can also be used to record new sites, aiding in
Cultural Resource Management. With this system, data are easily stored and
queried from a database instead of physically separated from other data in textbased databases. Data can be viewed on a much larger context than ever before
(Renfrew and Bahn 2000:260; 551).
Using GIS in an archaeological context, however, prompts researchers to
ask many questions. Is GIS an environmentally deterministic survey tool or not?
Is such an approach helpful to archaeologists or just misleading? Do computers
make mistakes? What if the archaeologist keying in the data is unsure of his or
her answer? Is there room for error? These questions must be explored before the
research can go forward. Some researchers have no problem using environmental
data to predict archaeology sites, following the processual argument that when
studied empirically, people of the past tend to settle in distinct patterns that are
dictated by both the physical and social environment. Because the social
environment is much more challenging to engineer on a physical map, especially
when dealing with prehistoric hunter-gatherers with more ephemeral social
environments that are poorly documented in the archaeological record, it is more
efficient to rely mostly on the physical environment when making a predictive
model (Brandt et. al. 1992:269-270).
Many authors have made compelling arguments about the use of GIS as a
tool, a crutch, or a shiny new toy amongst professional circles. While providing
specific examples with other people’s modeling techniques, other articles focus
on broader implications in archaeology and whether or not any similar models
are truly as helpful as we think (Westcott and Brandon 2000, Harris and Lock
1995). For example, some are adamant that most of the hopes archaeologists
place in GIS are unfounded, that archaeologists that eagerly experiment with GIS
are “uncritical” and trying to “understand the ‘past’ using contemporary data”
(Ebert 2000 137-140). Such critics of the potential of computer technology state
that more research is necessary before predictive modeling can accurately take
the motivations of prehistoric peoples into account. The argument is based on the
idea that sites are more complicated and varied than such models can take into
Predictive Modeling in North Carolina and Surrounding States
The First statewide predictive model produced was made by the
Minnessota Department of Transportation in 1995. It was also the first model to
explicitly work with survey bias and identify site potential at both the surface
level and at depth (Mn/Model). In Delaware, the State Department of
Transportation uses GIS and weighted model parameters to predict archaeology
sites. This method takes differing environmental data and proscribes values each
part of the variable depending on their importance to settlement patterns in the
area like slope, water proximity, and soil permeability. (Revised Archaeological
Predictive Model 2005). In Pitt County, a master thesis was completed about
predictive modeling that tested the applicability of one model type based in the
North Carolina coastal region in the Piedmont region. Not only that, but time
specific models were also compared to the more general Coastal Plain Model,
which proved that the more specific the variables and parameters of the model,
the more accurate that model should be (Schleier 2010).
In North Carolina, there is an ongoing project dedicated to predictive
modeling created by the North Carolina Department of Transportation. Though
its website appears to have been abandoned in the beginning of the last decade, it
provides a good base with which I can study predictive modeling that actually
uses the same geography as this project. This model depends on using
environmental data (such as soil types, elevation, vegetation, water location, and
slope) and using that to find patterns among the sites already inputted into the
system. Since its creation, numerous research papers have been based off of and
expanded the NCDOT predictive model (North Carolina Department of
Transportation 2001).
In the Great Smokey Mountains National Park, there have been attempts
at understanding settlement patterns based on ceramic and lithic analysis,
focusing on the procurement of lithic making material instead of the topography
of the park (Bass 1977). In South Carolina, however, several different methods
were documented in 2006, explaining the pros and cons of each model in the
South Carolina Piedmont (Benson 223-232).
Predictive Modeling within the National Forest Service
On the next page is a map of my testing area (Figure 1), which takes up
much of the Western Portion of North Carolina. Pisgah National Forest began in
1911 under the Weeks Act and now encompasses over half a million acres of
hardwood forest in North Carolina. This forest covers twelve counties and is
famous for its river system and wooded mountain slopes (United States
Department of Agriculture Forest Service 2011).
Figure 1-Map of Pisgah National Forest
At the moment, the National Forest Service does not have a systematized
method for identifying prospective site locations using GIS or statistical analysis.
Appendix 4 of the National Forest Service of North Carolina Cultural Resources
Survey Strategies/Methods is used to highlight potential sites of significance
within the National Forest. Each National Forest in North Carolina has differing
parameters for survey depending on the park’s geological and geographical
location as land in the Piedmont is very different to those in the mountains of
Pisgah or the coastal plains of Croatan National Forest (2008). If a site is found,
it is assigned a site number and noted on state site forms, topographic maps, and
a GIS data layer of cultural sites. This data layer was used to test my models'
efficacy as a predictive model (2008).
The two models were created using a tri-step method, wherein the
variables were first created, then the fuzzy analysis was conducted, only to be
finally tested using Kvamme’s gain statistic (Kvamme 1983).
Building the Model
To begin with, I looked to Mink et. at. (2009) and the National Forest
Service to create the variables for these two models. These factors were those
considered most important to archaeologists when surveying in Kentucky (Table
1) and the North Carolina National Forest (Table 2), respectively. Here are the
two sets of variables I used for this experiment. The Mink model is based solely
on that literature’s variables and parameters, which focus mainly on water
availability and elevation. Those same basic variables were used in the forest
service model, but more simplified, only taking into account proximity to water in
feet and slope in degrees percent.
Table 1-Mink 2009 Variables
Degrees Slope
Minutes Walk to
Minutes Walk to
Distance Above
Stream Rank in Strahler
Water in Feet
1=(VL, <=5)
VL, <=10
2=(L, 5-10)
3=(H, 11-20)
4=(VH, =>20)
Table 2-Forest Service 2012 Variables
Forest Service 2012 Variables
Degrees Slope
Distance to Water in Feet
For these models I gathered data from the National Forest Service by
requesting data via email (see Appendix A) and face to face conversations. There
are multiple sources of this sort of data, but I chose to contact The National
Forest Service because the service has the most extensive GIS data sets available
for this part of North Carolina. In fact, the Western North Carolina Archaeology
Office does not actually have these data inputted into a GIS database.
The base map, which is a vector map of all National Forest boundaries
clipped to the shape of my survey area, was overlaid with North Carolina
Department of Transportation LiDAR data. LiDAR data is some of the most
accurate elevation data available. These data were separated into county parcels,
which I combined into a mosaic, then created a new raster of the twelve counties
that encompasses Pisgah National Forest (Transylvania, McDowell, Haywood,
Madison, Caldwell, Burke, Yancey, Buncombe, Avery, Mitchell, Henderson, and
Watauga counties). With this new raster at hand, I clipped these counties to the
shape of my survey area with the vector map of National Forest. With this base in
place, I created a layer file for each parameter indicated by Mink et al (2009).
Here I will enumerate the tools I used to create each layer.
I began by creating stream order (using the Stahler order system, a
method used by geologists and researchers to rank streams according to how
many waterways flow out of the particular body of water) from the elevation data.
I first took that new raster layer I created and Filled it, this corrects the elevation
data for any sinks in the data that do not exist in the real world. The next step to
create stream order is Flow Direction, which ascertains the direction that each
cell of the fill flows into. This is done by calculating the slope of each cell adjacent
to a given cell to determine in which direction water would flow on the surface.
This is then augmented by the Flow Accumulation, a tool that identifies where
the highest and lowest levels of pixel accumulation lie using the previous tool.
Where the most accumulation lies is where the stream system is located in the
region. A conditional was placed at an appropriate level (in this case 36,000,000
pixels) to separate what is identified as ‘water’ from what is ‘land’ in this model. I
used this to identify the stream order. The stream order was then reclassified to
fit within the Mink paper’s parameters.
The next parameter I chose to attempt was the slope of the Pisgah region.
The final elevation was simply run through the slope tool, and then reclassified to
fit within the parameters of each model.
Minutes’ walk to the nearest water sources and confluences of waterways
were also analyzed. The walking speed and horizontal distance from water was
measured using Tobler’s method for identifying walking velocity based on terrain
(Mink et. al. 2009). This method is derived from a formula meant to measure
hiking velocity on hilly terrain and applies it to the topography on the map
(Figure 2). The virtue of the equation is that it neatly takes into account the slope
of the terrain (Tobler 1993).
Figure 2-Hiking on Hilly Terrain Equation
(Walter Tobler, 1993 Three Presentations of Geographic
al Analysis and Modeling. University of California-Santa
Barbara: National Center for Geographic Information
and Analysis Technical Report.)
Walking distance to water in hours and
walking distance to water confluences, or stream systems ranked as 3 or above
required more work than a simple slope layer. The first is created by combining
the Fill and stream data into the Path Distance tool. Then Tobler’s formula is
applied to the vertical factor of this new layer. This was then recalculated (as
Tobler’s formula works in kilometers per hour and I required kilometers per
minute) and reclassified to conform to the Mink parameters for walking distance.
The Forest Service model also required a distance to water variable, but it was
considerably simpler to produce. The Conditional layer that showed all water
ways in Pisgah National Forest was converted into a vector layer then the river
sister was buffered to within 150 feet of water.
The walking distance to confluences of water was more complex to begin
with, but was otherwise identical to walking distance to water. The Stream Order
layer was reclassified to only identify those streams with a rank of three or above,
then transferred from a raster to a vector layer, and finally applied to the Path
Distance tool in the same way as walking distance to water. The last parameter
from the Mink model was the layer that identified the elevation difference with
regards to water in feet. This was also identified using Path Distance. The stream
data was again used as the basis for this tool, using the fill as a surface raster and
the slope layer as the vertical factor, which was then reclassed.
Statistical Analysis
Mink uses FuzzyKnowledgeBuilder to analyze their data, but that software
is very expensive, so I looked for more cost effective knowledge building software.
Though I did find one free piece of software and another that was within my
budget, the former failed due to technical difficulties and the other’s
manufacturers never responded to my requests for more information. Thus I
used the Fuzzy Class tools in ArcMap instead. Each variable was run through the
Fuzzy Membership tool separately, and then joined them together with Fuzzy
Overlay. This single raster was separated into five and two parameters for the
Mink model and Forest Service model, respectively.
Figure 3-CRM Model Example
These maps are presented above (Figure 3), both in close magnification to
show how it would be used in CRM work. As these models would be used to
gauge whether parcels of land are locations of prospective archaeological sites or
not, I have randomly created a thirty acre land parcel, which you can see
highlighted in each map. The Mink model suggests that there is only a small
chance of a site being located around the river system, but the Forest Service
Model suggests that the area is of high potential.
After completing the analysis, I then inputted the NFS data into a GIS map
and then intersected each of the data samples with the Fuzzy Overlay. The NFS
data are separated into three different categories: polygons, points, and polygons.
They represent all of the archaeology sites recorded in all the North Carolina
National Forests, so I clipped the edged to Pisgah National Forest. I also chose to
only count the polygon and point layers. The intersection required me converting
the raster data (the Fuzzy Overlay layers) into vector data and intersecting each
cultural site layer with each Fuzzy Overlay layer. With these new layers available,
I queried their attributes to calculate the area and the number of sites located
within each layer.
Table 3-Mink Profiles and Site Distribution
Very Low
Very High
Very Low
Very High
Zone Profiles-Five Classes
Area (square feet)
Total Percent of Area
Site Distribution by Sites
Percent of Total Number of Sites
Table 4-Forest Service Profiles and Site Distribution
Zone Profiles-2 Classes
Total Percent of Area
Area (Square Feet)
Site Distribution by Sites
Percent of Total Number of Sites
I then calculated the gain statistic of each model using this information.
The gain statistic measures the accuracy and precision of a model’s findings. It
specifically measures the percent of area covered by each part of the map divided
by the percent of sites (in polygons and points) in each part of the map (Kvamme
Table 5-Mink Gain Statistics
Gain Statistic-Mink Model
Very Low
Very High
Table 6-Forest Service Gain Statistics
Gain Statistic-Forest Service Model
Finding a gain statistic, again, is a indicator for a model effectiveness. The
statistic ranges from 1 to far, far into the negative, with a positive result
indicating that that area is restrictive while still ‘catching’ plenty of known site
locations (Kvamme 1983).
These map layers use the same map projection, meaning the inaccuracies
inherent in all map projection are consistent throughout the model. This model is
derived from the work of Mink et. al. (2009), which relies on professional
consultations with archaeologists and archaeological literature to create and
measure their five variables. I began the model by identifying the variables (i.e.
slope, proximity to water, proximity to water confluences, stream order, and
elevation in relation to water) based on the available literature and Mink et. al.
(2009). I then took their ranges and put them in classes (between two and four
for each element), which was then used to establish probability of a prehistoric
archaeological site in a given area.
There is always the possibility that the data are subject to a non-sampling
error. The points I am using to check the veracity of the model have been
inputted by human hands and are thus prone to human error. Another limitation
to the research has been touched upon in the verification section. Though I am
not able to produce the model myself and have someone else apply the model to
the existing data, I believe that the solution is an adequate compromise. The data
did not cost me, the researcher, anything financially, but it took considerable
time to organize the site information with the Head Archaeologist at the National
Forest Service in North Carolina. Also, because I only categorized the data as
‘prehistoric,’ the sites I study do not focus on a specific time period, but on
indigenous sites predating European contact, which encompasses several
thousand years of habitation. However, this model is not meant to find specific
time periods or map habitation patterns at a given time but to predict the
probability that a given location will be an archaeology site in the present.
I am also an admitted newcomer to GIS who is mostly self-taught for the
purposes of this project. The accepted gain statistic of an effective model is 0.75
and the Mink model, obviously, seems to have hit a home run in the Very High
category with a statistic near 1 while the Forest Service Model sits far behind at
0.360. In this instance, I believe that there is a fault in the way that this was
calculated. The site polygons that were accounted for in the Medium and Very
High Category of the Mink map were the same, even though the categories cover
different areas entirely. The Very High category was also less than a tenth of a
percent of the total area covered. Had just one or two sites been found in that
section, it would have given a gain statistic similar to the one I had here. Thus, we
have a model that may, within the one category, be almost impossibly accurate or
there is an error on the part of the researcher. That model's results cannot be
significant in light of this reasoning. The other, however, is not flawed in same
The National Forest Model provides much less of a range of possibility
since their parameters are only two yes or no variables, unlike the Mink model,
which focuses on a range of possible outcomes. But, the entire model caught
more sites in total than the Mink model. This means that this version had covered
a larger area than the other model, but did not have a result extremely weighted
to the high probability areas. The area of the High probability parameter was also
much larger than the Very High Mink Parameter (over 19 billion square feet
compared to nearly 800,000 square feet).
It seems as though the Mink model is much more restrictive or ‘stingy’ in
its allocation of ‘Very High’ probability sites, while still catching four site points
and over 100 site polygons. The Mink model’s outcomes are also interesting
because they are nearly completely restricted to the very edges of river systems.
This is most likely because the topography is so steep in the mountains and this
model was originally designed to the specifications of a more gently rolling
Kentucky landscape.
Even though neither model proved effective in determining the likelihood
of archaeology sites in Pisgah National Forest, there are many possibilities for
further research. The Mink model’s basic variables (proximity to water, slope,
stream order, etc) can be recalibrated for a different topography all together. It
could possibly be applied more broadly, geographically, if the parameters are
edited to match the settlement patterns already known about in the area. In the
future, that model could also be further tested using other locations and
compared with other geophysical methods (i.e. magnetometry, LiDAR data, and
other remote sensing data) to compare and expand its usefulness.
Even though the Forest Service model did not succeed either, the use of
archaeologically sound geographic parameters unique to the locale, are sound. In
fact, this model found over eighty more sites points than the Mink model, which
lends credence to the tailoring of variables to match the environment of the
model. These tenants could then be used in other places to test their
effectiveness. An effective site model can aid in the identification and
preservation of previously undiscovered archaeology sites. Unlike in the
Southwestern United States, where most sites are much more clearly visible from
aerial photography and simple ground surveys (Brandt et. al. 1992), such sites
can be very subtle in the Southeastern United States. A model, which identifies
contextually rich sites without over-representing them, could help surveyors
make more informed decisions when exploring new potential sites.
In general, however, I propose that we spend more time training
archaeologists with GIS, and encourage people to experiment with it, model
building, and fuzzy logic. The more people we put to these problems, and the
more experience we gain in trying to fix them, the likelier it is that we can create a
viable, general method (or even a system of detailed variable tailoring based on
geography that is generalizable) for predicting sites and protecting the sites that
we have yet to discover.
Appendix A
I am a senior at Warren Wilson College working with Dr. David Moore, and I am
writing the senior thesis on predictive modeling in archaeology. This study will
mirror the work of Mink, Ripy, Bailey, and Grossardt’s 2009 paper, Predictive
Archaeological Modeling using GIS-Based Fuzzy Set Estimation. This paper,
which I heard at the Southeastern Archaeology Conference in 2010, experiments
with non-linear statistical modeling techniques using National Forest Service
data. I hope to see if their model is as effective in Pisgah National Forest as it was
in Woodford County, Kentucky. Would it be possible for me to have access to the
prehistoric archaeology sites location data in the Pisgah National Forest? The
data I need are the UTM coordinates for each site.
Sincerely, Maureen Vaughan
Academic Advisor: Dr. David Moore
WWC CPO 6076
PO Box 9000
Asheville, NC 28815-9000
Phone: 828.771.2013
Adèr, H.J.
2008 Chapter 12: Modelling. In H.J. Adèr & G.J. Mellenbergh (Eds.) Advising on
Research Methods: A consultant's companion, pp. 271-304. Huizen, The
Netherlands: Johannes van Kessel Publishing.
Benz, Ursula C., Peter Hofmann, Gregor Willhauck, Iris Lingenfelder, and
Markus Heynen
2004 Multi-resolution, object-oriented fuzzy analysis of remote sensing data for
GIS-ready information. ISPRS Journal of Photogrammetry and Remote
Sensing 58 (3-4):239-258.
Brandt, Roel, Dert J. Groenewouldt, and Kenneth L. Kvamme
1992 An Experiment in Archaeological Site Location: Modeling in the
Netherlands using GIS Techniques. World Archaeology 24 (2): 268-282.
Gardner, Roberta
2007 Social Theory: Continuity and confrontation: a reader. Ontario: Broadview
Harris, T. M. and Lock , G. R.
1995 Toward an evaluation of GIS in European archaeology. The past, present
and future of the theory and applications. In G. Lock and Z. Stancic (eds.),
Archaeological and Geographical Information Systems: a European
Perspective, 349-365. London: Taylor & Francis.
Iliadis, L.S.
2005 A decision support system applying an integrated fuzzy model for long-term
forest fire risk estimation. Environmental Modelling and Software, 20
(5), 613-621.
Mink, Philip B., John Ripy, Keiron Baily, and Ted Grossardt
2009 Predictive Archaeological Modeling using GIS-Based Fuzzy Set Estimation.
Paper presented at the Transportation Research Board Annual Meeting,
Washington, DC, January 11-15.
Minnesotta Department of Transportation., accessed November 21, 2011.
Niccolucci, Franco, D’Andrea, Andrea, and Crescioli, Marco
2001 Archaeological applications of fuzzy databases. Computing Archaeology for
Understanding the Past. Oxford:Archaeopress 107-116.
North Carolina Department of Transportation
North Carolina GIS Archaeological Predictive Model Project,, accessed November 21, 2011.
Perry, George L. W., Sparrow Ashley D., and Owens, Ian F.
1999 A GIS-Supported Model for the Simulation of the Spatial Structure of
Wildland Fire, Cass Basin, New Zealand Journal of Applied Ecology.
Renfrew, Colin and Bahn, Paul
2000 Archaeology: Theories, Methods, and Practice. London: Thames and
Revised Archaeological Predictive Model
2005 Delaware Department of Transportation.
l/301_pred_model_arch_pred_model.pdf/, accessed November 21, 2011.
Schleier, Jonathan
COUNTY, NORTH CAROLINA. The ScholarShip:East Carolina
University’s Institutional Repository., accessed November
21, 2011.
Tobler, Waldo
1993 Three Presentations of Geographic al Analysis and Modeling. University of
California-Santa Barbara: National Center for Geographic Information
and Analysis Technical Report.
United States Department of Agriculture Forest Service
2011 National Forests in North Carolina., accessed
November 21, 2011.
Westcott, Konnie L and Brandon, R. Joe
2000 Practical Applications of GIS for Archaeologists. London: Taylor and
Francis Inc.
Whitley, Thomas G.
2003 Causality and Cross-Purposes in Archaeological Predictive Modeling. Paper
prepared for Computer Applications in Archaeology Conference Vienna,
Austria, April 8-12.
Wiseman, James and El-Baz, Farouk
2007 Remote sensing in archaeology. Boston: Boston University Press.
Xia, Xintao, Wang, Zhongyu, and Gao, Yongsheng
2000 Estimation of non-statistical uncertainty using fuzzy-set theory.
Measurement and Science Technology. 11(4):430-435.