The Prediction of Place: Calculating the Presence of Archaeological Sites in Pisgah National Forest A Thesis Submitted to the Department of Anthropology/Sociology of Warren Wilson College In Partial Fulfillment of the Requirements for the Degree of Bachelor of the Arts In the Department of Anthropology/Sociology May 8, 2012 By Maureen Vaughan Supervisor-Christey Carwile and David Moore 2 Table of Contents Abstract 3 Acknowledgments 4 Introduction 5 Purpose 8 Background 9 Mathematics Behind Predictive Modeling 10 Common uses of GIS in Predictive Modeling 13 Predictive Modeling in North Carolina and Surrounding States 15 Predictive Modeling within the National Forest Service 16 Models 18 Predicting the Model 18 Statistical Analysis 22 Evaluation 23 Limitations 25 Results 27 Significance 28 Appendix 30 Bibliography 30 3 Abstract Utilizing Geographic Information Systems (GIS) in archaeology presents many opportunities for Cultural Resource Management (CRM) and academic research. Because locating and documenting archaeology sites can be challenging, GIS databases have been used to systematically predict these sites based off of a variety of factors previously. In the past, the National Forest Service (NFS) has relied on the instincts and advice of archaeologists to search out sites of potential importance. There is no systematized, statistical approach to predictive modeling in use at the NFS. ‘Fuzzy set estimation’ is a Bayesian method of statistical analysis that was used to create two predictive models of Pisgah National Forest through GIS. These models are both analyzed with fuzzy set estimation, but the variables are based off of the experience and advice of differing archaeologists. One model is based on the methods presented by Mink et al. (2009), while the other revolves around the current survey rules created by the NFS for Pisgah. This thesis then compares these models’ results with surveys conducted by the NFS. 4 Acknowledgments I would like to thank Philip B Mink II for presenting at the 2010 Southeastern Archaeology Conference. I also thank Rob Snedeker and Holly Hixon, who have provided me with invaluable data from the National Forest Service of North Carolina. I would also like to thank the many professors, friends, and family that have listened to me worry and provided many pearls of wisdom that were immeasurably helpful in making this thesis. 5 Introduction We are weaving a story. With every point placed on a map, every potsherd pulled from the ground, every successive layer of soil, we weave the largest, most fascinating tapestry of all time. It is our collective human past, and we, as archaeologists, are constantly trying to fight for the right to protect these remnants of our material past. As budgets tighten around the necks of the government officials in charge of allocating funds and resources, they are faced with picking out what places are extensively explored and which are quickly surveyed before being completely overturned by a construction crew and lost. What are these men and women to do, but use every item at their disposal to streamline the process and shave every last extraneous second? That is why we require models that predict archaeological sites, because we are at war with the resources we possess. For over thirty years, Geographic Information Systems (GIS) has proved to be an incredibly useful tool for archaeological inquiry. This program brings mapping to a new level, providing professionals with the tools to learn more about the geography and human activity in the area before even leaving the lab. GIS databases are designed specifically to aid in the collection, safeguard, and analysis of whatever spatial data the program is given (Renfrew and Bahn 2000:87). With this tool at our disposal, archaeologists can derive statistical analysis out of imputed data, from analyzing artifact distribution to calculating the prospective cost of an archaeological survey. Those involved with Cultural Resource Management (CRM) often take advantage of this and readily fund predictive models for archaeological sites. These maps allow them to find those 6 areas of greatest probable importance when presented with a proposed road or building complex and focus their attention on those places most likely to yield artifacts (Whitley 2003). Because GIS is used to organize complex and vast arrays of data, information is separated into visual, differentiable layers and then combined and edited to find the relationships between them. When mapping individual sites, environmental and artifact data can be inputted into a new map, overlaid with others, and explored more effectively than with pen and paper alone. GIS can also be used to record new sites and combine geographic data with site information. It allows us to document and permanently store sites at the push of a button. With this system, data are easily stored and queried from a database instead of physically separated from other data in text-based databases. Data can be viewed on a much larger context than ever before. These models can be used to estimate the likelihood that hidden there is valuable material culture where the human eye cannot discern the difference between a field and an archaic village. GIS can also model site locations, trade routes, and migration patterns, where information is inputted into a database, then analyzed mathematically to find relationships that could tell us more about the period and place we are exploring (Renfrew and Bahn 2000:260; 551). These models, utilizing various mathematical theoretical backgrounds, can be used to estimate the likelihood that hidden there is valuable material culture where the human eye cannot discern the difference between a field and an archaic village. Fuzzy set estimation, or fuzzy logic, is a type of model more commonly used in modeling biological systems like coral reef growth, and projecting their 7 future development with non-linear relationships between variables, non-linear relations being those that don’t follow an easily discerned path of logic from the initial to the conclusion while linear models require continuous dependant variables (Adèr 2008: 271-304). Because effectively predicting archaeological sites in a given place can be quite difficult with traditional statistical modeling, non-linear models are a more effective way to figure out these progressions. With too many variables and variable interactions to be effectively studied with linear models, changing the way we approach the mathematics of archaeological modeling seems the best way to get results. This research focuses on prehistoric archaeology sites in Pisgah National Forest in western North Carolina, using fuzzy logic to predict new sites for future exploration and protection (Mink et. al 2009). By applying fuzzy logic to topographical models made with GIS, this study examines whether this modeling system and the parameters within it can be used to effectively predict the chances of archaeological sites in a given area using certain environmental factors (like proximity to water, where that water flows, and elevation). This predictive model type has proved more effective than other statistical models in other research, so this project should likewise prove effective. In fact, I am basing one of these models completely on the work of Mink et at from their 2009 paper, Predictive Archaeological Modeling using GIS Based Fuzzy Set Estimation, which I hope to expand upon. However, taking into account differing topography between the original Mink model and Pisagh National Forest, I have also created a set of parameters presently used to predict site locations within the National Forest Service. This was done in an effort to compare, not only the efficacy of fuzzy logic 8 with regards to modeling, but also to find how different parameters can change the results of this model type. Purpose Because many survey methods are expensive and time consuming, many researchers have sought a more cost effective approach to predicting site locations utilizing GIS as a tool. This research examines the previous research of Mink et. al., (which uses fuzzy logic) and asks how reliable it is for use in the Appalachians. Their model was adapted and used in this instance to predict the probability of prehistoric archaeology sites in Pisgah National Forest. Another model was also created utilizing the parameters currently used by the National Forest Service. I investigated the veracity of these models with Pisgah National Park survey data. The methods involved gathering National Forest Service data of prehistoric archaeology sites and comparing actual sites discovered with the projected probability of sites in the National Forest. The models were created using fuzzy logic statistics comparing each model with the archaeology sites already documented and created a gain statistic to test the accuracy. This project compares the productivity of two models, set in Pisgah National Forest, analyzed with fuzzy logic, and utilizing different variables that seek to answer these questions: 1. How accurate is fuzzy logic in identifying archaeological sites accurately in Pisgah National Forest? 2. If variables are changed to coincide with the experience of local archaeologists, will this make the model more or less efficacious? 9 Background Combining archaeology with GIS has proved to be an effective way to categorize and organize data uncovered from excavations, shovel tests, and surveys. GIS can also model site locations, trade patterns, and migration patterns, where information is inputted into a database, then analyzed mathematically to find relationships that could tell us more about the period we are exploring (Renfrew and Bahn 2000:260; 551). Those involved with Cultural Recourse Management (CRM) often take advantage of this and readily fund predictive models for archaeology sites. These maps allow them to find those areas of greatest probable importance when presented a proposed road or building complex, and focus their attention on those places most likely to yield artifacts (Whitley 2003). Surveying is paramount to archaeological inquiry. Reconnaissance surveying is the practice of searching out archaeological sites by looking at the landscape. They demand the careful search for something as prominent as brick walls jutting out of the earth or as subtle as a scattering of projectile points or pottery pieces on the surface. An area is cordoned off, sampled, and examined with shovels, trowels, and human cleverness for signs of occupation. More advanced technologies are also used today to ding and document archaeological sites. There have been great strides made in using these emergent technologies to ensure that everything is done to marry efficiency with thoroughness. Archaeologists use tools like remote sensing, ground penetrating radar, magnetometry, aerial photography, electrical resistivity, and advanced mapping 10 programs to better do their jobs as archaeologists and learn as much as possible while disturbing as little as possible (Renfrew and Bahn 2000:71-105). Using technology with archaeology is certainly an effective tool, but it lacks human reflection and is often considered too environmentally deterministic. Statistical analysis, though a valid and important method of archaeological exploration, cannot be impartial by the very nature of the information we gather. New technology oft times becomes mired in innovation rather than understanding the functionality of the newest “toys.” Without the proper application of mathematical and archaeological theory, these new technologies often prove useless in the pursuit of answers to our archaeological questions. GIS has also been critiqued for its lack of ‘wiggle room,’ so to speak. This computer program cannot ‘see’ the world in the same way that a human archaeologist can and, thus, leaves no room for error or hesitation unless specifically programmed to do so. Thus the mathematical theory, fuzzy logic, has proved to be effective in mapping complex data relationships and uncertainty in GIS (Niccolucci et. al. 2001). Mathematics behind Predictive Modeling The basis of this research relies on statistics and how fuzzy logic can be applied to archaeology. Statistical analysis uses data to answer questions, and it has been used by social scientists since Emile Durkheim used it to find patterns in society (Gardner 2007:25-26; 72-77). The goal of statistics is to answer our questions without becoming too lost in uncertainty (Bolstad 2007). Bayesian 11 models have been successfully used to trace chronologies in archaeology in the past, and I hope to add to this body of work (Buck and Sahu 2000). I will be utilizing Fuzzy Set Estimation, or fuzzy logic, to create each model for this project. Using fuzzy logic in the stead of traditional statistics, we are better able to marry theory with technology. Fuzzy logic requires that we view everything, from what cereal is in your cabinet to the election of the nest president, as a constant array of choices that end up creating the final product. With such a viewpoint, we are better prepared to take a step back and understand how we organize and affect the data we acquire in archaeological research. We choose to place our data in computers, in databases and excel documents that will count our potsherds and debitage, and these choices affect how our data are compiled and analyzed. Every choice made in the long line of judgment calls made by archaeologists, specialists and laymen alike influence how that information is going to turn out. While, it certainly makes exchanging and computing that information very easy, it is not immune to mistakes. All computers are subject to compiling data incorrectly, and no software is ever perfect. This margin of error must be accounted for, or, sometimes, those errors will render your interpretations completely wrong (Niccolucci et. al. 2001). Subjectivity in identification is also especially important because there is no perfect way to identify actual ground cover or land use in the real world. Most of these definitions are merely idealized generalizations of what is actually going on at the ground level. People make mistakes or act inconsistently, and our mathematical approach should reflect that. (Benz et. al. 2004). Data recorded in databases often restrict archaeologists into making concrete value judgments, 12 instead of reflecting their hesitation or uncertainty over their decisions. The subjective opinion becomes objective fact (Niccolucci et. al. 2001). Fuzzy-set theory can be used to characterize non-statistical uncertainty in mathematical models. It allows uncertainty to be estimated under certain conditions. Computer-simulations have been used time and again to prove the validity of fuzzy logic (Xia et. al. 2000). This uncertainty accounted for with fuzzy logic better reflects real life situations in urban planning, medical consultation, coral reef growth, and fire risk estimation. All of these instances require taking for granted that many things are simply impossible to completely, perfectly predict, just as in archaeology, where a perfect understanding of the past is far beyond our reach. (Puente et al 2007, Boegi et al 2001, Meesters et al 2008, Iliadis 2005, Perry et al 1999). Predicting archaeology sites with GIS has been particularly advantageous for various State Department of Transportation agencies, but it can be a difficult endeavor. Models using a purely statistical approach need too many variables to be useful. This research combines Spatial Analyst and fuzzy logic modeling to combine anecdotal evidence with empirical fact. Anecdotal evidence comes from professional archaeologists who worked with Mink et. al. and advised them using their experience with prehistoric archaeology in Kentucky. These conversations focused mainly on measuring river systems. Such variables as distance from rivers and streams, size and order of the waterways, elevation in relation to waterways are important measures of the inhabitability of a given place. As water is a necessary component of life to people regardless of creed or culture, it is a good place to begin. Topographical gradient, 13 as well, is important to long-term settlement. Especially as we are in a mountainous environment, knowing whether a given place is too steep to settle is just as important as availability of water. With these variables in mind, this model, when completed and tested, proved to be more accurate than other statistical models like it (Mink et. al. 2009). Traditional statistical analysis is used primarily in finding a gain statistic after a predictive model has been generated. The gain statistic measures the accuracy and precision of a model’s findings. Having accurate information without the precise areas, or vice versa, makes for a useless model because having accuracy without precision could simply result in blanketing the entire map with a high probability label and precision without accuracy leaves the researcher scratching their head when no known archaeology sites even touch the areas indicated by the model. Finding a gain statistic, while not a definitive indication of the model’s success or failure, is a good beginning measure when looking for model efficacy (Kvamme 1988, Whitley 2003). Common uses of GIS in Predictive Modeling GIS is the vehicle with which this research statistically analyzes the probability of prehistoric site locations in Pisgah National Forest. GIS is often used to organize complex and vast data. Data are separated into visual, differentiable layers and then used to find the relationships between data forms. When mapping individual sites, environmental and artifact data can be inputted into a new map, overlaid with one another, and explored more effectively than with pen and paper alone. GIS can also be used to record new sites, aiding in 14 Cultural Resource Management. With this system, data are easily stored and queried from a database instead of physically separated from other data in textbased databases. Data can be viewed on a much larger context than ever before (Renfrew and Bahn 2000:260; 551). Using GIS in an archaeological context, however, prompts researchers to ask many questions. Is GIS an environmentally deterministic survey tool or not? Is such an approach helpful to archaeologists or just misleading? Do computers make mistakes? What if the archaeologist keying in the data is unsure of his or her answer? Is there room for error? These questions must be explored before the research can go forward. Some researchers have no problem using environmental data to predict archaeology sites, following the processual argument that when studied empirically, people of the past tend to settle in distinct patterns that are dictated by both the physical and social environment. Because the social environment is much more challenging to engineer on a physical map, especially when dealing with prehistoric hunter-gatherers with more ephemeral social environments that are poorly documented in the archaeological record, it is more efficient to rely mostly on the physical environment when making a predictive model (Brandt et. al. 1992:269-270). Many authors have made compelling arguments about the use of GIS as a tool, a crutch, or a shiny new toy amongst professional circles. While providing specific examples with other people’s modeling techniques, other articles focus on broader implications in archaeology and whether or not any similar models are truly as helpful as we think (Westcott and Brandon 2000, Harris and Lock 1995). For example, some are adamant that most of the hopes archaeologists 15 place in GIS are unfounded, that archaeologists that eagerly experiment with GIS are “uncritical” and trying to “understand the ‘past’ using contemporary data” (Ebert 2000 137-140). Such critics of the potential of computer technology state that more research is necessary before predictive modeling can accurately take the motivations of prehistoric peoples into account. The argument is based on the idea that sites are more complicated and varied than such models can take into account. Predictive Modeling in North Carolina and Surrounding States The First statewide predictive model produced was made by the Minnessota Department of Transportation in 1995. It was also the first model to explicitly work with survey bias and identify site potential at both the surface level and at depth (Mn/Model). In Delaware, the State Department of Transportation uses GIS and weighted model parameters to predict archaeology sites. This method takes differing environmental data and proscribes values each part of the variable depending on their importance to settlement patterns in the area like slope, water proximity, and soil permeability. (Revised Archaeological Predictive Model 2005). In Pitt County, a master thesis was completed about predictive modeling that tested the applicability of one model type based in the North Carolina coastal region in the Piedmont region. Not only that, but time specific models were also compared to the more general Coastal Plain Model, which proved that the more specific the variables and parameters of the model, the more accurate that model should be (Schleier 2010). 16 In North Carolina, there is an ongoing project dedicated to predictive modeling created by the North Carolina Department of Transportation. Though its website appears to have been abandoned in the beginning of the last decade, it provides a good base with which I can study predictive modeling that actually uses the same geography as this project. This model depends on using environmental data (such as soil types, elevation, vegetation, water location, and slope) and using that to find patterns among the sites already inputted into the system. Since its creation, numerous research papers have been based off of and expanded the NCDOT predictive model (North Carolina Department of Transportation 2001). In the Great Smokey Mountains National Park, there have been attempts at understanding settlement patterns based on ceramic and lithic analysis, focusing on the procurement of lithic making material instead of the topography of the park (Bass 1977). In South Carolina, however, several different methods were documented in 2006, explaining the pros and cons of each model in the South Carolina Piedmont (Benson 223-232). Predictive Modeling within the National Forest Service On the next page is a map of my testing area (Figure 1), which takes up much of the Western Portion of North Carolina. Pisgah National Forest began in 1911 under the Weeks Act and now encompasses over half a million acres of hardwood forest in North Carolina. This forest covers twelve counties and is famous for its river system and wooded mountain slopes (United States Department of Agriculture Forest Service 2011). 17 Figure 1-Map of Pisgah National Forest At the moment, the National Forest Service does not have a systematized method for identifying prospective site locations using GIS or statistical analysis. Appendix 4 of the National Forest Service of North Carolina Cultural Resources Survey Strategies/Methods is used to highlight potential sites of significance within the National Forest. Each National Forest in North Carolina has differing parameters for survey depending on the park’s geological and geographical location as land in the Piedmont is very different to those in the mountains of Pisgah or the coastal plains of Croatan National Forest (2008). If a site is found, it is assigned a site number and noted on state site forms, topographic maps, and a GIS data layer of cultural sites. This data layer was used to test my models' efficacy as a predictive model (2008). 18 Methods The two models were created using a tri-step method, wherein the variables were first created, then the fuzzy analysis was conducted, only to be finally tested using Kvamme’s gain statistic (Kvamme 1983). Building the Model To begin with, I looked to Mink et. at. (2009) and the National Forest Service to create the variables for these two models. These factors were those considered most important to archaeologists when surveying in Kentucky (Table 1) and the North Carolina National Forest (Table 2), respectively. Here are the two sets of variables I used for this experiment. The Mink model is based solely on that literature’s variables and parameters, which focus mainly on water availability and elevation. Those same basic variables were used in the forest service model, but more simplified, only taking into account proximity to water in feet and slope in degrees percent. Table 1-Mink 2009 Variables Degrees Slope Minutes Walk to Minutes Walk to Distance Above Stream Rank in Strahler Water Confluence Water in Feet Order 1=(VL, <=5) L<=2 L<=10 VL, <=10 L=1 2=(L, 5-10) M=2-4 H=>10 L=10-25 M=2-3 3=(H, 11-20) H=>5 H=25-60 H=>3 4=(VH, =>20) VH=>60 19 Table 2-Forest Service 2012 Variables Forest Service 2012 Variables Degrees Slope L=>30 H<=30 Distance to Water in Feet H<=150 L=>150 For these models I gathered data from the National Forest Service by requesting data via email (see Appendix A) and face to face conversations. There are multiple sources of this sort of data, but I chose to contact The National Forest Service because the service has the most extensive GIS data sets available for this part of North Carolina. In fact, the Western North Carolina Archaeology Office does not actually have these data inputted into a GIS database. The base map, which is a vector map of all National Forest boundaries clipped to the shape of my survey area, was overlaid with North Carolina Department of Transportation LiDAR data. LiDAR data is some of the most accurate elevation data available. These data were separated into county parcels, which I combined into a mosaic, then created a new raster of the twelve counties that encompasses Pisgah National Forest (Transylvania, McDowell, Haywood, Madison, Caldwell, Burke, Yancey, Buncombe, Avery, Mitchell, Henderson, and Watauga counties). With this new raster at hand, I clipped these counties to the shape of my survey area with the vector map of National Forest. With this base in place, I created a layer file for each parameter indicated by Mink et al (2009). Here I will enumerate the tools I used to create each layer. I began by creating stream order (using the Stahler order system, a method used by geologists and researchers to rank streams according to how 20 many waterways flow out of the particular body of water) from the elevation data. I first took that new raster layer I created and Filled it, this corrects the elevation data for any sinks in the data that do not exist in the real world. The next step to create stream order is Flow Direction, which ascertains the direction that each cell of the fill flows into. This is done by calculating the slope of each cell adjacent to a given cell to determine in which direction water would flow on the surface. This is then augmented by the Flow Accumulation, a tool that identifies where the highest and lowest levels of pixel accumulation lie using the previous tool. Where the most accumulation lies is where the stream system is located in the region. A conditional was placed at an appropriate level (in this case 36,000,000 pixels) to separate what is identified as ‘water’ from what is ‘land’ in this model. I used this to identify the stream order. The stream order was then reclassified to fit within the Mink paper’s parameters. The next parameter I chose to attempt was the slope of the Pisgah region. The final elevation was simply run through the slope tool, and then reclassified to fit within the parameters of each model. Minutes’ walk to the nearest water sources and confluences of waterways were also analyzed. The walking speed and horizontal distance from water was measured using Tobler’s method for identifying walking velocity based on terrain (Mink et. al. 2009). This method is derived from a formula meant to measure hiking velocity on hilly terrain and applies it to the topography on the map (Figure 2). The virtue of the equation is that it neatly takes into account the slope of the terrain (Tobler 1993). 21 Figure 2-Hiking on Hilly Terrain Equation (Walter Tobler, 1993 Three Presentations of Geographic al Analysis and Modeling. University of California-Santa Barbara: National Center for Geographic Information and Analysis Technical Report.) Walking distance to water in hours and walking distance to water confluences, or stream systems ranked as 3 or above required more work than a simple slope layer. The first is created by combining the Fill and stream data into the Path Distance tool. Then Tobler’s formula is applied to the vertical factor of this new layer. This was then recalculated (as Tobler’s formula works in kilometers per hour and I required kilometers per minute) and reclassified to conform to the Mink parameters for walking distance. The Forest Service model also required a distance to water variable, but it was considerably simpler to produce. The Conditional layer that showed all water ways in Pisgah National Forest was converted into a vector layer then the river sister was buffered to within 150 feet of water. The walking distance to confluences of water was more complex to begin with, but was otherwise identical to walking distance to water. The Stream Order layer was reclassified to only identify those streams with a rank of three or above, then transferred from a raster to a vector layer, and finally applied to the Path Distance tool in the same way as walking distance to water. The last parameter from the Mink model was the layer that identified the elevation difference with regards to water in feet. This was also identified using Path Distance. The stream 22 data was again used as the basis for this tool, using the fill as a surface raster and the slope layer as the vertical factor, which was then reclassed. Statistical Analysis Mink uses FuzzyKnowledgeBuilder to analyze their data, but that software is very expensive, so I looked for more cost effective knowledge building software. Though I did find one free piece of software and another that was within my budget, the former failed due to technical difficulties and the other’s manufacturers never responded to my requests for more information. Thus I used the Fuzzy Class tools in ArcMap instead. Each variable was run through the Fuzzy Membership tool separately, and then joined them together with Fuzzy Overlay. This single raster was separated into five and two parameters for the Mink model and Forest Service model, respectively. Figure 3-CRM Model Example 23 These maps are presented above (Figure 3), both in close magnification to show how it would be used in CRM work. As these models would be used to gauge whether parcels of land are locations of prospective archaeological sites or not, I have randomly created a thirty acre land parcel, which you can see highlighted in each map. The Mink model suggests that there is only a small chance of a site being located around the river system, but the Forest Service Model suggests that the area is of high potential. Evaluation After completing the analysis, I then inputted the NFS data into a GIS map and then intersected each of the data samples with the Fuzzy Overlay. The NFS data are separated into three different categories: polygons, points, and polygons. They represent all of the archaeology sites recorded in all the North Carolina National Forests, so I clipped the edged to Pisgah National Forest. I also chose to only count the polygon and point layers. The intersection required me converting the raster data (the Fuzzy Overlay layers) into vector data and intersecting each cultural site layer with each Fuzzy Overlay layer. With these new layers available, I queried their attributes to calculate the area and the number of sites located within each layer. 24 Table 3-Mink Profiles and Site Distribution Zone Very Low Low Moderate High Very High Zone Very Low Low Moderate High Very High Zone Profiles-Five Classes Area (square feet) Total Percent of Area 50.479 417,211,600 0 0 27.641 228,456,000 21.782 180,031,600 0.096 796,400 Site Distribution by Sites Count Percent of Total Number of Sites 5 2.128 0 0 115 48.936 0 0 115 48.936 Table 4-Forest Service Profiles and Site Distribution Zone Low High Zone Low High Zone Profiles-2 Classes Total Percent of Area Area (Square Feet) 33,117,981,575 36.07 19,141,131,330 63.93 Site Distribution by Sites Count Percent of Total Number of Sites 85 187 31.25 68.75 I then calculated the gain statistic of each model using this information. The gain statistic measures the accuracy and precision of a model’s findings. It specifically measures the percent of area covered by each part of the map divided by the percent of sites (in polygons and points) in each part of the map (Kvamme 1983). 25 Table 5-Mink Gain Statistics Gain Statistic-Mink Model Value Statistic Very Low -22.235 Low N/A Medium 0.435 High N/A Very High 0.998 Table 6-Forest Service Gain Statistics Gain Statistic-Forest Service Model Value Statistic Low -0.482 High 0.36 Finding a gain statistic, again, is a indicator for a model effectiveness. The statistic ranges from 1 to far, far into the negative, with a positive result indicating that that area is restrictive while still ‘catching’ plenty of known site locations (Kvamme 1983). Limitations These map layers use the same map projection, meaning the inaccuracies inherent in all map projection are consistent throughout the model. This model is derived from the work of Mink et. al. (2009), which relies on professional consultations with archaeologists and archaeological literature to create and measure their five variables. I began the model by identifying the variables (i.e. slope, proximity to water, proximity to water confluences, stream order, and elevation in relation to water) based on the available literature and Mink et. al. (2009). I then took their ranges and put them in classes (between two and four 26 for each element), which was then used to establish probability of a prehistoric archaeological site in a given area. There is always the possibility that the data are subject to a non-sampling error. The points I am using to check the veracity of the model have been inputted by human hands and are thus prone to human error. Another limitation to the research has been touched upon in the verification section. Though I am not able to produce the model myself and have someone else apply the model to the existing data, I believe that the solution is an adequate compromise. The data did not cost me, the researcher, anything financially, but it took considerable time to organize the site information with the Head Archaeologist at the National Forest Service in North Carolina. Also, because I only categorized the data as ‘prehistoric,’ the sites I study do not focus on a specific time period, but on indigenous sites predating European contact, which encompasses several thousand years of habitation. However, this model is not meant to find specific time periods or map habitation patterns at a given time but to predict the probability that a given location will be an archaeology site in the present. I am also an admitted newcomer to GIS who is mostly self-taught for the purposes of this project. The accepted gain statistic of an effective model is 0.75 and the Mink model, obviously, seems to have hit a home run in the Very High category with a statistic near 1 while the Forest Service Model sits far behind at 0.360. In this instance, I believe that there is a fault in the way that this was calculated. The site polygons that were accounted for in the Medium and Very High Category of the Mink map were the same, even though the categories cover different areas entirely. The Very High category was also less than a tenth of a 27 percent of the total area covered. Had just one or two sites been found in that section, it would have given a gain statistic similar to the one I had here. Thus, we have a model that may, within the one category, be almost impossibly accurate or there is an error on the part of the researcher. That model's results cannot be significant in light of this reasoning. The other, however, is not flawed in same respect. Results The National Forest Model provides much less of a range of possibility since their parameters are only two yes or no variables, unlike the Mink model, which focuses on a range of possible outcomes. But, the entire model caught more sites in total than the Mink model. This means that this version had covered a larger area than the other model, but did not have a result extremely weighted to the high probability areas. The area of the High probability parameter was also much larger than the Very High Mink Parameter (over 19 billion square feet compared to nearly 800,000 square feet). It seems as though the Mink model is much more restrictive or ‘stingy’ in its allocation of ‘Very High’ probability sites, while still catching four site points and over 100 site polygons. The Mink model’s outcomes are also interesting because they are nearly completely restricted to the very edges of river systems. This is most likely because the topography is so steep in the mountains and this model was originally designed to the specifications of a more gently rolling Kentucky landscape. 28 Significance Even though neither model proved effective in determining the likelihood of archaeology sites in Pisgah National Forest, there are many possibilities for further research. The Mink model’s basic variables (proximity to water, slope, stream order, etc) can be recalibrated for a different topography all together. It could possibly be applied more broadly, geographically, if the parameters are edited to match the settlement patterns already known about in the area. In the future, that model could also be further tested using other locations and compared with other geophysical methods (i.e. magnetometry, LiDAR data, and other remote sensing data) to compare and expand its usefulness. Even though the Forest Service model did not succeed either, the use of archaeologically sound geographic parameters unique to the locale, are sound. In fact, this model found over eighty more sites points than the Mink model, which lends credence to the tailoring of variables to match the environment of the model. These tenants could then be used in other places to test their effectiveness. An effective site model can aid in the identification and preservation of previously undiscovered archaeology sites. Unlike in the Southwestern United States, where most sites are much more clearly visible from aerial photography and simple ground surveys (Brandt et. al. 1992), such sites can be very subtle in the Southeastern United States. A model, which identifies contextually rich sites without over-representing them, could help surveyors make more informed decisions when exploring new potential sites. In general, however, I propose that we spend more time training archaeologists with GIS, and encourage people to experiment with it, model 29 building, and fuzzy logic. The more people we put to these problems, and the more experience we gain in trying to fix them, the likelier it is that we can create a viable, general method (or even a system of detailed variable tailoring based on geography that is generalizable) for predicting sites and protecting the sites that we have yet to discover. 30 Appendix A Hello, I am a senior at Warren Wilson College working with Dr. David Moore, and I am writing the senior thesis on predictive modeling in archaeology. This study will mirror the work of Mink, Ripy, Bailey, and Grossardt’s 2009 paper, Predictive Archaeological Modeling using GIS-Based Fuzzy Set Estimation. This paper, which I heard at the Southeastern Archaeology Conference in 2010, experiments with non-linear statistical modeling techniques using National Forest Service data. I hope to see if their model is as effective in Pisgah National Forest as it was in Woodford County, Kentucky. Would it be possible for me to have access to the prehistoric archaeology sites location data in the Pisgah National Forest? The data I need are the UTM coordinates for each site. Sincerely, Maureen Vaughan Academic Advisor: Dr. David Moore Address: WWC CPO 6076 PO Box 9000 Asheville, NC 28815-9000 Phone: 828.771.2013 Email: dmoore@warren-wilson.edu 31 Bibliography Adèr, H.J. 2008 Chapter 12: Modelling. In H.J. Adèr & G.J. Mellenbergh (Eds.) Advising on Research Methods: A consultant's companion, pp. 271-304. Huizen, The Netherlands: Johannes van Kessel Publishing. Benz, Ursula C., Peter Hofmann, Gregor Willhauck, Iris Lingenfelder, and Markus Heynen 2004 Multi-resolution, object-oriented fuzzy analysis of remote sensing data for GIS-ready information. ISPRS Journal of Photogrammetry and Remote Sensing 58 (3-4):239-258. Brandt, Roel, Dert J. Groenewouldt, and Kenneth L. Kvamme 1992 An Experiment in Archaeological Site Location: Modeling in the Netherlands using GIS Techniques. World Archaeology 24 (2): 268-282. Gardner, Roberta 2007 Social Theory: Continuity and confrontation: a reader. Ontario: Broadview Press. Harris, T. M. and Lock , G. R. 1995 Toward an evaluation of GIS in European archaeology. The past, present and future of the theory and applications. In G. Lock and Z. Stancic (eds.), Archaeological and Geographical Information Systems: a European Perspective, 349-365. London: Taylor & Francis. 32 Iliadis, L.S. 2005 A decision support system applying an integrated fuzzy model for long-term forest fire risk estimation. Environmental Modelling and Software, 20 (5), 613-621. Mink, Philip B., John Ripy, Keiron Baily, and Ted Grossardt 2009 Predictive Archaeological Modeling using GIS-Based Fuzzy Set Estimation. Paper presented at the Transportation Research Board Annual Meeting, Washington, DC, January 11-15. Mn/Model Minnesotta Department of Transportation. http://www.dot.state.mn.us/mnmodel/, accessed November 21, 2011. Niccolucci, Franco, D’Andrea, Andrea, and Crescioli, Marco 2001 Archaeological applications of fuzzy databases. Computing Archaeology for Understanding the Past. Oxford:Archaeopress 107-116. North Carolina Department of Transportation North Carolina GIS Archaeological Predictive Model Project, http://www.informatics.org/ncdot/, accessed November 21, 2011. Perry, George L. W., Sparrow Ashley D., and Owens, Ian F. 1999 A GIS-Supported Model for the Simulation of the Spatial Structure of Wildland Fire, Cass Basin, New Zealand Journal of Applied Ecology. 36(4):502-518. Renfrew, Colin and Bahn, Paul 33 2000 Archaeology: Theories, Methods, and Practice. London: Thames and Hudson. Revised Archaeological Predictive Model 2005 Delaware Department of Transportation. http://deldot.gov/archaeology/historic_pres/us301/pdf/predictive_mode l/301_pred_model_arch_pred_model.pdf/, accessed November 21, 2011. Schleier, Jonathan 2010 GIS BASED ARCHAEOLOGICAL SITE LOCATION MODELING IN PITT COUNTY, NORTH CAROLINA. The ScholarShip:East Carolina University’s Institutional Repository. http://thescholarship.ecu.edu//handle/10342/3650, accessed November 21, 2011. Tobler, Waldo 1993 Three Presentations of Geographic al Analysis and Modeling. University of California-Santa Barbara: National Center for Geographic Information and Analysis Technical Report. United States Department of Agriculture Forest Service 2011 National Forests in North Carolina. http://www.fs.usda.gov/nfsnc, accessed November 21, 2011. Westcott, Konnie L and Brandon, R. Joe 2000 Practical Applications of GIS for Archaeologists. London: Taylor and Francis Inc. 34 Whitley, Thomas G. 2003 Causality and Cross-Purposes in Archaeological Predictive Modeling. Paper prepared for Computer Applications in Archaeology Conference Vienna, Austria, April 8-12. Wiseman, James and El-Baz, Farouk 2007 Remote sensing in archaeology. Boston: Boston University Press. Xia, Xintao, Wang, Zhongyu, and Gao, Yongsheng 2000 Estimation of non-statistical uncertainty using fuzzy-set theory. Measurement and Science Technology. 11(4):430-435.