Structured Annotation for Land Grant Research: Interdisciplinary Collaboration and Database Modeling for Historical GIS By: Mary B. Ruvane University of North Carolina School of Information & Library Science Date: October 22, 2004 Revisied: November 2004 1 of 37 TABLE OF CONTENTS INTRODUCTION...................................................................................................................... 3 BACKGROUND ........................................................................................................................ 4 PROJECT SCOPE: LOCATING THE INDIAN TRADING PATH ........................................................ 4 LAND GRANT INFORMATION: SOURCES, CHARACTERISTICS & DATABASE CHALLENGES....... 5 North Carolina Land Grant System ................................................................................. 6 Organization & Condition of Land Grant Records .......................................................... 8 18th Century British-American Handwriting ................................................................. 10 Surveying Techniques & Measurement Systems .......................................................... 12 Additional Information Sources for Locating Parcels ................................................... 13 GEOGRAPHER’S INITIAL DATABASE DESIGN......................................................................... 13 DATABASE COLLABORATION ......................................................................................... 14 STEP 1: EVALUATION OF THE GEOGRAPHER’S ORIGINAL DATABASE MODEL....................... 14 Limitations ..................................................................................................................... 15 Structure Evaluation....................................................................................................... 15 Flat-File Model ..................................................................................................... 15 Data Fields: Entity Duplication ............................................................................ 16 Data Fields: Compound Attributes ....................................................................... 17 Data Fields: Commingled Information ................................................................ 18 STEP 2: UNDERSTANDING THE GEOGRAPHER’S INFORMATION NEEDS .................................. 19 Mapping a Tract of Land: Primary Clues ...................................................................... 19 General Clues - Area or Connecting: Land Office, Basin, County, Features ...... 20 Specific Clues - Vicinity or Adjacent: Features, People Names .......................... 21 General & Specific Clues: Comparing Characteristics ........................................ 22 Parcel Shape Clues: Angles, Distances ................................................................ 23 Overlooked Clues: Structure related, bibliographic, annotations .................................. 23 Structural: Multiple Variables, Commingled Data .............................................. 23 Additional: bibliographic reference ..................................................................... 24 New Clues: Entries, Warrants, Deeds and More ........................................................... 24 STEP 3: CHANGES IMPLEMENTED - THE NEW DATABASE MODEL ......................................... 25 The Relational Model .................................................................................................... 25 Limit redundancy ................................................................................................. 26 The Data Entry Form ..................................................................................................... 30 Parse and Import Geographer’s Data ............................................................................. 31 REMAINING OBJECTIVES ................................................................................................. 31 COLLABORATION SUMMARY ......................................................................................... 32 CITED REFERENCES ........................................................................................................... 33 APPENDIX A: DATA ENTRY FORM & ORIGINAL DATABASE FIELDS ................. 35 APPENDIX B: ENTITY RELATIONSHIP DIAGRAM .................................................... 36 2 of 37 INTRODUCTION Last year a PhD candidate in the Department of Geography at UNC-CH, G. Rebecca Dobbs, began researching the relationship of the Indian Trading Path and the role it may have played in the evolution of today’s North Carolina Piedmont urban centers. As an historical geographer, one objective of her ongoing project is to demonstrate the Path’s influence on the settlement patterns of Europeans who migrated into North Carolina during the later half of the 18th century. Using a Geographic Information System (GIS) Dobb’s intends to build a multimedia time-based map in support of her findings to illustrate the physical location of land settled in relation to its date of occupation and proximity to the Trading Path. Her ultimate purpose is to relate the Path’s influence on the early settlers’ site selection to the consequent emergence of present day city hubs in central North Carolina. The majority of information facilitating the task of building the GIS map is being culled from original 18th century land grant documents. Initially the geographer took it upon herself to design a simple Microsoft Access database to store and organize the evidence discovered in each land grant record, but after working with a sample of the records it became clear to Dobb’s that modifications were in order. After discussing the database’s shortcomings with me, a PhD student in information science, we agreed to collaborate on reengineering the geographer’s original data model. As of this writing some major improvements have been implemented while minor updates are pending. As the project progresses additional enhancements will be incorporated as needed. The first half of this paper provides an overview of the geographer’s historical research project, her information needs, the information characteristics, and the shortcomings of her initial database. Part of the original deficiencies centered around the ambiguous nature of historical material, while others were more directly related to the geographer’s limited 3 of 37 experience with relational databases and the modeling of complex data. The remaining sections discuss our collaboration process, the database modifications implemented by the information scientist (me), the results, and implications for future research. BACKGROUND Project Scope: Locating the Indian Trading Path The geographer’s study area is in North Carolina, limited to counties lying in the Piedmont region of the state situated between the mountain region to the west and the coastal plain to the east. It is through this land a section of the Indian Trading Path ran, continuing along its route in the NE into Virginia or along its southerly track crossing in the SW into South Carolina (see: Figure 1). Land grant paperwork processed between 1748 and 1763, coinciding with the most active years of the Granville District Land Office, was chosen by the geographer as the most reliable source for evidence of the Path’s position in relation to the tracts settled. As the project progresses the resources examined and the time period under study will necessarily expand or contract as the study’s objectives dictate. Several historic maps of regional or state scale, along with anecdotal materials, exist to substantiate the Indian Trading Path’s importance during this Colonial settlement period [1-4]. Unfortunately there is no known map at the local-scale needed to illustrate the progression of parcel occupancy as settlers moved in and began making claims to the land. Therefore to effectively observe the influence the Indian Trading Path may have had on a settler’s site selection requires that the geographer document each tract of land to determine its physical size and location in conjunction with its relationship to neighboring properties, geographic features and distance from the Trading Path; these findings will be presented in a digital GIS format. 4 of 37 Figure 1. Route of the Indian Trading Path through NC Piedmont Map Source: based on original by G. R. Dobbs; minor legend modifications by M.B. Ruvane. Land Grant Information: Sources, Characteristics & Database Challenges During the early settlement period of the NC Piedmont Area the distribution of land was recorded, albeit inconsistently, by the administrative entities of the day. Assorted documents exist, collectively referred to as land grant records, which provide information ideally suited for geographically positioning individual land parcels within a GIS map. Especially useful has been parcel measurement information contained in survey documents. Originals of these land grant records are housed in the NC State Archives in Raleigh and are viewable in microfilm format onsite, or reels may be purchased for viewing elsewhere. After struggling with the logistics of travel to the State Archives, dealing with the inadequate equipment, limited hours and the high price for poor quality copies, the latter option was ultimately adopted by Dobb’s. 5 of 37 North Carolina Land Grant System There were two land grant systems in use in the NC Piedmont area during the later part of the 18th century. If land was in the Granville District, which consisted of the northern half Figure 2. 18th Century land administered by two systems of the State, the process involved Lord Granville’s1 agents and was recorded in his books. If land was south of Lord Granville’s Line, the Map Source: G.R. Dobbs process involved Colonial, and later State, agents and their records. In either instance the documents typically fall into one of four categories representing individual stages of the land grant process, which was initiated by the recording of an entry document and culminated with the issuance of a deed. The first stage in the land grant process involved an application for a certain tract of land by a settler, resulting in an entry record of the request. The second stage was the issuance of a warrant by the Land Official, a document authorizing a surveyor to go out and survey the land applied for. Stage three was the survey, performed by an assigned surveyor who would physically measure the land and prepare a document containing his findings. The final stage was the grant of land (or deed), a document authorizing the grantee rights to the property. In the Granville District processing of an individual request for land ideally should have transpired over an 18-month period, the survey completed within 6 months of a recorded entry and the deed issued within 12 months of the survey, but this goal was frequently not realized. According to Mitchell, a ten-year lapse between the date of the survey and the date of the deed was common [5]. 1 Lord Granville was also referred to as John Cartaret, Earl of Granville. 6 of 37 The information contained in these land grant records provides essential clues needed by the geographer to construct her GIS map depicting the locations and sequence of land occupation. For instance the transaction dates, the county a tract resides in, parcel dimensions, people relationships (e.g., grantee, assignee, neighbors, surveyor, witnesses, chain carriers), and feature names and characteristics (e.g., rivers, paths, fields, a ridge) all play a role in piecing the puzzle together. Collecting this evidence requires that each document be manually reviewed by the geographer in order that pertinent data can be recorded in her database. Typically an entry record provides a brief description of the vacant land, the county it resided in (at the time), the estimated acreage, and the name of the person hoping to purchase the land. A warrant supposedly repeats the parcels’ description “exactly” as written in the entry record and is signed by the agent authorizing a survey. The ‘plat of survey’ Figure 3. Example: Plat of survey document documents contain a small \ map of the land and a written description of the property including boundary measurements with directional indicators. In addition the name(s) of the grantee(s) and surveyor are specified along with the total acreage, which often varied considerably from the estimated acreage listed in the entry or warrant. Frequently included in the narrative, or drawn on the plat, are references to geographic and cultural features, neighbors, and chain carriers. The deed documents contain 7 of 37 the signature of the grantee, the indenture agreement2 between the grantee and Land Official, and in records from the Granville District are normally accompanied by a copy of the survey. In most cases the date of each document’s transaction is clearly indicated. One other record, less frequently encountered by the geographer, is a paper assigning the rights to the land to another person, sometimes these transfers are simply noted on the back of a survey or warrant. Organization & Condition of Land Grant Records The majority of original land grant documents housed in the NC State Archives have been permanently removed from circulation for preservation purposes. Instead, to view these materials visitors are provided access to well-worn microfilmed copies of the paper work. The reels are filed by record series based on the issuing Land Office, either the State’s or Lord Granville’s, arranged alphabetically by county, then by the surname of the grantee. Warrant and plat [6, 7] documents are collated together within one record series while the deeds [8, 9] are grouped separately in another. In the Granville records entry documents are also included in the former mentioned series. In some instances the alphabetical order by grantee is not adhered to, especially in the deed records as opposed to the warrants and plats. In the Granville District survey documents can be found in both record series’: the warrants and plats or the grant of deeds. In the deed series’ a survey frequently precedes its related deed on the microfilm, this does not appear to be the case for State records. In the warrant and plat series’ the documents are intermingled, and may or may not be related. While an ‘exact’ duplicate of a survey found in a deed series could be included in the warrant and plat series, the same is not always true in reverse. If a tract was surveyed but ultimately did 2 A fee simple agreement that included an annual quit rent clause authorizing the grantee legal rights to the property. 8 of 37 not convey to the person it was surveyed for the completed survey would only be found in the warrants and plats series’. In the Granville District a surveyor was expected to prepare three copies of each survey: one for the Land Grant Office, the second for attachment to the grantee’s indented deed, and a third for attachment to a duplicate of the deed (signed by the grantee) to be sent to Lord Granville in London (although this did not always happen). Finding three surveys together is a good indicator that the grant of deed transaction never occurred. It has not been determined at this stage of the geographer’s research whether three copies of each survey were required by the Colony’s or State’s Land Office. The NC State Archives cautions researchers that although the entry, warrant, and survey [6] records of the Granville District are filed together, there is no guarantee that contiguous documents refer to the same parcel of land. They further warn that multiple entries ‘of the same date for the same person for land in the same county’ often exist, preventing a precise match between these vague documents and other entry papers, warrants, plats or grants. Additional pitfalls to be mindful of include land interests that may have been assigned to another party and the frequent shifts in county boundaries. It was not uncommon for a parcel to reside in one county at the beginning of the land grant process that midway through became incorporated into another. 9 of 37 The condition of land Figure 4. Example of damaged land grant document grant records varies. While many are intact, others over the course of years and through physical use have faded or been damaged making interpretation difficult. There are some that only remnants of the original document remain, rendering them useless for this project. In a number of cases only portions of the content can be deciphered due to smears and blemishes that have obscured the writing. Adding to these readability challenges are the inherently poor quality of microfilm images and limitations with the equipment available for reading them. To date, the Land Entries, Warrants, and Plats of Survey [6] and the Grants of Deeds [8] series’ related to the Granville District have provided the majority of evidence entered into the geographer’s new3 database. These consist of 14 reels and 19 reels of material respectively, representing thousands of land parcels. The State land records are pending data entry. There are other record series’ that may be incorporated as the project progresses, but suitability of these have not been thoroughly explored. 18th Century British-American Handwriting Deciphering hand-written land grant documents from the colonial days can be quite a challenge, yet with a little practice it usually can be done according to Dobb’s. In the 18th century spelling was not standardized [10]. Words were often spelled phonetically, 3 The geographer’s original database was utilized for entering approximately 800 initial records before migrating her data into the new database designed by the author (me). 10 of 37 abbreviated, or simply shortened with either superscript notation or no indication of the missing letters at all. Within the same document different spellings of the same word can often be found. The lower case s is frequently written in a style that today would be interpreted as an f. Additionally, depending on the penman, certain upper case letters can look similar such as K, P, and R or J and T. An added complication to interpreting these documents, and especially in designing an effective database for comparing similar terms, is that proper names were just as likely to be distorted in this cryptic prose. A persons’ first name was commonly abbreviated, for example Jno could represent the proper name for John, Jonathan, or even Jonas. Last names encountered the same imprecision, for instance there are multiple spellings recorded in the Dobb’s database for Sherrill, such as Sherrill, Sherill, Sherrel, and Shirill. The most common spelling found in today's NC phone directory is the former. Another case in point are the phonetically equivalent spellings of a creek named Lyle, which according to the Getty Thesaurus of Geographic Names [11] is the preferred spelling over the vernacular versions Lyles or Liles. Currently the database includes potential matches such as Lyles, Liles, Lylles, Lyleses, Lillis, Lilses, and Lileses. But whether any of these fuzzy similarities actually refer to the same proper name is yet to be resolved. This ambiguous writing style creates a dilemma when transcribing information for use within a database. While it was tempting from the geographer’s standpoint to ‘substitute on the fly’ a modern translation for an apparently archaic or misspelled word found in a document in order to standardize terms for searching, I recommended she enter information as it is written leaving uncertain language for comparison with the larger body of data as it grows. Any conscious revisions to original spellings presented in the document she agreed would be noted in the future. Regardless of the approach taken a method should be determined in 11 of 37 advance on how to indicate illegible words, uncertain letters, and translations inserted by the data entry person. Where possible the method chosen should be employed consistently and the process clearly documented. Surveying Techniques & Measurement Systems Prior to the 18th century a property line might have been described by ‘the sweep of an arm from a rock by a river to a distant tree’. By the 18th century the common practice in use by surveyors in the 13 colonies and other parts of the eastern states was the more ‘precise’ metes and bounds surveying system [12, 13]. This system incorporated measurements taken between landmarks to more accurately distinguish a property’s boundaries such as ‘…beginning at a red oak, running East 70 [chains] to a pine…’ [14]. The units of measure in the metes and bounds system were based on chains (100 iron or steel links equal to 66 feet long) or poles (a unit of length equal to 16.5 feet). Poles were also referred to interchangeably as perches or rods. Interpreting and recording these parcel measurements is the crux of the geographer’s project, for without the size, shape, and geographic orientation of each tract it would be impossible for her to map them. Chains were typically the unit of measurement employed. Many of the plats in the study area consist of four measured sides that are rectangular in shape; although numerous exceptions can be found where boundaries exceed eight or more measured sides or a meandering stream forms one or more sides of the property line. This distinction becomes important when working with GIS technology. Parcels constructed of enclosed linear boundaries need to be interpreted and drawn as polygons in the system, whereas those with borders defined by waterways initially must be recognized and drawn as lines. More about this later. 12 of 37 Additional Information Sources for Locating Parcels Numerous additional resources have been employed by the geographer to facilitate interpretation of, or expand upon, the information contained in the land grant records. These include but are not limited to cartographic material, books, and manuscripts. Most useful have been maps depicting selected themes in the 18th century drawn at various scales. For example Collet’s 1770 map [1] is a survey of the entire state of North Carolina representing geographic features (e.g., rivers, trees, counties, cultural features, settlements) of the period, while Markham’s maps [2] prepared in 1973 illustrate the location and ownership of land parcels in old Orange County between 1743-1810. An Atlas of Historical County Boundaries [15] details the shifts in NC administrative borders from the time of colonial settlement up to 1998. Ramsey [3] published a book describing the settlement of Rowan County between 1747-1762, which also contains map illustrations. A prime reference is Powell’s North Carolina gazetteer [16], a dictionary of NC names and places including variant historical spellings and aliases. Additional material appears to be available at the Durham County Library, in Durham, NC and in two special collections maintained by the Wilson Library at the University of North Carolina, but as of this writing they have not been fully explored by Dobb’s. These are but a few of the more authoritative sources that add credence to, and assist with the analysis of the evidence found in the land grant records. Geographer’s Initial Database Design At the outset it seemed clear to the Dobb’s that the survey records held the primary content needed for constructing the overall GIS land parcel map. With this in mind the first task she undertook was to become familiar with the survey documents, the evidence they provided, and what data needed to be collected. The geographer then considered her study’s 13 of 37 objectives, sketched out the process, and proceeded to build on her own a database tool for storing parcel measurements and relevant ‘incidental’ clues, such as the names of people, features, and selected characteristics mentioned in each survey. Dobb’s initial design worked well, handily facilitating a method for storing and organizing essential information contained in the documents. As expected, she made a few modifications along the way to improve the data entry process and to incorporate new facts. Unfortunately the first test of her database’s usefulness, after entering records from a sample of the study area, offered insight into its limitations and problems: the database’s information retrieval capabilities were ineffective for her project’s needs. DATABASE COLLABORATION Having previously collaborated before with me [the Author], and encouraged by my enthusiasm to jump right in and help, the geographer agreed to discuss the issues she was having and to explore possible database modifications. Based our first conversation it sounded like the clues she needed for identifying likely parcel adjacencies had not been thoroughly considered in terms of her database’s design. The first step was to evaluate her database’s strengths and weaknesses. The second was to learn about her information needs and understand the characteristics of the land grant documents she was working with (described previously). The third was for me, as the information scientist, to implement changes based on my area of expertise and the geographer’s needs. Step 1: Evaluation of the Geographer’s Original Database Model Dobb’s original database consisted of 44 fields (see: Appendix A). It was evident that the data being collected from the survey documents had been well thought out capturing the 14 of 37 essential evidence required to differentiate each tract, although certain limitations and structure concerns needed to addressed. Limitations As mentioned, the search capabilities of the geographer’s database had proven to be ineffective during a trial run with sample data she had entered. The retrieval problems were due to the variety of methods being employed for storing multiple values, as explained in the next section. The second limitation was the inability to easily incorporate new types of data. This was rooted in the original design’s singular focus on capturing information found in the survey records; the documents initially deemed most significant to the study by the geographer. The need to incorporate content from additional sources had become evident during the process of entering sample survey data from one county. Dobb’s discovered that many of the surveys she was recording in her database provided little detail and other parcels known to exist seemed to be missing entirely. Upon further review it appeared that the entry, warrant, and deed documents held promising information to fill in these gaps, and potentially other resources would be useful. Collecting content from different records had not been planned for in her original database design. To do so would require new fields and a way to differentiate each source. Therefore, the overall goal of our collaboration was to determine a method to incorporate new document types and improve the database’s search capabilities. Structure Evaluation Flat-File Model The geographer’s initial database design was a flat-file, a simple database management model for storing data in one table (see: figure 5). Dobb’s had chosen to work 15 of 37 with MS Access, a relational database application intended for use with multiple tables. The allure of Access was its’ feature for building customized forms to expedite data collection, a characteristic not found in spreadsheet applications typically employed for single table modeling. Although a flat-file provides a good method to store information it offers less flexibility for posing queries, dealing with large disparate quantities of data, and for customizing. As the content collected in each database field began to exceed its original intent and new fields were added the efficiency of the geograher’s single table design had greatly diminished. Data Fields: Entity Duplication Most of the 44 fields in Dobb’s database were redundant: over half were devoted to collecting paired survey measurements while other multiple fields were being used to store people names and feature names (see: figures 5, 6, 7 and 8). Creating separate fields for entities that share common attributes adds unnecessary complexity to a database’s search criteria. For instance, when looking for people with the same name an advanced union query would be necessary to join the seven individual ‘person name’ fields into one list for comparison. To make matters more complicated, content entered into these fields was inefficiently formatted, as described in the next two subsections. Figure 6. Redundant fields used for two entity types: angles and lengths Survey pairs identified in 24 separate fields instead of two: angles and lengths (note: pairs 6 - 12 not shown). 16 of 37 Data Fields: Compound Attributes Figure 5. One table with 44 fields Many of the geographer’s fields held compound attributes in a single field. For example the seven ‘person name’ fields contained full names such as “Robert Samuel Barshear Jr.” or “Reverend John Thompson” (see figure 7) instead of being divided into separate fields (e.g., suffix, first, middle, last, and prefix). This method only allowed for sorting on a person’s first name. For searching by a person’s last name, in a one field entry such as this, the better method would have been to enter the last name first separated by a comma, such as “Barshear, Robert Samuel, Jr.”. Figure 7. Redundant fields used for one entity type: people names People names identified in eight separate fields instead of one (note: surveyor field not shown) For the geographer’s project individual fields are likely to work better in facilitating searches on imprecise data such as peoples’ names. Being able to compare first name, last name, or any combination may assist in finding phonetic matches or possible spelling alternatives. The same single field format was an issue in her feature names’ entries, but these fields had additional troubles (see: figure 8). They not only included entity duplication (e.g., two columns) but also were seriously compromised by commingled data 17 of 37 (discussed below). Fields for people names’ also contained commingled data, but less often. Because of these formatting issues searching for parcels sharing similar features or with related people was essentially futile in the geographer’s original database model. Data Fields: Commingled Information Commingling of unique data within individual fields was causing the most conspicuous impediment to the geographer’s ability to effectively search her database. Multiple people, of the same type, were being entered together into the single field she had allocated for capturing this information. For example a grantee’s name (field titled: surveyed for) occasionally held two or more names formatted as follows: “George Tate [and] John Chew”. Similarly a field for documenting neighbors (field titled: Adjacent to) might contain three or more names such as: “John Beavard, Alexander Osborn [and] John McConil.” Clarification notes were also being added to the mix, especially in fields designed to identify features, causing another snag (see: figure 8). For example one feature field (field titled: Location Keywords) combined the name of a creek, location information, and the geographer’s clarification notes to herself: “head branch of Coddel Creek (now Coddle, Codle on Collet map)” [17]. Fields designed for recording people’s names were not immune to this unstructured annotation practice either, for instance one neighbor’s field contained “Moses Andrew (or near); George Davison must be somewhere near.” These inconsistencies, of entering several entity types along with observation notes into one common field, were prevalent throughout the original database making it a high priority for resolving. 18 of 37 Figure 8. Commingled information in two feature fields: water and transportation Step 2: Understanding the Geographer’s Information Needs The second step was to understand Dobb’s information needs and what clues the new resources she was interested in including might contain. How did the current evidence collected assist with placing a parcel in the correct location on a GIS map? Were there clues in the survey documents that had been overlooked in the first database? What additional clues would the entries, warrants, and deeds provide? What other materials might contain useful information for recording? What types of database searches did she envision employing? These questions sparked a great deal of dialog that ultimately provided the blue print I used for creating the current rendition of Dobb’s new database tool. Mapping a Tract of Land: Primary Clues The root of this collaboration is to facilitate the geographer’s task of positioning individual land parcels in real time and space. Although a comprehensive survey document provides the necessary measurements to reconstruct a parcel’s size and shape, the tract’s physical location often remains uncertain without further investigation. Locating a parcel in real space is contingent upon comparing a variety of details across multiple land grant records. Indications of how each parcel relates geographically are found by identifying tracts that share common characteristics, offering clues ranging from the more general to specific. The greater 19 of 37 the number of shared characteristics the greater the likelihood that those parcels are in the same general area, and ideally adjacent. General Clues - Area or Connecting: Land Office, Basin, County, Features General clues help to divide parcels into broad geographic areas. For example, the land office involved tells you whether the property was in the northern or southern half of the State. Knowing which water basin a tract resided in provides another clue, although the associated basin is not always clear. The county further reduces a tract’s possible position by limiting its location to within an administrative boundary, keeping in mind that border shifts require careful interpretation. Features provide the remaining general clues and in some instances fit the description of specific clues described in the next section. Within the confines of a parcel’s designated land office, basin, and county, the features identified become a crucial aid to further narrow down a tract’s general position. At times a feature’s location is even illustrated on a survey’s plat. Unfortunately not every document identifies a parcel’s features and many that do lack sufficient context for a definitive placement in real space. Features predominantly cited include waterways and transportation routes, followed by less frequently mentioned cultural features such as ‘a mill’, ‘Indian old fields’, or ‘a courthouse’. By extracting the names of features such as rivers, streams, creeks, paths, fords, and roads the geographer can not only compare them with present day map locations and historical records but also with other parcels’ characteristics. An especially valuable clue comes from connecting features, such as waterways, transportation routes, and land office or county lines. These types of features traverse multiple properties along a continuous route, an inextricable link that positions a tract along a common reference. In illustration, one survey document [18] pinpoints a tract’s location 20 of 37 ‘…on the N side of the Catawba River, straddling Buffalo Creek, bordering the Granville Line…’, unfortunately most records are not as precise offering only vague positions such as ‘on the south side of the Yadkin River,’ which could be anywhere along a 203 mile route [19]. Just like a broken strand of beads, restringing these loose pieces into their original order can be virtually impossible without further indicators. Nonetheless features can offer valuable clues, especially in combination with other evidence. Specific Clues - Vicinity or Adjacent: Features, People Names Specific clues help to determine a tract’s position in relation to other parcels’, either by inferring they are ‘in the vicinity of’ or by providing a clearly stated adjacency. The best evidence to establish nearby or neighboring properties is by comparing the names of people associated with each parcel, although at times adjacencies can be surmised based upon adequately described or unique feature clues. The grantee and the surveyor are two ‘types’ of people most consistently recorded within the land grant documents. Less frequently mentioned are the names of bordering neighbor(s), near-neighbor(s), chain carrier(s), assignee(s) and other minor relationships of less value for deciphering positions. As an example of how peoples’ names aid in determining adjacent parcels one survey document points to two neighboring tracts as follows: ‘…to a black [oak on] William Grant’s line then [straight] along said line 48 [poles] to a [red] oak on Hugh Dixon’s line then [east] along said line 16 [poles] to Dixon’s corner…’ [18]. Another combines feature relationships along with a near-by neighbor’s name to suggest an approximate location: ‘…On the N side of the Catawba River, Straddles Third Creek, about 3 miles above Thomas Gilespy's property…’ [20]. Properties not necessarily adjacent, but perhaps in the vicinity of each other may also be uncovered by comparing the names of chain carriers who assisted a surveyor with a 21 of 37 parcel’s measurements. Because long distance travel was not practical during this era it is assumed that volunteer chain carriers lived somewhere near the tract being surveyed. The exception might be a chain carrier with the same last name as a grantee’s, for he presumably was a resident member of the family the land was being surveyed for. General & Specific Clues: Comparing Characteristics Aside from the rare document providing precise location information for a particular parcel, pinpointing the actual position of the majority of tracts requires some detective work. Starting with records containing the most productive evidence, whether general or specific, features or people, the process entails frequent back and forth comparison in an attempt to first cluster likely groups of properties followed by arranging them into their original configuration. Although the results may be initially fuzzy by analyzing those parcels that share general and continuous features in conjunction with those likely to be adjacent or near-neighbors, the tedious job of placing each tract onto a GIS map usually begins to work. To evaluate the evidence collected a relational database is ideally suited for generating these comparisons. For example, with my assistance, Dobb’s could develop a two-part query based on general clues. First, find all records issued by the Granville Land Office that lie in the same water basin within a designated county and contain the Yadkin River. Second, select from these records only those listing neighbors, containing one or more additional common feature, or any other criteria deemed relevant to achieve the results sought. Alternatively, a query could be written using specific clues to find all documents that include neighbors and/or chain carriers. Followed by a sub-query to compare any general features they may have in common that indicate a possible adjacency. 22 of 37 Parcel Shape Clues: Angles, Distances The shape of a parcel offers a visual location clue. The angles and lengths provided in each survey document are extracted from the database into an application tool that generates GIS compatible shape files4 [21], resulting in a ‘puzzle piece’ yet to be fitted into the picture. Although not as distinctive as a finely carved jigsaw piece, one can be certain that a meandering river boundary does not adjoin a property whose border edge is linear. Parcels with unusual angles or irregular boundaries provide similar incompatibility clues. Overlooked Clues: Structure related, bibliographic, annotations In discussions surrounding the original database’s recorded evidence Dobb’s indicated a few items that had been overlooked or were causing problems. Some of these issues related to structural inconsistency discussed earlier, others were additional clues identified for incorporation into the new database. Structural: Multiple Variables, Commingled Data Multiple variables require special handling in the design of a database. In this project evidence such as the county a parcel resided in or the date of a survey at times falls into more than one category. For example, some parcels were associated with multiple counties either because of boundary uncertainty or an official shift in administrative borders during the land grant process. In other cases a document may have conflicting dates, where the front of a record indicated one month, day and year, and the back another. Structural errors such as these complicate access to anomalous types of information. 4 A shapefile stores nontopological geometry and attribute information for the spatial features in a data set. The geometry for a feature is stored as a shape comprising a set of vector coordinates 23 of 37 Commingled data within one field essentially nullified the value of the primary evidence recorded. As discussed in the previous structure section, several forms of this dilemma existed in Dobb’s original database. The primary names of features, such as waterways or transportation routes were typically intermingled with directional information. Other fields included primary names intermixed with various types of annotations, including personal reminders, citations to additional information, or notations concerning transcription uncertainty such as spelling or legibility. It was clear these pieces of evidence needed to be separated into new fields to improve their benefit to the goal of building a GIS map. Additional: bibliographic reference As the Author, I wanted to expand upon the one field Dobb’s was using to capture bibliographic information to insure each source and its’ location was properly identified. The existing institution field needed to be capable of storing multiple organization names and possibly instances where the same document might be owned by more than one, even though the State Archives in Raleigh would likely provide most of the material. The format of material also needed to be documented, such as whether it was a microfilm record, book, manuscript, or map. Another field equally vital to incorporate was a place for storing call numbers and related descriptions to validate the source and allow Dobb’s to return to the evidence at a later date. New Clues: Entries, Warrants, Deeds and More Our discussions included many conversations on what new information the entries, warrants, deeds and potentially other documents might provide. The content they contained seemed to overlap with most of the ‘entity types’ already being recorded in Dobb’s original database, with the exception of the parcel measurements. For instance she wanted to record 24 of 37 the type of document, the associated date(s), an extract of selected content, the features noted and people mentioned. This request could easily be handled by adding a ‘type’ field (e.g., document type, date type, person type, etc.) to differentiate the evidence pulled from the anticipated new mix of documents. Step 3: Changes Implemented - The new Database Model The third step was to remodel Dobb’s database to take advantage of the relational database application, better structure the field content, and address her expanded information needs. Based on our discussions I initially created an entity relationship diagram to illustrate the conceptual changes I proposed (see: Appendix B). After some back and forth discussions, which initiated several modifications to the diagram, the final rendition was used as a blueprint for designing the new database model described below. The Relational Model A relational database application, such as MS Access, typically employs multiple related tables. This approach cuts down on data redundancy, is well suited for handling multiple values that cause anomalies, and provides a method to maintain relational integrity. To remove as much redundancy as possible and accommodate the variety of multiple values five primary tables (e.g., document, parcel, people, features, survey) were created with links to 13 related subcategory tables (see: figure 9). This clearly differs from the Dobb’s original one table model illustrated previously (see: figure 5). Although some tables purposely still contain redundancy, to simplify the query building process from a geographer’s standpoint, the fields involved are coupled with underlying integrity rules to insure reliability. Figure 9. New database’s multiple table model 25 of 37 Limit redundancy Multiple values were causing the majority of redundancy problems in Dobb’s original single table model. In a flat-file these values can only be handled in one of three ways: by adding columns (like: figure 10a), by commingling all values in one cell (like: figure 10b), or by using multiple rows (which Dobb’s had not done). Each approach causes problems or creates anomalies when updating, adding, or deleting subsequent data [22]. Figure 10. Multiple value redundancy inherent in Dobb’s flat-file model (a) Multiple columns in single row. (b) All names in one column, in single row. 26 of 37 Limiting these types of redundancy in the new database was a priority, especially for fields containing critical clues such as people and feature names. In the geographer’s original database duplicated data was unlinked and appeared across and down columns as illustrated in the fields for people names and feature names (see: figures 7, 8, and 11a). In the new database several linked tables were employed for connecting people and feature elements. For example seven new tables were created for organizing information related to people: people, people associated with, prefix, first, middle, last and suffix (see: figure 9). For features six new tables were created: feature, feature locators, locator terms, primary name, suffix, and feature type (see: figure 9). The new primary entity table for people, PEOPLE, stores one unique name per row and separates the compound name elements into their smallest units (see: figure 11b). The PEO_ASSOC_WITH table links the unique person (PerID) from the PEOPLE table with all documents containing the same name, as well incorporates new fields for relation type and related comments. The relation type identifies what role a person played in the processing of a particular parcel and the related comments field eliminates the commingling of annotation data (see: figure 11c). The five remaining tables associated with people names (e.g., prefix, first, middle, last, suffix) store individual name elements and are called upon to establish integrity each time a new unique full name is required in the PEOPLE table. 27 of 37 Figure 11. Original single table vs. New People & People Associated with Tables b. New database: PEOPLE table a. Portion of geographer’s original database c. New database: PEO_ASSOC_WITH table (a) Redundancies in “surveyed for”: Mordecai Mendenhall (IDs 436, 435), William Morrison (IDs 446, 447); and between columns Alexander McCulloch is ID 619 in “surveyed for” then ID 426 in “adjacent to”. (b) The name William Morrison is now associated with the unique ID 189. (c) Note William Morrison’s relationship with multiple properties – these are but a few. The new primary entity table for features, FEATURE, stores one unique feature per row, separates compound feature elements into their smallest units, and uses a category field to identify the type of feature (e.g., water, transportation, cultural, place name, etc.) (see: figure 12b). The FEA_LOCATORS table links the unique feature (FeaID) from the FEATURE table with all parcels containing the same feature and incorporates new fields for description [location] terms and related comments. The description terms identify a parcel’s position in relation to the feature identified and the comment field eliminates the need to commingle annotation data (see: figure 12c). The four remaining tables associated with feature names (e.g., locator terms, primary name, suffix, and feature type) store individual name elements and are called upon to establish integrity each time a new unique term is required in the FEATURE or FEA_LOCATORS tables. 28 of 37 To illustrate the benefit of the new database's multiple-table model take a look at figure 12. Start by looking at ID159 in figure 12a, Dobb’s original single-table model, and compare it to the new multi-table output shown in figure 12d. In the new database ID159 has taken on a new role as the document ID (.docID) and in the query results displays on three separate lines, which serves to associate a document with each unique feature it contains, commonly referred to as a one to many relationship. This type of visual output would have been impossible to perform using the original data structure. Figure 12. Original single table vs. New Features & Fea_Locator Tables a. Dobb’s original feature fields; 2 separate columns with commingled data b. New primary feature table c. New linking feature table d. Query joining several tables to display related feature characteristics 29 of 37 The other three new primary entity tables (e.g., DOCUMENT, PARCEL, and SURVEY) designed for storing categories of related content were handled in a similar fashion (see: figure 9). Each connected to tables that provided integrity checks specific to them, as well linked back to the other primary tables based on the relationships formed between matching unique IDs. The Data Entry Form A new form was designed to facilitate Dobb’s data entry process (see: figure 13). This is similar to the one employed her initial database (Appendix A), although the new form addresses the variety of multiple values and is designed to handle additional source material. Currently evidence extracted from each land grant document is being entered using this form, which automatically populates the underlying new tables. Figure 13. New data entry form 30 of 37 Parse and Import Geographer’s Data Finally, once I had created the form and Dobb’s had tested it, the last and most difficult step was to parse and import the original survey data she had entered into her old database. There were over 300 records and 44 fields containing commingled data stored across and down multiple columns. This was no easy task, while some of the process could be automated the majority required manual intervention. But with patience and perseverance the process slowly was completed. Each field holding compound attributes was parsed into their simple elements, commingled data was separated, and data spread across multiple columns were joined into one field. From there the data could be imported into the new tables and fields created just for them. REMAINING OBJECTIVES There are a few data entry modifications Dobb’s would like to see implemented, mostly features identified as “nice to have.” Where feasible these will be incorporated in the near future. Additionally, due to the author’s time constraints the new form temporarily requires extra steps to complete certain entries, such as looking up previously entered unique IDs for people and features. Future adjustments are in the works to address this inefficiency as well other tasks that could be automated with the incorporation of additional programming code. At this time queries for identifying parcels with shared characteristics have not been automated for easy use by Dobb’s. This is partly because the data entry process is still proceeding and as she becomes more familiar with the evidence her requirements continue to change. Additionally, several fields were added as placeholders in the database for linkage to related fields once the majority of data has been entered, such as a parcel identifier to bind 31 of 37 multiple documents (e.g., entry, warrant, survey, deed) to an individual parcel and alias fields for connecting the variant spellings of people and feature names. A few other goals, not part of the original collaboration objective, include: determining a method for automatically exporting parcel measurements from the database into the application tool that generates the GIS shapefiles, linking images of each parcels’ shape to the database, and adding additional fields to hold derived information based on content stored in the database (e.g., Julian date conversions, measurement conversions, and fields for holding concatenated data). COLLABORATION SUMMARY Since implementing the new multi-table model the Dobb’s has successfully entered over 1500 new records. A key component to the successful implementation of this new database model was the open and frequent communication between the author and Dobb’s. Even though at times our field specific jargon, related to geography or information science, created barriers to understanding each other’s processes and objectives we both agree that the results met the purpose of our collaboration as outlined at the onset. In fact, we’re looking forward to continuing our partnership to tackle the remaining objectives outstanding. 32 of 37 CITED REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. Collet, C.J., A Compleat Map of North Carolina From an Actual Survey. 1770, S. Hooper: London, England. Markham, A.B., Land grants to early settlers in old Orange County, North Carolina: parts of present Orange, Chatham, and Durham counties, period 1743-1810. 1973, A.B. Markham: Durham, NC. Ramsey, R.W., Carolina Cradle: Settlement of the Northwest Carolina Frontier, 17471762. 1964, Chapel Hill, NC: University of North Carolina Press. Humber, J.L., Transportation & Settlement [map], 1660 -1775, in Atlas of North Carolina. 1967, University of North Carolina Press: Chapel Hill,. p. 39-40. Mitchell, T.W., The Granville District and Its Land Records. North Carolina Historical Review, 1993. 70: p. 103-129. Granville Proprietary Land Office: Land Entries, Warrants, and Plats of Survey, in State Agency Records. Secretary of State Record Group. NC State Archives. 17481763: Raleigh, NC. Land Office: Land Warrants, Plats of Survey, and Related Records, in State Agency Records. Secretary of State Record Group. NC State Archives. 1679-1959: Raleigh, NC. Granville Proprietary Land Office: Granville Grants of Deed, in State Agency Records. Secretary of State Record Group. NC State Archives. 1748-1763: Raleigh, NC. Land Office: Patent Books (Land Grant Record Books), in State Agency Records. Secretary of State Record Group. NC State Archives. 1693-1959: Raleigh, NC. How to Read 18th Century British-American Writing. 2004, Film Study Center at Harvard University (developer) and the Center for History and New Media at George Mason University (host & maintainer). [TGN] Getty Thesaurus of Geographic Names On Line. 2003, The J. Paul Getty Trust. unknown, Changing Chains, The Virtual Museum of Surveying. Broyles, S., Metes and Bounds Surveys, Direct Line Software. Anson County. Feagly, Peter: Parcel survey. Granville Proprietary Land Office: Land Entries, Warrants, and Plats of Survey, in State Agency Records. Secretary of State Record Group. NC State Archives. 1748-1763: Raleigh, NC. Long, J.H., Atlas of Historical County Boundaries: North Carolina, ed. J.H. Long and G.C. DenBoer. 1998, New York: Charles Scribner's Sons. Powell, W.S., The North Carolina Gazetteer: A Dictionary of Tar Heel Places. 1968, Chapel Hill,: University of North Carolina Press. Anson County. Berrey, Thomas: Parcel survey. Granville Proprietary Land Office: Land Entries, Warrants, and Plats of Survey, in State Agency Records. Secretary of State Record Group. NC State Archives. 1748-1763: Raleigh, NC. Anson County. Graham, Richard: Parcel survey. Granville Proprietary Land Office: Land Entries, Warrants, and Plats of Survey, in State Agency Records. Secretary of State Record Group. NC State Archives. 1748-1763: Raleigh, NC. Yadkin-Pee Dee River Basin, Office of Environmental Education, Department of Environment and Natural Resources. 33 of 37 20. 21. 22. Anson County. Blain, George: Parcel survey. Granville Proprietary Land Office: Land Entries, Warrants, and Plats of Survey, in State Agency Records. Secretary of State Record Group. NC State Archives. 1748-1763: Raleigh, NC. ESRI Shapefile Technical Description: An ESRI White Paper. 1998, Environmental Systems Research Institute, Inc. Roman, S., Access Database Design & Programming. 2nd ed. 1999, Sebastopol, CA: O'Reilly. xx, 409. 34 of 37 APPENDIX A: Data Entry Form & Original Database Fields Figure 14. Geographer’s Original Data Entry Form Note: Several fields added by reseracher at a later date (e.g., prior occupant, etc.) Figure 15. Geographer’s Original Database Fields 35 of 37 Derived Chains Acres County Riv er T ype Stream Transportation Other (Topography , Built Landscape, etc.) Length Basin Angle FeatureID Grantor Surveyed by 1, 2, 3, etc. Measurements Primary M + Riv er Stream Secondary Creek (sfx) Fork Path Road *add'l user specif ied characterized_by/ charcterize FEATURES Name parcel(s) MAY hav e 0:M f eature(s) = M f eature(s) MUST hav e 1:X parcel(s) M 1 LAND PARCEL surv ey MUST hav e 1 parcel described in/ describes SURVEY parcel MAY hav e 1:M surv ey s 1 Full name Locating Descriptor(s) (Primary + 2ndary) Descrpt No. Descrpt Term T erm Sequence Description microf ilm, manuscript, book, etc Describes/ Described in Entry Warrant Type Shuck Surv ey Grant/deed *add'l user specif ied Draw as Poles, Chains, or Rods Format State Archiv es, Library , etc. Institution/ Archive 36 of 37 Catalog No. Fir_Alias Record Identifyer Comment Prefix Doc Details Month M Day First Mid_Alias M Middle Name PEOPLE M Associated With DOCUMENT/ RECORD Year Dates Type Last_Alias (doc, entry ) Last Month Relationship Suffix documentID PersonID Derive OS Comment Day Type Assignee Attestor Grantee Chain Carrier Neighbor-adjacent Neighbor-near *add'l user specif ied APPENDIX B: Entity Relationship Diagram Year Line Poly gon Appendix B: Entity relationship diagram Ex: Feature is in the N, NE, NW, etc. quadrant of parcel; Or f eature is N, NE, NW - OFparcel; or straddles a f eature (such as stream?) Unit Type Full Descrip. type Brief Verso Improv ement Notes *add'l user specif ied Comment