Department of Computer Science & Engineering 2009-11 Automated Archeological Survey of Ancient Irrigation Canals Authors: Joseph Izraelevitz Abstract: In Ancient Southern Mesopotamia (now Iraq), rainfall was never high enough to sustain agriculture, so for thousands of years people used a system of canals to irrigate their fields. These canals, built up over generations, have long formed the basis of settlement patterns and state relations in the region, making their location and mapping of key interest to archaeologists. However, since 1991, western archaeologists have not been allowed into Iraq, and consequently have been unable to conduct ground surveys. This work focuses on automating the detection of these critical irrigation canals. Owing to a process of silt deposition, the canals gradually raise their banks above the surrounding plain. Using free elevation data collected by NASA in 2000, we use a combination of image filters and tracing algorithms to detect and highlight the ancient canals, quickly scanning an area for archaeologically relevant details. Our method was generally successful in detecting very large and old canals, but data noise and resolution prevented the detection of smaller features. Type of Report: MS Thesis Department of Computer Science & Engineering - Washington University in St. Louis Campus Box 1045 - St. Louis, MO - 63130 - ph: (314) 935-6160 WASHINGTON UNIVERSITY IN ST. LOUIS School of Engineering and Applied Science Department of Computer Science and Engineering Thesis Examination Committee: Robert Pless, Chair Ron Cytron Tao Ju Automated Archeological Survey of Ancient Irrigation Canals by Joseph Herman Izraelevitz A thesis presented to the School of Engineering of Washington University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE May 2009 Saint Louis, Missouri ABSTRACT OF THE THESIS Automated Archaeological Survey of Ancient Irrigation Canals by Joseph Herman Izraelevitz Master of Science in Computer Science Washington University in St. Louis, 2009 Research Advisor: Professor Robert Pless In Ancient Southern Mesopotamia (now Iraq), rainfall was never high enough to sustain agriculture, so for thousands of years people used a system of canals to irrigate their fields. These canals, built up over generations, have long formed the basis of settlement patterns and state relations in the region, making their location and mapping of key interest to archaeologists. However, since 1991, western archaeologists have not been allowed into Iraq, and consequently have been unable to conduct ground surveys. This work focuses on automating the detection of these critical irrigation canals. Owing to a process of silt deposition, the canals gradually raise their banks above the surrounding plain. Using free elevation data collected by NASA in 2000, we use a combination of image filters and tracing algorithms to detect and highlight the ancient canals, quickly scanning an area for archaeologically relevant details. Our method was generally successful in detecting very large and old canals, but data noise and resolution prevented the detection of smaller features. ii Acknowledgments I would like to thank the many people who helped not only in the writing of this thesis but my entire college journey. Consequently, I would like to thank the people who made it possible for me to come to Washington University in St. Louis, especially Andrea Heugatter and Dean J. Chris Kroeger. Another thanks goes to the cadre and staff at Gateway Battalion, in particular MAJ Dave Owens and LTC Warren Griggs, who taught me all about leadership and confidence. For their help on this project, I would like to thank Professor Carrie Hritz, who inspired this project, Troy Ruths, who helped in the early stages, and my mentor, Professor Robert Pless. I would also like to thank my good friends who kept me happy through the rigors of school, my rabbi, Tzvi Schwartz, and my grandparents, Shep and Faith Ellis. Finally, I would like to thank my two brothers, Jacob and Adam, and my parents, for their love, care, and concern. Joseph Herman Izraelevitz Washington University in St. Louis May 2009 iii Dedicated to my parents. iv Contents Abstract.......................................................................................................................................... ii Acknowledgments ..................................................................................................................... iii List of Figures ........................................................................................................................... vii 1 Introduction .......................................................................................................................... 1 1.1 Problem ......................................................................................................................... 1 1.2 Approach ....................................................................................................................... 2 1.3 Results ............................................................................................................................ 3 2 Archaeological Background ............................................................................................. 4 2.1 Mesopotamia ................................................................................................................. 4 2.2 Geography ..................................................................................................................... 6 2.3 Canals ............................................................................................................................. 7 3 Computerized Survey ......................................................................................................... 8 3.1 Advantages of Computerized Survey ........................................................................ 8 3.2 Disadvantages of Computerized Survey ................................................................... 8 3.3 Implementation ............................................................................................................ 9 4 Data………. ......................................................................................................................... 10 4.1 SRTM Data ................................................................................................................. 10 4.2 Other Sources of Data............................................................................................... 11 4.3 Ground Truth ............................................................................................................. 11 4.3.1 Ground Surveys ............................................................................................ 12 4.3.2 Aerial Surveys ................................................................................................ 12 4.4 Test Sets ....................................................................................................................... 13 5 Failed Approaches ............................................................................................................. 16 5.1 Threshold .................................................................................................................... 16 5.2 Threshold with Fitted Plane ..................................................................................... 18 5.3 Hough Transform ...................................................................................................... 20 5.4 Watershed .................................................................................................................... 21 6 Final Algorithm .................................................................................................................. 23 6.1 Preprocessing .............................................................................................................. 23 6.1.1 Failed Approaches ........................................................................................ 23 6.1.2 Fourier Transform ........................................................................................ 24 6.1.3 Tiling ............................................................................................................... 26 6.2 Normalized Cross Correlation ................................................................................. 26 v 6.3 Tracing Walk ............................................................................................................... 31 6.3.1 Tracing Step................................................................................................... 32 6.3.2 Wrapping the Step ........................................................................................ 34 7 Results and Analysis ......................................................................................................... 35 7.1 Results .......................................................................................................................... 35 7.2 Comparison to Ground Truth ................................................................................. 36 7.3 Analysis ........................................................................................................................ 39 8 Further Application ........................................................................................................... 40 8.1 Data…… ..................................................................................................................... 40 8.2 Results and Analysis…….......................................................................................... 41 9 Conclusion... ........................................................................................................................ 45 References .................................................................................................................................. 46 Vita ................................................................................................................................................ 47 vi List of Figures Figure 2.1 Topographic Map of Iraq......................................................................................... 6 Figure 4.2 Ground and Satellite Surveys in Area around Nippur ....................................... 13 Figure 4.2 Shatt al-Gharraf Dataset ......................................................................................... 14 Figure 4.3 Nippur Dataset ........................................................................................................ 15 Figure 5.3 Thresholded values on Nippur dataset ................................................................ 17 Figure 5.2 Blurred and thresholded values on Nippur dataset ............................................ 18 Figure 5.3 Thresholded values above a linearly fitted plane in area around Nippur........ 19 Figure 5.4 Nippur dataset. Data was blurred, fitted to plane, then thresholded. ............. 20 Figure 5.5 Hough transform on blurred, plane fitted, and thresholded Nippur dataset. ........................................................................................................ 21 Figure 5.6 Watershed algorithm on Nippur dataset at blur filters of size 20 and 100 pixels. ....................................................................................... 22 Figure 6.1 Fourier transform of Shatt al-Gharraf dataset. ................................................... 25 Figure 6.2 Comparison of original and Fourier adjusted data. ............................................ 26 Figure 6.3 Normalized cross correlation filter. ...................................................................... 27 Figure 6.4 Filter response at zero and ninety degrees on Nippur dataset.......................... 28 Figure 6.5 Aggregate filter response on Nippur dataset. ...................................................... 29 Figure 6.6 Oriented filter response. ......................................................................................... 30 Figure 6.7 Scaled filter response on small section of the Nippur data set ......................... 31 Figure 6.8 Normalized cross correlation filter response at intersection. ........................... 32 Figure 6.9 Traced Canals of Nippur Dataset. ........................................................................ 34 Figure 7.1 Traced Canals with Filter Response on Shatt al-Gharraf Dataset ................... 36 vii Figure 7.2 Comparison of Final Result to Adams Ground Survey .................................... 37 Figure 7.3 Comparison of Final Result to Adams Satellite Survey ..................................... 37 Figure 7.4 Comparison of Final Result to SRTM Data ........................................................ 38 Figure 7.5 Comparison of Final Result subsection to SRTM Data .................................... 38 Figure 8.1 Comparison of Visual Spectrum and Elevation Data near Zaozhuang .......... 41 Figure 8.2 Canal Grid In Visual Spectrum ............................................................................. 41 Figure 8.3 Comparison of Visual Spectrum and Filter Response near Zaozhuang ......... 42 Figure 8.4 Final Result near Zaozhuang ................................................................................. 43 Figure 8.5 Comparison of Visual Spectrum and Traced Canals near Zaozhuang ............ 44 viii Chapter 1 Introduction 1.1 Problem Mesopotamia, a region that now stretches across Iraq, Syria, and parts of Jordan, is the home of the earliest civilizations. The region is bordered by two rivers, the Euphrates to the south and the Tigris to the north, and forms a crescent from the Persian Gulf to the Mediterranean. In ancient times Mesopotamia was blessed with extremely good soil and the ancestor grains of both wheat and barley. However, only in the northern sections of Mesopotamia can agriculture be sustained with rain. In the south, where the most fertile soil is, there is not enough rain to grow crops. The Euphrates is a slow moving, snow-fed, muddy river. The silt it carries is a major reason for the good soil on its banks. The silt also has another effect. As the river flows, it deposits silt along its bed, gradually raising itself. Periodically, when the river floods, it also deposits silt along its sides, or levees. The combination of this silt deposition means that the Euphrates, over time, has raised itself above the surrounding plain. The Tigris, being a much faster river, experiences an opposite effect, and has cut a bed for itself below the plain. The silt deposition phenomena of the Euphrates made it ideal for early agriculturalists. By cutting small channels in the levees, they could bring water into their fields, without needing any sort of pump. Indeed, until relatively modern times the Tigris was not used for agriculture. Consequently, Southern Mesopotamia, now the south eastern corner of Iraq, contained some of the most elaborate irrigation works of the ancient world. Canals, dikes and 1 ditches crisscrossed the open plain, providing water for crops and draining the flat fields. The social importance of these waterworks went beyond simple irrigation. The canals influenced settlement patterns, state relations, and governmental control. The earliest recorded war in human history was over water rights in Southern Mesopotamia. Since the location of these canals is important in knowing and understanding the history of this region, several archaeological surveys have been conducted, identifying sites and waterways. This method is especially useful in the area just south of Bagdad, which is a desert and easily traversable in vehicles. Unfortunately, since 1991, Iraq has been closed to western archaeologists. Furthermore, much of the region of historical interest is now covered by agricultural fields, which cannot be easily surveyed by ground. 1.2 Approach The idea behind this research was to automate the detection of Mesopotamian canals, not only to speed the detection but also to apply the algorithm to other regions. If successful, such an algorithm would be able to map out these critical archaeological features and provide useful information to historians and archaeologists. This goal is achievable because the same silt deposition that occurs along the Euphrates also happens to the canals that feed off of it. Consequently, the canals also raise themselves up above the fields. This phenomenon allows the canals to be seen in terrain elevation data collected by satellite (Hritz and Wilkinson). While not at as high a resolution as regular images, elevation data has the advantage of eliminating a lot of noise that would otherwise obscure the canals. To extract the canals from the elevation data, we tried a number of image processing techniques. In the end, a combination of techniques was required to eliminate noise, filter the canals, and then identify the features. 2 1.3 Results The final algorithm had some success in identifying canals in Mesopotamia. In particular, it was able to find the large, wide, and old canals fairly easily. These were the canals most easily visible in the elevation data. They also tended to be those canals that had large numbers of archaeological sites next to them, implying they were occupied for long periods and supplied a large number of fields. Smaller canals were lost in the noise of the data or too small to be easily identified at the resolution of the data. 3 Chapter 2 Archaeological Background 2.1 Mesopotamia Mesopotamia is the oldest civilized region in the world. The first domesticated grains, first cities, and first waterworks are all located in this region. It is referred to both as the “Fertile Crescent” and “the cradle of civilization.” Since 6000 BC, the region has been inhabited by complex civilizations. Consequently, there are layers and layers of archaeological material built up in the alluvial plain. Various societies have come and gone, each leaving behind a rich archaeological deposit. Evidence of static human settlement begins around 6000 BC during the Ubaid period (Adams 54). From this period until the end of the Uruk period in 3000 BC, Mesopotamia experienced a gradual transformation from pastoral and subsistence techniques to widespread agriculture. Around 3000 BC the Early Dynastic period began, marking the beginning of true city-states. During this period Mesopotamia experienced intense urbanization, the proliferation of writing, and the rise of kings. The waterworks developed in the late Uruk period became increasingly important for producing the food supply; the earliest recorded war was fought over control of a canal. The earliest Mesopotamian state was formed when the Sargon of Akkad took control of most of southern Mesopotamia around 2300 BC, creating the Akkadian Empire and achieving the first centralized control over the waterworks. After the fall of Akkad, political centralization persisted under the powerful kings of Ur until 2004 BC (Van De Mieroop, 59). 4 After the fall of Ur and throughout the early 1000s BC, competing city dynasties vied for the Mesopotamian plain. Isin and Larsa, then Babylon controlled swaths of the region. The territorial states were suddenly ended by Hittite raids from Anatolia, leaving behind a power vacuum and a dark age that lasted until 1500 BC (Van De Mieroop, 8081, 112-117). Around 1500 Mesopotamia was integrated into a much larger political system incorporating much of the Middle East. The Southern Kassite state and northern Assyrian state were much larger than their predecessors, and interacted with polities in the Mediterranean, Egypt, Anatolia, and Persia (Adams, 172-174). These were large nations, yet still regional powers. They would fall apart, along with their contemporaries in Egypt, Anatolia, and Greece, during the sweeping raids of the Sea People around 1100 and the instability that followed (Van De Mieroop, 180-182). Following the collapse of the regional powers, Mesopotamia was progressively acquired by various world powers that used its now extensive irrigation system and good land as a bread basket to feed their empires. The Assyrian, Neo-Babylonian, and Persian Empires would conquer the area and incorporate it into their states. These empires would greatly improve the canal system, creating a grid of small canals that connected the old major waterworks and brought much more land under cultivation (Adams, 188). This period of Mesopotamian based empires lasted from Assyria’s rise in the 900’s BC until Alexander the Great’s defeat of Persia in 331 (Van De Mieroop, 279). The imperial pattern continued as the region was controlled by the Greek Seleucids, Persian Sassanians, and Islamic Ummayads and Abbasids. Under the Sassanians in particular the irrigation works were expanded, creating a dense population that was very susceptible to plague (Adams, 214). However, during Abbasid rule, the canal system went into decline as its maintenance was disregarded, resulting in a gradual but steady decline in arable land and population (Adams, 215). 5 2.2 Geography Mesopotamia, meaning “the land between two rivers,” stretches across much of the Middle East. From north to south it ranges from the base of the Taurus Mountains to the Persian Gulf, and from east to west from the Arabian Desert to the Zagros Mountains. The land slopes gently downward from the north east, between the high ground of the mountains and desert. The two main rivers, the Tigris and Euphrates, flow along the slope into the Arabian Gulf. In ancient times, these rivers and their floods provided rich soil and water for the world earliest farmers. In modern times, however, much of this region is manmade desert, the result of intense agriculture and the resulting salinization of the soil. Some areas are still under cultivation, especially the lower regions near the Shatt al-Arab. Figure 2.1 Topographic Map of Iraq (Sadalmelik) 6 2.3 Canals The canals of southern Iraq were necessary to sustain agriculture, since the region does not get enough rainfall. Fortunately, due to silt deposition, the Euphrates runs above the surrounding plain, making irrigation simple. Silt deposited during floods built up the levees on either side of the river, while silt deposition on the river bottom raised the water level. To build a canal, a small cut was made in the levee, allowing water to spill out into the surrounding land. The same process of levee creation that occurred along the Euphrates happened on the banks of these canals. Canals levees rise, on average, three to five meters, above the plain. They gradually slope away from the watercourse, and provide rich alluvial soil for farming. This slope extends for a minimum of half a kilometer, often more. They are the dominant features in an otherwise flat plain (Adams, 10). 7 Chapter 3 Computerized Survey This project was an attempt to automate the time-consuming survey of large areas of land for archaeological features. The canals of Mesopotamia were chosen because they can be identified in elevation data and are relatively important in the study of the region. While aerial survey is an accepted archeological tool, to automate the search has not yet been done. 3.1 Advantages of Computerized Survey Automating the survey of land has several advantages. Most obviously, it enables an appropriate algorithm to survey very large areas, without requiring a trained person. While the algorithm may be rather slow, as the one presented in this paper is, it still eliminates the tedious work. Additionally, once the algorithm is written, it can be tweaked to find similar features in other areas, thus increasing the amount of land that a survey program can be applied to. Less obviously, automating the survey adjusts for certain biases present in a human search. In particular, the range of data is often large, much larger than the available colors on a monitor. Mapping this range onto a monitor can both blur important boundaries and introduce artifacts through the use of artificially chosen colors. 3.2 Disadvantages of Computerized Survey There are, however, some disadvantages to automated survey. Writing the algorithm is difficult and may not yield good results. Even if it does, a computerized survey is not 8 authoritative and would have to be validated through a trained person. Finally, the algorithm itself may yield artifacts. In order to overcome these difficulties, the results of a computerized search for features should be used in conjunction with results obtained through all other means, including aerial surveillance and ground survey if possible. 3.3 Implementation To program the automated survey, I used Mathwork’s MATLAB. MATLAB was chosen based on its ability to handle large amounts of data and its rich library of image processing tools. Furthermore, its use of matrices as base data types allowed certain operations to be understood more intuitively. While the algorithms described here could have been written in a faster language, such as C++, they could not have been prototyped as easily in such a language. 9 Chapter 4 Data Since the canal silt deposition process raises the levees of the watercourses above the surrounding plain, ancient canals can be seen in terrain elevation data. This data can then be used in an automated survey to capture the features. 4.1 SRTM Data A widely available and free elevation dataset was collected by the Shuttle Radar Topography Mission (SRTM). This mission, undertaken by NASA in February of 2000, used sophisticated radar equipment to measure elevation across a wide swath of the earth. The collection method is roughly analogous to taking a depth measurement over a large grid. The resolution of the grid varies over the surface of the Earth, with certain areas of interest, such as the United States, being covered more completely than others (Farr et. al., 3). The data for Mesopotamia is laid out in a grid, with measurements taken every ninety meters. According to the official paper published with the data, noise values in the elevation measurement vary by at most six meters with a 90% confidence rate (Farr et. al., 3). SRTM data is available from a number of sources. I used two locations. The Global Land Cover Facility provided the data for Mesopotamia, while the CGIAR Consortium for Spatial Information provided the data for China. The sites were chosen based on accessibility and ease of use, not on any innate differences in the data. 10 4.2 Other Sources of Data While the SRTM elevation data was the most useful for this project, other sources of data were looked at for their possible utility in identifying ancient watercourses. Visible Spectrum The most obvious data source would be overhead visible spectrum images. These are available in several free forms, both aerial and satellite, at a range of resolutions. The resolution on these images is fairly high, and identifying long features should not be difficult. However, deciding which features are canals versus roads or hedges would be near impossible. It is difficult even for a human to pick out a canal from these images. False Spectrum Less obvious are images in false color that measure the non visible spectrum. These wavelengths generally correspond to properties of the soil and vegetation. While in theory canals have a different soil than the surrounding plain, in practice the canals are no more visible in these images than in the visual spectrum, and the images tend to be just as noisy as the visible spectrum. 4.3 Ground Truth The results obtained from the automated survey need to be validated in some way. In other words, we need some sort of ground truth map of where some canals are in order to test the algorithm. Fortunately, a series of archaeological surveys, both ground and air, were done in the decades prior to the First Gulf War by Robert McCormick Adams, a professor at the University of Chicago. While they only cover a limited area, his maps provide a good way to test and validate the performance of the algorithms developed. 11 4.3.1 Ground Surveys Adams did two major ground surveys of Southern Mesopotamia. His first, published in 1965, was titled Land Behind Baghdad and concentrated on a major tributary of the Tigris, the Diyala River and its surrounding area. The second, called the Heartland of Cities, was published in 1981 and surveyed the area just south of Baghdad between the Tigris and Euphrates Rivers. Both provide immensely detailed maps of the archaeological sites and waterworks in the region covered. However, the maps are very detailed and do not indicate how old, wide, or big a canal is, since all water courses are drawn as uniform lines. As such, it is difficult to tell which canals it would be possible to find in the satellite images, and which are tiny ditches that the SRTM data would miss. 4.3.2 Aerial Surveys To guide his search on the ground, Adams did detailed aerial surveys of the areas before arriving in Iraq using data obtained from United States military spy planes. He also did a quick search using primitive visual spectrum satellite images. The surveys done using the satellite images provide a rough sketch of canals in Iraq. In particular, they show the very large and wide canals. Combining this map with the more detailed ground surveys, we can get a sense of where the major canals are and which ones are less likely to be found. Figure 3.1 shows the area around the ancient city of Nippur. The top image shows the detailed ground survey, while the bottom image shows the far less detailed satellite survey. As mentioned, those found in the satellite image are larger and easier to detect. 12 Figure 4.1 Ground and Satellite Surveys in Area around Nippur (Adams 34, insert) 4.4 Test Sets Memory and speed issues quickly became a problem when dealing with such a large data set. Consequently, I built some smaller test sets that were used to develop and test algorithm prototypes. These sets had the added advantage of being within the Adams survey region, so results could be compared to ground truth. Since these sets are used to display various algorithms and results, they are being introduced here. 13 Shatt al-Gharraf Dataset The Shatt al-Gharraf dataset covers the entire Heartland of Cities survey region and the river of Shatt al-Gharraf. This large dataset was used to test some faster algorithms. It ranges from the Tigris River to the north to the Euphrates to the south, and contains several ancient cities. The ancient cities, which show up as sharp hills or tells in the elevation data, are useful for georeferencing and homographies. Figure 4.2 shows the dataset with the area covered by the Adams ground survey outlined. Figure 4.2 Shatt al-Gharraf Dataset Nippur Dataset The Nippur dataset is the northwest quarter of the Shatt al- Gharraf dataset and covers a completely surveyed region. It has three dominant canals, 14 two of which intersect along the western edge. The ancient city of Nippur is the southern boundary, while the Tigris bounds it on the north. Figure 4.3 Nippur Dataset 15 Chapter 5 Failed Approaches Before trying any costly and complicated algorithms, it was useful to try simple and built-in algorithms that had a decent chance of success. While none of the algorithms described here worked particularly well, they did illuminate some of the issues with the data set. Furthermore, had they worked, they would have provided computationally quick solutions to the problem. 5.1 Threshold The most obvious algorithm to try to find height differentiated features is to use a simple threshold above the mean value of the image. This algorithm can be quickly tested, but it has rather discouraging results. 16 Figure 5.1 Thresholded values on Nippur dataset There are two issues of note in this image. Firstly, the data is very noisy, with many small sharp peaks. Also, the top left of the image is much higher than the bottom right, due to the general slope of the plain. In order to identify the canals in this image, we need to fix at least some of these problems. Trying to fix the noise problem, we can blur the image. The blur used in Figure 5.2 was a Gaussian blur about twenty pixels wide, but most blurs have the same general effect. Furthermore, the blur does not fix the slope issue. 17 Figure 5.2 Blurred and thresholded values on Nippur dataset 5.2 Threshold with Fitted Plane To try to fix the slope problem, we linearly fitted a plane to the data, and subtracted it out of the image. We then used a threshold to identify canals. 18 Figure 5.3 Thresholded values above a linearly fitted plane in area around Nippur We now find good data points across the entire image but have not eliminated noise, nor are the canals identified. Using a preprocess blur fixes some noise, but we do not have well connected canals. 19 Figure 5.4 Nippur dataset. Data was blurred, fitted to plane, then thresholded. 5.3 Hough Transform To connect the canals, we tried using the Hough Transform, a standard line finding algorithm. Since MATLAB has a built-in implementation, this algorithm can be quickly tested. 20 Figure 5.5 Hough transform on blurred, plane fitted, and thresholded Nippur dataset. Unfortunately, we are still limited by the simplicity of our feature algorithm. The simple thresholding creates large blocks of features that confuse the line finding algorithm, seen in the bottom left and top right. Furthermore, the inability to choose a correct threshold means that some large features are broken, such as the long east-west canal that bisects the Nippur dataset. 5.4 Watershed The final simple algorithm to try is the watershed algorithm. This algorithm gradually fills low intensity regions of the image, then marks edges where “puddles” merge. It is an algorithm that is often used for ridge finding, and seems especially applicable to elevation data. Once again, MATLAB has a built-in implementation. 21 Since the watershed algorithm is especially sensitive to noise, we blur the image as a preprocessing step. However, regardless of how much the image is blurred, noise is still a problem. With a moderate blur (a twenty pixel wide Gaussian filter), the watershed algorithm extremely over detects, while a large blur (one hundred pixels wide Gaussian filter) will blur out major features, while still not eliminating over detection. The watershed algorithm also cannot display how certain a feature is. Figure 5.6 shows the watershed algorithm run at two blur levels. Neither eliminates all noise, and over blurring eliminates the large horizontal canal in the center of the dataset and at the top. Note especially the long, straight, vertical and horizontal features. These are all errors. Figure 5.6 Watershed algorithm on Nippur dataset at blur filters of size 20 and 100 pixels. 22 Chapter 6 Final Algorithm The algorithm described in this chapter is the best working algorithm we could write. It works relatively well, but certainly does not achieve the level of detail seen in the Adams ground surveys. Furthermore, it is computationally intense, and requires a good bit of time to run. However, it does give a good overview of the watercourse landscape and should be useful in assisting archaeologists of the region. The algorithm consists of three steps. A preprocess step eliminates some noise and tiles the image. The cross correlation step identifies features, and the tracing step traces the canals. 6.1 Preprocessing The first step in our final algorithm is a series of preprocessing filters. These filters generally deal with noise issues. Several filters used early in the project were later eliminated as unnecessary, and these are enumerated here. 6.1.1 Failed Approaches Gaussian Blur The Gaussian blur filter was the standard blurring algorithm used in this project. It was done by convolving a small Gaussian matrix across the image, with a standard deviation of one third the width of the filter. In general, the blur eliminated some of the small noisy peaks in the data set, but also blurred out thin canals. The preprocess blur was rendered unnecessary by the cross correlation step of the algorithm, and was thus eliminated. 23 Resize Resizing was a useful preprocess step in finding various sizes of canals. The resize also acted as a type of blur, since small peaks were averaged out as the image got smaller. However, the scaled filters step of the algorithm made this preprocess step unnecessary. Value Correction Due to the instruments used to collect the elevation data, large areas of water, like lakes or rivers, give extremely low values. To fix these values, we tried a two step process. Since the highest water value was approximately ten percent of the lowest land value, thresholding was the easiest method to identify water regions. The water pixels were then filled in using the mean of their non-water neighbors. Unfortunately, this method introduced additional artifacts that, while less noticeable in the weight image, still affected the end result significantly. As a result, we left the noisy data in to indicate where error values may have occurred. 6.1.2 Fourier Transform Due to the collection method, the SRTM data has periodic crisscrossing noise. The noise was mostly eliminated using a Fourier transform. First, we took the Fourier transform of the image. In Figure 6.1, the peaks of this periodic noise can be identified radiating from the center. These peaks were set to zero and the inverse Fourier transform was taken. Figure 6.2 shows the results. 24 Figure 6.1 Fourier transform of Shatt al-Gharraf dataset. 25 Figure 6.2 Comparison of original and Fourier adjusted data. (USGS) 6.1.3 Tiling The entire data set to be scanned is approximately 53 MB in a compressed tiff format. Consequently, it needs to be tiled before any memory intensive computation is run. The data was tiled into windows of about 500 by 500 pixels, which were later concatenated together. Most of the testing and verification of the algorithm was done on a single tile. 6.2 Normalized Cross Correlation After preprocessing to prepare the data, we used a normalized cross correlation filter to identify the canals. The filter was square, with its height values shaped like a prism, as shown in Figure 6.3. The height of the filter was based on examination of canal heights. 26 Figure 6.3 Normalized cross correlation filter. When conducting the pass, at each pixel, the filter was rotated around 180 degrees, with the normalized cross correlation computed at increments of ten degrees. The output of this step was an array of images, each containing the filter response for a given filter orientation. Figure 6.4 shows the filter response for orientations of zero and ninety degrees. Notice how each filter generally finds canals oriented along it. 27 Figure 6.4 Filter response at zero and ninety degrees on Nippur dataset. To combine the array of filter responses, we take the max value across all orientations, recording which orientation this is. This results in an aggregate weight image and a corresponding vector field of the orientation along the canals. 28 Figure 6.5 Aggregate filter response on Nippur dataset. 29 Figure 6.6 Oriented filter response. The final step in the normalized cross correlation step is to scale the filter. This allows the program to identify many sizes of canals. For every size filter, we computed the same rotated and oriented normalized cross correlation described above. Figure 6.7 shows the scaled responses, along with a comparison to the original SRTM data and the Adams ground survey. 30 Figure 6.7 Scaled filter response on small section of the Nippur data set. (USGS) (Adams, insert) 6.3 Tracing Walk While the filter response gives a good representation of the canals, it has some problems. First of all, it is not a binary representation, but rather a display of approximate certainty. It is also poorly connected. This problem is the result of the filter failing at intersections, where it correlates poorly with the data. This is visible in Figure 6.8, which shows the filter response at several intersections. 31 Figure 6.8 Normalized cross correlation filter response at intersection. To connect the canals, we use a tracing algorithm, which follows the orientations obtained during the correlation step. Using these orientations, we can walk along the canals, similar to a classic gradient descent. 6.1.3 Tracing Step During the tracing walk, the state is stored as a location (l) and momentum (m) vector. After each step, these variables are updated. The tracing walk is conducted using a small window seven pixels wide, which is the approximate size of a small canal. 32 The step begins by determining the direction of the trace (Dtrace). The direction is found by summing all directions (D) within the window and weighted by the filter response (W). Since the orientations in the cross correlation step are computed in the range of zero to 170 degrees, continuous canals could have discontinuous orientations (i.e. a generally horizontal canal will have orientations varying between values of zero, 170 or 10). Consequently, the direction sum is done by computing the dot product of the momentum and orientation. If the dot product is negative, the orientation is flipped before getting summed. (6.1) Dtrace = 1 ∑Wij * Dij * sign(m • Dij ) 49 The total direction has a weight, which is then linearly mapped to a step size in pixels. The constants on this mapping are used to satisfy a set of conditions. The maximum step size (smax) is chosen so that the walk is always continuous. This means the maximum step is the width of the window, and is the step size that should result at the maximum filter response. The minimum step is one pixel, and corresponds to a filter response of .05, which is the value on the filter response below which we can be very sure no canal exists. The step size in pixels (strcce) is linear between these two bounds. smax − .5 Wmax − .05 (6.3) intercept = smax − Wmax * slope (6.2) slope = (6.4) strace = slope * Dtrace + intercept Similar to a classic gradient walk, we introduce both a momentum term and a random force to the trace. Momentum (m) is the last step vector, while the random force (r) is a small vector in a random direction. (6.5) s final = c1strace + c2 m + c3 r However, these additional terms rarely improved the performance of the algorithm, and were eliminated from the final program. The tracing walk will continue to move until it hits the edge of the tile or stops moving. 33 6.1.3 Wrapping the Step The tracing walk is seeded by choosing the maximum weight still untraced. The walk is given two opposing momentums and allowed to move to completion. The tracing walk is run on all scales of the correlation filter images. The sum image of all traces on all scales is the final result. Figure 6.9 Traced Canals of Nippur Dataset. 34 Chapter 7 Results and Analysis The final algorithm gives an approximate map of canals in the region scanned. While the canals generally line up with ground truth, there are still artifacts and incorrect values. In general, the algorithm is much better at finding large, wide canals, and is less consistent when looking for smaller features. 7.1 Results Figure 7.1 shows the final result for the Shatt al-Gharraf data set. The image took about two hours to create. Note how the image is generally well connected and clearly delineates the canals. 35 Figure 7.1 Traced Canals with Filter Response on Shatt al-Gharraf Dataset 7.2 Comparison to Ground Truth Figure 7.2 shows a small section of the Shatt al-Gharraf solution that corresponds to the Adams Heartland of Cities survey. In general, large canals dense with sites are found by the proposed algorithm. However, the density of the canals on the ground survey is far greater than those found using the algorithm, so many small canals are missing. 36 Figure 7.2 Comparison of Final Result to Adams Ground Survey (Adams, insert) When compared the satellite surveys done by Adams as in Figure 7.3, one can again see the trend that this algorithm tends to find large canals. The large canals are generally found, even if not in their entirety. Figure 7.3 Comparison of Final Result to Adams Satellite Survey (Adams, 34) Comparing the results to the actual elevation data as in Figure 7.4 shows the same trend of finding large canals. However, even small linear bumps hardly visible in the elevation 37 data are usually found by the tracing, as shown in Figure 7.5. This fact implies that some linear mounds are not canals, that the ground survey missed some canals, or that noisy values in the elevation data created these features. Figure 7.4 Comparison of Final Result to SRTM Data (USGS) Figure 7.5 Comparison of Final Result subsection to SRTM Data (USGS) 38 7.3 Analysis As the various figures show, the program was moderately successful at finding linear mounds in the elevation data. In particular, the program found large and long linear features which usually corresponded to wide canals with a high density of archaeological sites. However, in general, correlation with the ground truth surveys of Adams was not very good. While many of the canals identified by satellite were found, many were not. Furthermore, the algorithm could not come close the detail of the ground survey, since the resolution of the data did not allow it. An interesting correlation emerges on a more careful study of the data. Areas where the proposed algorithm fails to find a high density of features, like the center of the western edge of the region, tend to have far fewer sites. This correlation implies that the canals in the western region are newer, since they did not experience the necessary soil build up to significantly raise the levees. Finally, the algorithm corresponds very well to the elevation data in the area along the Shatt al-Gharraf, an area that, due to increased cultivation, is near impossible to survey on the ground. 39 Chapter 8 Further Application Automated survey becomes far more useful when it can be applied additional regions outside of the original dataset. Fortunately, the same process of silt deposition that forms the Mesopotamian canals is also active in China along the Yangtze River. Consequently, we can apply our algorithm to the region and attempt to find features in this region. 8.1 Data The data for this area is processed SRTM elevation data found at the CGIAR Consortium for Spatial Information. I chose a small section from a tributary of the Yangtze River just west of Zaozhuang. For ground truth I used a visual spectrum satellite image from Google Maps, which shows the canals fairly clearly since they are still in use. 40 Figure 8.1 Comparison of Visual Spectrum and Elevation Data near Zaozhuang (TeleAtlas et. al.) (Jarvis) 8.2 Results and Analysis Since the same silt deposition process occurs in China as in Mesopotamia, the canal detection algorithm should work there as well. However, unlike the meandering canals of Mesopotamia, the canals of China tend to fall along a grid pattern, as shown in Figure 8.2. As shown in Figure 8.3, the filter response also follows this general pattern. Figure 8.2 Canal Grid In Visual Spectrum (TeleAtlas et. al.) 41 Figure 8.3 Comparison of Visual Spectrum and Filter Response near Zaozhuang (TeleAtlas et. al.) While the filter response seems fairly reliable, the tracing algorithm mostly fails. This problem seems to arise from the canal pattern. As noted above, the filter response fails at intersections, especially those with right angles. The density and regularity of these intersections means that the filter response tends to be lower on average than in Mesopotamia. Consequently, the tracing algorithm is likely to fall off the canals or stop prematurely. Figure 8.4 shows the final result, while Figure 8.5 shows the result in comparison to the visual spectrum image. 42 Figure 8.4 Final Result near Zaozhuang 43 Figure 8.5 Comparison of Visual Spectrum and Traced Canals near Zaozhuang (TeleAtlas et. al.) 44 Chapter 9 Conclusion The algorithm presented in this paper performed reasonably well on the original dataset, but needed to be tweaked significantly when moved to another dataset. This adjusting is likely a matter of changing certain constants, not fixing the entire process. In general, this algorithm would be best suited for assisting archaeologists find hard to see features in the dataset. However, other archaeological features might be easier for computers to discern and, in this case, it is certainly conceivable that other algorithms could be written to perform full automated survey of regions. The particular algorithm described here is good at finding long quasi-linear features, a problem generally known as ridge detection. These types of features are present not only in elevation data, but also in certain medical images. Finding ridges in data has been the study of several recent papers (Lopez et. al., Cañero and Radeva), yet this thesis presents a novel and unique approach to the problem. 45 References [1] Robert McC. Adams. Heartland of Cities: surveys of ancient settlement and land use on the central floodplain of the Euphrates. University of Chicago Press, Chicago, 1981. [2] C. Cañero and P. Radeva, “Vesselness enhancement diffusion,” Pattern Recognition Letters 24 (2003), pp. 3141–3151. [3] Tom G. Farr et. al. “The Shuttle Radar Topography Mission,” Reviews of GeoPhysics, 45 (2004). http://www2.jpl.nasa.gov/srtm/SRTM_paper.pdf. [4] Carrie Hritz and T.J. Wilkinson. “Using Shuttle Radar Topography to map ancient water channels in Mesopotamia,” Antiquity, 80:308 (2006): 415-424. [5] A. Lopez, D. Lloret, J. Serrat, and J. Villanueva. “Multilocal creaseness based on the level-set extrinsic curvature,” Computer Vision and Image Understanding, 77(2):111.144, 2000. [6] A. Jarvis et. al. Hole-filled SRTM for the Globe Version 4. 34.480015, 116.544342, CSI-CGIAR, 2008. srtm.csi.cgiar.org/SRTMdataProcessingMethodology.asp. [7] Sadalmelik. “Iraq Topography,” Wikipedia, 2008. http://en.wikipedia.org/wiki/File:Iraq_Topography.png [8] TeleAtlas, Digital Globe, TerreMetrics. Google Maps. 34.480015, 116.544342. Google, 2008. maps.google.com. [9] USGS. Shuttle Radar Topography Mission. 32.558389, 45.335083. Global Land Cover Facility, University of Maryland, College Park, Maryland, 2004. www.landcover.org. [10] Marc Van De Mieroop. A History of the Ancient Near East ca. 3000-323 BC. Blackwell Publishers Ltd, Oxford, 2004. 46 Vita Joseph Herman Izraelevitz Date of Birth July 23, 1986 Place of Birth Boston, Massachusetts Degrees B.S. Computer Science with Second Major in History, May 2009 M.S. Computer Science, May 2009 Professional Societies Tau Beta Pi Phi Alpha Theta May 2009 47 Automated Archaeological Survey, Izraelevitz, M.S. 2009 48