Modeling Landslide Occurrence and Impacts in a M^SS^CHU Changing Climate OF TE ~CHNOLOGY [7nr by Erin Leidy 07 201 B.S., Fordham University (2012) LIB RARIES Submitted to the Engineering Systems Division in partial fulfillment of the requirements for the degree of Master of Science in Technology and Policy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2014 @Erin Leidy. All rights reserved. The author hereby grants to MIT and the Charles Stark Draper Laboratory permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Signature redacted A u th o r ................................................................ Engineering Systems Division August 23, 2014 Signature redacted C ertified by . ....................... Cathy Slesnick Senior Member of the Technical Staff, The Charles Stark Draper Signature redacted Certified by Acptpd b . Laboratory Thesis Supervisor ................ C. Adam Schlosser Senior Research Scientist, Center for Global Change Science Thesis Supervisor Ririn tiH r- r mr tc i I I y . .. I ~U I I ~U . . . . . . . . . . . . . . . . . . .. .. Dava J. Newman Professor of Aeronautics and Astronautics and Engineering Systems Director, Technology and Policy Program 2 Modeling Landslide Occurrence and Impacts in a Changing Climate by Erin Leidy Submitted to the Engineering Systems Division on August 23, 2014, in partial fulfillment of the requirements for the degree of Master of Science in Technology and Policy Abstract In the coming years and decades, shifts in weather, population, land use, and other human factors are expected to have an impact on the occurrence and severity of landslides. A landslide inventory database from Switzerland is used to perform two types of analysis. The first presents a proof of concept for an analogue method of detecting the frequency in landslide activity with future climate change conditions. Instead of relying on modeled precipitation, it uses composites of atmospheric variables to identity the conditions that are associated with days on which a landslide occurred. The analogues are compared to relevant meteorological variables in MERRA reanalysis data to achieve a success rate of over 50% in matching observed landslide days within 7 days. The second analysis explores the effectiveness of machine learning as a technique to evaluate the likelihood of a slide to create high damage. The algorithm is tuned to accommodate unbalanced data, extraneous variables, and variance in voting to achieve the best predictive success. This method provides an efficient way of calculating vulnerability and identifying the spatial and temporal factors which influence it. The results are able to identify high damage landslides with a success of upwards of 70%. A machine-learning based model has the potential for use as a policy tool to identify areas of high risk. Thesis Supervisor: Cathy Slesnick Title: Senior Member of the Technical Staff, The Charles Stark Draper Laboratory Thesis Supervisor: C. Adam Schlosser Title: Senior Research Scientist, Center for Global Change Science 3 4 Acknowledgments Numerous thanks to the many who have provided support, encouragement, and advice throughout the writing of this thesis and my time at MIT. First, my advisors, Dr. Cathy Slesnick and Dr. Adam Schlosser, and also to Dr. Natalya Markuzon, for their consistent guidance and support throughout this project. Their expertise and advice have been vital to the learning experience that this research has been. I would like to thank my friends and classmates in the Technology and Policy Program at MIT, for consistently being a source of inspiration and support. You are all amazing and have been a highlight of my time in Boston. I will treasure the memories. Final thanks to my family, without whose encouragement I would certainly not be where I am now. My appreciation of their unwavering support and love is immeasurable. This thesis was prepared at the Charles Stark Draper Laboratory, Inc., under Project 24254-001, IDS. 5 6 Contents 0.1 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An Analogue Method to Detecting Landslide Response to Climate Change: Proof of Concept 19 1.1 Background ....... 1.2 Methodology ....... 1.3 D ata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.3.1 Observed Landslides . . . . . . . . . . . . . . . . . . . . . . . 22 1.3.2 NASA-MERRA . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.3.3 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.4 1.5 2 16 Application ................................ ............................... 19 21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.4.1 Creating the Composites . . . . . . . . . . . . . . . . . . . . . 24 1.4.2 Analogue Determination . . . . . . . . . . . . . . . . . . . . . 33 1.4.3 Success Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . . 40 Modeling the Damage Incurred by Landslides 43 2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.1.1 Vulnerability Analysis . . . . . . . . . . . . . . . . . . . . . . 43 2.1.2 Machine Learning Approach . . . . . . . . . . . . . . . . . . . 45 2.1.3 Random Forest Algorithm . . . . . . . . . . . . . . . . . . . . 47 2.2 M ethodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.3 D ata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.3.1 50 The Swiss Flood and Landslide Database . . . . . . . . . . . . 7 .55 2.4 2.3.2 NASA SEDAC 2.3.3 GDP ....... 2.3.4 Weather Data ...... 2.3.5 Transportation Data ....................... 2.3.6 Buildings 2.3.7 Land Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . A pplication .......................... 51 ................................ 52 ........................... 53 53 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.4.1 Determining Important Variables . . . . . . . . . . . . . . . . 56 2.4.2 Unbalanced Data . . . . . . . . . . . . . . . . . . . . . . . . . 66 2.4.3 Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 2.4.4 Separation into Seasons . . . . . . . . . . . . . . . . . . . . . 81 2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 2.6 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . . 84 3 Using Landslide Risk Models in a Policy Context: Best Practice and Recommendations 87 3.1 Rationale for Using a Model of Vulnerability . . . . . . . . . . . . . . 87 3.2 Potential Uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.3 M odel Application 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Damage Modeling in Oregon 93 A .0.1 D ata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A .0.2 M odeling 93 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 A .0.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 B Variable Importance Results 105 B.0.4 KS-Test results . . . . . . . . . . . . . . . . . . . . . . . . . . B.0.5 105 Sensitivity analysis results, removing one variable at a time. . 110 B.0.6 Sensitivity analysis results, removing all of one type of variable, and adding in one variable individually. . . . . . . . . . . . . . 8 116 List of Figures 1-1 Number of landslides in Switzerland per month, period 1979-2012. 23 1-2 Swiss landslides by date, 1979-2012. 23 1-3 Composites of all Swiss DJF slide dates. The colors of a) show the . . . . . . . . . . . . . . . . . . standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). 1-4 . . . . . . . . . . . . . . . . . . . . 25 Composites of all Swiss JJA slide dates. The colors of a) show the standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). 1-5 . . . . . . . . . . . . . . . . . . . . 26 Composites of all Swiss DJF slide dates with 2-day period. The colors of a) show the standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). 1-6 . . . . . . . . . . 27 Composites of all Swiss DJF slide dates with 5-day period. The colors of a) show the standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). 9 . . . . . . . . . . 27 1-7 Composites of all Swiss DJF slide dates with 7-day period. The colors of a) show the standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). 1-8 . . . . . . . . . . 28 Composites of all Swiss DJF slide dates with 10-day period. The colors of a) show the standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). 1-9 . . . . . . . . . . 28 Composites of all Swiss DJF slide dates with 14-day period. The colors of a) show the standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). . . . . . . . . . . 29 1-10 Composites of all Swiss DJF slide dates with 30-day period. The colors of a) show the standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). . . . . . . . . . . 29 1-11 Peak Standardized Anomaly of Composite Z500 for Various Time Spans. The trough (negative anomaly, here labeled as min), and ridge (positive anomaly, here labeled as max) are both variables of interest and so are included. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 1-12 Peak Standardized Anomaly of Composite w500 for Various Time Spans. 30 1-13 Peak Standardized Anomaly of Composite TPW for Various Time Spans. 31 10 1-14 Composites of all Swiss DJF slides that incurred high damage. The colors of a) are the standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). . . . . . . . . . . 32 1-15 Composites of all Swiss DJF slides that incurred low damage. The colors of a) are the standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). . . . . . . . . . . 32 1-16 Spatial Correlation of Z500. . . . . . . . . . . . . . . . . . . . . . . . 34 1-17 Density Plot of Z500 spatial correlation. . . . . . . . . . . . . . . . . 34 1-18 Spatial Correlation of w500. . . . . . . . . . . . . . . . . . . . . . . . 35 1-19 Density Plot of w500 spatial correlation. . . . . . . . . . . . . . . . . 35 1-20 Spatial Correlation of TPW. . . . . . . . . . . . . . . . . . . . . . . . 36 1-21 Density Plot of TPW spatial correlation. . . . . . . . . . . . . . . . . 36 2-1 2-2 Depiction of the steps to create a supervised classification model. b) Prediction is the same as testing. (Bird et al. 2009) . . . . . . . . . . 46 Structure of a decision tree. (Safavian 1991) . . . . . . . . . . . . . . 48 2-3 All Swiss landslides used in modeling. Red dots indicate high damage, yellow are medium damage, and green are low damage. . . . . . . . . 2-4 51 This figure displays the count of all slides that were included and excluded, separated by the amount of damage. . . . . . . . . . . . . . . 52 2-5 Modeled slides, separated by canton/GDP and damage level. . . . . . 53 2-6 Distribution of land cover for all Swiss slides used in modeling. 56 2-7 Features of the 30 Day Rain Distribution. Red is the high damage landslides, blue is the low damage landslides. 2-8 Features of the 4 Day Rain Distribution. landslides, blue is the low damage landslides. 11 . . . . . . . . . . . . . . . 58 Red is the high damage . . . . . . . . . . . . . 58 2-9 Features of the Population Density Distribution. Red is the high damage landslides, blue is the low damage landslides. . . . . . . . . . . . 59 2-10 Features of the Length of Road in 2 km Distribution. Red is the high damage landslides, blue is the low damage landslides. . . . . . . . . 59 2-11 Correlation of all Anthropogenic Variables. The larger the circle, the higher the absolute value of the correlation. Blue indicates positive correlation, red indicates negative. An X indicates an insignificant correlation. Key is located to the right. . . . . . . . . . . . . . . . . 62 2-12 Correlation plot of all Rain Variables. The larger the circle, the higher the absolute value of the correlation. Blue indicates positive correlation, red indicates negative. An X indicates an insignificant correlation. 63 2-13 Correlation plot of all Pressure Variables. The larger the circle, the higher the absolute value of the correlation. Blue indicates positive correlation, red indicates negative. An X indicates an insignificant correlation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 2-14 Correlation plot of all Temperature Variables. The larger the circle, the higher the absolute value of the correlation. Blue indicates positive correlation, red indicates negative. correlation. An X indicates an insignificant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 2-15 Combined rank of variables, from results of sensitivity analysis (red), and KS-test (blue). Rank determined by the test results, lowest number being the most significant variable. . . . . . . . . . . . . . . . . 66 2-16 Model accuracy by number of variables, fit with a LOESS (Locally weighted scatterplot smoothing) curve. . . . . . . . . . . . . . . . . . 67 2-17 ROC curve, Undersampled data . . . . . . . . . . . . . . . . . . . . . 70 2-18 ROC curve, Oversampled Data . . . . . . . . . . . . . . . . . . . . . 70 2-19 ROC curve, SMOTE . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 2-20 ROC curve, CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 2-21 ROC curve, ENN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 2-22 ROC curve, Tomek Links . . . . . . . . . . . . . . . . . . . . . . . . . 74 12 2-23 ROC curve, NCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 2-24 ROC curve, OSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 2-25 Results of balancing training data . . . . . . . . . . . . . . . . . . . . 76 2-26 Density plot of the voting results, with SMOTE data.The red line shows high damage slides, the green line shows low damage instances. 77 2-27 Density plot of the voting results, with undersampled data. The red line shows high damage slides, the green line shows low damage instances. 78 2-28 Variance of voting results for data that has been balanced with SMOTE 79 2-29 Variance of voting results for data that has been undersampled . . . . 80 2-30 Summary of voting techniques . . . . . . . . . . . . . . . . . . . . . . 80 2-31 Swiss landslides separated by season. . . . . . . . . . . . . . . . . . . 81 2-32 Summary of season-separated results . . . . . . . . . . . . . . . . . . 82 2-33 Summary of best results on JJA slides. Tuning methods include separating the seasons, reducing the variables, using the mean vote, and balancing the training data. . . . . . . . . . . . . . . . . . . . . . . . 83 A-i Location of slides in Oregon, mapped with the highway system. Green points do not contain records of damage, red points do. . . . . . . . . 94 A-2 Type of damage caused by landslides in the SLIDO records . . . . . . 95 A-3 Length of detours caused by Oregon landslides. 96 . . . . . . . . . . . . A-4 Distribution of values of direct damage for Oregon slides, measured on an intensity scale. . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . 97 A-5 Distribution of total damage for Oregon slides, measured on an intensity scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 A-6 Distribution of dollar amount of Oregon landslide damage, plotted on a logarithmic scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 A-7 Location of slides in Oregon, separated by amount of damage, as determined in Figure A-6. . . . . . . . . . . . . . . . . . . . . . . . . . A-8 Distribution of land cover for Oregon slides. 13 . . . . . . . . . . . . . . 100 101 A-9 Distribution of population density for Oregon slides. The red line is high damage slides, the blue line is low damage slides. . . . . . . . . 102 A-10 Distribution of distance to nearest highway for Oregon slides. The red line is high damage slides, the blue line is low damage slides. 14 . . . . 102 List of Tables 2.1 Confusion Matrix, Undersampled Data . . . . . . . 2.2 Confusion Matrix, Oversampled Data . . . . . . . . . . . . 70 2.3 Confusion Matrix, SMOTE . . . . . . . . . . . . . . . . . 71 2.4 Confusion Matrix, CNN . . . . . . . . . . . . . . . . . . . 72 2.5 Confusion Matrix, ENN . . . . . . . . . . . . . . . . . . . 73 2.6 Confusion Matrix, Tomek Links . . . . . . . . . . . . . . . 74 2.7 Confusion Matrix, NCR . . . . . . . . . . . . . . . . . . . 74 2.8 Confusion Matrix, OSS . . . . . . . . . . . . . . . . . . . . 75 2.9 Votes produced when data is balanced with SMOTE. . . . . 77 2.10 Votes produced when training data is undersampled. . . . . 78 . . . . . . . . 70 B.1 KS-test results for all continuous variables. Includes the test variable 105 B.2 KS-test results for all continuous variables. . . . . . . . . . . . . . . 111 B.3 Sensitivity Analysis with only one rain variable included. . . . . . . 117 B.4 Sensitivity Analysis with only one min temperature variable included. 118 B.5 Sensitivity Analysis with only one max temperature variable included. 119 . . . . . . . . . . . . . . . . . . . . . . . . and a p-value of significance. B.6 Sensitivity Analysis with only one mean temperature variable included. 120 . B.7 Sensitivity Analysis with only one pressure variable included. . . . . 15 121 0.1 Introduction Shifts in weather extremes are one on the most dangerous expected impacts of climate change, due to their tendency to cause natural disasters. When averaged across the globe, extreme precipitation events have been found to be increasing (Alexander et al. 2006). As precipitation is the most common trigger of landslides, an increase in extreme precipitation is expected to result in increased landslide frequency (Dale et al. 2001) (Crozier 2010). Measurements of this frequency increase are at an experimental phase, because of high degrees of uncertainty in landslide data, slope models, and precipitation estimations from general circulation models (GCMs) (Crozier 2010). A method is used here to bypass the modeled rain that comes from GCMs to lessen one type of uncertainty. Large-scale atmospheric conditions associated with landslide activity are determined from composites of atmospheric conditions over all days with observed landslides. These atmospheric variables can be more confidently predicted using climate models than modeled precipitation can be (Gao et al. 2014). The method is presented here as a proof of concept and further work will create a numeric estimate of the expected change in frequency that landslides will undergo in with future climate change scenarios. Many high damage landslides come as a surprise. The land movement occurs at a high velocity, leaving little chance for the impacted area to be evacuated. Knowing where landslides are going to occur and which areas are particularly vulnerable to high damage slides would help reduce the risk that many people consistently live under. The implications of this research are underscored by the recent landslide in Washington that resulted in 41 casualties, one of the most fatal landslides ever to occur in the United States (Berman 2014). Each year, landslides cause hundreds of deaths globally, a number that could potentially be reduced with a better method of predicting high damage slides (Petley 2008). Climate change, along with other predicted global shifts such as population and land use that will influence landslide frequency and severity, increase the importance of understanding landslide risk. Quantifying landslide risk has to overcome challenges such as the lack of high 16 quality data, incomplete and uncertain models of landslide processes, and complex formulas for risk measurement (van Westen et al. 2006). What robust risk quantification does occur largely takes place on a site-specific level, for example by governments interested in evaluating their transportation networks or geological engineers looking to evaluate the danger to specific buildings. This high cost and inefficient method of measurement is unusable in places that do not have the resources to perform it, or on larger scales where a map of all risk over an area is desired. The accepted formula for landslide risk, initially proposed by Varnes (1984) is: R=E*H*V E is the elements at risk, H represents hazard, and V is a measurement of vulnerability. H is expressed as the probability of landslide occurrence. V is the expected degree of loss, on a scale of 0 to 1 (no damage to full destruction). E, the elements at risk, are the population, transportation networks, buildings, economic activity and other features of an environment that could be impacted, measured by the cost of the features. The formula appears simple, but calculating these individual measurements is complicated and causes the formula to become difficult to apply over large areas (van Westen et al. 2006). Machine learning techniques may be able to bypass this complex method of risk quantification. Several studies have successfully used machine learning to measure the susceptibility of slopes to movement, but it has been little used for complete landslide risk detection, or for measuring vulnerability (Yao et al. 2008) (Brenning 2005). If refined effectively, machine learning algorithms could be used to determine risk for large regions in an efficient way. Historical data about damage caused by many landslides, along with mapped data about anthropogenic and weather features at the time of the slide can be used to identify patterns about the intensity of landslide damage. This method could help a policy maker or planning official better decide land use and zoning patterns and identify which areas are worth preemptive action to avoid damage. Historical databases are one of the most valuable resources for studying landslides 17 and estimating their future occurrence. Switzerland is used as a study area, because of the availability of a comprehensive database that contains records of landslides spanning 40 years, including a measurement of damage along with other necessary information about the time and place of event occurrence (Hilker et al. 2009). This data is used as a test site for two types of analysis: 1) an analogue approach to detecting the occurrence of landslide days, determined by compiling the composite atmospheric conditions of days on which landslides were observed and 2) a machine learning approach to estimating the damage a landslide will cause, based on anthropogenic and weather conditions at the time of the slide. Following this is a discussion of best practice in the real world applications of these models. 18 Chapter 1 An Analogue Method to Detecting Landslide Response to Climate Change: Proof of Concept 1.1 Background The Intergovernmental Panel on Climate Change (IPCC) has reported that the frequency of extreme precipitation events has increased due to anthropogenic climate change (Solomon et al. 2007). Because the primary trigger of landslide activity is precipitation, in many areas landslides are expected become more frequent as a result. Because of the mechanisms that cause water to accumulate in slopes and affect their stability, an increase in total precipitation and extreme precipitation events will cause higher rates of failure (Crozier 2010). Several studies have described the mechanisms that will influence slope stability in response to climate change and though they have come to the same theoretical conclusion about landslide response to climate change, high uncertainty plagues attempts at quantifying an increase (Cruden and Varnes 1996) (Crozier 2010) (Collison et al. 2000). A common method of estimation involves applying downscaled GCMs to slope stability models, but doing this requires a high spatial resolution that many climate models cannot reach. Other studies have 19 tried to statistically establish a relationship between landslide occurrence and an increase in local rainfall, though this method can be complicated by natural variability in climate and environmental factors that is difficult to attribute to climate change or account for in this sort of analysis. High magnitude events, which are likely to cause more damage, may also increase with climate change because heavier rain events may cause more land to be displaced (Matsuura et al. 2008). Crozier observes that results from studies linking climate change and landslide frequency are subject to very high margins of error, largely due to factors related to climate models such as uncertainty in modeled weather predictions and the inability of projections to reach high spatial resolution (2010). Previous studies have found that, in general, GCMs do not reliably reproduce precipitation frequency (Dai 2006). Many models have a tendency to underestimate heavy precipitation, while overestimating less severe events. When these general circulation models are used to predict changes in landslide activity any uncertainty in precipitation is carried through. Coupled with other biases and uncertainties that landslide measurement is subject to, quantifying the increase in landslides becomes subject to extremely high uncertainty that reduces the robustness and usability of predictions. Gao et. al (2014) developed an analogue method to reduce the errors produced by climate models in estimating extreme precipitation. Precipitation generally results from the interaction of large-scale atmospheric features, many of which can be simulated more realistically by GCMs than precipitation currently can be. It was found that composites of these atmospheric conditions more faithfully reproduced extreme precipitation than the measurements of precipitation given directly from models (Gao et al. 2014). This technique was applied here to the prediction of landslides in a proof of concept whose end goal is to determine the effect that an increase in precipitation, and climate change more generally, will have on landslide occurrence. The results show that landslide-inducing atmospheric conditions can be identified and reproduce the occurrence of landslide days with reasonable accuracy. 20 1.2 Methodology Because landslides are frequently caused by extreme precipitation, when applying the analogue approach to landslide prediction many of the same atmospheric conditions will be present between landslides and precipitation. This study looks to improve predictions for the number of landslide dates resulting from an atmosphere altered by climate change over the current best available estimates. A proof of concept is presented here, later to be applied to climate models to identify a quantification of the general increase or decrease in landslide activity with climate change. The approach builds off the work done by Gao et al., which used large-scale atmospheric features to identify days of extreme precipitation (2014). Instead of targeting extreme precipitation to identify days of interest, in this study landslide days are targeted. Dates of observed landslides were gathered from the Swiss Flood and Landslide Damage database, a source which has 3366 records of landslides from 1972-present. Any days that had more than one slide were only considered once. The atmospheric variables that were considered as predictors of landslides are 500-hpa geopotential height, 500-hpa vertical pressure velocity, and total precipitable water. The atmospheric conditions of each landslide day were determined, and a composite was created which averages all of those days into one pattern. One way in which landslide events differ from extreme precipitation is that longer-term atmospheric conditions may be highly influential in producing the correct conditions for landslides. Landslides are dependent on conditioning factors such as soil moisture which are determined by a long timescale of precipitation patterns (Iverson 2000). The use of composites which averaged conditions over extended time periods was explored. The results of this will be shown later, but the one day composites proved strongest and created the most distinct patterns so they were used. From the composites, cutoffs for the identification of atmospheric conditions present on landslide days were identified, in the form of spatial correlation from a day to the composites, and the presence of "hotspots" - localized areas of continuity between the composite and an individual day. 21 1.3 1.3.1 Data Observed Landslides The Swiss Flood and Landslide Database was used to gather observed dates of landslide occurrence (Hilker et al. 2009). This database contains 3366 records of landslides that have occurred from 1972 to the present. The records largely originate from news sources. The dates were narrowed from a list of all slide dates based on criteria about certainty and the availability of atmospheric data. Any dates that were indicated as uncertain were removed and any date which had more than one slide was counted once. Because of the variation in weather patterns among seasons, the data was also separated by season for analysis purposes. The summer months (June, July, and August) have the highest landslide frequency but, as will be shown, the winter months (December, January, and February) have the strongest composite patterns. For this reason, the focus season is winter. Because of the availability of MERRA reanalysis data, only the years from 1979 and 2012 were used, even though the dataset begins in 1972. Figures 1-1 and 1-2 show the distribution of landslides in the Swiss Flood and Landslide Databased during the period 1979-2012 by the dates and months on which they occurred. 1.3.2 NASA-MERRA The composites of atmospheric conditions were generated from NASA's Modern Era Retrospective-analysis for Research and Applications (MERRA) (Rienecker 2001). MERRA is based on the GEOS-5 atmospheric data assimilation system, which is reanalyzed with NASA's Earth Observing System (EOS) satellite observations using an Incremental Analysis Updates (IAU) procedure to gradually adjust the model to observational data. Its special focus is on conditions relating to the hydrological cycle. The MAI3CPASM and MAI1NXINT products were used. These are MDSIC modeling and assimilation history files. MAI3CPASM contains pressure variables on 42 levels in 3 hour increments. Geopotential height, sea level pressure, and vertical 22 Number of Landslides by Month 6001 - 400 - 300 200 - -- - -- _ - - 100 - 0 \f (~4 ~ .~ 4' ~ *~$~ \ \'~ el Z ~' Figure 1-1: Number of landslides in Switzerland per month, period 1979-2012. Distribution of Slides by Date 50 45 L 40 35k 30 25 E Z20 10 - - 15 5 1975 2i015 Figure 1-2: Swiss landslides by date, 1979-2012. 23 pressure velocity were drawn from this file. MAI1NXINT has a single level resolution and is measured in daily increments. Total precipitable water was sourced from this. The data is all at a 1.250 resolution. 1.3.3 Data Processing The Swiss Flood and Landslide Damage Database was used to identify on which dates landslides occurred. MERRA data was drawn from those dates for composite generation. The variables drawn from MERRA were those that have been confirmed to be associated with heavy precipitation, including 500-hpa geopotential height, 500-hpa vertical pressure velocity, sea level pressure, and total precipitable water. To standardize the resolution with climate models, the resolution was linearly interpolated to 2.5'x2'. All variables have been converted to a standardized anomaly, defined as the anomaly from the seasonal climatological mean over the 34-year period under consideration, divided by its standard deviation. 1.4 1.4.1 Application Creating the Composites Composites are an average of the atmospheric conditions (presented as standardized anomalies) over all the days on which there were one or more landslides. The atmospheric conditions are determined for each day which has landslides and then all of those days are averaged together to obtain the composite atmospheric conditions. Several variables were used, which also have an association with precipitation events. They include 500-hpa geopotential height (Z500), 500-hpa vertical pressure velocity (w500), sea level pressure (SLP), and total precipitable water (TPW). Sea level pressure is not shown, as it was found to be redundant when also using geopotential height. Figure 1-3 shows the composites for DJF slide days in Switzerland. The shading in 1-3.a) is the standardized anomaly of 500-hpa geopotential height (Z500) and features 24 75N 55N70N 1AW -W2W -1 -0.8 0 1QE -0.6 2E -0.4 3 -0.2 5Ei3 W 4-E 0 0.2 20W 0.4 1 10W 0.6 0.8 20E 1 30E 4E 50E 1.2 Figure 1-3: Composites of all Swiss DJF slide dates. The colors of a) show the standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). a dipole pattern with a trough above Northern Europe and Scandinavia and a ridge over Algeria in Northern Africa and the Western Mediterranean. This image also shows vertical integral atmospheric vapor flux, in the arrows. The colors of 1-3.b) show total precipitable water (TPW) in the shading. The peak anomaly for this variable is over Switzerland and Southern France. The contour lines in this figure are 500-hpa vertical pressure velocity (w500). The composites that were used for analysis were based on 1-day atmospheric conditions over Switzerland, for all levels of damage and only for the DJF months. The composites for other time periods and seasons are shown in Figures 1-5 through 110, and they visually explain why the composite that was used was chosen. Figure 14 shows the composite 1-day conditions over landslides during the summer months (June, July, and August), and when compared to the DJF composites, the pattern is not as strong. The peak standardized anomaly is lower. The variables that are shown in each figure are the same as in 1-3.a) and b). Figures 1-5 through 1-10 show composites over longer time periods. Multi-day composites were considered because landslides can be caused by atmospheric condi25 63N 54N51N - 67N --- 48N4-5N l"w D 1 IOE -0.8 20E -0.6 -0.4 30E -0.2 4D-E 0 0 1lOW 0.2 0.4 1OE 0.6 0.8 20E 1 30E 40E 1.2 Figure 1-4: Composites of all Swiss JJA slide dates. The colors of a) show the standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). tions and precipitation that occurred several days beforehand, or can be influenced by average conditions over a longer time period instead of a sudden triggering event. In these cases, the conditions over the several days preceding a landslide are averaged, and then all those averages are combined to create one composite of the average of the preceding days for all landslide target days. Figure 1-11 reinforces the visual determination that the one day composites are the strongest. It plots the peak maximum and minimum values of the standardized anomaly for the composite Z500 value, as well as the difference between the ridge (positive values) and trough (negative values), for each of the multi-day composites. Figure 1-12 does the same for w500 and Figure 1-13 shows TPW. In all cases except for TPW, the strongest peak anomalies are seen in the one day composite. For this reason, the one-day composite was used instead of any of the multi-day composite options. The Swiss Flood and Landslide Damage Database also contains information about how much damage a slide produces. The final consideration in deciding the target days was whether or not to separate the high damage and low damage slides. The analysis that will be shown in Chapter 2, where the features of a landslide that are 26 70N 65N 6ON 55N SON 45N * 40 N 20W b 16w -1 -0.8 16E -0.6 26E -0.4 3 E -0.2 W E 0 0.2 1 0.4 1 0.6 2 E 0.8 1 30 4ME 1.2 Figure 1-5: Composites of all Swiss DJF slide dates with 2-day period. The colors of a) show the standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). 70N 65N 6ON 55N 5N 2 W 16W -1 b -0.8 10E -0.6 2 -0.4 E -0.2 4E 0 20 0.2 1 1 O.4 0.6 0.8 *E 1 36E 44 E 1.2 Figure 1-6: Composites of all Swiss DJF slide dates with 5-day period. The colors of a) show the standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). 27 70N 65N 6ON 55N SON 45N 40N 35N 3ON a) 16W -1 b 16 -0.8 -0.6 20E -0.4 4E 30E -0.2 0 b) 20W 0.2 loW 0.4 0.6 0.8 1 1.2 Figure 1-7: Composites of all Swiss DJF slide dates with 7-day period. The colors of a) show the standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). 55N 50N 46N 4ON 35N x jr 'W jr A' - X. jj .*,, fta V T: 3ON .:Z4 A A:& a 25N 21-W V V:-% 11 1W -1 0 -U - -Y .1 A.-* .6 - a) IE -U.b 20E -U.4 4 DE 30E -0.2 0 20W 0.2 10W 0.4 IDE 0 0.6 0.8 1 1.2 Figure 1-8: Composites of all Swiss DJF slide dates with 10-day period. The colors of a) show the standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). 28 70N65N 60N' 55N, SON 40N -4--+-*--Y.-I - - 45N -E 35N 30N - -) 0) 0W low 6 -1 -0.8 1E -0.6 24E -0.4 b) 3XE -0.2 4 IE 0 2 0.2 0.4 0.6 0.8 1 1.2 Figure 1-9: Composites of all Swiss DJF slide dates with 14-day period. The colors of a) show the standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). 70N 65N -A.mp 6ON map 55N 5N, - 45N A. dW0. 40N .W 35N .7--- 30N- 0) 25N 21 W 16W - 1 b -U.8 16E -U.6 26E -U.4 36E -0.2 4 DE 2 0.2 0 W 6 16W 0.4 0.6 16E 0.8 26E 1 -6E 40E 1.2 Figure 1-10: Composites of all Swiss DJF slide dates with 30-day period. The colors of a) show the standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). 29 Z500 0.8 0.6 0.4 a0. -$-Min 1 0 Z 2 3 4 5 10 7 - 14 --0.2 Max r2-Difference -0.4 -0.6 -0.8 -- _______________ Number of Days Figure 1-11: Peak Standardized Anomaly of Composite Z500 for Various Time Spans. The trough (negative anomaly, here labeled as min), and ridge (positive anomaly, here labeled as max) are both variables of interest and so are included. W500 0 1 2 3 4 s 7 10 14 30 -0.1 E -0.2 -0.3 * C. -0.4 -0.5 -0.6 Number of Days Figure 1-12: Peak Standardized Anomaly of Composite w500 for Various Time Spans. 30 TPW 0.45 0.4 0.35 101 0.3 S0.25 -.- Max 0.2 0 z 0.15 0.1 0.05 0 1 2 3 4 5 7 10 14 30 Number of Days Figure 1-13: Peak Standardized Anomaly of Composite TPW for Various Time Spans. likely to produce high damage were studied, had shown that precipitation was relevant to the amount of damage a landslide would incur. Separating severe slides from less damaging ones could help create an individual estimate of the future increase in specifically high damage slides. Evidence of a difference in atmospheric conditions between high and low damage slides could be used to infer a change in frequency of particularly high damage slides as a result of climate change. The composites for these two sets of target dates, Figures 1-14 and 1-15, show a difference in the peak anomalies of atmospheric conditions, however, the high damage landslide dates did not show a consolidated atmospheric pattern. The high damage composite for the Z500 variable shows a disperse pattern which spans a large area, from Greenland to Scandinavia, over much of the North Atlantic. For comparison, the low damage composite for the same variable shows a pattern focused over Scandinavia and Northwestern Africa. Identifying hotspots, a localized pattern of atmospheric consistency among the days which will be discussed later on, is difficult for such a broad pattern. There were only 17 high damage landslides in this period, which is likely the cause of the disperse atmospheric conditions. 31 3 ON IMIj, 10 11111111 ji 251-40W 30W 20W 0 10W 10E 20E 30E 40E 50W 40W .3DW 20W SDE 1OW 0 16E 20E 30E 40E 50E b) a) -1 -0.6 -0.8 -0.4 -0.2 0.2 0 0.4 0.6 1.2 1 0.8 Figure 1-14: Composites of all Swiss DJF slides that incurred high damage. The colors of a) are the standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). 7N )0N 55N SON4,5N 35N 25N V 15N loW 20W -1 0 -0.8 1 -0.6 20E -0.4 3E -0.2 20W 46E 0.2 0 0.4 loW 0.6 0 0.8 30E 2E 16E 1 40E 1.2 Figure 1-15: Composites of all Swiss DJF slides that incurred low damage. The colors of a) are the standardized anomaly of 500-hpa geopotential height (Z500) and the arrows show vertical integral atmospheric vapor flux. The colors of b) show total precipitable water (TPW) and the contour lines are 500-hpa vertical pressure velocity (w500). 32 1.4.2 Analogue Determination Hotspots Following the method of Gao et al., the composites shown in Figure 1-3 are used to identify a pattern which can detect the occurrence of landslide events (2014). The first part of detection is the identification of hotspots. Each grid from the atmospheric conditions of each day has either a positive or negative value for the standardized anomaly. The composite, as an average of the anomalies, has its own sign for each grid. A map is produced which measures the number of individual days that make up the composite that have the same sign as the composite at the same grid. From this map, a "hotspot" is identified, defined as a cluster of grid cells which show strong evidence of consistency among many members. The grid(s) with the maximum consistent sign count serve as a lower threshold to the smallest "hotspot" that must be matched. One criteria for identification as a landslide event is that the atmospheric variables consistently match the signs of the hotspot grid cells. This is one feature that will be used to identify the atmospheric conditions for a landslide day. Spatial Correlation The second part of detection is a cutoff in the spatial correlation between the composite and an individual day. Spatial anomaly correlation coefficients (SACCs) are calculated between the composite members and the composite, and between the daily MERRA values and the composites. Figures 1-16 through 1-21 show density plots and histograms of spatial correlation, normalized as a percentage of days in the set. In general, as spatial correlation increases, a day has a higher likelihood of being an observed slide day. The statistical difference in spatial correlation between observed landslide days (members of the composite) and the remaining days is analyzed to determine a cutoff in SACC value that makes a day more likely to have a landslide. The cutoff is measured as the spatial correlation value above which a day has a statistically determined higher than random chance of being a landslide day. Of all the days in the 34 year time span, 125 of them are landslide days and 2944 are not. 33 Z500 55 50 I 45 I 40 ;;. 35 o 30 * Landslide Days 25 * Non-Landslide Days 15 - 4) 20 10 ____R__L 9 05._ <0 0-0.1 0.1-0.2 0.2-0.3 0.3-0.4 0.4-0.5 0.5-0.6 0.6-0.7 0.7-0.8 0.8-0.9 0.9-1.0 Correlation Figure 1-16: Spatial Correlation of Z500. Z500 c' - Slide Days - All Days - Non-Extreme Days 62 46 0 0 0 0 N 0 _ 0 0 -1.0 -0.5 0.0 0.5 Spatial Correlation Figure 1-17: Density Plot of Z500 spatial correlation. 34 1.0 w500 60 55 45 40 035 0 00 30 i Landslide Days 25 - -Non-Landslide Days 20 10 - 15- a 5 0 <0 0-0.1 0.1-0.2 0.2-0.3 0.3-0.4 0.4-0.5 0.5-0.6 0.6-0.7 0.7-0.8 0.8-0.9 0.9-1.0 Correlation Figure 1-18: Spatial Correlation of w500. o500 -Slide - -0.5 0.0 All Days Non-Extreme Days 0.5 Spatial Correlation Figure 1-19: Density Plot of w500 spatial correlation. 35 Days TPW 60 55 50 45 40 M 35 too " 25 Landslide Days " Non-Landslide Days is3 20 10 5 0 <0 0-0.1 0.1-0.2 0.2-0.3 0.3-0.4 0.4-0.5 0.5-0.6 0.6-0.7 0.7-0.8 0.8-0.9 0.9-1.0 Correlation Figure 1-20: Spatial Correlation of TPW. TPW - Slide Days All Days - Non-Extreme Days a C4 0 -0.5 0.0 0.5 Spatial Correlation Figure 1-21: Density Plot of TPW spatial correlation. 36 Drawing from this population at random, the probability of drawing a landslide day is 0.0407. If the landslide and non-landslide days had the same distribution across spatial correlation, the percentage of slide/non-slide days would stay consistent with this 0.0407 value for each bin in the histogram of spatial correlation. However, because slide days are more likely to have a high correlation measurement, the proportion of slide days over total population in each bin increases. The proportion of landslide days can be taken as a binomial distribution, where p=0.0407 is the expected probability that a random day is going to be a landslide day. An example of a binomial distribution is a coin toss, where there are two possible outcomes; in this case, the two outcome options are whether a day has landslide activity or not. There is some uncertainty in the stated probability, because of the likelihood that landslides go unrecorded in datasets, for reasons such as they are in remote areas or do not have any human or urban impact. Therefore, the formula for standard deviation of a binomial distribution is: np(l -1-p) where n is the population count and p is the probability of success (it is a landslide day). ,/np(l - p) = V/3069 * 0.0407(1 - 0.0407) = 10.9464 So the number of landslide days has a standard deviation of 10.9464. Using that in the formula for probability over a binomial distribution, NL + 3a P NL + NN === 125 + (3 * 10.9464) 125+2944 0.0514 In this formula, NL is the number of landslide days and NN is the number of total days in the subset. This value can be used as a statistically significant minimum from which setting the cutoffs can begin. Above this cutoff probability, the likelihood that a day will be a landslide is higher than if it was a random day in the dataset, 37 to a statistically significant degree. The increase in landslide likelihood is due to the increase in spatial correlation. Days with a higher spatial correlation have a higher chance of being a landslide day than could be attributed to randomness or chance. The spatial correlation that this probability measurement corresponds to is 0.4 for Z500 and TPW, and 0.3 for w500. Criterion of Detection Further following the method from Gao et al., the hotspots and spatial correlation are combined to create a criterion of detection for the occurrence of landslides (2014). It was determined that many of the observed landslide days share these common features. In order to be identified as a landslide event, a day must meet the following criteria: 1. Three or more variables (ridge (positive anomaly) of Z500, trough (negative anomaly) of Z500, w500, and TPW) must have signs consistent with the hotspot grid cells. 2. At least one of three variables (Z500, w500, TPW) meets the cutoff for spatial correlation. 3. The spatial correlation of the same three variables all have to be positive. If these three conditions are met, the day is tagged as a landslide day. The cutoffs and hotspots previously identified serve as a minimum threshold, and can be refined to stricter values if the criterion identifies too many days in calibration. The criterion of detection begins with the values defined above, the statistically significant minimum for spatial correlation and the smallest number of grids that can define a hotspot. These cut-offs are further refined into stricter values until the same number of observed landslide days is approximately matched with the number of days that fit this criterion in 34 years of MERRA observations. 38 1.4.3 Success Rate The criterion of detection was evaluated on 34 years of MERRA data, including all the DJF days between 1979 and 2012. All days are compared against the composite constructed from all observed DJF landslide days to determine hotspot sign consistency and spatial correlation. If a day meets the criterion, it is considered a landslide day. Performance is measured by the success rate in identifying an observed landslide date, and the number of false positives that the criterion detects. The success rate is the fraction of days that the criterion detected that match the observed landslide days over the number of total observed days. False positive is the fraction of falsely identified days over total identified days. The success rate reaches 20-25% when exact matches are considered. It improves to 25-29% if the window for event matching is increased so that a date generated by the analogue method is within 1 day previous to an observed day. Increase this window to 2 days and the success reaches 29-30%, 3 days and it reaches 35-36%, up to 7 days and it reaches 45-50%. The false positive rate for an exact match is 79-81%, for 1 day it drops to 70-73%, 2 days at 67-71%, 3 days to 63-70%, up to 7 days where the false positive rate is 52-56%. The criteria was allowed to be relaxed because oftentimes the triggering weather event for a landslide occurs several days previous. Relaxing the criteria is another way to accommodate longer timescale conditioning factors which influence landslide occurrence, besides using multi-day composites. There is a naturally high uncertainty rate in predicting landslides, particularly when estimating their occurrence from solely one aspect of their mechanisms, as is done here with the weather pattern. The high false positive rate may be partially due to the incompleteness of data, when there may be landslides but they were not observed or recorded. Some of the dates identified during the calibration may have had a landslide, but it would have been in a remote area or did not cause damage. 39 1.5 Discussion and Future Work What has been presented exists as a proof of concept for an analogue detection of landslides. Calibration has produced robust rates of success, high enough to trust that this method is valid. The criteria of detection can replicate the number of observed landslide days within a time period with MERRA data, and can reach a better than 50% success rate at getting the day right to within a small window. The next step is to use these results with climate models in order to quantify a change in landslide activity. The results will be applied to CMIP5 climate models, which simulate all the atmospheric conditions of interest. The frequency with which the criteria of detection occur in CMIP5 will serve as an estimate of the frequency of landslide activity in the future, and can be measured under various climate change scenarios. This study has been able to show that the atmospheric conditions present at the time of a landslide are a good means of detection for occurrence and can be used in the future to estimate the change in landslide activity according to predicted climate change patterns. This method presents a novel means of correlating landslides with climate change and expected alterations in precipitation occurrence, offering improvements over alternatives by removing the reliance on modeled precipitation. Most methods of correlating landslides with climate change extract a direct measurement of precipitation change from climate models. Since landslides generally occur under the most severe precipitation events, and since climate models have been proven to unreliably estimate extreme precipitation, this method improves upon climate change-landslide impact measurements driven by modeled rain estimates. One potential source of bias is the non-reporting of slides, a common shortcoming in many landslide datasets. It may be suspected that the conditions are right for a landslide, yet the slide could have taken place in an area which was too remote to record a slide or the data could be complete and no slide occurred. This analysis also occurs on a large scale - the weather patterns are widespread and the study area is all of Switzerland, which necessarily ignores more smaller scale conditions which influence landslide occurrence, such as slope stability or local weather patterns. The 40 results should be viewed as an estimation of slide frequency rather than any prediction as to exactly when a slide will occur. In future work, this method could also be coupled with other changes that are expected to occur with climate change, such as a change in vegetation, temperature, and soil moisture to more fully estimate the impact that climate change will have on landslide activity. Despite these shortcomings, the analogue method has the opportunity for improving on large-scale predictions of landslide frequency. It offers a robust way to analyze landslide response to climate change, over calculating a percentage increase in slide activity as a direct function of a percent increase in rain, and it can be applied to any area with reliable weather data and a list of dates on which slides have occurred. The ultimate measurement that can be determined from this method, a quantification of change in landslide frequency, can help direct priorities for landslide maintenance. An increase in landslide activity would indicate that mitigation should be done, perhaps a monitoring system should be put in place, and further emphasis should be placed on directing construction away from unstable slopes. If a nation of other jurisdiction spends X amount of money on landslide damage and resultant repairs yearly, this could provide a measurement of how much they can expect to spend in the future, and can budget accordingly. With better data, this method could provide an estimate for the amount of damage landslide will create by correlating increases with the amount of money spent, or correlating atmospheric conditions with damage produced by a slide. 41 42 Chapter 2 Modeling the Damage Incurred by Landslides 2.1 2.1.1 Background Vulnerability Analysis The equation for landslide risk, as explained in the introduction is R = E * H * V (Varnes 1984). This part of the study aims to quantitatively evaluate one part of this equation: vulnerability. Switzerland is used as a case study, as it was in Chapter 1 of this thesis. Measuring vulnerability provides a basis for decision-making in response to a potential landslide threat. In fields such as disaster response, urban planning, and transportation planning, knowing the vulnerability of an area is important because when combined with a hazard measurement (which shows where and when landslides are likely to occur), vulnerability produces a picture of total risk (Alexander 2005). Hazard maps and analyses are common, and the field of measuring hazard is well-studied (van Westen et al. 2006). It largely relies on understanding the causal factors - geophysical and weather-related - that go into landslide occurrence and recognizing them over a wide area. Spatial analysis tools such as GIS are used to derive relevant data features of landslide prone areas (Carrara et al. 1999). Once 43 the causal factors are known, and those features have been identified over the area of interest, a hazard map can be made. Vulnerability can be evaluated in largely the same way. With an understanding of causal factors and an identification of where the influential features are present, vulnerable areas can be recognized. Literature reviews have noticed a decided lack of vulnerability studies within research on landslide risk assessment (Glade 2003). They assert that a lack of information about vulnerability inhibits determination of risk (Galli and Guzzetti 2007). Techniques for vulnerability studies for landslides are largely drawn from those developed from other hazards such as earthquakes or floods. Two general types of approaches can be taken towards evaluating landslide hazard. One is a qualitative approach, which creates a list of exposed elements (all buildings or other urban features over an area) and assigns them an empirically evaluated index for vulnerability, generally on a 0-1 scale (none to total destruction likely) (Maquaire et al. 2004). This method is difficult to apply efficiently over large areas because each element is evaluated individually, but it is potentially useful for local or urban governments (van Westen et al. 2006). A quantitative approach to evaluating vulnerability uses models and damage functions to decide the impact of a landslide (Maquaire et al. 2004). This method depends on the ability to acquire detailed data about past occurrences. The approach taken here is quantitative. We aim to identify the causes of landslide vulnerability from an analysis of historical landslide occurrence, which includes consideration of both spatial and temporal factors, anthropogenic features and weather conditions respectively. It will do this by applying data mining and machine learning procedures to the data, techniques which have previously been utilized in studies on hazard analysis (Yao et al. 2004) (Brenning 2005). A benefit to this approach is that it combines detailed mapping on a small scale with a wide area of study. By using part of the data for training and part for testing, the model success can be validated, a process which is frequently missing from hazard analysis (Chung and Fabbri 2003). Machine learning is efficient, requiring relatively small amounts of time for computation once models have been properly refined and given all the necessary 44 data (Kostiantis 2007). It can also easily accommodate changing future conditions, such as shifts in population density or expected climate change. 2.1.2 Machine Learning Approach Data mining is a field within computer science that is used for "the extraction of implicit, previously unknown, and potentially useful information from data" (Witten and Frank 2005). Machine learning uses algorithms to infer the underlying structure of data and extracts information from data into a usable format. In the application here, it will be used to learn about landslide vulnerability, taking data about each historic occurrence and drawing conclusions and predictions about resultant damage. Machine learning has been successfully applied to a variety of real world problems, from cancer detection to text identification to business decision making. It can be broadly separated into two categories: supervised and unsupervised learning (Hastie et al. 2009). Supervised learning works on data where the output labeling is already known. It uses a group of inputs, which may have some level of influence over the output, as predictive variables (Hastie et al. 2009). Unsupervised learning uses data that comes with no output labeling, and its primary purpose is in clustering the data into groups with some level of homogeneity. Because the data from the study area here has an output in the form of level of damage, supervised learning is used. A problem can be one for either classification or regression (Hastie et al. 2009). In regression, the output is continuous. In classification, the output is a class labeling. The problem here is one for supervised classification, because the data has a labeling which consists of two classes, high or low damage. Landslide damage will be classified using a machine learning algorithm based on available information about conditions at the time of each slide. The dataset is made up of N vector samples, which combine to form a matrix X, and where xi represents the ith sample in the set. Each sample xi is an individual feature vector, containing all the information, or features, of each sample (Hastie et al. 2009). Each feature vector is associated with a class label, Y within [0, 1] in the case of a binary classification. In this case the options are low or high damage, which can be represented as binary. 45 Figure 2-1: Depiction of the steps to create a supervised classification model. Prediction is the same as testing. (Bird et al. 2009) b) The combination (xi, yi) of feature vector and labeling is one sample or instance. The classification algorithm aims to find a function f(X) that accurately reproduces Y. There are two general steps in machine learning: training and testing. The algorithm determines f(X) on a subset of all the data, in a process called training. The function produced by the algorithm is meant to perform well on the training data, though rarely does it reach perfect accuracy. Perfect accuracy on the train- ing data is also undesirable because the resulting classifier has likely overfit the data(Hastie et al. 2009). For the function to be usable, it must perform well on a separate set of testing data. Test data is a set of feature vectors which the classifier has not seen before, but which are presumably drawn from the same distribution as the training set. In practice, a certain percentage of the entire source data is removed and reserved for testing before any modeling has taken place. Testing error is one measurement of model success. It is a measure of how well the model generalizes. A successful model will minimize test error (Hastie et al. 2009). In summary, the steps taken in creating a classification model are shown in Figure 2-1. If the test success is unsatisfactory, the model parameters can be tuned along each step of the building process. 46 Preliminary tests to determine which classification algorithm to use were done in Weka (Waikato Environment for Knowledge Analysis), a software which is able to apply many standard machine learning techniques to data (Witten and Frank 2005). Preliminary results indicated promising success with the random forest algorithm. Before tuning, accuracy on all the data was recorded at close to 90%, though with much higher error in classifying high damage slides. This will be discussed in more detail later on. 2.1.3 Random Forest Algorithm Random forest, first developed by Breiman, is an algorithm which consists of an ensemble of decision trees with vote amongst themselves to decide upon a classification (2001). It uses bagging (bootstrap aggregating) to average individual models. Bagging is a technique where each model in the ensemble votes with equal weight, in this case each decision tree in the ensemble forest. Random forests have a reputation for strong performance, even before tuning (Hastie et al. 2009). The general structure of a decision tree classifier is shown in Figure 2-2. It is the basis of random forest algorithms and on its own is another supervised classification algorithm. A decision tree sorts data on m levels. Each level has internal nodes, each node representing a feature of the data, each branch from the node corresponding to a value of that feature (Mitchell 1997). Random forests consist of n trees with m levels each. n is generally a very large number, limited by the fact that very high numbers may introduce longer computation time. The default value for m is the square root of the number of variables in the feature vectors. Each tree is grown in a subspace of m randomly chosen variables, which means it is trained on only those random variables. This introduces variability into the trees (Breiman 2001). 47 ..-.. ...... . -....... -.. root .....-.--------------------------- depth 0 node t --- ~~C) ---Mee c66 ,D( ---------.--- depth I ----- ------- A . j k depth (M-1) iterminals (class labels) CQ) - subset of classes accessible from node i F(I) - feature subset used at node t D(I) - decision rule used at node t Figure 2-2: Structure of a decision tree. (Safavian 1991) 2.2 Methodology Machine learning techniques will be used to build a robust model of landslide damage, based on the hypothesis that information about past events can help us predict the damage of future events. A historical database of landslide occurrence and damage, geospatial data, and weather data are used to prepare the feature vectors. Feature vectors are created which contain elements reflecting both spatial risk, in an anthropogenic sense, and temporal risk, in the weather patterns surrounding each slide date taken from the historical inventory. Each slide has been classified into high or low damage landslides, and the output of the algorithm is a classification of landslides, based on the information in the feature, into low damage or high damage. Two classes were decided on because of the organization of the data, as discussed in a later section. The algorithm trains on a subset of 80% of the data, and tests on the remaining 20% in order to assess the success that it achieves. The training and test data sets are randomly drawn without replacement from the full set. 48 To validate a model's success, the data is resampled five times. The algorithm is trained on each subsample ten times. Any measure of accuracy is an average of fifty times that the algorithm is applied. Based on the results, the algorithm is refined and tuned. Some features of the data required refinement of the process for better success, such as the feature vectors having a large number of features which may not be significant, severely unbalanced classes, and unbalanced success between the classes. Total success is presented based on the best refinements of the model. 2.3 Data Inferences will be drawn about the future of landslide activity based on an examination of past occurrences. Each instance if therefore drawn from a historic dataset that contains thousands of individual slide records. To create feature vectors which will inform the algorithm, each instance is taken from the historic record and the conditions at the time and place of the slide are taken from other datasets. Each dataset used to create the feature variables is described in detail below. The data has a variety of collection techniques, from historical record-gathering, to GPS road maps, to remotely sensed observations. Various software such as Quantum GIS, ArcGIS, and Matlab were used to calculate variables for the feature vectors. Each feature either represents an element that has historically incurred damage from landslides or a potential cause of the landslide's severity. The elements of damage are population density, transportation networks, land cover, GDP, and buildings. The variables reflecting landslide severity mechanisms include gradient, rain, atmospheric pressure, and temperature. There may be some interaction between these two types of features, for instance building a road may destabilize a hill, but the road will also be broken, or agricultural activity, which may both be a cause and an impacted feature of a landslide. Other features such as the power system and the water supply, which have been impacted, or the amount of deforestation, which can be a cause of activity, are relevant but have not been included here because of data availability. The results presented here are from a case study in Switzerland. Previous work 49 was done with a dataset in Oregon, but due to issues of data quality, results were not included here but are discussed in Appendix A. 2.3.1 The Swiss Flood and Landslide Database The same dataset which was used for identifying landslide days for the analogue method in analyzing the atmospheric patterns surrounding landslides was also used for modeling the amount of damage. Substantial historic datasets of landslides are difficult to come by and are subject to many issues and biases, such as incompleteness, short time periods, and uncertainty surrounding measurements like dates. The Swiss Flood and Landslide Damage Database contains records of landslides and floods that have happened in Switzerland from 1972-2012 (Hilker et al. 2009). The original source of the data is in news reports, and all of them have incurred some type of damage. The records are contained in a large spreadsheet and each landslide in the dataset has been documented in a standardized way, which includes information about the amount of damage caused in a category based on the cost in Swiss Francs, the date, the exact location, any fatalities, the triggering event, and any other further information that was available. A level of uncertainty is included in each measurement, and those with uncertain dates or location were disregarded in this analysis because exact dates and locations are needed for information about weather patterns and local conditions. Figure 2-3 shows the location of all of the slides that were modeled. It separates the slides by level of damage. All the landslides from 1979 on that had a date indicated to be certain were included in this analysis because of data availabity. Each landslide contains a record of how much damage was incurred, measured in Swiss Francs. The damage was binned into three categories: High (over 2 million francs), medium (0.4-2 million francs), and low (0.01-0.4 million francs). The damage is only a record of direct damage (e.g. damage to a building or a road), not indirect damage or death/injury. Before removing any of the data, there were 3085 low damage slides, 210 medium damage, and 71 high damage. Because of the small number of high damage landslides, high and medium damage slides were combined into one class for evaluation purposes. The 50 Figure 2-3: All Swiss landslides used in modeling. Red dots indicate high damage, yellow are medium damage, and green are low damage. final dataset in use contained 218 high damage cases and 2255 low damage instances. Figure 2-4 shows the distribution of slides, separated by damage level and by those which have been excluded and included. 2.3.2 NASA SEDAC Information about population density came from NASA's Socioeconomic Data and Applications Center (CIESIN et al. 2005). Gridded maps of population density are available in 5 year periods from the years 1990-2015 from a dataset called the Gridded Population of the World. The data is on a 2.5 arc-minute scale and is sourced from satellite data. Years 2005-2015 are projected future estimates because the data was gathered before 2005. For each slide, the location was used to identify the population density at that point, as a weighted average over the grid it lies within and its surrounding 8 grids. Each landslide is dated, so the population density used for each slide was also a weighted average from the closest years where data was available. 51 Number of Slides 3000 - 3500 2500 2000 Slides 1500 Excluded U Slides Included in Modeling 1000 500 0 High Medium Low Figure 2-4: This figure displays the count of all slides that were included and excluded, separated by the amount of damage. 2.3.3 GDP Switzerland's Federal Statistical Office makes available statistics on its GDP by canton (FSO 2013). Cantons are the states of Switzerland; there are 26 of them. The data from 2008 was used, the earliest year for which this information was available digitally. This information could provide a reason why some landslides cost more than others, or indicate which cantons are best positioned economically to recover from disasters. Some economists have argued that disaster risk increases with GDP per capita levels (Kellenberg and Mobarak 2007). More developed areas, as shown with higher GDP, may have higher value infrastructure and damaged elements, which could be a reason why a particular slide incurs more damage. Others have argued the contrary, that more developed areas are better equipped to mitigate risk, and thus would incur more damage. The exact nature of the relationship between GDP and amount of damage is still contested, but using a machine learning method means that the relationship between each variable and the outcome does not have to be known in advance. Figure 2-5 shows the number of high and low damage slides, separated by canton. The GDP for each canton is also given in Swiss Francs. 52 Slides by Canton and Damage ---- - - Basel-Stadt (147,769) Zug (125,302) Geneva (106,484) Zurich (94,515) Schaffhausen (73,550) Basel-Landschaft (69,792) Neuchatel (69,298) Aargau (66,812) Bern (66,086) Ticino (65,909) Vaud (65,671) St. Gallen (64,077) A Solothurn (60,580) Lucerne (59,222) Glarus (58,236) Jura (58,041) Graubunden (56,843) Obwalden (56,150) U Thurgau (55,192) Nidwalden (54,443) Schwyz (53,927) Valais (53,354) Fribourg (51,016) Appenzell Innerrhoden (48,553) Appenzell Ausserrhoden (47,890) Uri (47,195) --- -ma &Im-s E High Damage N Low Damage - U-mm _________ 0 50 100 4- 200 150 250 300 350 400 Number of Slides Figure 2-5: Modeled slides, separated by canton/GDP and damage level. 2.3.4 Weather Data The European Climate Assessment and Dataset project maintains daily observational data from meteorological data around Europe. E-OBS is a daily gridded observational dataset that includes mean temperature, minimum temperature, maximum temperature, sea level pressure, and precipitation (Haylock et al. 2008) (van den Besselaar et al. 2011). The data comes from a number of meteorological observation sites across Europe. The grid is a 0.25 degree box. Each of these variables is included in the feature vector for each slide over a number of date ranges, from the day of to the previous 365 days. 2.3.5 Transportation Data A number of datasets were used from ESRI, as provided in ArcGIS base layer data. The original sources were a number of companies that provide mapped transportation data, largely for GIS or GPS uses. The measurements were done with Quantum GIS 53 and ArcGIS. The transportation variables in the feature vectors are: 1. Number of Roads in 0.12 km 2. Number of Roads in 2 km 3. Number of Highways in 0.12 km 4. Number of Highways in 2 km 5. Number of Railroads in 0.12 km 6. Number of Railroads in 2 km 7. Length of Road in 0.12 km 8. Length of Road in 2 km 9. Length of Highway in 0.12 km 10. Length of Highway in 2 km 11. Length of Railroad in 0.12 km 12. Length of Railroad in 2 km 13. Distance to Nearest Road 14. Distance to Nearest Highway 15. Distance to Nearest Railroad These distances were chosen because they were the mean and max landslide runout distances identified from another landslide database in Oregon. Modeling was started on this dataset, but data quality issues meant the results were not as robust as they could have been. However, for consistency in modeling and since the data from Switzerland provided no data on runout distance, the radius measurements on 0.12 and 2 km were maintained. Railroads, roads, and highways all incurred damage in several historic instances, and these measures provide an indication of the amount of human activity occurring in the region. 54 2.3.6 Buildings Information about buildings came from OpenStreetMap data, and open source mapping dataset. The data comes from a number of original sources, including surveys, public access government data, and aerial photography. It is available for public download and is where building data was obtained for all of Switzerland. The number of buildings in 0.12 km and 2 km were calculated, the same radius used for transit data. this variable provides a more nuanced look at potential damage to structures over a simple land-use categorization. 2.3.7 Land Cover The Corine Land Cover database covers all of Europe in a 100 square meter land cover grid (Bossard et al. 2000). It is maintained by the European Environmental Agency and available for the years 2000 and 2006 for Switzerland. Each grid is identified as one of 51 possible types of land cover. For the feature vectors, each landslide was matched to a grid in the closest year's data. Figure 2-6 shows the number of landslides per land cover category. 2.4 Application Creating the feature vectors was the first step in modeling, and once each slide had the same information stored in its vectors, a random forest algorithm can be applied to the data. The specifics of the data provided several challenges to address when applying the algorithm, such as unbalanced class populations, variables which may be irrelevant, and unbalanced success for the two classes. Each of theses issues has many possible solutions, so this section will discuss algorithm application and refinement, examining each issue and the methods used to confront them, and the varying levels of success each achieved. 55 Distribution of Landcover for Swiss Landslides Water Bodies W Water Courses Sparsely Vegetated Areas Bare Rocks Transitional Woodland-Shrutb Moors and Heathland Natural Grasslands Mixed Forest Coniferous Forest Broad-Leaved Forest Agro-Forestry Areas Principally agriculture, with significant vegetation Complex Cultivation Patterns Pastures Fruit Trees and Berry Plantations Vineyards Non-Irrigated Arrable Land Construction Sites Road and Rail Networks and Associated Land Industrial or Commercial Units Discontinuous Urban Fabric Continuous Urban Fabric 0 7 7 0 100 200 300 400 500 600 Figure 2-6: Distribution of land cover for all Swiss slides used in modeling. 2.4.1 Determining Important Variables There are a total of 165 possibly influential variables, but each tree in a random forest is generated using only a randomly chosen subset of all the features. Because all of the variables have not yet been guaranteed to influence the class outcome, there is a chance that the variables that have been randomly chosen will not actually assist a classifier in creating the correct outputs, and that they will be just noise. If the algorithm uses many irrelevant variables, its performance may be poor. A number of statistical test were run with the aim of identifying which features differ between classes, and therefore could be determined to be influential to the classification. A ranking of the variables by combined significance from the tests is located after a description of the texts, in a summary. 56 KS-test A Kolmogorov-Smirnov test is a statistical test which compares the probability distributions of two-samples. It makes no assumptions about the underlying distribution of the data. The null hypothesis is that the two samples are drawn from the same distribution, and refuting the null hypothesis concludes that the samples are drawn from two different distributions. The test computes a distance between the empirical cumulative distribution functions (ecdf) of each sample and quantifies the results as the ks-statistic. The p-value is a measure of significance that allows acceptance or rejection of the null hypothesis. The traditionally accepted cutoff for significance is at least 0.05, if not the stronger value of 0.01, where there is a strong chance that the null hypothesis can be rejected. In this application, the distribution of the low damage slides was compared with the distribution of the high damage slides for each of the non-categorical variables. The results of the ks-test are shown in a table in Appendix B. Figures 2-7 through 2-10 show the distributions of selected variables which the ks-test result deemed to be significantly different between the high and low damage slides. Sensitivity Analysis In addition to the ks-test, sensitivity analysis was used to determine how the models responds to the removal of each variable. This method consists of removing features from the training set one at a time and recording the success (Guyon and Elisseeff 2003). If removing one variable causes a significant decrease in success for a model, it can be deemed significant. The model needs it to successfully classify the data. If a variable is removed and the model success increases, the variable may be noise or may be confusing to the model, and not helpful for classification. No change also implies that the variable is not helpful. For this, the algorithm is run ten times for validation, while removing each variable one at a time. The success on the training data set is recorded. Over each data removal, the training dataset and the test set 57 a 8 a -Ia1 C% 8 8 C-? B, I 0 I I I 2000 ~~~1~~~ U 4000 6000 low high 30 Day Rain Figure 2-7: Features of the 30 Day Rain Distribution. Red is the high damage landslides, blue is the low damage landslides. a a S3 a j 8b C= I 0 I 500 I - LS?c% t 1500 low 2500 high 4 Day Rain Figure 2-8: Features of the 4 Day Rain Distribution. Red is the high damage landslides, blue is the low damage landslides. 58 8 83 ii> K I 0 - 8 4 1 4000 I 1 t 6000 low high Population Figure 2-9: Features of the Population Density Distribution. Red is the high damage landslides, blue is the low damage landslides. 8 8 8 8 0 50000 low 150000 high Lesngth of Road in 2 krn Figure 2-10: Features of the Length of Road in 2 km Distribution. Red is the high damage landslides, blue is the low damage landslides. 59 remain the same, and the majority class has been undersampled to balance the data. Appendix B includes the results of the sensitivity analysis. It lists the mean success rate for the runs with each variable removed, and the results of a t-test that compares the distribution of success between the model with all variables and the model with each variable removed. The p-value of that test is included here, telling whether there each variable makes a significant difference in success.Commonly accepted significance levels are 0.05 or 0.01. Again, as with the ks-test, we are most interested in ranking the variables for significance. The table is sorted according to increasing success rate. For reference, the first row is the success with no variables removed. Because of high correlation amongst variables, in several cases the data is also removed in related segments, for example all the rain variables are removed at once, or all the temperature variables. When these sets of variables are removed, each variable is added back in one at a time to test which gives the largest improvement on success. This is also shown in Appendix B. Sensitivity analysis performed by removing all the rain variables and then just including one rain variable at a time is shown in Table B.3. Minimum temperature is in Table B.4, maximum temperature in Table B.5, mean temperature in Table B.6, and pressure in Table B.7. The results from this analysis show that many of the variables are highly influential to the model's success. Weather variables in particular, when all were removed, dropped the model success the most. Weather is a somewhat surprising variable to influence the amount of damage, but there could be several explanations for this. It could be correlated to season; seasons will be separated later on for individual modeling. It could be that high amounts of rain also cause flooding, which increases the damage. Rain could also be an analogue to the amount of land that is displaced. High temperatures could also cause snow to melt, increasing the water present in the ground. The most influential anthropogenic variable was the number of roads in 2 km. Roads are one of the most common element damaged by landslides and, as Switzerland has many areas with mountainous terrain, many of the roads are built into slopes. More roads in an area increases the likelihood that one or many of them may be hit and damaged. 60 Pearson Correlation Many of the variables have high correlation amongst each other, particularly many of the weather variables. Two highly correlated variables may both seem important to classifying data, but they may be redundant if the correlation is high. Correlation measurements were explored using a Pearson correlation, which is a measurement of the linear dependence of two variables. The Pearson statistic is calculated by: Px, = covariance(X, Y) where a- is the standard deviation of the distribution, X and Y are the two distributions being compared, and p is the symbol commonly used to represent correlation. Figure 2-11 displays the correlation for all the anthropogenic variables. Figures 2-12, 2-14, and 2-13 correlate the atmospheric conditions. The axis labels refer to the variable, the numbers represent the days included in the measurement of the variable. Any correlation with an X through it signifies that an accepted significance level of .95 was not reached. The larger the circle, the higher the absolute value from the correlation. The highest value that can be reached is one, which occurs when one variable is the exact match to the one it is being compared to. Blue indicates positive correlation, red indicates negative correlation, an inverse relationship where one variable increase while the other decreases. Variable Significance Summary In summary, the Pearson correlation, sensitivity analysis, and KS-test were used to narrow the number of variables used in modeling. Each variable was ranked with how it had done on each test, based on the p-value of significance. The lower the ranking, the less significant. Figure 2-15 shows a plot of the combined measurement of the KS-test and sensitivity analysis rank. The lower the number, the more significant the variable is likely to be. Also considered was the correlation - if two variables were marked as significant in both tests, but they had high correlation, one was likely to 61 o (N cc0 M f- 00f 9 06* 9*4 10 2 40 ; ;eee 0.6 12000*@@ 0.4 @ 16000# 1900 0.2 0 17 -0.2 11 5 -0.4 -0.6 @ -0.8 . @ , 15 9 3 Figure 2-11: Correlation of all Anthropogenic Variables. The larger the circle, the higher the absolute value of the correlation. Blue indicates positive correlation, red indicates negative. An X indicates an insignificant correlation. Key is located to the right. 62 Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Variable Length Hwy in 0.12 km Length Hwy in 2 km Length Rd in 0.12 km Length Rd in 2 km Length RR in 0.12 km Length RR in 2 km No. Hwys in 0.12 km No. Hwys in 2 km No. Rds in 0.12 km No. Rds in 2 km No. RRs in 0.12 km No. RRs in 2 km Distance to Hwy Distance to Rd Distance to RR No. Building in 2 km No. Building in 0.12 km GDP Population Density ao- 10 150 . .. . . . . e 45 90 75 105 120 30 14 0.8 r~b~ g 0.6 040*4t*V60 00, A&0 00 I *1*,*',*-* * 0 0 0** *6'6 04 .0.2 . ' 0.2 -0.4 -08 180 e 3 e 1-7 -0.2 e ee e 0 2 29-58tecre 8-14 es e ict a Figre2-1:orelatio 15-21 e e pltol s * Xo e p i c -0.4set -06 Rai Vabes.Telagrtecceheihr 22-28X Figure 2-12: Correlation plot of all Rain Variables. The larger the circle, the higher the absolute value of the correlation. Blue indicates positive correlation, red indicates negative. An X indicates an insignificant correlation. 63 . ' 0.8 * *0 e0*0 0.6 04 >K@ A X * 0 * 0 * s 0.2 00 ** * eo *f~oeo -0.2 0j 00 - U 0.. ' 0*0 0 00 .0400 0.000.0.000 S**0* 0 *e -0.6 - 45 * ee.e 1-7e0 * 14 10 .0 0 120 e 105X 30 60 * * 180 0., *-0.4 XeX X e >< IO- I * 365 22-28 15-21 4 8-14 29-58 150 0I GO )(.M * 1 2 3X --- C 0, * to00 Figure 2-13: Correlation plot of all Pressure Variables. The larger the circle, the higher the absolute value of the correlation. Blue indicates positive correlation, red indicates negative. An X indicates an insignificant correlation. 64 9300 EEEEE EEE~EEEEE CO C, U0":: Ln~ EE EEEEEEEEEEEEEEEEEEEEEEEEEE EE EEE EEEE EE EEE EEE min45 min6O mean45 max60 max45 mean60 min3O mean30 max75 min75 max30 mean75 max90 min90 meanl4 mInl4 mean7 mean10 mean9 minlO meanl-7 maxi 4 mini -7 max29-58 min2 min29-58 mean4 mW maxIOS meanl5-21 min15-21 mea3 mInlO5 maxi 0 mean29-58 min8-14 min4 mW mean8-1 4 maxl-7 mean2 min22-28 mean22-28 max8-14 max15-21 mean105 min3 max22-28 meani mWn max12O max4 minl2O mini meanl20 max3 max2 maxi maxl5O miniSO meani 50 maxl8O mini 80 meani 80 max365 min365 mean365 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 Figure 2-14: Correlation plot of all Temperature Variables. The larger the circle, the higher the absolute value of the correlation. Blue indicates positive correlation, red indicates negative. An X indicates an insignificant correlation. 65 Ranking of Variables 300 " Sensitivity Analysis Rank 250 to - 200 " KS-test Rank 150 100 50 ~~~= I! =! _!2 EU~~~ 1! G~ G~ G cc o Pr 0. ffl 0nLn c ~ ~ M- Mp-i0r r 2 E E! V! V! E! 2!!!22 WWWE222!!! E2 -~~ WE cc W~EE W ccc =! 9 ni V V. E~'~ V VE J -I- iit E- M-E-wMM009 !2 2 2 0 > M z _j Selected Variables Figure 2-15: Combined rank of variables, from results of sensitivity analysis (red), and KS-test (blue). Rank determined by the test results, lowest number being the most significant variable. be removed. Figure 2-16 is a plot of total model success by the number of variables, fit with a regression curve. The success rate peaks near 75 variables, so the top 75 most important variables, as determined from the significance tests, were included in future modeling attempts. 2.4.2 Unbalanced Data When data is unbalanced, the model tends to favor the more populated class. Machine learning applications such as cancer detection, image analysis, and speech recognitions regularly confront this problem, and as such it has been a well-discussed issue in data mining literature (Chawla 2005). High damage slides tend to be much more rare than low damage landslides. The data in use contains 2473 instances of landslides, 218 of which are classified as high damage. Several methods were considered to re-sample the training data, includ- 66 Accuracy by Number of Variables 0 0 4 ** 0 0 * I a 0~ 0 - IN * - so 0 a 0 q* - 0.65 0 0 50 100 Number of Variables Figure 2-16: Model accuracy by number of variables, fit with a LOESS (Locally weighted scatterplot smoothing) curve. 67 ing oversampling, random undersampling, SMOTE, Tomek links, Condensed Nearest Neighbor, Edited Nearest Neighbor, Neighborhood Cleaning Rule, and One-Side Selection. Each of these will be explained in detail, and their success considered, in subsequent sections. All of these are available in and utilized from the R package "unbalanced" (Del Pozzolo 2014). Unbalanced data also requires consideration of its performance measures, as a measurement of total accuracy may not appropriately represent the cost of different errors. In this machine learning application, high damage landslides are more important to detect, so a reduction in majority class detection would be favored in return for a higher success rate in identifying the minority class. Though steps are taken to balance the training data, the test data remains unbalanced, as it would be in a reallife application. For this reason, several performance measures are presented for each modeling technique and refinement, including a confusion matrix, measurements of precision, f-value, and recall, and a Receiver Operating Characteristic (ROC) curve, and its corresponding measurement of the area under the curve (AUC). An ROC curve measures the trade-off between true positive and false positive error rates. The X-axis is percent false positive (%FP) = FP/(TN + FP). and the y-axis is percent true positive (%TP) = TP/(TP + FN). Ideally, a model would classify all positive examples positively, with no negative examples misclassified as positive. On an ROC curve, this is represented at the point (0,100). Y = X, the line where %TP= %FP, means that the results are no better than random guessing. The AUC can be used to compare relative success between classifiers and data balancing methods. Precision, recall, and f-measure are other measurements that are used to compare success among methods. Precision is the rate of correctly identified positives over all the samples classified as positive in the model. Recall is the proportion of true positives that were identified over all the positive slides. Accuracy is the rate of correct classification for both classes. Precision= 68 TP TP+FP Recall = F - TP TP+FN measure = (1 + 02) * precision* recall 32 * recall + precision Recall and precision are frequently a tradeoff. Improving one may hurt the other, since increasing a true positive may increase the rate of false positives. Improving recall while not hurting precision is one of the primary goals for balancing the training data. The F-measure combines precision and recall into one measurement of accuracy. Prior to balancing, 20% of the data is set aside as a test set. It remains unbalanced. 80% of the data is considered a training set and various methods, each described in more detail below, are used to balance it. To reduce uncertainty and random variation in each method, the data is re-balanced five times, and on each of those five rebalanced samples, the algorithm is applied ten times. The results are cumulative among the 50 times an algorithm is applied. A summary of the success of all the methods follows the individual discussions of each. Random Undersampling Undersampling is one of the most straightforward and commonly used methods for accommodating unbalanced data. It takes a random subset of majority class data that equals the size of the minority class. This method is at risk of not using some potentially important samples from the majority class, particularly when the minority class is significantly smaller (Liu et al. 2009). The results of this method are shown in Figure 2-17 and Table 2.1. Random Oversampling Oversampling replicates instances from the minority class at random until the two classes are equal in size. The potential issue with this method is that it may overfit because each sample that is replicated becomes unfairly weighted (Batista et al. 2004). The results of this method are shown in Figure 2-18 and Table 2.2 69 Undersampling ROC 0 2 2 0 Actual 1.0 0.8 0.6 04 0.2 0.0 False Positive Rate Figure 2-17: ROC curve, Undersampled data Table 2.1: Data High Low Accuracy Predicted High Low 27 16 133 318 0.169 0.952 Accuracy 0.628 0.705 0.698 Confusion Matrix, Undersampled Oversampling ROC 0 d - 2 Actual 1.0 0.8 0.6 0.4 0.2 0.0 False Positive Rate Figure 2-18: Data High Low Accuracy Predicted High Low 7 36 17 434 0.292 0.923 Accuracy 0.163 0.962 0.893 ROC curve, Oversampled Table 2.2: Confusion Matrix, Oversampled Data 70 SMOTE ROC Actual 1.0 0.8 0.2 0.4 06 Fadse Posdtive Rate 0.0ow4 High Lw4 Accuracy Figure 2-19: ROC curve, SMOTE Predicted High Low 29 14 0 48 0.246 0.934 Accuracy 0.326 .0 0.0 0.854 Table 2.3: Confusion Matrix, SMOTE SMOTE SMOTE (Synthetic- Minority Oversampling Technique) is a method which combines random undersampling with an oversampling technique that creates synthetic minority class instances. In this application, the method doubles the number of minority class samples, and the majority class is double the number of synthetic samples. The creators of this method found that if offered improvements over oversampling with replacement because of a lower likelihood for an algorithm to overfit the minority class data. New samples are created by identifying a sample's nearest neighbor, taking the difference between those two vectors, multiplying the distance by some random number between 0 and 1, and adding it to the sample (Chawla 2002). The decision region is then forced to be less specific than under random undersampling. The results of this method are shown in Figure 2-19 and Table 2.3 Condensed Nearest Neighbor The following several methods differ from the first three in that they aim to preprocess the dataset by removing samples which may interfere with the creation of a clear boundary, under some application that uses a sample's nearest neighbor. This is a more focused way of balancing a dataset, but in this application has generally 71 Condensed Nearest Neighbor ROC 2 o 1.0 0.8 0.2 0.4 0.6 False Positive Rate Actual 0.0 Figure 2-20: ROC curve, CNN High Low Accuracy Predicted High Low 8 35 443 8 0.500 0.927 Accuracy 0.186 0.982 0.913 Table 2.4: Confusion Matrix, CNN fallen short of creating a significant difference in the data or the success. The data largely remains very unbalanced. The first of these, the Condensed Nearest Neighbor rule is one such method. It identifies the samples that are unable to be classified using a one-nearest-neighbor rule (their nearest neighbor is from a different class) and removes them (Batista et al. 2004). The results of this method are shown in Figure 2-20 and Table 2.4 Edited Nearest Neighbor Edited Nearest Neighbor is similar to Condensed Nearest Neighbor in that it looks for majority points which differ from their neighbors, but it changes the criteria from one nearest-neighbor to removing all those points which differ from at least two of their three nearest neighbors (Batista et al. 2004). The results of this method are shown in Figure 2-21 and Table 2.5 Tomek Links A pair of points is considered a Tomek Link when the pair is of different classes, but no other point in the dataset is closer to either one. When used as an undersampling method, only the majority class samples are removed (Kotsiantis 2006). The results 72 Edited Nearest Neighbor ROC 2 Predicted High Low Actual 1.0 0.8 0.6 0.4 0.2 0.0 False Positive Rate High Low Accuracy Figure 2-21: ROC curve, ENN 8 3 0.727 Accuracy 35 448 0.186 0.928 0.923 0.993 Table 2.5: Confusion Matrix, ENN of this method are shown in Figure 2-22 and Table 2.6 Neighborhood Cleaning Rule Neighborhood cleaning rule acts similarly to edited nearest neighbor in that it removes majority samples whose three nearest neighbors are from the minority class. It also removes the nearest neighbors of minority class samples which fall into the majority class (Batista et al. 2004). The results of this method are shown in Figure 2-23 and Table 2.7 One-Side Selection One side selection combines two methods that were previously applied individually. It first processes data with the Tomek link method, and follows this with the Neighborhood Cleaning Rule (Batista et al. 2004). The results of this method are shown in Figure 2-24 and Table 2.8. Summary The summary values may not match the values presented in the confusion matrices above, because they are an average over many runs of the algorithm, instead of one 73 Tomek Links ROC 060 2 Actual 1.0 0.8 0.2 0.4 0.6 False Positive Rate 0.0 High Low Accuracy Predicted High Low 34 9 6 445 0.600 0.929 Accuracy 0.209 0.987 0.919 Table 2.6: Confusion Matrix, Tomek Links Figure 2-22: ROC curve, Tomek Links Neighborhood Cleaning Rule ROC 0 a-. 2 0 Actual 1.0 0.8 0.4 0.6 0.2 False Positive Rate 0.0 High Low Accuracy Predicted High Low 11 32 6 445 0.647 0.933 Accuracy 0.256 0.987 0.923 Table 2.7: Confusion Matrix, NCR Figure 2-23: ROC curve, NCR 74 One-Side Selection ROC 0 2 Predicted High Low Actual 0 1.0 0.8 06 0.4 0.2 High Low 0.0 9 Accuracy False Positive Rate Figure 2-24: ROC curve, OSS 10 0.474 Accuracy 34 0.209 441 0.978 0.928 0.911 Table 2.8: Confusion Matrix, OSS sample of the model. Each measurement is made on the same test set. Table 2-25 is a summary of the results. Accuracy is high for many of the methods, but this number is misleading because of how poorly most of the methods perform in identifying high damage slides, as seen in the measurement of recall. The purpose of these balancing methods was to raise the success of classifying high damage slides, so precision, recall, and AUC should be considered before total accuracy. The AUC measurement tells a different story than the accuracy - some of the highest accu- racy rates have the lowest AUC measurements. Moving forward, Undersampling and SMOTE and both retained as methods of balancing the testing data. Undersampling produces the highest recall success, meaning it has the highest amount of high damage slides classified as high. It also has the third highest AUC score. SMOTE is also retained because of its high AUC curve and high recall. Any further refinement and runs of the algorithm are performed with both SMOTE and undersampling the test data. 75 Method No Balancing Oversampling CNN ENN NCR OSS SMOTE Tomek Links Undersampling High Damage Training Low Damage Training Samples Samples 174 1804 174 174 174 174 522 174 174 1804 1804 1800 1800 1507 1675 696 1673 174 Accuracy AUC 0.911 0.898 0.911 0.911 0.908 0.909 0.833 0.909 0.686 0.601 0.6806 0.600 0.600 0.601 0.604 0.635 0.603 0.620 Precision Recall F-measure 0.447 0.324 0.438 0.439 0.385 0.414 0.180 0.417 0.133 0.157 0.210 0.154 0.155 0.159 0.184 0.213 0.186 0.208 0.095 0.155 0.093 0.094 0.100 0.119 0.261 0.120 0.473 Figure 2-25: Results of balancing training data 2.4.3 Voting Mean Votes and Number of Votes When a random forest function votes on each sample, it produces creates a vote between 1 and 0. This is a measurement of the percentage of trees which vote for the classification to be in the high damage category. At default, each vote above 0.5 is classified as high damage, and each below that value is classified as low damage. The closer the vote is to the extremes of 0 and 1, the more certain the model is of its decision. The model's success is best for those samples it votes on either end of the spectrum, as can be seen in Table 2.9 and 2.10. Figure 2-26 is the density plot of mean votes produced by balancing the data using SMOTE, and Figure 2-27 is a density plot of the mean votes produced by undersampling the training data. Under different runs of the algorithm, the same sample could produce multiple different votes, which explains why the total number of samples that were voted on is larger than the total size of the test set, and also offers another way to potentially improve classifier performance. Over many runs of the same algorithm, the results were gathered. In an attempt to improve performance, the final classification was considered for both the mean vote over all algorithms and the number of votes for high vs. low classification. Figures 2-29 and 2-28 shows the variance of votes for all 76 Vote 1.0-0.9 0.9-0.8 0.8-0.7 0.7-0.6 0.6-0.5 0.5-0.4 0.4-0.3 0.3-0.2 0.2-0.1 0.1-0.0 High Damage Samples 176 127 107 131 164 323 348 513 230 31 Low Damage Samples 65 149 324 441 767 1873 3918 6743 6044 2226 Accuracy 0.7302905 0.4601449 0.2482599 0.2290210 0.1761547 0.8529144 0.9184248 0.9292999 0.9633408 0.9862650 Table 2.9: Votes produced when data is balanced with SMOTE. Density Plot of Mean Votes, SMOTE Results C4J U, _ 0 _ C C; 0.0 0.2 0.6 0.4 0.8 1.0 Mean Vote Figure 2-26: Density plot of the voting results, with SMOTE data.The red line shows high damage slides, the green line shows low damage instances. 77 Vote 1.0-0.9 0.9-0.8 0.8-0.7 0.7-0.6 0.6-0.5 0.5-0.4 0.4-0.3 0.3-0.2 0.2-0.1 0.1-0.0 High Damage Samples 388 491 803 1480 3582 409 258 126 25 20 Low Damage Samples 378 154 142 273 365 5908 5492 2868 1303 235 Accuracy 0.4934726 0.2387597 0.1502646 0.1557330 0.0924753 0.9352541 0.9551304 0.9579158 0.9811747 0.9215686 Table 2.10: Votes produced when training data is undersampled. Density Plot of Mean Votes, Undersampled Results 0_ Nl 0.0 0.2 0.6 0.4 0.8 1.0 Mean Vote Figure 2-27: Density plot of the voting results, with undersampled data. The red line shows high damage slides, the green line shows low damage instances. 78 Variance of Votes 0.00 0.05 0.10 0.15 Vadance Figure 2-28: Variance of voting results for data that has been balanced with SMOTE the samples. The results were relatively stable because the variance was very low in the majority of cases. A classification was determined based on the results of many runs of random forest. The same random forest algorithm is run many times, enough so that all of the instances make it to the test set, which creates a distribution of votes for each instance. The results below show a summary of results for each method. in both cases, recall was improved by using the mean or number of votes. However, the more that recall improved, the lower precision became. The f-measure was improved in both cases, but more so in the case of using the mean vote. 79 Variance of Votes CD, 0.00 0.02 0.06 0.04 0.08 0.10 0.12 Variance Figure 2-29: Variance of voting results for data that has been undersampled Method Accuracy Precision Recall Undersampled, Vote Undersampled, Mean Vote Undersampled, Number of Votes SMOTE, Vote SMOTE, Mean Vote SMOTE, Number of Votes 0.686 0.719 0.685 0.833 0.866 0.853 0.133 0.176 0.161 0.180 0.288 0.266 0.473 0.596 0.610 0.261 0.353 0.381 Figure 2-30: Summary of voting techniques 80 Fmeasure 0.208 0.272 0.254 0.213 0.318 0.313 larg ti* Legend winter slie lations * small summnr slide locations * large * smal Figure 2-31: Swiss landslides separated by season. 2.4.4 Separation into Seasons Many of the rain variables proved to be influential to the success of the model, as shown in the sensitivity testing. Rain patterns vary significantly between seasons, as also shown in Chapter 1, when using the analogue method to examine atmospheric conditions showed a marked difference in the extremity and extent of the atmospheric variables. A KS-test done between the winter and summer slides confirmed the difference in weather conditions between slides from each season. For these reasons, modeling was attempted on each of the seasons separately. The most effective data refinements, as discussed above, were also followed, meaning that the variables were narrowed to only those deemed to be significant, SMOTE and random undersampling were both considered for balancing the data, and the mean vote over many models was used. Figure 2-31 shows a map of the winter and summer landslide locations, separated by season. Table 2-32 shows the results when the seasons are modeled individually. Modeling the summer landslides in particular had high success rate. Some of the 81 Season DJF DJF MAM MAM JJA JJA SON SON No. of Instances 345 345 661 661 1141 1141 326 326 Method Accuracy AUC Precision Recall Fmeasure SMOTE Undersampled SMOTE Undersampled SMOTE Undersampled SMOTE Undersampled 0.8126 0.5699 0.8097 0.5718 0.8958 0.7403 0.8522 0.6910 0.5474 0.5557 0.5804 0.5705 0.7276 0.7912 0.6974 0.7074 0.1039 0.0918 0.1284 0.0997 0.3143 0.2108 0.2323 0.1456 0.2080 0.5550 0.2186 0.5105 0.4683 0.6591 0.3870 0.6070 0.1386 0.1576 0.1618 0.1668 0.3762 0.3195 0.2903 0.2349 Figure 2-32: Summary of season-separated results other seasons may not have had enough high damage slides to produce a successful model. The summer has almost double the number of slides as any other season. The primary variable that made this season different from the rest was the amount of rain. 2.5 Results Without tuning, the model had a base success of 91.1%, but an AUC of 0.601, precision of 0.447, recall of 0.095, and F-measure of 0.157. Without any refinement, it performed very well at identifying low damage slides, but this is primarily because most of the samples were low damage samples. The high value for precision shows that, of the slides it classified as high damage, a high fraction were actually high damage, but the very low value for recall shows how few high damage slides were identified correctly by the algorithm. Tuning was able to significantly improve results, particularly in the model's ability to recognize high damage slides. See Table 2-33 for the these results. The best results were reached when the number of variables was reduced, based on a combination of KS-test results, sensitivity analysis, and Pearson Correlation, to remove variables which were mostly noise. The data underwent a number of tests to accommodate the unbalanced data which included balancing the training data using SMOTE, undersampling, oversampling, condensed 82 Method SMOTE Undersampled Accuracy 0.8958 0.7403 AUC 0.72764 0.7912 Precision 0.3143 0.21080 Recall 0.4683 0.6591 F-measure 0.3762 0.3195 Figure 2-33: Summary of best results on JJA slides. Tuning methods include separating the seasons, reducing the variables, using the mean vote, and balancing the training data. nearest neighbors, edited nearest neighbors, neighborhood cleaning rule, one-sided selection, and Tomek links. The most successful approaches to balancing the training data were SMOTE and undersampling, because they were able to properly emphasize the minority class without causing the model to overfit, as determined from measurements of recall and the area under a ROC curve (AUC). For stability over many random forest classifications, the mean vote and the number of votes for each class were considered as a final classification technique. The mean vote improved classification success more significantly than did the number of votes. Separating the seasons improved success for the most populated season, but not for the others. Best success is achieved with just the JJA slides, reducing the number of variables to 75, using the mean vote, either undersampling the majority class or populating the minority class with SMOTE. With all of this, the success reaches the levels shown below, in the summary Table 2-33 The two different methods have different strengths. The recall is higher when using undersampled data, which means that it correctly identifies more high damage slides, though at the cost of more false positives. SMOTE classifies fewer slides correctly as high damage, but the precision is higher, meaning there are fewer false positives. The approach that would be chosen depends on the application. If using a precautionary approach,high recall and low precision is preferable. In this case one is more cautious and would rather regard more slides as potentially high damage to protect an area from damage, but at the cost of unnecessarily protecting against some low damage slides. False positives and false negatives are a persistent problem in any modeling method and in any real world application of models, but the success that was achieved 83 means that this model is a tool which has potential for real world use. This method provides a convenient, computationally efficient means of recognizing high damage landslides, which can be applied to landslide activity in the future. Vulnerability is a complicated measurement, and this method relieves the pressure of advanced calculations and site-specific studies. 2.6 Discussion and Future Work The goal of this study was to prove that machine learning is a technique which can improve upon vulnerability measurements for landslides. Random forest algorithm was trained, tested, and refined on a large collection of slide instances. Validation of the model, a process which is frequently neglected in landslide prediction studies, provided valuable measures of success (Chung and Fabbri 2002). Historical databases of landslide occurrence and damage have thus proven useful not only for predicting the occurrence of a landslide, but also the impact of a slide once it happens. High damage landslides have been shown to be caused by a combination of anthropogenic and temporal weather variables. This data that was used is all public access and widely available in many areas around the world, so the method offers practical applications in a variety of circumstances and places. To improve upon the study, and as recommendations for future work, several other types of information would have been helpful, but were not included largely for issues of data availability. The greater the amount of detail available about each slide, the more robust a modeling attempt can be (van Westen et al. 2008). Information about previous landslides that happened on the same sites, whether any construction has been done to stabilize the slope, or information about indirect damage are just a few things that could have enhanced this modeling. Other characteristics of a slide would have been helpful, such as the volume or the speed, as it is predicted that slides that travel more quickly and move more volume have a higher damage potential, given where they are situated (van Westen et al. 2006). The most difficult data to obtain is the one which is most essential, historic 84 databases of slide occurrence. Data quality is one of the most significant problems facing this sort of analysis, or any analysis of landslides (van Westen et al. 2008). Substantial historic datasets with reliable and consistent information are difficult to find. Modeling was first attempted on a dataset from Oregon, but due to data quality issues, many of the results are incomplete. The dates recorded for the landslides were found to be unreliable, which makes many of the characteristics included in the feature vectors difficult to compute, such as weather patterns, population, or other date-dependent information. Dates can be unreliable if there is uncertainty about when a landslide happened, and this is likely for many methods of data gathering. To link temporal features and triggering factors to landslides, the landslides need to either be dated individually or through an inventory of specific event-based occurrences (van Westen et al. 2008). This is mostly compatible with methods such as surveys or news reports. In Oregon, the inventory is a compilation of many original sources with no standardization between individual instances. The certainty of dates is unclear, and there is high variability in the precision of the dates. Without temporal information, a complete analysis of the sort done in Switzerland is unable to be performed. See Appendix A for a further discussion of damage modeling in Oregon, and more specificity regarding data quality. Additionally, this analysis has the potential to be expanded to other areas. Most of the available data is from high-income countries, but these countries rarely incur extremely serious damage from landslides. The developing world is at great risk for severe damage and large losses of life because of settlement and construction patterns, deforestation, and various other human-influenced features. Because we were unable to find sufficient available data from the developing world with which to test the results, the model can not be said to be applicable to significantly different regions. For example, the cause of the highest damage landslides in the developed world are frequently those along roads, as construction codes prohibit the building of residential areas on extreme slopes. Therefore, there are few examples of landslides in highly urbanized areas, and for those that do occur, the damage is generally low. In contrast, many of the most severe landslides in the developing world have occurred in densely 85 populated city regions. In recent years there was a 2008 landslide in Cairo which killed 119 people and a 2010 slide in Rio which killed over 200. Over 125,000 people live in landslide-prone slums in Mumbai (Kamath 2013). Including expected climate change is another future application of this work, which would analytically tie the two studies done in this thesis together. If a model can estimate high damage landslides with weather as one of the predictive features, it could use the expected intensity of future precipitation events as an input for prediction of landslide damage. 86 Chapter 3 Using Landslide Risk Models in a Policy Context: Best Practice and Recommendations 3.1 Rationale for Using a Model of Vulnerability 36 of the states in the United States have areas of moderate to high severity landslide hazard (Spiker and Gori 2003). With climate change and population increases that cause expanding development and infrastructure onto unstable hillsides, landslide loss is expected to increase in the future. The United States is hardly alone in expecting this trend. The USGS has recommended that this can be curbed through increased hazard mapping and response (Spiker and Gori 2003). Predicting disasters allows for proper preparation, evacuation if possible, best practice urban and transportation planning, and preservation of life and property. Small scale disasters such as landslides offer challenges and uncertainty in modeling, and also often receive less attention that larger disasters because of the relative scale of damage for a severe event (Rice 1985). However, several areas around the world do have warning systems in place, based on hazard models and historic data. Hong Kong, for example, has a landslide inventory of over 40 years (van Westen et al. 2008). Since 1984, they have had a warning system 87 in place which monitors rainfall data and issues a warning if conditions are such that numerous landslides are expected (Chau et al. 2004). This section discusses the challenges that still remain to implementing the modeling technique here presented and the benefits that it would provide to both private and public sectors. It also offers recommendations and an analysis of the models discussed in this paper, should they be used in a real-world application. 3.2 Potential Uses Currently, landslides are under-recognized as a serious hazard. They can be concurrent with other, larger-scale disasters such as flooding, extreme rain, or earthquakes. Small landslides are frequent and larger landslides more rare, but the cost of frequent landslides adds up to the point where annually, millions of dollars of damage can be incurred. Landslides cause an estimated $1-2 billion dollars of economic damage annually in the United States (Dai et al. 2001). Landslides also frequently strike the same area twice, so the same road or other urban feature can have recurring problems that may seem to be a minimal threat initially, but become serious very quickly. When building in areas prone to landslide activity,other priorities may take preference over the threat of landslides. For transportation networks, the most direct way to plan a route may be to cut a road into the side of a hill. There may be cost factors that go into building on a hillside - it may be cheaper to build in a particular area, but the costs of future damage go unconsidered. Vacation homes are built on coastal slopes that are prone to failing, but the view is excellent while the house lasts. Entire neighborhoods have been washed away after one severe event. Countering these and many other challenges is important to public safety and reducing the costs of landslides, both private and public. In the private sphere, landslides are generally not covered by standard homeowner's insurance, and specialty insurance is very expensive in the places where it is available (Schuster and Highland 2007). For this reason, many who are at risk for landslides have no financial recourse if one does happen. The modeling technique 88 presented here is designed to recognize high risk areas on a small scale, which is useful for homeowners trying to measure their risk and insurance companies attempting to accurately evaluate risk. The model could have an interface where one could look up their home address and identify their risk. For example, the interface for the Statewide Landslide Information Database for Oregon (SLIDO) already has a a feature where an address can be searched for, and historical landslides nearby can be found. As it currently stands, the interface only contains information about historic occurrences, but the likelihood of landslide reactivation, or a risk evaluation based on a combination of vulnerability and hazard models would allow homeowners to evaluate their personal level of risk and make an informed decision about managing financial loss and recognizing when an insurance policy is advisable. Public responsibility for hazard management affects many policy spheres, such as setting building codes, transportation planning, and hazard mitigation. Many approaches can be considered as methods to reducing landslide impact, including restricting development in hazardous areas, construction codes, geo-engineering prevention and control measures (such as drainage and supportive structures), and warning systems (Dai et al. 2001). This model, and the indicators of vulnerability that it identifies, can be used as one tool to recognize priority areas for policy intervention (Birkmann 2007). All areas which are prone to landslide activity are not a priority. Many landslides have relatively low damage, and the costs of pre-event mitigation may outweigh any benefit gained. Risk thresholds are determined by what individual and societal priorities are willing to accept (Dai et al. 2001). A detailed cost-benefit analysis for mitigation versus post-event response is outside the scope of this work, but the locations expected to be high damage are likely to have the highest value return (in terms of loss avoided) for mitigation investment. Increased frequency and severity of events with climate change may make mitigation only more important in the coming years. Preparing for the future by evaluating risk based on many different Investment in engineering solutions may be necessary, particularly to protect areas that may not have been at as great of risk in the past. The modeling technique presented here is easily adaptable to future scenarios, once 89 it has determined the function that relates causal factors to damage incurred. 3.3 Model Application Using machine learning to create a model of landslide damage is a method that is broadly applicable over many test sites. In creating a model for a new area, the same process that was followed in PartII would be used, from creating the feature vectors to refining the algorithm. A user would have to have information about all of the important variables as determined by this model, and preferably more if available. All the data for feature variables that was accessed for this research is publicly available and straightforward to analyze using software such as GIS. Even if certain areas do not maintain good data on features such as population density or land cover, this information is largely available from global satellite data. Transportation network data is frequently available from private sources, such as companies which provide maps for GPS systems. In order to maintain accuracy, data should be available in short intervals. In creating feature vectors for a landslide, the more years that separate the feature data and the landslide date, the higher the likelihood that it is inaccurate and will yield incorrect results. The one uncompromisable set of data is a database of historical landslides, which must include date, location, and damage caused by each landslide entry. The most difficult piece of the data to obtain is the most essential part. This is the most substantial challenge to overcome before vulnerability modeling such as the sort developed here can be implemented accurately and in many areas. Many studies have called for a standardized practice in recording landslide occurrence, and increased frequency of this recording (van Westen et al. 2008). The need for a centralized agency or group responsible for collecting this data is apparent, though many places do not have one or recording is only prioritized for high landslide frequency time periods, such as after a particularly bad rainstorm. Additionally, in public records slides that impact private property largely go unreported because of the lack of insurance coverage, or the relatively small scale and high frequency nature of these events. Several 90 methods are available for creating landslide inventories, such as aerial photography, field mapping, and newspaper archives. One of the most useful methods for analyzing landslide vulnerability are inventories of new reports, such as the Swiss Flood and Landslide Database which was used here, which are very likely to have a recording of damage, date, and location. However, they are less likely to contain technical information such as volume, failure mechanism, and landslide type. Inventories made from remote sensing, image interpretation, or field mapping are more likely to contain these characteristics, but at the cost of exact dating and records of damage. This modeling technique would be most effective when combined with a measurement of hazard to make a complete estimate of risk. Risk maps could be produced, though with the recognition that they are only valid for present conditions. Conditions affecting causal factors (such as weather patterns, which may be altered with climate change) and elements at risk (such as population density or the locations of buildings) are frequently changing, meaning that risk maps need to be updated and adapted as necessary. Landslide analysis is dominated by uncertainty, which can make proper application of model results difficult (Dai et al. 2001). The success that this model was able to achieve means that the results are subject to uncertainty, particularly in prediction of high damage slides. Further refinements and better data could reduce possibly increase confidence, but a certain degree of uncertainty is inevitable. A policymaker could choose to recognize the uncertainty and be proactive and precautionary in mitigation, which will likely have some financial cost, but will likely have later benefits in terms of structures and lives saved. 91 92 Appendix A Damage Modeling in Oregon As mentioned in Chapter II, Section 6, the first area studied was Oregon, a state which has a large centralized landslide database. Because of data quality issues, results were not strong, but the work still merits discussion. The goal of this analysis was the same as that for Switzerland, to evaluate areas for vulnerability to landslides based on probabilistic modeling of past damage records. A.0.1 Data The Oregon Department of Geology and Mineral Industries maintains a database of . landslides called SLIDO (Statewide Landslide Information Database for Oregon) The records are gathered from a variety of published sources, including Department of Transportation Records, published papers, and surveys taken after storms. Quality varies among these records, in the information they include. A subset of the slides include exact dates, and a further subset contains information about what damage was caused. Some have damage in an amount of money, others have a description of the event and its impact. Figure A-1 shows the location of all the slides in the dataset. Red indicates that there is information about the damage. Most of the damage records had an original source in the Oregon Department of Transportation (ODOT) and you can see that there are many areas where the slides follow along the roads. 93 Figure A-1: Location of slides in Oregon, mapped with the highway system. Green points do not contain records of damage, red points do. 94 Selected Damage, from ODOT measured Landslides and Rockfall 800 700 600 500 400 300 200 Z 100 Figure A-2: Type of damage caused by landslides in the SLIDO records Approximately 3400 landslides had an estimate of damage. About 600 of those were from sources other than ODOT. Figure A-2 shows some of the most common types of damage. Because of the inconsistency of damage records, several options were considered for ways in which to measure damage. A subset of slides containing damage information had a monetary estimate, but a larger number had descriptions. The monetary estimates were also a biased set - they were largely ODOT sources, meaning that all of the damage impacted road features, none of them caused damage to other features like buildings or agriculture. Direct and indirect damage were also included in many cases. Figure A-3 shows the miles of detours caused by landslides. An estimate of indirect loss can be calculated from this measurement, because it gives an estimate of time lost in travel, which impacts economic activity. Intensity scales are a common way of measuring impact from natural disasters. The most recognizable ones are the Mercalli scale of earthquake intensity, and the 95 Landslides Causing Detour, by Miles of Detour 140 100 - 120 S80 0 E 40 20 20 0-3 10-30 30-60 > 60 Detour in Miles Figure A-3: Length of detours caused by Oregon landslides. Saffir-Simpson hurricane scale (Wood et al. 1931) (Simpson and Saffir 1974). Intensity scales provide a simple, easy to communicate estimate of damage. Scales have been proposed for use with landslides, yet none is widely used. Pettruci developed a scale, called the Support Analysis Framework, which was applied to areas in Italy (2009). The scale determines a normalized score for damage, based on separating a description of the damage into categories of elements, and how much damage was sustained by each element on a scale from 0-1 (generally none-complete destruction). Damage is added from all categories, and normalized according to the maximum amount of damage that could be sustained over all categories. This scale was applied to descriptions of damage in Oregon. Some sample descriptions are "3 homes isolated," "Creek bed filled in with slide material, washed out part of roadway and shoulder, flooding out field," and "Roaring River Bridge - One lane traffic due to temp one lane bridge." A code was written to process text descriptions by identifying key words, but it proved difficult to apply and required manual sorting because of low accuracy. 96 1800 7- 1600. U 1400. U 1200 1000'.. 800 E 3 600 400 0 0 0.01 0.02 - - 200 0.03 0.04 0.05 0.06 Normalized Direct Damage Figure A-4: Distribution of values of direct damage for Oregon slides, measured on an intensity scale. This method is not a quick way to sort slides, making it unfeasible for applications with large datasets, unless an intensity scale were used on initial recording of the slide, or if a monetary estimate of damage was unobtainable. Figure A-4 shows the distribution of direct damage when classifying all the slides. Figure A-5 shows the total amount of damage, both direct and indirect included. The peaks correspond to the levels of road damage, which is still where most of the damage occurs, because the majority source of records is still ODOT. Instead of using the intensity scale for the modeling application, the set of modeled slides was narrowed to only the slides with a monetary record of damage. The slides with a monetary amount of damage had a distinct pattern when plotted on a log scale. Figure A-6 shows the distribution. No explanation in the data could be found for this distribution, such as all slides with one type of damage being sorted into one peak. This provided for a convenient way to separate the data by peak into low and high damage categories. Based on this distribution, Figure A-7 shows the landslides 97 1800 1600 1400 1200 1000 -o 800 z 400 200 0 0 M wml 0.005 0.01 - M 0.015 0.02 0.025 0.03 Normalized Total Damage 0.035 0.04 0.045 Figure A-5: Distribution of total damage for Oregon slides, measured on an intensity scale. 98 35 30 25 10 S15. 10 5. 4 6 8 10 12 Log(damages) 14 16 18 Figure A-6: Distribution of dollar amount of Oregon landslide damage, plotted on a logarithmic scale. mapped by damage. Red points are high damage landslides and green points are low damage. The final decision was to use those slides with a dollar estimate of damage, because using an intensity scale is subject to manual sorting error and the dollar estimate had a clear division for classification. 1166 slides were included in this set. Feature vectors were created for all landslides that were initially considered to have reliable dates and a dollar estimate of damage. The variables included were largely the same as those considered for Switzerland, including anthropogenic variables and weather variables. The source of population density was the same as for Switzerland. Figure A-9 shows population density, which came from NASA's Socioeconomic Data and Applications Center. This dataset was discussed in a previous section.The red line is the high damage slides, the blue line is the low damage. Figure A-8 shows the distribution of land cover over all the slides. Land cover came from the National Land Cover Database, which classifies land over the entire United States into 16 different land 99 Figure A-7: Location of slides in Oregon, separated by amount of damage, as determined in Figure A-6. 100 Oregon Landslide Land Cover Open Water Developed High Intensity Cultivated Crops Pasture Mixed Forest Shrub Woody Wetlands Emergent Herbaceous Wetlands Barren Land Grassland Developed Medium Intensity p I I U Developed Low Intensity - Developed Open Space 0 100 200 300 - - Evergreen Forest 400 500 600 700 Figure A-8: Distribution of land cover for Oregon slides. cover categories at a 30 m resolution (Homer et al. 2007). Transportation network data was drawn from the Oregon Department of Transportation, accessed through the Oregon Spatial Data Library. The distribution of distance to the nearest highway for all slides is shown in Figure A-10. This confirms that most damage has taken place very near a highway. Rain was initially considered as part of the feature vectors, but the models performed better without it. Over a 10% improvement in success was recorded when all the rain variables were removed. Looking at the details of the rain, over 500 slides were without rain in the past week. As rain is the most common trigger of landslides, this looked unusual and as such the dates in the dataset could not be treated as reliable. Because the slides were largely from original ODOT records, one possible explanation is that the dates reported are the dates when a repair occurred, not the date on which the landslide happened. Earthquakes are another trigger of landslides, but there was no correlation between these two events. 101 Distribution of Population Density over Oregon Landslides C) CD ci, C S 0 CD a 0 05 0 500 1000 1500 2000 Population Density Figure A-9: Distribution of population density for Oregon slides. The red line is high damage slides, the blue line is low damage slides. 0 CD - Distribution of Distance to Nearest Highway over Oregon Landslides 0 0 Cv, 0 CJ 0 0 0 2 4 6 8 10 12 Distance to Nearest Highway (km) Figure A-10: Distribution of distance to nearest highway for Oregon slides. The red line is high damage slides, the blue line is low damage slides. 102 A.0.2 Modeling Many of the same techniques that were used in modeling the data from Switzerland were also used on the Oregon data. Random forest was used as the modeling algorithm and the training data was similarly balanced by undersampling the majority class. Weather was removed from the modeling due to distrusting the dates of the landslides. The final list of variables considered is: 1. Population Density 2. Type of Land Movement 3. Land Cover 4. Distance to Nearest[Road, Highway, Railroad] 5. Length of [Road, Highway, Railroad] in [0.12,2] km 6. Number of [Roads, Highways, Railroads in [0.12, 2] km 7. Elevation [at point, minimum, maximum, mean] 8. Top [5, 10, 25, 50]% Gradient [minimum, maximum, mean, standard deviation] All algorithms were run on data that had been sorted into 80% training, 20% testing. Any results presented are the results on the testing set. A.0.3 Results The best success achieved was 75-80% with balanced success between high and low damage slides. The top 10 most significant variables were, in decreasing levels of significance: 1. Number of Highways in 2 km 2. Length of Highway in 2 km 3. Distance to Nearest Railroad 103 4. Length of Highway in 0.12 km 5. Closest Highway 6. Number of Railroads in 2 km 7. Mean Top 50% Gradient 8. Length of Railroad in 2 km 9. Minimum Top 25% Gradient 10. Number of Roads in 0.12 km These results are mostly expected, as there is great homogeneity in the type of data presented in the dataset. The damage is largely highway damage, and the more highways there are in an area, the more likely the damage is to be high. This model has no temporal aspect, as rain was a poor indicator of damage, unlike with the results in Switzerland. Though the results may be expected, they present an efficient method of applying risk analysis over a large area. All of these features that impact the amount of landslide damage are generally readily available, frequently updated, and easy to extract using GIS software. When combined with a hazard map, this measurement of vulnerability could create a map of risk over a wide area. Risk maps at a small scale could in this way be computed with great efficiency and just a few inputs, making this a promising way for a policy maker or planner, who may have no particular expertise in this area, to avoid high risk areas. This case serves as an example explaining the need for high quality, consistent data with clear records about the shortcomings and uncertainty present. A temporal aspect to this analysis would have been preferable, but was unable to be performed because of data quality.. 104 Appendix B Variable Importance Results This appendix presents the results of variable testing, as discussed in Chapter II, Section 4.1. It includes results from a KS-Test, comparing the values of each variable between high and low damage slides. It also includes the results of sensitivity analysis, where each variable was removed from the model individually and the resulting increase or decrease in success was recorded. B.O.4 KS-Test results Table B.1: KS-test results for all continuous variables. Includes the test variable and a p-value of significance. Variable KS-Statistic P-Value 30 Day Rain 0.312 0.OOE+00 4 Day Rain 0.294 2.44E-15 Days 1-7 Rain 0.292 4.00E-15 45 Day Rain 0.284 2.58E-14 Days 22-28 Pressure 0.283 2.93E-14 3 Day Rain 0.271 3.80E-13 10 Day Rain 0.268 7.18E-13 Continued on next page 105 Table B.1 - continued from previous page Variable KS-Statistic P-Value 7 Day Rain 0.264 1.97E-12 14 Day Rain 0.261 3.56E-12 120 Day Rain 0.246 7.36E-11 150 Day Rain 0.245 8.48E-11 60 Day Rain 0.237 4.40E-10 45 Day Pressure 0.234 6.72E-10 30 Day Pressure 0.225 3.69E-09 75 Day Rain 0.224 4.44E-09 2 Day Rain 0.213 2.85E-08 90 Day Rain 0.213 3.01E-08 105 Day Rain 0.210 4.71E-08 180 Day Rain 0.203 1.44E-07 365 Day Min Temperature 0.186 2.29E-06 75 Day Pressure 0.184 3.OOE-06 365 Day Mean Temperature 0.179 5.53E-06 14 Day Pressure 0.177 7.90E-06 Days 22-28 Rain 0.177 8.04E-06 60 Day Pressure 0.172 1.46E-05 365 Day Max Temperature 0.169 2.24E-05 Days 15-21 Rain 0.166 3.34E-05 4 Day Max Temperature 0.166 3.39E-05 3 Day Max Temperature 0.158 9.67E-05 90 Day Pressure 0.154 1.55E-04 10 Day Pressure 0.152 2.08E-04 Days 22-28 Mean Temperature 0.151 2.29E-04 Days 8-14 Rain 0.148 3.23E-04 Continued on next page 106 Table B.1 - continued from previous page Variable KS-Statistic P-Value 365 Day Pressure 0.147 3.91E-04 105 Day Pressure 0.146 3.96E-04 Days 8-14 Pressure 0.145 4.88E-04 Days 28-58 Pressure 0.140 8.60E-04 2 Day Max Temperature 0.139 9.67E-04 Days 1-7 Pressure 0.138 1.03E-03 3 Day Min Temperature 0.137 1.12E-03 Day of Max Temperature 0.137 1.15E-03 Population 0.136 1.30E-03 Day of Rain 0.133 1.71E-03 365 Day Rain 0.132 1.91E-03 Days 1-7 Max Temperature 0.131 2.08E-03 10 Day Max Temperature 0.129 2.71E-03 2 Day Min Temperature 0.128 2.93E-03 Number of Highways in 2 km 0.127 3.16E-03 14 Day Max Temperature 0.127 3.39E-03 Days 1-7 Min Temperature 0.126 3.60E-03 4 Day Min Temperature 0.126 3.61E-03 7 Day Max Temperature 0.124 4.40E-03 10 Day Min Temperature 0.121 6.06E-03 Number of Buildings in 2 km 0.120 6.47E-03 Days 22-28 Min Temperature 0.120 6.59E-03 Days 8-14 Mean Temperature 0.118 7.87E-03 Days 29-58 Rain 0.117 8.44E-03 7 Day Pressure 0.116 9.25E-03 14 Day Min Temperature 0.115 0.011 Continued on next page 107 Table B.1 - continued from previous page KS-Statistic P-Value 4 Day Pressure 0.114 0.011 Days 15-21 Max Temperature 0.113 0.012 120 Day Pressure 0.113 0.013 7 Day Min Temperature 0.112 0.014 Day of Min Temperature 0.109 0.017 30 Day Max Temperature 0.107 0.021 Days 8-14 Min Temperature 0.106 0.022 180 Day Mean Temperature 0.106 0.023 Days 1-7 Mean Temperature 0.103 0.028 180 Day Pressure 0.102 0.033 14 Day Mean Temperature 0.101 0.036 2 Day Mean Temperature 0.100 0.037 Length of Road in 2 km 0.097 0.047 30 Day Mean Temperature 0.096 0.050 7 Day Mean Temperature 0.096 0.051 150 Day Pressure 0.096 0.051 Days 15-21 Pressure 0.096 0.051 Days 22-28 Max Temperature 0.096 0.053 30 Day Min Temperature 0.095 0.054 Days 15-21 Min Temperature 0.095 0.056 10 Day Mean Temperature 0.095 0.057 Length of Highway in 2 km 0.094 0.058 180 Day Min Temperature 0.094 0.061 Days 8-14 Max Temperature 0.093 0.066 Distance to Nearest Road 0.092 0.067 3 Day Pressure 0.092 0.069 Variable Continued on next page 108 Table B.1 - continued from previous page Variable KS-Statistic P-Value 180 Day Max Temperature 0.089 0.088 2 Day Pressure 0.086 0.105 45 Day Min Temperature 0.085 0.117 Days 15-21 Mean Temperature 0.083 0.126 Day of Mean Temperature 0.083 0.128 3 Day Mean Temperature 0.083 0.131 45 Day Max Temperature 0.081 0.145 4 Day Mean Temperature 0.080 0.158 Number of Roads in 0.12 km 0.080 0.160 Number of Roads in 2 km 0.080 0.160 45 Day Mean Temperature 0.079 0.165 Mean Top 5% Gradient 0.079 0.172 120 Day Mean Temperature 0.078 0.174 105 Day Max Temperature 0.078 0.182 Days 29-58 Mean Temperature 0.077 0.184 90 Day Min Temperature 0.076 0.201 90 Day Mean Temperature 0.075 0.219 Days 29-58 Min Temperature 0.072 0.256 GDP 0.072 0.257 120 Day Max Temperature 0.071 0.264 105 Day Mean Temperature 0.071 0.267 150 Day Max Temperature 0.070 0.291 120 Day Min Temperature 0.069 0.303 Length of Road in 0.12 km 0.069 0.306 150 Day Mean Temperature 0.067 0.334 Length of Railroad in 2 km 0.067 0.336 Continued on next page 109 Table B.1 - continued from previous page KS-Statistic P-Value 90 Day Max Temperature 0.067 0.336 60 Day Max Temperature 0.067 0.339 105 Day Min Temperature 0.066 0.360 Days 29-58 Max Temperature 0.065 0.368 75 Day Max Temperature 0.063 0.415 60 Day Min Temperature 0.062 0.427 60 Day Mean Temperature 0.062 0.431 150 Day Min Temperature 0.061 0.440 75 Day Mean Temperature 0.061 0.454 Distance to Nearest Highway 0.060 0.461 Distance to Nearest Railroad 0.059 0.496 75 Day Min Temperature 0.058 0.522 Number of Buildings in 0.12 km 0.056 0.563 Day of Pressure 0.053 0.630 Number of Railoads in 2 km 0.029 0.996 Length of Railroad in 0.12 km 0.024 1.000 Length of Highway in 0.12 km 0.024 1.000 Number of Highways in 0.12 km 0.013 1.000 Number of Railoads in 0.12 km 0.007 1.000 Variable B.O.5 Sensitivity analysis results, removing one variable at a time. 110 Table B.2: KS-test results for all continuous variables Variable Mean Accuracy P-Value of T-Test No Variables Removed 0.6697 0 No Weather Variables 0.5523 1.04E-09 90 Day Pressure 0.6477 0.0354 2. Day Max Temperature 0.6488 0.0414 No Pressure Variables 0.6512 0.0462 Number of Roads in 2 km 0.6523 0.1276 No Rain Variables 0.6535 0.1145 150 Day Rain 0.6547 0.1085 105 Day Rain 0.6547 0.0748 Day of Pressure 0.6558 0.1559 120 Day Rain 0.6558 0.1433 120 Day Pressure 0.6558 0.2326 Day of Mean Temperature 0.6570 0.1246 180 Day Max Temperature 0.6570 0.1916 365 Day Pressure 0.6582 0.1952 2 Day Mean Temperature 0.6581 0.1800 Length of Railroad in 2 km 0.6593 0.3020 Days 29-58 Pressure 0.6593 0.2874 Days 15-21 Rain 0.6593 0.2874 2 Week Mean Temperature 0.6593 0.3020 2 Day Pressure 0.6593 0.2874 10 Day Rain 0.6593 0.3091 10 Day Max Temperature 0.6593 0.2948 Distance to Nearest Railroad 0.6605 0.3696 7 Day Rain 0.6605 0.3170 Continued on next page 111 Table B.2 - continued from previous page Mean Accuracy P-Value of T-Test 105 Day Pressure 0.6605 0.3556 Days 8-14 Min Temperature 0.6616 0.3507 Days 8-14 Max Temperature 0.6616 0.4599 2 Day Min Temperature 0.6616 0.4151 No Highway Variables 0.6628 0.4874 Mean Top 5% Gradient 0.6628 0.4514 Days 29-58 Min Temperature 0.6628 0.4266 Days 22-28 Min Temperature 0.6628 0.4434 Days 15-21 Pressure 0.6628 0.5232 Days 1-7 Max Temperature 0.6628 0.4807 90 Day Rain 0.6628 0.4514 75 Day Pressure 0.6628 0.5062 75 Day Min Temperature 0.6628 0.4266 60 Day Rain 0.6628 0.4434 365 Day Rain 0.6628 0.5336 365Day Mean Temperature 0.6628 0.4352 Population Density 0.6640 0.5703 Days 29-58 Rain 0.6640 0.5324 Days 22-28 Mean Temperature 0.6640 0.5524 Days 1-7 Mn Temperature 0.6640 0.5759 90 Day Min Temperature 0.6640 0.5524 4 Day Pressure 0.6640 0.5586 3 Day Pressure 0.6640 0.4845 150 Day Mean Temperature 0.6640 0.5393 Distance to Nearest Road 0.6651 0.6461 Days 15-21 Mean Temperature 0.6651 0.6509 Variable Continued on next page 112 Table B.2 - continued from previous page Variable Mean Accuracy P-Value of T-Test 45 Day Mean Temperature 0.6651 0.6247 180 Day Pressure 0.6651 0.6461 150 Day Min Temperature 0.6651 0.6358 105 Day Min Temperature 0.6651 0.6411 Number of Roads in 0.12 km 0.6663 0.6998 Days 22-28 Min Temperature 0.6663 0.7548 90 Day Max Temperature 0.6663 0.6767 75 Day Mean Temperature 0.6663 0.7189 45 Day Max Temperature 0.6663 0.7548 4 Day Max Temperature 0.6663 0.7231 120 Day Max Temperature 0.6663 0.6944 Number of Railroads in 0.12 km 0.6674 0.7974 Land Cover 0.6674 0.8075 Length of Road in 0.12 km 0.6674 0.8241 Length of Highway in 2 km 0.6674 0.8043 Days 8-14 Pressure 0.6674 0.8009 Days 8-14 Mean Temperature 0.6674 0.7856 Days 29-58 Mean Temperature 0.6674 0.7974 Days. 1-7 Mean Temperature 0.6674 0.8106 30 Day Min Temperature 0.6674 0.7898 3 Day Rain 0.6674 0.786 2 Week Min Temperature 0.6674 0.8164 10 Day Min Temperature 0.6674 0.8164 Day of Min Temperature 0.6686 0.9090 Day of Max Temperature 0.6686 0.9103 7 Day Pressure 0.6686 0.8997 Continued on next page 113 Table B.2 - continued from previous page Mean Accuracy P-Value of T-Test 45 Day Rain 0.6686 0.9115 45 Day Pressure 0.6686 0.8997 4 Day Mean Temperature 0.6686 0.9161 30 Day Rain 0.6686 0.8960 2 Day Rain 0.6686 0.9345 150 Day Max Temperature 0.6686 0.9115 10 Day Mean Temperature 0.6686 0.8918 Number of Railroads in 2 km 0.6698 1 No Mean Temperature Variables 0.6698 1 Days 22-28 Pressure 0.6698 1 Days 1-7 Pressure 0.6698 1 90 Day Mean Temperature 0.6698 1 75 Day Rain 0.6698 1 60 Day Min Temperature 0.6698 1 3 Day Min Temperature 0.6698 1 10 Day Pressure 0.6698 1 Number of Highways in 0.12 km 0.6709 0.8993 Number of Buildings in 2 km 0.6709 0.8975 Number of Buildings in 0.12 km 0.6709 0.8914 No Min Temperature Variables 0.6709 0.8914 75 Day Max Temperature 0.6709 0.9100 7 Day Mean Temperature 0.6709 0.9190 60 Day Pressure 0.6709 0.8956 30 Day Pressure 0.6709 0.9125 30 Day Max Temperature 0.6709 0.8891 3 Day Mean Temperature 0.6709 0.9073 Variable Continued on next page 114 Table B.2 - continued from previous page Variable Mean Accuracy P-Value of T-Test 150 Day Pressure 0.6709 0.8975 No Road Variables 0.6721 0.8124 Distance to Nearest Highway 0.6721 0.8030 Days 22-28 Rain 0.6721 0.821 4 Day Min Temperature 0.6721 0.8094 180 Day Rain 0.6721 0.7922 No Railroad Variables 0.6733 0.7499 No Max Temperature Variables 0.6733 0.7162 Days 29-58 Max Temperature 0.6733 0.6662 Days 15-21 Max Temperature 0.6733 0.6966 365 Day Min Temperature 0.6733 0.7116 180 Day Min Temperature 0.6733 0.6966 105 Day Mean Temperature 0.6733 0.7018 60 Day Mean Temperature 0.6744 0.6259 105 Day Max Temperature 0.6744 0.6008 7 Day Min Temperature 0.6756 0.5460 14 Day Max Temperature 0.6756 0.5324 120 Day Min Temperature 0.6756 0.5324 14 Day Rain 0.6767 0.4069 Length of Highway in 0.12 km 0.6779 0.4328 45 Day Min Temperature 0.6779 0.3733 30 Day Mean Temperature 0.6779 0.4049 14 Day Pressure 0.6779 0.4688 Number of Highways in 2 km 0.6791 0.2950 No Buildings 0.6791 0.4039 Length od Railroad in 0.12 km 0.6791 0.3283 Continued on next page 115 Table B.2 - continued from previous page Mean Accuracy P-Value of T-Test Days 15-21 Min Temperature 0.6791 0.2950 60 Day Max Temperature 0.6791 0.2863 4 Day Rain 0.6791 0.3855 180 Day Mean Temperature 0.6802 0.2658 120 Day Mean Temperature 0.6814 0.2472 GDP 0.6826 0.2714 Days 8-14 Rain 0.6826 0.1625 Day of Rain 0.6826 0.2423 7 Day Max Temperature 0.6826 0.1695 Length of Road in 2 km 0.6837 0.1660 3 Day Max Temperature 0.6837 0.1051 Days 1-7 Rain 0.6907 0.0346 365 day Max Temperature 0.6919 0.0158 No Anthropogenic Variables 0.6942 0.0157 No Temperature Variables 0.7081 0.0006 Variable B.O.6 Sensitivity analysis results, removing all of one type of variable, and adding in one variable individually. 116 Variable Days 8-14 Rain 14 Day Rain 2 Day Rain 150 Day Rain 90 Day Rain 7 Day Rain 4 Day Rain Days 1-7 Rain 60 Day Rain Days 15-21 Rain 75 Day Rain 120 Day Rain 10 Day Rain 105 Day Rain 3 Day Rain Days 29-58 Rain 45 Day Rain 180 Day Rain Days 22-28 Rain Day of Rain 365 Day Rain 30 Day Rain Mean Accuracy 0.6256 0.6419 0.6419 0.6465 0.6477 0.6488 0.6500 0.6512 0.6512 0.6523 0.6523 0.6523 0.6535 0.6558 0.6570 0.6581 0.6593 0.6593 0.6616 0.6628 0.6628 0.6733 P-Value of T-Test 0.0004 0.0109 0.0075 0.0176 0.0245 0.0252 0.0301 0.0872 0.0791 0.0651 0.0576 0.0954 0.1044 0.1371 0.2435 0.2400 0.3419 0.3655 0.3229 0.4352 0.4737 0.7434 Table B.3: Sensitivity Analysis with only one rain variable included. 117 Variable 30 Day Min Temperature 10 Day Min Temperature Days 1-7 Min Temperature 180 Day Min Temperature 90 Day Min Temperature 3 Day Min Temperature 75 Day Min Temperature 105 Day Min Temperature 2 Day Min Temperature 150 Day Min Temperature 60. Day Min Temperature 120. Day Min Temperature Days 22-28 Min Temperature 7 Day Min Temperature Day 8-14 Min Temperature Days 29-58 Min Temperature Day of Min Temperature 4 Day Min Temperature 365 Day Min Temperature 14 Day Min Temperature Days 15-21 Min Temperature 45 Day Min Temperature Mean Accuracy 0.6663 0.6663 0.6674 0.6674 0.6686 0.6686 0.6709 0.6721 0.6733 0.6733 0.6744 0.6744 0.6756 0.6756 0.6767 0.6767 0.6779 0.6779 0.6791 0.6837 0.6849 0.6884 P-Value of T-Test 0.7272 0.6888 0.8043 0.8428 0.8939 0.8979 0.9027 0.8030 0.69108 0.7116 0.6471 0.6200 0.5098 0.4933 0.4335 0.4249 0.4122 0.4393 0.3120 0.1163 0.1332 0.0627 Table B.4: Sensitivity Analysis with only one min temperature variable included. 118 Variable Days 15-21 Max Temperature 105 Day Max Temperature Days 8-14 Max Temperature 7. Day Max Temperature 60 Day Max Temperature 4 Day Max Temperature 180 Day Max Temperature 10 Day Max Temperature 75 Day Max Temperature 365 Day Max Temperature 3 Day Max Temperature 30 Day Max Temperature 2Day Max Temperature Days 22-28 Max Temperature 150. Day Max Temperature Day of Max Temperature 90 Day Max Temperature 45 Day Max Temperature 120 Day Max Temperature 14 Day Max Temperature Days 29-58 Max Temperature Days 1-7 Max Temperature Mean Accuracy 0.6628 0.6651 0.6674 0.6674 0.6674 0.6698 0.6698 0.6698 0.6709 0.6709 0.6733 0.6744 0.6744 0.6767 0.6767 0.6791 0.6791 0.6791 0.6791 0.6802 0.6826 0.6826 P-Value of T-Test 0.4514 0.5924 0.7937 0.7719 0.7898 1 1 1 0.8815 0.8914 0.7643 0.6075 0.6315 0.4861 0.4160 0.3438 0.3655 0.3655 0.3120 0.2736 0.1488 0.2105 Table B.5: Sensitivity Analysis with only one max temperature variable included. 119 Variable 30 Day Min Temperature 10 Day Min Temperature Days 1-7 Min Temperature 180 Day Min Temperature 90 Day Min Temperature 3 Day Min Temperature 75 Day Min Temperature 105 Day Min Temperature 2 Day Min Temperature 150 Day Min Temperature 60 Day Min Temperature 120 Day Min Temperature Days 22-28 Min Temperature 7 Day Min Temperature Days 8-14 Min Temperature Days 29-58 Min Temperature Day of Min Temperature 4 Day Min Temperature 365 Day Min Temperature 14 Day Min Temperature Days 15-21 Min Temperature 45 Day Min Temperature Mean Accuracy 0.6663 0.6663 0.6674 0.6674 0.6686 0.6686 0.6709 0.6721 0.6733 0.6733 0.6744 0.6744 0.6756 0.6756 0.6767 0.6767 0.6779 0.6779 0.6791 0.6837 0.6849 0.6884 P-Value of T-Test 0.7272 0.6888 0.8043 0.8428 0.8939 0.8979 0.9027 0.8030 0.6911 0.7116 0.6471 0.6200 0.5098 0.4933 0.4335 0.4249 0.4122 0.4393 0.3120 0.1163 0.1332 0.0627 Table B.6: Sensitivity Analysis with only one mean temperature variable included. 120 Variable 2 Day Pressure Days 8-14 Pressure Day of Pressure 14 Day Pressure 10 Day Pressure Days 29-58 Pressure 7 Day Pressure 180 Day Pressure 3 Day Pressure Days 22-28 Pressure 4 Day Pressure 30 Day Pressure 150 Day Pressure 105 Day Pressure 75 Day Pressure Days 15-21 Pressure Days 1-7 Pressure 365 Day Pressure 45 Day Pressure 60 Day Pressure 90 Day PRessure 120 Day PRessure Mean Accuracy 0.6326 0.6337 0.6372 0.6384 0.6395 0.6407 0.6407 0.6419 0.6430 0.6442 0.6477 0.6477 0.6500 0.6500 0.6512 0.6523 0.6523 0.6535 0.6547 0.6570 0.6640 0.6802 P-Value of T-Test 0.0072 0.0011 0.0023 0.0065 0.0034 0.0116 0.0137 0.0069 0.0258 0.0267 0.0265 0.0265 0.0722 0.0832 0.0597 0.0651 0.1138 0.1838 0.0978 0.1639 0.5586 0.2332 Table B.7: Sensitivity Analysis with only one pressure variable included. 121 122 Bibliography [1] Alexander, D. "Urban landslides." Progress in Physical Geography 13.2 (1989): 157-189. [2] Alexander, D. "Vulnerability to Landslides." Landslide Hazard and Risk. Wiley, Chichester (2005): 175-198. [3] Alexander, L. V., et al. "Global observed changes in daily climate extremes of temperature and precipitation." Journal of Geophysical Research: Atmospheres 111.D5 (2006): 1984-2012. [4] Batista, G.E., Prati, R.C., and Monard, M.C. "A study of the behavior of several methods for balancing machine learning training data." ACM Sigkdd Explorations 6.1 (2004): 20-29. [5] Berman, M. "Active Search Through Washington Landslide Debris Ends." Washington Post 29 April 2014: n. pag. [6] Bird, S., Klein, E., and Loper, E. Natural language processing with Python. O'Reilly Media, Inc.", 2009. [7] Birkmann, J. "Risk and vulnerability indicators at different scales: applicability, usefulness, and policy implications." Environmental Hazards 7.1 (2007): 20-31. [8] Bossard, M., Feranec, J., and Otahel, J. European Environment Agency (EEA)."CORINE land cover technical guide: Addendum 2000." (2000). [9] Breiman, L. "Random forests." Machine learning 45.1 (2001): 5-32. 123 [10] Brenning, A. "Spatial Prediction Models for Landslide Hazards: Review, Comparison and Evaluation." Natural Hazards and Earth System Sciences, 5 (2005):853-862. [11] Carrara, A., et al. "Use of GIS technology in the prediction and monitoring of landslide hazard." Natural Hazards 20.2-3 (1999): 117-135 [121 Chau, K.T. et al. "Landslide hazard analysis for Hong Kong using landslide inventory and GIS." Computers and Geosciences 30.4 (2004): 429-443. [13] Chawla, N. V., et al. "SMOTE: Synthetic Minority Over-sampling Technique." Journal of Artificial Intelligence Research 16 (2002): 321-357. [14] Chawla, N.V. "Data mining for imbalanced datasets: An overview." Data mining and knowledge discovery handbook Springer US, (2005): 853-867. [151 Chung, C.F., and Fabbri, A.G. "Validation of spatial prediction models for landslide hazard mapping." Natural Hazards 30.3(2003): 451-472. [16] Center for International Earth Science Information Network (CIESIN), Columbia University, United Nations Food and Agricultural Program (FAO), and Centro Internacional de Agricultura Tropical - CIAT. Gridded Population of the World, Version 3 (GPWv3): Population Density Grid. Palisades, NY (2005): NASA Socioeconomic Data and Applications Center (SEDAC). [17] Collison, A., et al. "Modelling the impact of predicted climate change on landslide frequency and magnitude in SE England." Engineering Geology 55.3 (2000): 205-218. [181 Crozier, M.J. "Deciphering the Effect of Climate Change on Landslide Activity: A Review." Geomorphology,124 (2010): 260-267. [19] Cruden, D. M., and Varnes, D. J. "Landslides: Investigation and Mitigation. Chapter 3-Landslide types and processes." Transportation research board special report 247 (1996). 124 [20] Dai, A. "Precipitation Characteristics in Eighteen Coupled Climate Models." Journal of Climate,19.18 (2006):4605-4630. [21] Dai, F.C., Lee, C.F., and Ngai, Y.Y. "Landslide risk assessment and management: an overview." Engineering Geology 64.1 (2001): 65-87. [22] Dal Pozzolo, A., Caelen, 0., and Bontempi, G. unbalanced: The package implements differentdata-driven method for unbalanced datasets. R packageversion 1.1: 2014. http://CRAN.R-project.org/package=unbalanced. [23] Dale, V. H., et al. "Climate Change and Forest Disturbances: Climate change can affect forests by altering the frequency, intensity, duration, and timing of fire, drought, introduced species, insect and pathogen outbreaks, hurricanes, windstorms, ice storms, or landslides." BioScience 51.9 (2001): 723-734. [24] Swiss .Federal Statistics (FSO). Gross domestic product by major region and canton. OFS 1328-1300 Neuchatel, Switzerland: FSO, 2013. [251 Gao, X., et al. "An Analogue Approach to Identify heavy Precipitation Events: Evaluation and Application to CMIP5 Climate Models in the United States." Journal of Climate 27 (2014): 5941D5963. [26] Guyon, I. and Eliseeff, A. "An Introduction to variable and feature selection." The Journal of Machine LEarning REsearch 3 (2003): 1157-1182. [27] Hastie, T., et al. The elements of statistical learning. Vol. 2. No. 1. New York: Springer, 2009. [28] Haylock, M.R., Klein Tank, A.M.G., Klok, E.J., Jones, P.D., and New, M. "A European daily high-resolution gridded dataset of surface temperature and precipitation." J. Geophys. Res (Atmospheres) 113(2008) [29] Hilker, N., Badoux, A., and Hegg, C."The Swiss flood and landslide damage database 1972-2007." Nat. Hazards Earth Syst. Sci 9 (2009): 913-925. 125 [30] Homer, C., et al. "Completion of the 2001 National Land Cover Database for the Counterminous United States." PhotogrammetricEngineering and Remote Sensing 73.4 (2007): 337. [31] Iverson, R.M. "Landslide triggering by rain infiltration." Water Resources Research 36.7 (2000): 1897-1910. [32] Kamath, N. "1.25L slumdwellers living in landslide-prone areas in Mumbai." Hindustan Times 2 July 2013: n.pag. [33] Kellenberg, D.K., and Mobarak, A.M. "Does rising income increase or decrease damage risk from natural disasters?." Journal of Urban Economics 63.3 (2008): 788-802. [34] Kotsiantis, S., Kanellopoulos, D., and Pintelas, P. "Handling imbalanced datasets: A review." GESTS International Transactions on Computer Science and Engineering 30.1 (2006): 25-36. [35] Kotsiantis, S.B., Zaharakis, I.D., and Pintelas, P.E. "Supervised machine learning: A review of classification techniques." Informatica 31 (2007): 249-268. [361 Liu, X., Wu, J., and Zhou, Z. "Exploratory undersampline for class-imbalance learning." IEEE Transactions of Systems, Man, and Cybernetics-PartB: Cybernetics 39.2 (2009): 539-550. [37] Matsuura, S., Asano, S. and Okamoto, T. "Relationship between rain and/or meltwater, pore-water pressure and displacement of a reactivated landslide." Engineering Geology 101.1 (2008): 49-59. [38] Mitchell, Tom M. Machine learning Burr Ridge, IL: McGraw Hill 45 (1997). [39] Petley, D.N. "The Global Occurrence of Fatal Landslides in 2007." European Geosciences Union General Assembly, Vienna, 13-18 April 2008. Geophysical Research Abstracts, 2008. n.pag. 126 [40] Petrucci, 0., Pasqua, A. A., and GuliL, G. "Landslide damage assessment using the Support Analysis Framework (SAF): the 2009 landsliding event in Calabria (Italy)." Advances in Geosciences 26.26 (2010): 13-17. [41] Rice, R.M. "Social, technological, and research responses to potential erosion and sediments disasters in the western United States, with examples from California." Proceedins of the InternationalSymposium on Erosion, Debris Flow and Disaster Prevention, Tskuba, Japan, 3-5 September 1985. p. 1-10. [42] Rienecker, M.M., et al. "MERRA: NASA's modern-era retrospective analysis for research and applications." Journal of Climate, 24.14 (2011): 3624-3648. [43] Safavian, S.R., and Landgrebe, D. "A survey of decision tree classifier methodology." IEEE transactionson systems, man, and cybernetics 21.3 (1991): 660-674. [44] Shuster, R.L. and Highland, L.M. "The Third Hans Cloos Lecture: Urban landslides: socioeconomic impacts and overview of mitigative strategies." Bulletin of Engineering Geology and the Environment 66.1 (2007): 1-27. [45] Simpson, R.H., and Saffir, H. "The hurricane disaster potential scale." Weatherwise 27.8 (1974): 169. [46] Spiker,E.C., and Gori, P. National Landslide Hazards Mitigation Strategy, a Framework for Loss Reduction. No. 1244. US Geological Survey, 2003. [47] Solomon, et al., Editors. Climate Change 2007 - The physical science basis: Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. Vol. 4. Cambridge (UK): Cambridge University Press; 2007. 966 p. Supported by the Intergovernmental Panel on Climate Change. [481 van den Besselaar, E.J.M., Hayock, M.R., van der Schrier, G., and Klein Tank, A.M.G. "A European daily high-resolution observational gridded data set of sea level pressure." J. Geophys. Res., 116 (2011). 127 [49] van Westen, C.J., van Asch, T.W.J., and Soeters, R. "Landslide Hazard and Risk Zonation - Why is it Still so Difficult?" Bulletin of Engineering Geology and the Environment, 65.2 (2006): 167-184. [501 van Westen, C.J., Catellanos, E., and Kuriakose, S.L. "Spatial Data for Landslide Susceptibility, Hazard, and Vulnerability Assessment: An Overview." Engineering Geology 102.3 (2008):112-131. [51] Varnes, D.J. "Landslides Hazard Zonation: A Review of Principles and Practice." Natural Hazards 3 (1984). [521 Witten, I. H., and Frank, E. Data Mining: Practicalmachine learningtools and techniques. Morgan Kaufmann, 2005. [53] Wood, H. 0., and Neumann, F. "Modified Mercalli intensity scale of 1931." Bulletin of the Seismological Society of America 21.4 (1931): 277-283. [54] Yao, X. Tam, L.G., and Dai, F.C. "Landslide Susceptibility Mapping Based on Support Vector Machine: A Case Study on Natural Slopes of Hong Kong, China," Geomorphology 101.4 (2008): 572-582. 128