Predicting Extreme Events: The Role of Big Data in Quantifying Risk in Structural Development by Oliver Edward Newth Submitted to the Department of Civil and Environmental Engineering in partial fulfillment of the requirements for the degree of Master of Engineering in Civil and Environmental Engineering j MASSACHUSETTS IST OF TrFC~tr-'oGy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUN 1 3 2014 June 2014 J RARIES @ Massachusetts Institute of Technology 2014. All rights reserved. Signature redacted A uthor ............ ........ Department of Civil and Environmental Engineering May 9, 2014 Certified by..... Signature redacted Pierre Ghisbain Lecturer of Civil and Environmental Engineering f, n. Certified by...... - Thesis Supervisor ........... Signature redacted........... j Jerome J. Connor Professor of Civil and Environmental Engineering Thes(1upervis r Signature redacted Accepted by . ....................................................... . Heidi M. Nepf Chair, Departmental Committee for Graduate Students E Predicting Extreme Events: The Role of Big Data in Quantifying Risk in Structural Development by Oliver Edward Newth Submitted to the Department of Civil and Environmental Engineering on May 9, 2014, in partial fulfillment of the requirements for the degree of Master of Engineering in Civil and Environmental Engineering Abstract Engineers are well-placed when calculating the required resistance for natural and non-natural hazards. However, there are two main problems with the current approach. First, while hazards are one of the primary causes of catastrophic damage and the design against risk contributes vastly to the cost in design and construction, it is only considered late in the development process. Second, current design approaches tend to provide guidelines that do not explain the rationale behind the presented values, leaving the engineer without any true understanding of the actual risk of a hazard occurring. Data is a key aspect in accurate prediction, though its sources are often sparsely distributed and engineers rarely have the background in statistics to process this into meaningful and useful results. This thesis explores the existing approaches to designing against hazards, focussing on natural hazards such as earthquakes, and the type of existing geographic information systems (GIS) that exist to assist in this process. A conceptual design for a hazard-related GIS is then proposed, looking at the key requirements for a system that could communicate key hazard-related data and how it could be designed and implemented. Sources for hazard-related data are then discussed. Finally, models and methodologies for interpreting hazard-related data are examined, with a schematic for how a hazard focussed system could be structured. These look at how risk can be predicted in a transparent way which ensures that the user of such a system is able to understand the hazard-related risks for a given location. Thesis Supervisor: Pierre Ghisbain Title: Lecturer of Civil and Environmental Engineering Thesis Supervisor: Jerome J. Connor Title: Professor of Civil and Environmental Engineering 3 4 Acknowledgments I would like to express my sincere gratitude to my supervisor, Pierre Ghisbain, for his support and assistance throughout my research and study. His knowledge and advice has been invaluable during the writing of my thesis. I also wish to thank Professor Jerome Connor, my advisor, for his advice throughout the academic program, and my parents who have advised me on many points of detail and made many valuable suggestions. Funding support was provided through a Kennedy scholarship granted by the Kennedy Memorial Trust - the British memorial to President Kennedy. I would like to thank the trustees who have provided me with much advice and assistance over the past year. 5 6 Contents 1 Introduction 15 1.1 O verview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.2 Types of Extreme Events . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.2.1 Naturally Occurring Events . . . . . . . . . . . . . . . . . . . 16 1.2.2 Non-natural Events . . . . . . . . . . . . . . . . . . . . . . . . 19 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.3.1 Designing for Risk . . . . . . . . . . . . . . . . . . . . . . . . 20 1.3.2 Approaches to Geographic Information System Design . . . . 22 1.3.3 Existing Hazard Geographic Information Systems . . . . . . . 22 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 1.3 1.4 2 Software Design 31 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.2 Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.2.1 Required Functionality . . . . . . . . . . . . . . . . . . . . . . 33 2.2.2 Optional Functionality . . . . . . . . . . . . . . . . . . . . . . 35 2.3 Information Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.4 User Interface Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.4.1 User Experience Requirements . . . . . . . . . . . . . . . . . . 37 2.4.2 Overall Interface Design . . . . . . . . . . . . . . . . . . . . . 38 2.4.3 Risk Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.5 Implementation of the Proposed System . . . . . . . . . . . . . . . . 41 2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 7 3 49 Data 3.1 Introduction and Overview . . . . . . . . . . . 49 3.2 Data Sources and Display of Data . . . . . . . 49 3.2.1 International Organizations . . . . . . 50 3.2.2 Governmental Data . . . . . . . . . . . 51 3.2.3 Other Sources . . . . . . . . . . . . . . 54 3.3 Variation in Data Sets . . . . . . . . . . . . . 55 3.4 Copyright Issues . . . . . . . . . . . . . . . . 56 3.5 Conclusions . . . . . . . . . . . . . . . . . . . 56 4 Interpretation 5 57 4.1 Introduction . . . . . . . . . . . . . . . . . . . 57 4.2 Types of Risk . . . . . . . . . . . . . . . . . . 57 4.3 Hazard Analysis . . . . . . . . . . . . . . . . . 59 4.4 Systems Design . . . . . . . . . . . . . . . . . 62 4.5 Resolution in Risk Calculations . . . . . . . . 64 4.6 Conclusions . . . . . . . . . . . . . . . . . . . 65 Conclusions and Recommendations 67 5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.2 Recommendations for Future Development . . . . . . . . 68 5.2.1 Collaboration . . . . . . . . . .. . . . . . . . . . . 68 5.2.2 Funding Opportunities . . . . . . . . . . . . . . . 69 8 List of Figures 1-1 Total damages (USD billion) caused by reported natural disasters between 1990 and 2012. [11] . . . . . . . . . . . . . . . . . . . . . . . 1-2 Estimated damage (USD billion) caused by reported natural disasters between 1975 and 2012. [11] . . . . . . . . . . . . . . . . . . . . . . . 1-3 24 A map displaying the likelihood of seismic activity occurring around the world, where the most likely areas are shown in dark red. [14] 1-6 23 Screenshot of a map output from the HAZUS software package showing associated structural economic loss due to an earthquake. [1] . . . . . 1-5 18 A choropleth map (top-left), an isoline map (top-right), and a network m ap (bottom ). [32] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4 18 . . 26 The NATHAN World Map of Natural Hazards. Earthquakes are shown in orange with varying intensities and tropical cyclones are in green. [26] 26 1-7 Screenshots of the beta version of the Global Earthquake Model. . . . 28 1-8 Screenshot of Raven with live data from Tacloban, Philippines. [13] . 29 2-1 Proposed design for the interactive risk-focussed GIS. Key navigation available in the top left corner of the main area. . . . . . . . . . . . . 39 is kept to the left of the screen, with primary search and information 2-2 USGS map of United States with spectral accelerations for a return period of 50 years. [30] . . . . . . . . . . . . . . . . . . . . . . . . . . 9 40 2-3 The user interface with a view of earthquake risk for a location in Washington DC. A scale with varying brightness is used to distinguish risk levels, in this case the predicted peak ground acceleration for a return period of 100 years. Risk levels are for illustration purposes only. Mapbox's Tilemill (https://github.com/mapbox/tilemill) is used to generate the base map. . . . . . . . . . . . . . . . . . . . . . . . . 2-4 41 A comparative view showing the difference in risk levels between two proposed locations in Washington DC. Note that risk levels are for illustration purposes only. 2-5 42 A display on data breakdown that has contributed to predictions for the comparative view. 2-6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 A potential structure for a hazard GIS. Components run by the system owner have a red outline, and external components are outlined in green. Dotted connecting lines signify read-write access, and solid lines represent read-only access. . . . . . . . . . . . . . . . . . . . . . . . . 3-1 44 Average coordinates of countries from the ISO database have been plotted using matplotlib's Basemap toolkit. Size of circle indicates 2012 population which was taken from the World Bank database and matched using the country's Alpha-3 code. . . . . . . . . . . . . . . . 51 3-2 Location of earthquakes between 2004 and 2014. . . . . . . . . . . . . 52 3-3 Location of earthquakes with deaths due to earthquakes in each country overlaid in red. Size of red circles represents number of deaths. 3-4 53 Location of earthquakes with deaths due to earthquakes in each country overlaid in red. Size of red circles represents number of deaths divided by population number . . . . . . . . . . . . . . . . . . . . . . 53 3-5 A comparative map showing deaths due to flooding. . . . . . . . . . . 54 4-1 Variations in quantity of earthquakes over a long time period in Northern China. [37] . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 59 4-2 Recorded spectral acceleration values correlated with the distance from epicenter for the 1999 Chi-Chi, Taiwan earthquake. It should be noted that there is large variability in ground motion. Predicted distribution is taken from the model by Campbell and Bozorgnia (2008). [2 . . . 60 4-3 OpenQuake workflow for classical PSHA. [8] . . . . . . . . . . . . . . 61 4-4 A suggested approach for structuring an earthquake risk model. [4] 62 4-5 The framework for GEM, identifying the three primary modules. [25] 63 4-6 A possible systems design for n hazards. Modules are outlined in red, data storage systems are outlined in blue, and the flow of data is indicated using arrows. Input and output for calculations completed in real-time are outlined in green. Annotations at the base indicate the type of data transferred. . . . . . . . . . . . . . . . . . . . . . . . 11 64 12 List of Tables 1.1 Major Categories of Environmental Hazard [27] 13 . . . . . . . . . . . . 17 14 Chapter 1 Introduction 1.1 Overview When designing a structure, the fundamental role of the engineer is to ensure that it will remain safe to use for the entirety of its design life. There is a large range of extreme events that could lead to structural failure, such as earthquakes, tropical cyclones and flooding. While tools and guidance exist to aid the engineer in designing to mitigate these risks, damage still occurs. Between 1989 and 2009, there were at least 1,029 failure events in the United States due to snow load alone. [21] Risk is defined as hazard multiplied by vulnerability, where a hazard is defined as a natural or man-made event that could cause harm to society, and vulnerability is how susceptible something is when exposed to a hazard. [23] The main way that risk can be better mitigated is through ensuring the engineer and other parties are fully informed about the associated risks in a given location. Designing against hazards is a primary driver in the cost of construction, as a structure must be designed to be strong enough to withstand any event it is likely to be exposed to in its design life. As design against extreme events play in important part of the overall cost, logically the likelihood of such events occurring at a given location should be considered from the very beginning of the structural development process. Currently, extreme events are generally considered when an engineer is completing the detailed design of a structure, or by an insurance company when 15 calculating premiums. If risks were considered before any design had begun, even as early as when a developer was deciding where to build a structure, costs could be dramatically reduced by choosing a location where risks were lower. Through improved access to information about the likelihood of extreme events taking place, engineers, real estate developers and other parties will be able to make more informed decisions to reduce the likelihood of catastrophic damage from occurring, reducing the cost and associated risks at an earlier stage in the decision-making process. Other benefits could also result from a better-designed tool, such as a reduction in time spent quantifying hazard-related risks. There are a number of geographic information systems (GIS) that aim to visually provide this type of data, where data is typically superimposed upon a digital map. This will consider what systems currently exist for calculating and displaying the probability of extreme events and associated risks, and how these could be improved upon. The focus will be on what an ideal system would be, what data could be used by the system, and how this data could be used to predict the risk from hazards. 1.2 Types of Extreme Events Hazards can be categorized into two groups: naturally occurring and technological hazards. When designing a structure, both of these groups must be considered and designed for to ensure risk is adequately mitigated. Smith (2013) defined the categories of risk as shown in Table 1.1. This will primarily focus on natural hazards, namely earthquakes in examples. Natural hazards are typically geographically influenced, and therefore a GIS is most useful for understanding this type of hazard. 1.2.1 Naturally Occurring Events Naturally occurring events can lead to extreme loads applied to a structure. Depending upon the location, these may include earthquakes, storms and volcanic eruptions. Such events may cause lasting damage to a structure, or lead to issues such as flooding which changes the surrounding environment. Figure 1-1 shows how storms 16 Table 1.1: Major Categories of Environmental Hazard [27] Natural Hazards Geologic Atmospheric Hydrologic Biologic Technological Hazards Transport accidents Industrial failures Unsafe public buildings and facilities Hazardous materials Examples Earthquakes; volcanic eruptions; landslides; avalanches Tropical cyclones; tornadoes; hail; ice; snow River floods; coastal floods; drought Epidemic diseases; wildfires Examples Air accidents; train crashes; ship wrecks Explosions and fires; releases of toxic or radioactive materials Structural collapse; fire Storage; transport, misuse of materials have led to the greatest financial damages from natural disasters in the Americas, though earthquakes are the largest cause in Asia. There is some overlap between these categories, such as where storms may have led to flooding. This paper will focus on earthquakes, which is one of the primary causes of economic disturbance and leads to a great number of fatalities around the world. Data collected by the Centre for Research on the Epidemiology of Disasters has reported that the average danage caused by natural disasters has grown significantly over the past 20 years well above inflation, as shown in Figure 1-2. The data in their database was compiled from various sources, including data from international notfor-profit agencies and organizations, insurance firms, and research institutes. [10] While the figure suggests a significant increase in the amount of damage in recent years, data has become more readily available since the program began in 1988 and it is possible that more data on disasters has become available in recent years. The view that more damage has occurred in recent years is therefore questionable. Earthquakes Earthquakes are one of the main sources of structural damage in the world, particularly in Asia where earthquakes were associated with over 500 million US dollars of damage between 1990 and 2012. [11] Risk from earthquakes is currently 17 Total damages ($US billion) caused by reported natural disasters 1990 -2012 I MAMca Asia Amaiercas Erape Mca Oceania Amercas Asia Eu rape Oceania Figure 1-1: Total damages (USD billion) caused by reported natural disasters between 1990 and 2012. [11] Estimated darnage (US$ bikon) caused by reported natural disasters 1975 - 2012 KOM twrVncK*W 9- 1975 1980 1985 1990 1995 2000 2005 2010 Year -o "r.aO-Vca a -a Figure 1-2: Estimated damage (USD billion) caused by reported natural disasters between 1975 and 2012. [11] 18 primarily mitigated through the use of design codes and guidelines, which provide instructions on how to design a structure that will withstand extreme load scenarios (see Section 1.3.1). Modeling software such as CSI's SAP, Autodesk's Revit and Oasys Software's GSA Suite are used to programmatically check a structure. Where static analysis is used to consider how a structure will respond to non-changing load scenarios, dynamic analysis is generally used to consider how a structure will respond to changing load cases such as earthquakes. Structural elements and the overall design of a structure can then be checked for suitability. The type of subsoil that the structure is built upon is one of the significant influencers in the risk of an earthquake damaging a structure. For example, hard rock presents different load scenarios to a structure than soft clay soil. This is one of the factors that needs to be taken into account when considering the exposure of a structure to an earthquake. The Unified Soil Classification System (USCS) is the standard for classifying soils in North America. Other natural events Tropical cyclones are a major cause of damage across the Americas. These include hurricanes, tropical storms and cyclonic storms, and typically form over the bodies of relatively warm water. A hurricane is the same as a typhoon, where hurricane is the regional term used in the northern Atlantic and northeast Pacific. Hurricanes pose the greatest threat to life and property. The US coastline is typically struck by three hurricanes every two years, one of which is classified as a major hurricane. Floods from heavy rain can also cause excessive damage and deaths. In 2001, Tropical Storm Allison killed 41 people and caused around 5 billion US dollars of structural damage. [20] It should be noted that this was just a tropical storm and not even a hurricane. 1.2.2 Non-natural Events The predominant source of non-natural extreme events is fire. Other events such as those caused by terrorism may also be considered in the design of a structure, though 19 terrorism causes significantly less catastrophic damage than other types of events. While it is important to consider how a structure could be damaged due to collisions from vehicles, this tends to be a localized issue, rather than one that is geographically influenced. A GIS is therefore likely to be of less benefit when designing against this kind of risk. 1.3 Literature Review Various resources play an important role in the risk assessment process. Hazardrelated data may be collected from global and national databases to gain an understanding of the history of events at a given site. Socio-economic data may also be used to gain a meaningful understanding of the probability of loss of life and damage occurring. Physical data may be used to understand vulnerability for a specific structure. Industry standard documents are often used by engineers to simplify a problem in the design process, giving typical safe values that can be used in calculations. There are also a number of geographic information systems that aim to combine hazard-related data with methodologies to provide an overall view of the associated risks. This section will go through each of these, identifying what currently exists and any associated issues. 1.3.1 Designing for Risk Designing for risk in North America The Association of Structural and Civil Engineers (ASCE) has been the longstanding provider of standards and guidance for the American structural engineering community. ASCE 7 is a standard that defines the minimum design loads that should be applied to structures and specifies the dead, live and extreme load cases that need to be considered. Section 1.5 in ASCE 7-10 defines different risk classifications and how important the different load cases are depending upon the structure's intended usage. 20 For structures that represent a low risk to human life in the event of a failure, the lowest risk category is used. The other extreme is where the structure is designated as an essential facility (for example, a facility that processes hazardous waste). [21] For calculating design parameters for earthquakes, the US Geological Survey (USGS) provides tools that allow engineers to create values for design based upon the location of a site and the ASCE (and other) guidelines. All tools allow latitudinal and longitudinal coordinates to be used, with some tools also permitting postal addresses to be used. However, these tools provide values for a single site or a selection of locations (through batch requests), rather than a tool that allows for comparison between different locations. Designing for risk in other countries The Eurocodes are technical rules produced by the European Committee for Standardisation and were made mandatory for public structures in 2010, replacing the British Standards that were previously used in many countries. The International Organization for Standardization (ISO) provide a number of guidelines that aid in the design and analysis of structures. These have been adopted by countries such as Ethiopia as the national standards [22], though it is unclear how widely the standards are used globally. Some of the other major standards used worldwide are those written by AS/NZS 2002 (Australia and New Zealand), NBCC 2005 (Canada) and AIJ (Japan). A study conducted at the University of Notre Dame in 2009 aimed to find the differences and similarities between the different standards with respect to their approaches to calculating wind actions on structures. In this study, they concluded that while the general approaches are moderately similar, there is significant variability between the values that are found from the different equations. [17] 21 1.3.2 Approaches to Geographic Information System Design The functionality provided by a GIS should contain at a minimum the following elements: data input, preprocessing, data management, basic spatial queries, geographic analysis, and output generation. [5] In the design of such a system, all of these areas must be considered to produce an adequate solution. Three approaches to showing geographic data visually are through a choropleth map, an isoline map and a network map (see Figure 1-3). A choropleth map is a thematic map that is designed to show statistical data related to a geographical area. The area is shaded to represent the variable that it represents, where a darker shade represents a greater value of the variable. An isoline map is one that uses continuous lines to join points that represent a variable with the same values. A typical example of such maps are ones where the lines represent altitude in the form of contour lines. Network maps show the connectivity in networks, where points are connected to show the sharing of a specific variable. 1.3.3 Existing Hazard Geographic Information Systems A number of proprietary and open source systems have been developed for analyzing risk due to hazards. A subset of these considers catastrophe modeling, also known as cat modeling, which looks to estimate losses that would occur due to an extreme event. This is particularly relevant for the insurance industry, where they typically use such software to calculate the risk of properties in their portfolio, and therefore the associated premiums. This section considers some of the primary GIS that currently exist, and their benefits and drawbacks. HAZUS - Federal Emergency Management Agency HAZUS is a national tool developed by the Federal Emergency Management Agency (FEMA) that allows individuals to calculate the losses that would occur due to natural disasters (see Figure 1-4). It is useful for local and state governments for calculate risk for their areas. At the time of writing the software is freely available for the 22 $ lp assourp~s 16-:*eIW'huega :ncrmvlm Us Paz MYL Kok Tn 9-,,knT UUM-Yna Ade ~ ~ ~ orei-j=rcy Figure 1-3: A choropleth map (top-left), an isoline map (top-right), and a network map (bottom). [32] 23 Pan I Zoom Point info Building Losses ED $1 - $200,000 EM $200,000 - $500,000 "NIII$500,000 - $1 million n$1 mllion - $10 million $10 million - $50 million More than $50 million Co Cu- nAl*gLoes ~ Trans: Figure 1-4: Screenshot of a map output from the HAZUS software package showing associated structural economic loss due to an earthquake. [1] public to download and use within the United States and available to order in other countries. HAZUS combines hazard models with geographic information systems (GIS) to provide estimates of damage that would occur due to different extreme load scenarios. While the software package provides valuable information for engineers and other parties about the associated risk and potential economic loss and social impact, it is highly complex and only provides information for the United States. For example, the guidance on how to use the flood model alone is 499 pages long. [1] The software is therefore useful in providing detailed technical calculations for risk profiles at a late stage in the design process for a structure within the United States, but provides limited assistance in the decision-making process for choosing a location for a structure. FEMA has also sponsored the production of a software package named ROVER (Rapid Observation of Vulnerability and Estimation of Risk) that allows individuals to conduct safety assessments of structures before and after an earthquake via a mobile 24 device. The tool aims to provide a solution for creating a database of structures, where users are able to efficiently analyze a structure and its exposure to earthquakes. Global Seismic Hazard Assessment Program The Global Seismic Hazard Assessment Program (GSHAP) aimed to produce a worldwide view of the risk from earthquakes. A map displaying the findings is shown in Figure 1-5. This proved to be a valuable exercise and the findings helped influence foreign policy, particularly in Africa, where it was used as a primary source of data for risk mitigation strategies. [3] The project was terminated in 1999, though the data continues to be used by organizations globally and Google have produced an interactive version using Google Maps Engine, where they have additionally produced interactive versions of earthquake fault lines and real-time earthquake data. [15] Organizations such as the US Geological Survey (USGS) provide information specifically looking at earthquakes, where they have a database of historical earthquakes that includes information such as the latitude, longitude, magnitude, date and time of the events and other more information. Data is also available in the form of a near real-time feed (every few minutes for California and within 30 minutes for the rest of the world), and have data and models available for particularly active zones such as around San Francisco. NATHAN Risk Suite - Munich Re NATHAN is a commercial product produced by Munich Re that aims to provide analysis for risk, primarily targeting insurance companies. They produced their own models for modeling catastrophe risk and provide online risk analysis tools to their customers through a digital portal. [26] One of the products from this suite is a world map of natural hazards, shown in Figure 1-6. While the research was undertaken separately to the GSHAP, their findings for earthquake risk were similar. Predictions from these programs should be treated with caution. In one study, low correlation was found between the earthquakes with the greatest body count and the predicted high risk areas from models such as that produced by the GSHAP. [16] 25 GLOBAL SEISMIC HAZARD MAP Figure 1-5: A map displaying the likelihood of seismic activity occurring around the world, where the most likely areas are shown in dark red. [14] Figure 1-6: The NATHAN World Map of Natural Hazards. Earthquakes are shown in orange with varying intensities and tropical cyclones are in green. [26] 26 The Global Earthquake Model (GEM) GEM is a public-private partnership that began in 2006, created by the Global Science Forum that is part of the Organisation for Economic Co-operation and Development (OECD). Its purpose is to create an open-source risk assessment solution specifically for earthquakes. They are developing a web-based platform that will enable users to calculate, visualize and investigate earthquake risk anywhere in the world. They released the first version of the OpenQuake engine in 2014, which is part of a wider suite of tools, including a platform and tools for modeling. Currently the engine does not have a graphical user interface, so calculations must be completed using the command line. The engine is largely based upon OpenSHA (http://www.opensha.org), an open source, Java Based platform which was designed to process seismic hazard analysis (SHA). [8] Figure 1-7 shows two screenshots of the beta software, which aims to make the large data-sets easier to access. They have also created a cloud-based solution called the OpenQuake Alpha Testing Service (OATS), which a user can request access to through their website. The project is open-source, and one of the particularly interesting elements is around how they work with local and regional experts to not only feed in local data, but also to create models specific for localities. For example, China-specific earthquake models can be built into the software. One of the main issues, however, is that the views are potentially still hard to interpret, as there is often a large amount of information shown at any given time. It is also only for earthquake risk, so does not provide any comparison between the level that earthquakes should be considered compared to other types of risk. While GEM does require a technical background to setup and use correctly, it has many advantages over existing systems. As this project is open-source, data and source could be potentially integrated into a wider solution. The scientific framework behind this system is discussed in Section 4.4. 27 OPENOUAKE )PENOUAKE (b) Hybrid view of hazard data. (a) Granular population counts in China. Figure 1-7: Screenshots of the beta version of the Global Earthquake Model. Other hazard-related geographic information systems There are a number of companies in addition to Munich-Re that provide catastrophe models to industry. One of the notable companies is Risk Management Solutions (RMS), who provide catastrophe modeling for 9 major types of event, including natural disasters such as earthquakes, tropical cyclones and windstorms, as well as man-made disasters including likelihood of terrorism activities and infectious pandemics. Other non-hazard geographic information systems There are a number of commercial platforms that have been created to allow individuals to explore and interpret data. One example of this is Raven, a software package created by Palantir described as a "high-performance, web-based geospatial analysis application." The platform focusses on integrating real-time data and has primarily been used for natural disaster recovery operations, where it was recently used in efforts to assist in the response efforts to Typhoon Haiyan, a catastrophic tropical cyclone that occurred in 2013. [13] A software engineer at Palantir said that when creating the tool, the focus was to make the data easily accessible, leaving any interpretation of this data to the end user. (W. Macrae, personal communication, February 3, 2014) The system primarily uses open source data sets, namely changes 28 - - = I Figure 1-8: Screenshot of Raven with live data from Tacloban, Philippines. [13] in OpenStreetMaps, to produce the filterable output. A screenshot of the Haven system is shown in Figure 1-8. Dangers related to an easy to use hazard-based geographic information system While informing a larger body of users about the risks that various hazards present at a given geographical location, there is a danger that that the results could be misunderstood, misinterpreted or incorrect if the system is not properly configured for their specific use case. It is therefore important that users of such a system are made aware of the dangers of using this information incorrectly. There have been a number of programs undertaken to train users on how to use GIS correctly, with professional organizations pushing to train their workforce on utilizing these systems. Resources and tutorials on how to use and understand the system should be provided to end users, though these should be easy to understand without oversimplifying the contents. 29 1.4 Thesis Outline The following chapters will focus on three main areas: the design of a tool that could make hazard-related information easily accessible to a global audience; a study considering which data sources could be used to provide appropriate data and how this could be visualized; and an examination of existing risk models with a proposed systems design for a multi-hazard GIS. Chapter 2 considers the primary factors behind the design of a hazard-related GIS. This chapter concentrates on the requirements for the system, what factors should be considered to ensure a user-centered design process is followed, and which issues a developer may come across when seeking to implement this solution. Proposed user interface designs are presented, and an overall structure for the system is proposed. Chapter 3 focusses on what national and international data sets could be used to quantify the probability of different hazards occurring. This chapter predominantly considers data directly related to earthquakes and how this data can be visualized, in addition to considering how variation between data sets can be accounted for. Chapter 4 looks at models and methodologies that exist for interpreting riskfocussed data, and how these could be used with the aforementioned data sets to create an integrated, accurate and transparent solution. The chapter concludes with a proposed systems design for quantifying risk. The final chapter reflects upon the key conclusions from this research, and includes recommendations for future study and development. 30 Chapter 2 Software Design 2.1 Introduction While there has been a large amount of research into predicting different natural disasters and the associated risks, the tools available to make use of this data are complex and generally require expertise and a large amount of time to utilize. The aim of this is to explore what type of tool would be useful to the engineering and planning community at the early stages of development that would aid in the decision making process. Transparency will be key to such a tool's success. Current solutions usually take a 'black-box' approach, where output is presented without any explanation about how they were derived. This system should be designed so that it is clear how any values are calculated. The overall user interface should also be easy to use and learn how to use. Core functionality should be easy to find and use. The ability of the developer to fulfill these requirements is likely to have a significant impact over whether such a tool is adopted. One of the movements in the software development industry has been to develop "cloud-based" solutions, where digital resources are stored on a network of servers. The software application of this is referred to as software as a service (SaaS), where the software and associated data are stored in the cloud. As this tool will require a large amount of computational power, large data sets and location-specific information, 31 a SaaS solution is likely to be more relevant than a desktop package and we will focus on this throughout the chapter. This is because SaaS solutions are able to use distributed computing networks to perform calculations, large amounts of data can be stored affordably, and results can be cached so less computing resources are required. User Groups There are a number of parties who would benefit from such a system. The system is likely to be based upon macro-scale data which cover large geographic areas. Site specific data such as data from geotechnical investigations would still be collected later in the design process. The system is therefore expected to be primarily of benefit early in the design and decision-making stages, such as when deciding upon where a structure should be located. External parties such as governmental organizations may also use such a system when assessing how to mitigate large-scale risk. Engineers This is likely to be the primary user group for the platform as engineers are usually expected to analyze the risk at the various stages of design in a structural project. This group typically uses a wide range of complex software and would be expected to primarily use a hazard GIS to assist in predicting the levels of risk expected in a structural project. Real estate developers Developers generally take on the greatest risk in the creation of real estate. As such minimizing the associated risk and costs in a construction project would be of great benefit to them. This system would allow developers to help decide upon a location for a structure, ensuring the risk of extreme events is considered at an early stage. Choosing a lower-risk site could therefore benefit developers by reducing the cost of construction and associated risk. Governmental organizations This software could be used to help inform local and national governments about the likely risks in their areas and could help to design against such hazards. 32 These individuals will be able to approach the data from a social vulnerability viewpoint, and consider how government funds can be utilized best in their area of interest. Not for profit organizations National and international organizations may be interested in this software as it may provide information to allow them to focus aid and support in particular areas. For example, international aid organizations may wish to use such data to focus financial support for locations that are likely to experience a large natural disaster in the coming years. Insurance industry They could use the system to help evaluate likelihood of loss for their portfolio of properties under given scenarios, and will be able to evaluate the sensitivity of their locations of interest. 2.2 Functionality 2.2.1 Required Functionality The design requirements have been broken down into the main components as defined by Withall (2007) in the book 'Software Requirement Patterns'. [36] Each of these required areas is discussed below. Data entity This refers to the individual elements of data that will be stored. One of the main data entities will be location information, most likely accessed through a mapping application programming interface (API) such as Google Maps API. Data on different levels of risk (previous earthquakes, information, etc.) will also be required. historical weather The way that this data is stored and processed will be important so that the breakdown of any final figures of risk can be clearly shown. For example, rather than simply showing that 33 the likelihood of a severe earthquake is high at a particular location, historical data on earthquakes and other factors that affect the prediction should also be accessible. Information The data entities need to be efficiently stored and processed to calculate the risk levels. Data should also be stored in a way that allows data to be efficiently compared, allowing the user to query, filter and order this data (for example, the user may wish to order a set of locations by the level of risk). User function This primarily refers to how the data will be accessed by the user. To ensure the system can be extended for access via a range of devices (such as mobiles or tablets), the core functionality should be separated from the user interface by designing it such that data is accessed through an API. This will ensure that other versions can be produced later relatively easily, which may include mobile or tablet versions of the application. It also will help to ensure that the device is designed to follow a core programming principle called 'DRY' (don't repeat yourself) - a concept that states code should not be repeated if possible. This aims to improves the serviceability and reliability of software, as changes to code only need to be made once, compared to multiple times if code did not follow the 'DRY' principle. There are a few core user functions that need to be built into the system. These include views of different types of hazard, location querying (where a user can search for a specific place by street address, post-code, etc.), location comparison and risk breakdown (showing a breakdown of the contributing factors for each type of hazard). Performance The system should be designed so that it is capable of supporting the required peak capacity. Designing the system on a cloud computing platform is likely to ensure this requirement is satisfied. The system should also be designed to 34 ensure that there is high availability and low response time and tools such as New Relic (http://newrelic.com) could help to design for this requirement. Flexibility The system should be designed modularly to ensure that it can be extended at a later stage. This will make it easier to add additional features later or allow the system to be customized for different types of users. Designing the software using object-oriented principles could help to fulfill this requirement, as this will increase the modularity of the code and improve readability. Access control User accounts will be required primarily to store user data such as locations of project sites and other user-specific data such as preferences. Access control would also be necessary for a private management portal, where only authorized users should have access to monitor and maintain the system. Commercial If premium features are added at a later stage, such as data and guidance from design standards, it may be necessary to add commercial features such as payment options and corporate/multi-user accounts that allow enterprises to purchase a license for multiple users. Access control would be required for commercial features to be implemented. Optional Functionality 2.2.2 A few proposed optional features have been listed below. It is likely that these will increase over time as users request additional features. Real-time view of data The system will primarily focus on predicting long-term risk, rather than looking at immediate risk. However, the system could be scaled to provide a view of current issues affecting people around the world if real-time access to hazard data is available. Early warning systems such as ShakeAlert for earthquake 35 warnings have a highly practical application in terms of saving lives, though their applicability to the long-term planning and construction is limited. It has also been proposed that individuals are able to contribute to these data sets by hosting seismic stations and this can provide detailed realtime data regarding ongoing natural disasters. Customization for different user groups As different users of the system will most likely want different types of output from the system, the system could be customized for their varying requirements. For an engineer, two examples of parameters they may wish to control are the type of structure they are considering, and the risk category for this structure. An example of how the type of structure is important is with regards to the structure's height. For high-rise structures, wind is a major consideration, whereas earthquakes are more important when designing low-rise buildings. The most relevant information could then be presented to the end user. The importance of different risks also varies depending upon the use case for a structure. This could be taken into account if a user selects its intended use. For example, requirements are considerably different for a non-critical structure such as a house, compared to a critical structure like a hospital. 2.3 Information Architecture It is important to consider how the information will be collected, processed and stored within the system. This will also ensure that the system is able to operate efficiently. Not all data will need to be stored in the system's own database. Instead, public APIs can provide this data on-demand. An example is the 'Yahoo BOSS PlaceFinder', which converts street addresses into geographic coordinates. The basic types of data that a hazard GIS would require outside of data available on-demand through public APIs is as follows: 36 Hazard types An overall collection of the types of hazards stored in the system. This will include the properties and descriptions about each one. For an earthquake, the properties would include the location of the event, magnitude and magnitude scale type. Hazard data This will contain the data relating to specific events. This data should correlate with a particular hazard type, and should include the properties this hazard type defines as required. As data from different sources is likely to be collected and matched, the source of hazard data should also be included. Model data This would include the data output from programmatic models, which would be cached to reduce the load on servers and time required to load pages. Geographic data Data on countries, continents and other geographic areas would be stored in the system. This would comprise of information such as border co-ordinates, the name of the geographic area, and other area-specific information. User data Information such as personal preferences and sites of interest may be stored in the database to allow users to return without repeating previously completed personalization. 2.4 2.4.1 User Interface Design User Experience Requirements One model that outlines how a product should be designed is the Technology Acceptance Model (TAM) that was developed by Davis in 1989. [9] According to the model, software will be adopted if it fulfills two basic criteria: it must be perceived 37 to be easy to use, and perceived to be useful. The model has been modified more recently to include intrinsic motivations for using technology. Wang (2012) concluded that primary intrinsic contributors to acceptance include emotional response, imaginal response and flow experience, where each of these significantly influences how likely a user is to adopt using a technology. 2.4.2 [34] Overall Interface Design A proposed user-interface is shown in Figure 2-1. The design features an interactive map with layers of risk overlaid. Users are able to search for a location and modify the selected view to show different types of risk. When a location is selected, a pin shows the point visually on the map and the hazards for this location are shown in the top left of the screen. Here, the relative probability of severe events occurring for the given site are shown in a bar chart format. These aim to give the end user a quick understanding of what the primary risks are that they need to consider for the given site. In the case of the site in Figure 2-1, the charts indicate that typhoons are the most severe risk for that specific location. As the user changes the location, they will be able to see the risk levels adjust accordingly. The interface design aims to achieve the targets specified in the TAM. The view is purposefully simplistic, with only the most important information displayed. In order to show the usefulness of data, hazard information is shown prominently on the screen at all times once a location has been specified. 2.4.3 Risk Views The different types of risk are overlaid on the map. This proposal shows three risk views: a view for earthquakes; typhoons; and flooding. These could be expanded as other types of risk are considered. Further searching and filtering options could be added in later versions of the application. It is important that the system is designed so that it is easy to use. 38 Colmbta. SC 29203 HumdaV Other- Figure 2-1: Proposed design for the interactive risk-focussed GIS. Key navigation is kept to the left of the screen, with primary search and information available in the top left corner of the main area. It is important that views of risk are interpreted correctly. Previous views of earthquake likelihood typically are shown using variation in hue which can be hard to interpret correctly. Figure 2-2a shows a map produced by USGS where a color gradient is used. It may be misleading to some that the green areas signify a mediumrisk area and the dark brown signifies higher risk areas than red. Saturation is easier to correctly interpret, with an equivalent figure shown in Figure 2-2b. It should also be noted that a non-uniform scale is used, which could lead to further misinterpretation regarding the meaning of the displayed values. The type of risk shown on the map can be selected using the navigation bar on the left of the screen. A proposed design showing the associated risk from earthquakes is presented in Figure 2-3. The risk levels for earthquakes is indicated through varying brightnesses of red, with the scale displayed at the bottom of the screen. This allows the user to understand how risk varies around a given site. Views of other risk types can then be selected, with icons for typhoons and flooding shown on the navigation bar. While typhoons may lead to flooding, there can be alternative causes of flooding. Both are shown so the user can consider whichever is of greatest importance for their project. 39 4So 4#5,o -40 -90 10 10A, (a) Original map. &00$5 95 (b) Map where colors have been replaced with a greyscale gradient with varying brightness. Figure 2-2: USGS map of United States with spectral accelerations for a return period of 50 years. [301 Multiple sites can be directly compared, as shown in Figure 2-4. Here, both sites are indicated using pins on the map with corresponding colors, and the comparative risk levels are shown in the 'Hazards' box. These two sites are fairly close, so risk levels do not vary hugely, though by showing risk levels alongside each other, the user is able to quickly interpret that the risk of earthquakes is higher at the first site. Risk levels are for illustration purposes only. A brcakdown of the dlifferenit co)ntrib~utors to a risk calculationi is slhowii wvhen a type of risk is selected (see Figure 2-5). This allows the end user to gain an understanding of the reasoning behind a prediction. The clarity that this brings should ensure the end user is fully informed about the associated risk and why it is the case. An engineer is also able to receive key variables required in technical calculations. Default settings are shown below the values, though they can be manually changed by selecting them and choosing an alternative value using a dropdown menu. A number of other views could also need to be developed. These would look to cater for the different user groups and their requirements. For example, if engineers were to use this system as a replacement for all of the hazard analyzing systems they previously used, they would require significant control over all variables affecting each type of risk, and all the relevant output parameters needed to analyze a structure. 40 Figure 2-3: The user interface with a view of earthquake risk for a location in Washington DC. A scale with varying brightness is used to distinguish risk levels, in this case the predicted peak ground acceleration for a return period of 100 years. Risk levels are for illustration purposes only. Mapbox's Tilemill (https://github.com/mapbox/tilemill) is used to generate the base map. 2.5 Implementation of the Proposed System Now we have considered what the required features of such a system are and how it could be designed, we shall consider how it could be implemented. This section will discuss how the system could be structure, what developer tools exist for creating a GIS, which technologies may be appropriate for this system, whether native versions should be created for desktop and mobile, and finally how the public could be made aware of its existence. Overall system structure A proposal for an implementation for such a system is shown in Figure 2-6. In this diagram, the system is split into three main sections: data storage; data processing; and external access. Data storage includes the storage of any core data. Static files would most likely be stored in a low-cost, redundant system such as Amazon Web Services' Simple Storage Service (AWS S3). This would serve static public-facing files such as the 41 Figure 2-4: A comparative view showing the difference in risk levels between two proposed locations in Washington DC. Note that risk levels are for illustration purposes only. HTML, JavaScript and image files, which are stored on servers distributed around the world and served to a client device from the nearest data center. This reduces load time for pages, and improves the overall visitor experience. The system's database will hold any changing data, such as hazard and user data, and could either be stored on a relational (RDBMS) or non-relational (NoSQL) data storage system. There are different benefits for each of these, though a developer should investigate the benefits to each to decide upon an appropriate configuration. Data stored externally, including hazard data sets, map layers, and location data, would be queried through their corresponding data access systems. Most providers of data have public-facing application programming interfaces (APIs) available for accessing this data, and may charge for usage. Data processing considers file processing. Real-time calculations would be performed on a cluster of real-time calculation servers. Services such as AWS Elastic Cloud Compute (AWS EC2) allow resources to be dynamically scaled, meaning that at peak times, the number of servers can be increased within minutes by using a previously created image of the files for the server. Periodic calculations would be completed on a separate cluster to ensure performance does not degrade when this is 42 Figure 2-5: A display on data breakdown that has contributed to predictions for the comparative view. being completed. Some -static' files may need to be served from a server rather than static storage, so a separate server could be used to serve these. An example of such a file might be the HTML page for the home screen, which would possibly need to be served from a server depending upon the configuration of the system. External access covers how people access the data. The public-facing portal is where the public see the system, and this would interact with the static files and the real-time calculation server. This would be similar for the private management portal (or administration area), where the managers of the system could control the data and how the system is operating. An external access API may also be available for developers to create their own solutions using the data from the system, in a similar manner to how this system interacts with other companies' APIs for data such as map images. The proposed structure is simplistic: it is likely that an API would be created to control the flow of data between any public-facing areas and the server and storage areas. However, it should provide an idea for how such a system could be created. 43 Data storage Static file storage System database External data sets Maps API Location query API Data processing Real-time calculation server Periodic Cloud static server Private management portal (developers; other integrations) calculation server External access Public-facing portal External access API Figure 2-6: A potential structure for a hazard GIS. Components run by the system owner have a red outline, and external components are outlined in green. Dotted connecting lines signify read-write access, and solid lines represent read-only access. Mapping tools available for developers An existing mapping provider could be used to provide the underlying base map. Assuming the system is likely to be a server-based solution (in contrast to desktop or mobile software which is often more time-consuming and therefore costly to create), a JavaScript-based package would best suit this project. Google Maps for Business (http://google.com/enterprise/mapsearth), Mapbox (http://mapbox.com), CartoDB (http://cartodb.com) and ArcGIS (http://esri.com/software/arcgis) are four of the leading providers of server-based mapping developer tools, and generally charge based upon number of requests sent to their servers. Some of these providers also provide analytical tools that can provide interpretation and visualization to a developer. Technologies for back-end system The two languages most popular in the statistical community are R and Python, where Python is becoming increasingly widely used. Python is a high-level language that is built on-top of another language. The most commonly used version is CPython which is based upon C. Python has a number of benefits over R and other languages. First, it is easy to learn. The language syntax is purposefully readable and the language popular in the scientific community, meaning there is 44 a lot of support and documentation available. Second, Python has an extensive number of scientific packages available that extend Python's core offering to include sophisticated functions and other capabilities. Third, Python is efficient. Built-in generators allow tasks to be completed using less memory, and there have also been a number of projects that have aim to improve the performance of Python such as PyPy (http://pypy.org), which further reduces Python's speed and memory footprint. Technologies for front-end interface The language selected for programming the user-facing interface should be carefully considered. Ruby on Rails (http://rubyonrails.org), also known simply as Rails, is currently a popular language for developing front-end systems as it removes repetitiveness from the development process. Web applications can therefore often be built in significantly less time in comparison to other web-based programming languages such as PHP (though frameworks have been developed for other languages which aim to reduce time taken for development). Programming the system in a language such as Rails may therefore reduce development time and this can therefore help to reduce the cost of creating the system. While traditional websites dynamically generate user-facing content on the server, developers are starting to use JavaScript front-end model-view-controller (MVC) frameworks that allow pages to be dynamically generated by the end-users' browser. This can help to reduce the load on servers (as the server is generating less content), and improve load time for visitors. Some examples of such frameworks include Backbone.js (http://backbonejs.org), AngularJS (https://angularjs.org) and Ember.js (http://emberjs.com). Open source GIS frameworks A number of GIS frameworks have been created to make the process of producing a GIS easier, removing repetitiveness from the development process. One example is GeoNode (http://geonode.org/) which describes itself as an "open source geospatial content management system". This system aims to provide an underlying system 45 that makes it easier to share interactive maps and geospatial data. The system can then be extended by a third party to produce the desired final system. QGIS (http://www.qgis.org/) is another system that is free and open source, and available for desktop, server or online applications. Native desktop and mobile versions Generally SaaS solutions are able to bring the belefits that coie from scale to the end user. A standalone version would lead to issues such as a large amount of disk space being required on the end users' computer. Two possible solutions to this issue are to either produce a native application that sends and receives requests from the server remotely, or to produce an application where an online interface can be integrated into a native application. Creating a native application may provide a higher-quality user experience as it would be written in the operating system's native language, so the application can utilize native functionality. However, it is typically more time-consuming to create as the interface effectively has to be recreated. Embedding an online interface in a native application requires significantly less time and resources to produce, and also means updates can be rolled out immediately, in the same way as SaaS solutions, as updates involve updating code stored on a server instead of code stored on an end user's device. A similar approach can be applied for developing applications for mobile devices. Two examples of open-source projects that aim to allow developers to embed online interfaces are PhoneGap (http://phonegap.com), a solution for mobile devices, and macgap (https://github.com/maccman/macgap), a solution for Macintosh. Public awareness If funding is invested into developing such a system, it will be important for the public to be aware of its existence. One way to bring awareness to the project would be to encourage organizations that currently publish hazard-related information to publicize the system through their site. An example would be through USGS, a primary provider of earthquake-related data to engineers. If organizations such as 46 USGS show support on their websites for the system, this is likely to encourage more individuals to use it. Introductory seminars could also be given to inform the public about its existence. These could be tailored to different audiences, where a seminar for engineers could discuss the more technical aspects of the system. 2.6 Conclusions For a suitable system to be developed, the user interface, the type of data that will be displayed and the approach to system development must each be carefully considered from the outset. The user interface must be designed such that it fulfills the core requirements, and should aim to fulfill the guidelines specified by the TAM to ensure users adopt the system. The type of data shown should be sufficient for each of the core user groups. The system should be designed so that it is scalable in the future. Approaches such as object-oriented programming will increase the modularity of the code, making future developments easier to implement. Open source projects could be utilized to build the system in less time and with fewer resources. In Chapter 3, we will consider the type of data that such a system would require, what sources could be used to provide this data and how this data can be presented. The chapter will also consider some of the main issues encountered when using different data sources and what approaches exist for overcoming them. 47 48 Chapter 3 Data 3.1 Introduction and Overview There are a number of data sets available which document events due to natural hazards, although they are often hard to find and interpret. This chapter looks into what types of data sets currently exist, how these data sets can be shown graphically, and what issues exist when displaying this data. The data discussed in this chapter could then be made available through a tool such as the one proposed in Chapter 2. This study was conducted using the Python programming language, and a number of libraries including Pandas (http://pandas.pydata.org), Matplotlib (http://matplotlib.org), NumPy (http://numpy.org) and SciPy (http://scipy.org). The Matplotlib Basemap Toolkit (http://matplotlib.org/basemap/) was also used for creating the maps. 3.2 Data Sources and Display of Data Levels of risk vary widely depending upon the geographic location of the site in question. This study will primarily consider seismic risk as an example, and then compare the approach to how data could be collected for other types of natural hazard. For earthquakes, this is mainly dictated by how close a site is to an active 49 fault, which can lead to the strongest shaking and can therefore result substantial damage occurring to structures in these areas. In order to compare the levels of risk, data will need to be gathered from a number of different sources. International not-for-profit and governmental organizations provide a large amount of information that will aid in this analysis. Specialist organizations (such as the ASCE) and university research also provide valuable data which can be used in this analysis. 3.2.1 International Organizations International organizations provide a large amount of freely available data on countries worldwide. For this study, geographical data was required for all the countries around the world. A data set was used that related the Alpha-3 code' with the mean latitude and longitude values for the majority of countries around the world. [28] It is important to go further than basic hazard information, and consider socioeconomic factors to take into account social and economic vulnerability at a location. Indicators of such factors may include population-related, economic and governmental factors. To investigate the relationship between natural disasters and socio-economic factors, population data from the World Bank was used for countries around the world. [29] This data can be seen visualized in Figure 3-1. World Health Organization The World Health Organization's (WHO) Centre for Research on the Epidemiology of Disasters (CRED) created an international database called EM-DAT. This provides a tool that allows individuals to create customized datasets filterable by time, location and type. The output includes number of individuals killed, injured, affected and made homeless, as well as the total financial damage caused by natural and nonnatural disasters. [12] 1 Alpha-3 codes are three-letter country codes defined within the ISO 3166 standard which provide a standardized representation for countries, dependent territories or areas of geographic interest. 50 .0- Figure 3-1: Average coordinates of countries from the ISO database have been plotted using matplotlib's Basemap toolkit. Size of circle indicates 2012 population which was taken from the World Bank database and matched using the country's Alpha-3 code. 3.2.2 Governmental Data Typically, hazard-related systems will incorporate dozens of different data sets, and many of these will likely come from governmental organizations. In this section, we will focus on data available related to earthquakes, looking at three sources in particular: the United States Geological Survey; The International Seismological Center; and the National Climate Change Center. United States Geological Survey Event sets from the United States Geological Survey (USGS) Earthquake Archive Search were used to show the location of earthquakes that occurred around the world between 2004 and 2013. [31] This data is visualized in Figure 3-2a, where the epicenter of each earthquake is plotted as an opaque point, and in Figure 3-2b, where the size of the circle was defined by the magnitude of the earthquake and each point is translucent. 51 ~h4r, C4h.' WOW- Vo (a) Basic plot of earthquakes (b) Points are sized by magnitude with an alpha value of 0.05. Figure 3-2: Location of earthquakes between 2004 and 2014. International Seismological Centre The International Seismological Centre in the UK have produced a catalogue of earthquakes that covers 110 years of seismic events, including over 20,000 events with a magnitude > 5.5. They claim it is homogenous to a high degree, and it includes estimates of uncertainty. [7] The relationship between earthquake locations and number of deaths was then studied. Figure 3-3 shows the number of deaths from the aforementioned CRED database superimposed over a map displaying earthquake epicenters. This showed a moderate correlation. Figure 3-4 considered deaths as a percentage of a country's population. This showed a greater correlation, which is likely to be due to the number of people exposed to an earthquake. As an example, consider how no fatalities will occur when there is an earthquake in an uninhabited area. National Climatic Data Center The National Climatic Data Center (NCDC) is part of the National Oceanic and Atmospheric Administration (NOAA) and is responsible for collecting and storing the world's largest live collection of weather data. [19] It provides data on climate variability, with three key types of data available: 1. Analyses of weather and climate events, which includes monthly reports on changes in climate and summaries of economically significant events since 1980. 52 A62 r1 Figure 3-3: Location of earthquakes with deaths due to earthquakes in each country overlaid in red. Size of red circles represents number of deaths. VJaw Figure 3-4: Location of earthquakes with deaths due to earthquakes in each country overlaid in red. Size of red circles represents number of deaths divided by population number. 53 . Figure 3-5: A comparative map showing deaths due to flooding. 2. Data on extreme events at a national and state level. 3. Statistical weather and climate information, particularly focussing on changes in temperature, precipitation and drought levels. 3.2.3 Other Sources While government departments and international organizations typically provide the majority of international and national data sets, there are other sources of useful data, including university research industry associations. University research has contributed significantly to the data available. A number of papers have been written on the topic of analyzing associated risk with earthquakes, and research has often led to the creation of large data sets that have been used in industry. A number of universities have also created portals for accessing geospatial data. One example of such a system is Geoweb, which was created by Massachusetts Institute of Technology (MIT) and includes over 23,000 geospatial data sets from 54 multiple repositories. The system is based upon OpenGeoPortal, which was developed by Tufts University Information Technology in partnership with MIT and Harvard, and the data is collected from contributions made available through The Open Geoportal Consortium. [18] Industry associations such as the ASCE tend to focus on creating guidelines and specifications that can be used to create risk estimates, rather than providing core data sets. 3.3 Variation in Data Sets Data in different data sets was found to vary significantly, so data cleansing is likely to be one of the more time consuming processes. When analyzing the different data sets, there were a number of issues in the previous data sets used, which were related to the format of the data stored and missing information in a few of the databases. One of the issues encountered was around discrepancies in the data stored in different databases. When collating multiple sources into one large set., generally there will be issues with differences in titles and formats in the different databases. For example, one database listed 'Cape Verde', whereas another referred to the country as 'Cabo Verde', the Portuguese spelling. To avoid such issues, the Alpha-3 code was used when available, which generally led to a higher percentage of matches. The usual approach for mitigating this issue is to create an attribute mapping schema which relates the previous and current keys, and any changes in formats should be recorded for future reference. Another difficulty was how some databases did not include information regarding all of the countries in the world. The World Bank database was lacking information on 35 countries (14.4% of all countries in the database) which were mostly islands with small populations, such as the Cook Islands and Niue, an island country in the South Pacific Ocean. This led to issues when plotting as some datasets were incomplete. This was solved by removing rows with missing data prior to plotting. 55 Uncertainty should be quantified in order to give a measure of confidence. This can be achieved by comparing differences from the values measured by independent sources. Through analyzing how much these values vary by, it is possible to calculate the accuracy and precision for each reported event. 3.4 Copyright Issues There may be issues in implementing copyrighted approaches into such a system. For example, guidance provided by associations such as the ASCE and ICE is generally chargeable and therefore it may not be possible to integrate information such as risk categories into software without paying a fee. One option may be to use governmental funding to pay for these licenses, on the grounds that it will lead to more informed decision-making and the construction of safer infrastructure and structures in the long-term. 3.5 Conclusions There are a number of data sources available from sources such as governmental and not-for-profit organizations. In order to create a reliable, valid database of multihazard data, research should be undertaken to collate and validate this information. Attention must be given to ensure variations in the data sets are accounted for, and any uncertainty between these sets should be quantified to ensure any user of the data has a measure of confidence in results obtained. The next chapter will consider how this data can be used to create predictions of risk, and looks into what models and methodologies currently exist for quantifying the likelihood of extreme events occurring. 56 Chapter 4 Interpretation 4.1 Introduction One of the important aspects that should be considered when designing a hazard GIS is how the risk levels are predicted using existing data sources. A system needs to be designed that is able to take various data sources for parametric inputs, and output reliable, easily interpretable predictions for the various types of risk. This chapter focusses on the types of risk that should be considered in calculations; general approaches taken to hazard analysis; and how an overall system could be integrated to produce probability of hazards occurring. It will also consider the resolution of the overall system and how this is impacted by the input data and models used. Calculations from these models would be computed using a cloud- based network of servers, with output cached in the system's database in order to reduce server load (see Section 2.5 for further details on the implementation of the system). 4.2 Types of Risk The type of risk calculated is dependent upon the intended purpose for the output. Insurance companies tend to use catastrophe models to calculate the likelihood of 57 financial loss, whereas a public body is more likely to be concerned about reducing the likelihood of loss of lives. There are a number of factors that influence the level of risk. For typical natural hazards, these include location-oriented, time-oriented and structure-oriented factors. These all need to be taken into account to produce a valid estimation for how at risk a particular structure is in a given environment. Location-oriented All types of risk vary depending upon the location of a site, and this will be the governing parameter in any risk calculations. Two such factors include urban density and soil conditions. Areas with higher urban density (such as cities) often have higher levels of risk due to changes in soil conditions. For example, building a number of tall structures in an area will generally lead to compaction and therefore different soil properties in the surrounding area. These areas can also have higher numbers of deaths as catastrophic events impact more people. One way to take this into account is by calculating deaths as a percentage of population in an area. The type of strata is also likely to influence how susceptible a structure is to risk. Time-oriented Some types of risk are dependent upon time. An example is with earthquakes, where the probability is largely dependent upon stress release levels. Figure 4-1 shows a theory applied by Zheng and Vere-Jones (1999), where the number of earthquakes changes over time depending upon stress levels. [37] Structure-oriented Property-specific information will also affect the how exposed a structure is deemed to be. This may include data such as structure properties and characteristics (number of stories, material types, etc.), the use-case, the value of the property, and insurance and reinsurance information. 58 China El -- Poisson Stress Releas -Year 1500 1600 6 01 1700 Yer --:r:s China E2 1800 s. 1900 2000 1900 2M .r.. .. Year 1500 1600 1700 1800 Figure 4-1: Variations in quantity of earthquakes over a long time period in Northern China. [37] 4.3 Hazard Analysis Models must be implemented into such a system in order to calculate risk probabilities. These models need to be based upon tested methodologies. As an example, there are three primary methodologies involved in the process of predicting risk from earthquakes: probabilistic seismic hazard analysis; ground motion prediction; and magnitude scaling. Each of these will be discussed in turn within this section. Probabilistic Seismic Hazard Analysis (PSHA) One of the standard methodologies to determine how likely it is that ground motion will exceed a given value at a location in a specific time period is PSHA. It was originally based upon research conducted by Cornell and Esteva in 1968 [6], and the accuracy of the process has been refined in subsequent decades. This correlated the relationship between spectral acceleration and distance from epicenter. Figure 4- 2 shows the relationship between spectral acceleration values for the 1999 Taiwan earthquake. 59 PDF, given distance= 10 km I-b -~0.1~ Recorded ground motions Mean InSA prediction - - - Mean InSA prediction +/-one standard deviation ) 0.01 1 10 Distance (km) 100 Figure 4-2: Recorded spectral acceleration values correlated with the distance from epicenter for the 1999 Chi-Chi, Taiwan earthquake. It should be noted that there is large variability in ground motion. Predicted distribution is taken from the model by Campbell and Bozorgnia (2008). [2] PSHA is one of the primary methods for calculating risk from earthquakes, and classical PSHA is the primary way to calculate ground motion in the OpenQuake engine. Figure 4-3 shows how the OpenQuake system is configured to calculate risk using this methodology. First, the logic tree processor takes the input data and creates a seismic source model and ground motion model. The seismic sources model is used to calculate the earthquake rupture forecast, which includes a list of all earthquake ruptures in the source model with probabilities of occurrence. The earthquake rupture forecast and ground motion model are then used by the classical PSHA calculator to create the hazard curves for the desired site. [8] Ground motion prediction equations In the calculation of the hazards at a particular location, the ground motion must be quantified for the design and evaluation of sensitive structures. Several new ground motion prediction equations (GMPEs) have been developed recently, which have made use of advancements in empirical ground motion modeling techniques. 60 However, a PSHA Input Model: - Seismic Sources System Logic Tree Rocessor - GMPEs System Seumc Sources Forecast Calculator Earthquake Rupture bewen odls eenifth smedaa Classical Hazard Curves Mdncalculator toreaehse ausde Thdelsuycocue Hazad GuvesRisk that subjectiv decisions maEanthqe duuesgofmdlsinictyifueedrut; Figure 4-3: OpenQuake workflow for classical PSHA. [8] predictions. [24 study conducted by Peru and Fajfar (2009) found that there is significant differences between models, even if the same data was used to create these. The study concluded that subjective decisions made in the design of models significantly influenced results; including aftershocks in the training generally had a negative effect on median values and increased scatter; and the adopted functional form had a significant effect on the predictions. [24] Magnitude-scaling relationships There are a number of relationships that need to be calculated in hazard risk modeling. The magnitude scaling relationship is one of these and relates the magnitude of an earthquake to the rupture area (km2) or length (km). While there have been different studies into the relationships, the currently implemented relationship in many systems is based upon research by Wells and Coppersmith (1994). The value for magnitude for all rupture types is defined as M = 5.08+ 1.16 *log(L), where L represents length. The variables differ for other rupture types. The associated uncertainty on magnitude for all rupture types is 0.22. [35] 61 4.4 Systems Design A number of companies have produced models, though these are generally proprietary and therefore they do not disclose details behind their models. This section provides a brief overview of the general architecture behind such models, looking at approaches taken to create them. This section will primarily consider earthquake risk, though similar models exist for other types of hazard. Figure 4-4 shows a typical structure for how an insurance industry-oriented model may be structured for earthquake risk. This breaks a model into four procedural modules covering exposure, hazard, vulnerability and financial aspects. Results from module are passed from one module into the next, eventually leading to a computation of risk for a location. The system will typically iterate over multiple locations until risk are completed for the entirety of a company's portfolio. HAZAMW Adus DaW Ground motion Losses Figure 4-4: A suggested approach for structuring an earthquake risk model. [4] The GEM scientific framework (discussed in Section 1.3.3) organizes the risk into 62 Probability InteSity Location Physical Social Structures Population Cost-benelit Risk reduction Figure 4-5: The framework for GEM, identifying the three primary modules. [25] three modules: seismic hazard; seismic risk; and socio-economic impact. Figure 45 shows a schematic of this framework. The organization have produced an open source library called OpenQuake Risklib, which includes a number of the modules that calculate the loss and damage distributions for scenario earthquakes, in addition to probabilistic risk calculations for events that may occur in a region within a time period. The source code (written in Python) for the calculation of risk levels is publicly available at https://github.com/gem/oq-risklib. Figure 4-6 shows a proposal for how such a system could be structured. The first stage is where information about hazard events is passed into the first module. This module would calculate the likelihood of risk for a given location. The hazard calculations would include finding the probability of a hazard occurring for a given site. The results would then be passed into the exposure module, along with any previous data. The exposure module would assess historical damage that had occurred at a given location, using this to calculate the vulnerability for different structure types. Finally, output from these previous stages would be passed into an economic module, which would take socio-economic factors into account and calculate probabilities of financial loss. Calculations would be completed simultaneously for n number of hazards (e.g., earthquakes). Calculations would be completed periodically when the output is used to create hazard maps, to take into account recently input data. This could be as 63 Hazard 1 risk Hazard DB Hazard 2 risk - Hazard 2 exposure Hazard n risk - Hazard n exposure Raw hazard data -+ Previous data + hazard calculation output Figure 4-6: A possible systems design for data storage systems are outlined in blue, arrows. Input and output for calculations green. Annotations at the base indicate the tImmedte Hazard 1 economic Hazard 1 exposure paramters Hazard 2 economic Cached results DB Hazard n economic] Previous data + exposure output Previous data + losses output n hazards. Modules are outlined in red, and the flow of data is indicated using completed in real-time are outlined in type of data transferred. frequent as once per hour. The system would iterate through a moderately granular matrix of locations in co-ordinate format which would cover the entire globe. The output from these computations would then be cached in a results database and used to produce visual plots on a map and some of the calculated values that would be displayed in a GIS. Calculations could also be performed on demand for site-specific results, completing a more comprehensive set of computations where site-specific parameters are taken into account and output is shown on the user interface. 4.5 Resolution in Risk Calculations According to Bendimerad (2001), there are two primary dimensions of resolution: the resolution of exposure data; and the resolution of analysis. [4] When considering the exposure data, which refers to the granularity and accuracy of the data input to a model, the validity of any model output is very much dependent upon how detailed this data is. A highly sophisticated model will take into account data for that specific location (e.g., at street level). While less relevant to this application, advanced insurance models may also take into account exposure risks for each property in their portfolio. In contrast, the analysis resolution refers to the ability of a model to statistically model risk levels for a specific location. 64 The resolution of exposure data varies depending upon the type of data being reported. Many data sets, such as those looking at soil information, tend to be at much lower resolution, which leads to lower quality in risk calculations. More granular data tends to be available for specific locations. States may produce more granular data for their area, though this data is often not available in a standardized format. If this data were to be integrated into such a platform, standardization of this data would most likely be necessary. 4.6 Conclusions For a system to produce valuable results that are representative of the true levels of risk for a given location, it should provide a view of a risk associated with a number of different hazards. The likelihood of these hazards occurring will be influenced by factors oriented around location, time and structure, and these need to be accounted for in prediction calculations. Existing methodologies and models can be integrated into a multi-hazard GIS to provide unified, easily interpretable predictions that individuals can then use to better understand the associated risks at a specific location. Attention must be given to the resolution of the input to ensure that predictions are representative of the true levels of risk. Both the data used and the assumptions made by the implemented models and methodologies are important aspects to ensuring a system produces valid predictions. There can be significant variability in the results achieved from similar methodologies, so it will be important to select ones that are supported by thorough research and are generally accepted by industry. 65 66 Chapter 5 Conclusions and Recommendations 5.1 Conclusions Digital modeling of risk and associated data and the geographic information systems have greatly enhanced the construction industry's ability to understand the associated risks, though many are currently hard to access and interpret. It is evident that the current tools do not provide a simple way to interpret the likelihood of extreme events occurring at a given site. Developing a system that makes this process easier would deliver substantial value to the structural community and surrounding industries, reducing the time taken to understand hazards and introducing the subject of risk at an earlier stage in the construction process. One of the major hurdles in realizing such a tool is regarding the availability of data. While there are a number of freely available data sources for a number of the required data sets, many are protected by copyright issues that prevent the integration in their current form. Partnerships would need to be agreed upon to make parts of this tool viable, namely with industry associations such as ASCE. Care must also be taken to ensure that the data will be interpreted and used correctly. 67 5.2 Recommendations for Future Development A number of complex issues have been discussed that play an important role in the design of a hazard GIS. Further research needs to be undertaken in the three core areas discussed: the creation of the system and interface; collating high quality data sets; and creating models based upon appropriate methodologies. The external access portals need to be taken from theoretical systems through to fully developed and tested systems available for public use. This is likely to take significant time and resources (potential funding opportunities are discussed in Section 5.2.2). Collating appropriate data sets for multiple types of hazards will be time consuming, and it will be important to work with organizations specializing in each type of hazard to ensure that the aggregated data sets provide a representative view of the actual situation. Current multi-hazard prediction models are proprietary. For such a system to exist, either one of these models must be licensed to be used in the system, or an open-source model must be developed. The licensing of an existing commercial model is unlikely as it would remove the necessity for other parties to pay if the data is freely available online through this system, so developing an open-source model is likely to be the only available option. This would also bring a significant benefit that individuals would be able to contribute freely to the system, which could lead to a more advanced and accurate model being created, though it would be important to create a process for verification to ensure the quality of user-contributed input remains high. Previously developed open source code such as that created for the GEM project (see Section 1.3.3) could be adapted for specific components in the system, reducing how much original code needs to be written. 5.2.1 Collaboration The collaboration of different parties would be instrumental to the success of such a project. These parties would include governmental organizations, research institutions and companies in related industries. For such a system to work, access to 68 the required data will be a necessity. Some organizations already provide open access to data and guidelines (see Chapter 3), though a number of data sets are commercial. These would need to be licensed if they are to be included in the system. 5.2.2 Funding Opportunities North America Organizations already exist for funding research into hazards, namely through the National Science Foundation (NSF). Congress have also formed programs such as the National Earthquake Hazards Reduction Program (NEHRP) which aim to research and find opportunities to reduce the damages caused by natural disasters. [7] Europe The European Commission funds a number of programs aiming to improve prevention and mitigation against natural disasters, many of which are under the Civil Protection unit. Many member states have policies in place such as seismic protection policies [33], which show commitment to reduce associated risk from natural disasters. European Union or national governmental funding may therefore be an option to finance such a project. The GEM project received funding from a number of European public bodies, including the Italian Department of Civil Protection and the Swiss Federal Institute of Technology in Zurich. There were also a number of private founders, such as the Munich Re Group and Zurich Insurance Group. The initial five year plan included 35 million euros of funding to develop the first version of the system. As much of this research could be used in the implementation of this system, such as the earthquake models they created, the resources required for the development of this proposed GIS could be reduced. 69 Other Countries There are a number of other country and region-specific programs. However, there has been a deficiency of research conducted in developing countries, such as across much of Africa, where there is less funding available than in developed countries. The high cost of research into flooding and other risks has acted as a barrier preventing significant amounts of research from occurring in these locations. 70 Bibliography [1] FEMA 366. HAZUS-MH MR5 Flood User Manual. 2013. [2] Jack W. Baker. An introduction to probabilistic seismic hazard analysis (PSHA). October 2008. [3] World Bank. Disaster risk management in Malawi - country note. 2010. [4] Fouad Bendimerad. Modeling and quantification of earthquake risk: Application to emerging economies. In PaulR. Kleindorfer and MuratR. Sertel, editors, Mitigation and Financing of Seismic Risks: Turkish and International Perspectives, volume 3 of NATO Science Series, pages 13-39. Springer Netherlands, 2001. [5] B. P. Buttenfield and R. B. McMaster. Map generalization: Making rules for knowledge representation. 1991. [6] Cornell C.A. Engineering seismic risk analysis. Bulletin of the Seismological Society of America, 58(5):1583-1606, 1968. [7] International Seismological Centre. ISC Bulletin: event catalogue search. http: //www. isc. ac.uk/iscbulletin/search/catalogue. Accessed: 21 April 2014. [8] H. Crowley, D. Monelli, M. Pagani, V. Silva, and G. Weatherill. Book. GEM Foundation, Pavia, Italy, 2011. OpenQuake [9] Fred D Davis. Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS quarterly, pages 319-340, 1989. [10] United Nations Statistics Divisions. Statistical note for the issue brief on climate change and disaster risk reduction. 2014. [11] EM-DAT. Disaster trends: Trends and relationships period 1900-2012. http: //www. emdat .be/disaster-trends, November 2008. [12] Emdat. Emdat Advanced Search. Accessed: 13 April 2014. http://cred0l.epid.ucl.ac.be:5317/. [13] Kyle Erickson. How were building an information infrastructure for Typhoon Haiyan response operations. 2013. http://www.palantir.com/?p=6941, November 71 [14] D. Giardini. The global seismic hazard assessment program (GSHAP) - closing report to the IDNDR/STC (1998). http: //www. seismo. ethz. ch, February 1999. [15] Google. Google Earth Gallery - Global Seismic Hazard Map. http: //www.google.com/gadgets/directory?synd=earth&id=743582358266, November 2012. Accessed: 19 February 2014. [16] V.G. Kossobokov and A.K. Nekrasova. Global seismic hazard assessment program maps are erroneous. Seismic Instruments, 48(2):162-170, 2012. [17] Dae Kun Kwon and Ahsan Kareem. Comparative study of major international wind codes and standards for wind effects on tall buildings. Engineering Structures, 51(0):23 - 35, 2013. [18] MIT GIS Services. GeoWeb. http: //arrowsmith . mit . edu/mitogp/. Accessed: 19 April 2014. [19] National Climatic Data Center. About NCDC. http: //www. ncdc. noaa. gov/ about-ncdc. Accessed: 2 April 2014. [20] National Oceanic and Atmospheric Administration. preparedness guide. 2013. Tropical cyclones - a [21] American Society of Civil Engineers. Minimum Design Loads for Buildings and Other Structures. Reston, VA, ASCE/SEI 7-10 edition, 2013. [22] Federal Democrating Republic of Ethiopia. ET ISO 4354 (2009) (English): Wind actions on structures. page 73. [23] G.F. Panza, K. Irikura, M. Kouteva, A. Peresan, Z. Wang, and R. Saragoni. Advanced seismic hazard assessment. Pure and Applied Geophysics, 168(1-2):19, 2011. [24] I. Perus and P. Fajfar. How reliable are the ground motion prediction equations? Proc., 20th International Conference on Structural Mechanics in Reactor Technology (SMiRT 20), page 9pp, 2009. [25] Rui Pinho. Global Earthquake Model: Earthquake Risk. GEM Foundation, 2009. Calculating and Communicating [26] Miinchener Riickversicherungs-Gesellschaft. NATHAN - World Map of Natural Hazards. 2011. [27] Keith Smith. Environmental hazards: assessing risk and reducing disaster. Routledge, 2013. [28] Socrata. Country List ISO 3166 Codes Latitude Longitude. http: //opendata. socrata. com/d/mnkm-8ram, August 2011. Accessed: 13 April 2014. 72 [29] The World Bank Group. Data - Population (Total). http: //data. worldbank. org/indicator/SP.POP.TOTL. Accessed: 13 April 2014. [30] U.S. Department of the Interior. 2008 NSHM Figures. http: //earthquake. usgs.gov/hazards/products/conterminous/2008/maps/. [31] U.S. Department of the Interior. Earthquake Archive Search & URL Builder. http: //earthquake. usgs. gov/earthquakes/search/. Accessed: 2 February 2014. [32] Rene Willibrordus van Oostrum. information systems. 1999. Geometric algorithms for geographic [33] Fereniki Vatavali. Earthquakes in europe - national, international and european policy for the prevention and mitigation of seismic disaster. 2003. [34] Zhihuan Wang. Exploring the intrinsic motivation of hedonic information systems acceptance: Integrating hedonic theory and flow with tam. In Ran Chen, editor, Intelligent Computing and Information Science, volume 134 of Communications in Computer and Information Science, pages 722-730. Springer Berlin Heidelberg, 2011. [35] Donald L Wells and Kevin J Coppersmith. New empirical relationships among magnitude, rupture length, rupture width, rupture area, and surface displacement. Bulletin of the Seismological Society of America, 84(4):974-1002, 1994. [36] Stephen Withall. Software Requirement Patterns. Microsoft Press, Redmond, WA, USA, first edition, 2007. [37] X.-G. Zheng and D. Vere-Jones. Application of stress release models to historical earthquakes from North China. Pure and Applied Geophysics, 135:559-576, April 1991. 73