SOFTWARE—PRACTICE AND EXPERIENCE Softw. Pract. Exper. 2004; 34:1–23 (DOI: 10.1002/spe.580) A Web-based distributed system for hurricane occurrence projection pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 1 Shu-Ching Chen1,∗,† , Sneh Gulati2 , Shahid Hamid3 , Xin Huang1, Lin Luo1 , Nirva Morisseau-Leroy4, Mark D. Powell5 , Chengjun Zhan1 and Chengcui Zhang1 1 2 3 4 ted School of Computer Science, Florida International University, Miami, FL 33199, U.S.A. Department of Statistics, Florida International University, Miami, FL 33199, U.S.A. Department of Finance, Florida International University, Miami, FL 33199, U.S.A. Cooperative Institute for Marine and Atmospheric Science, University of Miami, Coral Gables, FL 33124, U.S.A. 5 Hurricane Research Division, NOAA, Miami, FL 33149, U.S.A. SUMMARY 5 KEY WORDS : rec 15 distributed system; hurricane statistical analysis; database 20 cor 10 As an environmental phenomenon, hurricanes cause significant property damage and loss of life in coastal areas almost every year. Research concerning hurricanes and their aftermath is gaining more and more attention nowadays. This paper presents our work in designing and building a Web-based distributed software system that can be used for the statistical analysis and projection of hurricane occurrences. Firstly, our system is a large-scale system and can handle the huge amount of hurricane data and intensive computations in hurricane data analysis and projection. Secondly, it is a distributed system, which allows multiple users at different locations to access the system simultaneously and to share and exchange the data and data model. Thirdly, our system is a database-centered system where the Oracle database is employed to store and manage the large amount of hurricane data, the hurricane model and the projection results. Finally, a three-tier architecture has been adopted to make our system robust and resistant to the potential change in the lifetime of the system. This paper focuses on the three-tier system architecture, describing the c 2004 John Wiley & Sons, Ltd. design and implementation of the components at each layer. Copyright ∗ Correspondence to: Professor Shu-Ching Chen, Florida International University, School of Computer Science, 11200 SW 8th Street, ECS 354, Miami, FL 33199, U.S.A. † E-mail: chens@cs.fiu.edu Un Contract/grant sponsor: Florida Department of Insurance under the ‘Hurricane Risk and Insured Loss Projection Model’ project c 2004 John Wiley & Sons, Ltd. Copyright Received 21 March 2003 Revised 4 September 2003 Accepted 4 September 2003 2 pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 2 S.-C. CHEN ET AL. INTRODUCTION 25 30 35 40 ted 20 rec 15 cor 10 Un 5 Due to their significant threat to life and property, it is very important to predict the possible occurrences of hurricanes in order to prevent damage and loss. However, tracking the recovery process across decades to predict their future impact is a challenging task. A hurricane is a type of tropical cyclone, which is a generic term for a low-pressure system that generally forms over warm, tropical oceans. Usually a hurricane measures several hundred miles in diameter and is accompanied by violent winds, incredible waves, heavy rains and floods. Normally a hurricane starts as a tropical depression, becomes a tropical storm when the maximum sustained wind speed exceeds 38 mph and finally turns into a hurricane when the winds have a speed higher than 74 mph. Hurricanes have an eye and eye wall. The eye is the calm area near the rotational axis of the hurricane. Surrounding the eye are the chick clouds, called the eye wall, which is the violent area of a hurricane [1]. Hurricanes are categorized according to their severity using the Saffir-Simpson hurricane scale, ranging from 1 to 5 [2] as shown in Table I. A category 1 storm has the lowest wind speeds while a category 5 hurricane has the strongest. These are relative terms, because lower category storms can sometimes inflict greater damage than higher category storms, depending on where they strike and the particular hazards they bring. In fact, tropical storms can also produce tremendous damage, mainly due to flooding. It is reported that every year approximately ten tropical storms develop over the Atlantic Ocean. Although many of these remain over the ocean, some become hurricanes and strike the United States coastline and at least two of them are greater than category 3, posing enormous threats to life and property. For example, storm tides preceding hurricane Camille in 1969 were in excess of 20 ft, and the flooding accompanying hurricane Agnes in 1972 caused 122 deaths and US$6.4 billion in damage in the northeast. Sophisticated three-dimensional numerical weather prediction models (e.g. [3]) are too computationally expensive to conduct hurricane loss projection simulation studies. In order to project losses associated with landfalling hurricanes, statistical Monte-Carlo simulations [4] are conducted, which attempt to model thousands of years of hurricane activity based on the statistical character of the historical storms in the vicinity of the location of interest. Another hurricane damage and loss projection model is HAZUS [5,6]. HAZUS, or Hazards U.S., was developed by the Federal Emergency Management Agency (FEMA) as a standardized, national methodology for natural hazards losses assessment. HAZUS can estimate the damage and losses that are caused by various natural disasters such as earthquakes, wind and floods. Some useful databases, such as a national-level basic exposure database, are built into the HAZUS system, which allow the users to run a preliminary analysis without having to collect additional local data. It also provides the functionality to allow the users to plug their own data into the databases. Although HAZUS is powerful and useful, the necessary software packages, such as the commercial GIS software, need to be installed in every machine on which the HAZUS system runs, which in turn increases both expenses and manual labor. This paper presents a distributed system for hurricane statistical analysis and projection. First of all, our system is built upon an object-relational database management system Oracle9i [7], which is one of the core system components to store and manage the large amount of hurricane data, the hurricane data model and the projection results as well. The source data sets, such as the HURDAT c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 3 HURRICANE OCCURRENCE PROJECTION 3 Table I. Saffir–Simpson hurricane scale. Category 1 2 3 4 5 15 Damage minimal moderate extensive extreme catastrophic cor To achieve the system robustness, flexibility and resistance to potential change, the popular three-tier architecture is deployed in the intended system. The architecture consists of three layers: the user interface layer, the application logic layer and the database layer. The three-tier architecture aims to solve a number of recurring design and development problems, hence to make the application development work easier and more efficient. The interface layer in the three-tier architecture offers the user a friendly and convenient entry to communicate with the system while the application logic layer performs the controlling functionalities and manipulates the underlying logic connection of information flows; finally, the database layer conducts the data modeling job, which can store, index, manage and model information needed for this application. Un 25 Examples Charley (1998) Bob (1991) Alicia (1983) Andrew (1992) Camille (1969) database [8], are imported into the database and are modeled by applying the object-relational concepts. The user may also import the customized data into the database. In addition, the models and projection results produced by the system are stored into and managed by the database for future use. Secondly, in contrast to the existing hurricane projection applications, an important feature of the proposed system is that it aims to support both professional and general-purpose users in a very convenient way. For that purpose, a Web-based distributed system architecture following the client–server architecture is adopted to provide easy and parallel accesses to multiple users at different locations. Specifically, a Web-based system based on Java Server Pages (JSPs) [9] and J2EE is implemented. All the specific software and hardware are installed only on the server side. Anyone who can surf the Internet using a standard Web browser is able to take advantage of the system without any additional cost, while the underlying principles are seamlessly concealed from the Website visitors. Prototyping the system online also offers great flexibility in content presentation and visualization. Since the hurricane data are constantly being updated and the mathematical models for the hurricane data are also potentially changeable, a three-tier architecture is adopted as our system’s fundamental architecture to provide the transparency among the data layer (hurricane data), application logic layer (the hurricane data model) and the user interface layer. This architecture makes our system more robust and resistant to a potential change in the lifetime of the system. SYSTEM ARCHITECTURE 20 4–5 6–8 9–12 13–18 >18 ted 10 74–95 96–110 11–130 131–155 >155 Storm surge (ft) rec 5 Wind speed (mph) c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 4 S.-C. CHEN ET AL. User Interface pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 4 Application Logic Web Browser HTTP/SSL Web Server Database OC4J Container ORACLE DB JDBC Java Bean ted JNI Math Model in C++ IMSL Library Math Model Web applications are perfect for utilizing three-tier architecture because the presentation layer is necessarily separated, and the logic and data components can be divided up much like a client–server application. A detailed illustration of the system’s architecture is given in Figure 1. Components contained in each tier and the relations among different tiers are described in the following sections. User interface tier The first tier is the user interface tier. This tier manages the input/output data and their display. With the intention of offering great convenience for the users, the system is prototyped on the Internet. The users are allowed to access the system by using any existing Web browser software. The user interface tier contains HTML components needed to collect incoming information and to display information received from the application logic tiers. The Web visitors communicate with the Web server via application protocols such as HTTP and SSL, sending requests and receiving replies. In our system, the major Web-scripting language exploited in designing the presentation layer is the JSP technique [9]. Un 10 cor 5 rec Figure 1. Detailed architecture of the system. c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 5 HURRICANE OCCURRENCE PROJECTION Application logic tier 5 10 The application logic tier is the middle tier, which bridges the gap between the user interface and the underlying database and hides technical details from the users. An Oracle9i Application Server is deployed. Its OC4J container embeds a Web server, which responds to events, such as data receiving, translating, dispatching and feed-backing jobs [10,11]. Components in this tier receive requests coming from the interface tier and interpret the requests into apropos actions controlled by the defined work flow in accordance with certain pre-defined rules. JavaBeans perform the appropriate communication and calculation activities such as getting/pushing information from/to the database and carrying out the necessary computing work with respect to proper statistical and mathematical models. JDBC [12] is utilized for JavaBeans to access the physical database. In the interest of quick system response, C/C++ language is used to program the computing modules that are integrated into the Java code via JNI [13]. Database tier 20 ted The database tier is responsible for modeling and storing information needed for the system and for optimizing the data access. Data needed by the application logic layer are retrieved from the database, then the computation results produced by application logic layer are stored back to the database. Since data constitute one of the most complex aspects of many existing information systems, it is essential in structuring systems. Both the facts and rules captured during data modeling and processing are important to ensure the data integrity. An oracle9i database is deployed in our system, and Object Relational Model is applied to facilitate data reuse and standard adherence. USER INTERFACE rec 15 5 The intended system is prototyped into the Internet, therefore the design and implementation of the system user interface mainly becomes a job to design and implement Web pages. The users can gain access to the system through any commonly used commercial browsers such as Internet Explorer, Netscape, etc. 35 cor 30 Due to its ‘unlimited’ expressive power and natural coherence with the J2EE architecture, JSP Webscripting technology is adopted to implement the Web pages [9,14]. JSPs, sitting on top of a Java servlets model, can easily and flexibly generate the dynamic content of a Web page. The basic idea of JSPs is to allow Java codes to be mixed together with static HTML or XML templates. The Java logic handles the dynamic content generating, while the markup language controls the structuring and presentation of the data. Since putting all the Java codes into a JSP itself causes unmanageable content, especially when the tasks performed by the Java code are not simple, JavaBeans are imported to perform most of the actual work. For the sake of performance, complex computational tasks are actually achieved by using C/C++ codes. The C/C++ code is seamlessly integrated into corresponding Java code via the Java Native Interface (JNI) mechanism [13]. Java Applet techniques are exploited when necessary to live up the Web page. Un 25 c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 6 S.-C. CHEN ET AL. pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 6 Table II. El Niño and La Niña years. La Niña year 1925 1929 1930 1940 1941 1951 1953 1957 1963 1965 1969 1972 1976 1977 1982 1986 1987 1990 1991 1993 1994 1997 1933 1938 1942 1944 1945 1948 1949 1950 1954 1955 1956 1961 1964 1967 1970 1971 1973 1974 1975 1978 1988 1995 1998 1999 2000 rec ted El Niño year Annual Hurricane Occurrence projection Rationale For the estimation of hurricane occurrence distribution to be conducted, a suitable data set needs to be selected. Different data set choices significantly influence the final estimation of a probability distribution. Hurricane records in the database are categorized into five datasets according to climate cycles or qualifications. The categories are: (1) 1851–2000, (2) 1900–2000, (3) 1944–2000, (4) ENSO Un 10 cor 5 The first step to study the hurricane phenomena and their impact is to estimate the frequency of hurricanes in the future. Annual Hurricane Occurrence (AHO) projection is proposed to address this problem. AHO estimates the frequency of hurricanes occurring in a series of years based on an associated hurricane occurrence probability distribution, which is obtained through statistical analysis and calculation on the basis of historical hurricane records. c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 7 HURRICANE OCCURRENCE PROJECTION 7 Table III. Multi-Decadal year ranges and climate phase. Climate phase (warm) Climate phase (cold) 1870–1902 1926–1970 1995–2001 1903–1925 1971–1994 ted begin Oracle DB system gives out dataset selection user select rec system gets data calculate basic statistical features IMSL Statistic & Math Library generate distribution and (5) Multi-Decadal. The first three groups contain hurricanes occurring in different year ranges. The ENSO data set is for the El Niño and La Niña years. Table II lists all El Niño and La Niña years up to date. The last group, Multi-Decadal, includes records of hurricanes that occurred in certain years when the climate phase was either warm or cold. The years contained in this category are detailed in Table III. The statistical models are generated from the historical data set. Based on the generated probability distribution models, the number of hurricane occurrences per year in the future are produced for any number of years the user desires. The detailed description of these models is presented later in the Un 5 cor Figure 2. Flow chart for AHO. c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 S.-C. CHEN ET AL. ted 8 pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 8 Figure 3. Data set selection Web page (AHO). Implementation 10 Several JSPs and JavaBeans are constructed to implement the functionalities of AHO projection. JSPs offer interfaces for the user to specify a data set and for displaying results to the user. JavaBeans are responsible for handling communication and computation tasks and for hiding the technical details from the external users. The data are retrieved from and stored back to the database via calling JDBC API. Simple calculations are performed by Java code itself while more complicated computing tasks are achieved by C/C++ programs that are integrated into Java code through JNI in order to improve the computing performance. Data set selection First, the Web visitor needs to select a data set and to tell the system to use the selected data set as the basis of the statistical projection. A JSP is built for that task. To avoid typos and illegal datasets, all data sets that are currently available are offered to the user via a drop-down list. The user’s choice is collected by a form. The user chooses a data set he/she wants and submits the selection to the system by clicking the ‘Submit’ button. The actual Web page is portrayed in Figure 3. Un 15 cor 5 rec ‘Statistical and mathematical modeling’ section. Figure 2 illustrates the overall workflow for AHO estimation. c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 9 9 ted HURRICANE OCCURRENCE PROJECTION Statistical models evaluation 15 cor 10 Another JSP file handles the submitted selection from the user. In this JSP file, there are two imported JavaBeans. The first JavaBean is the database-querying Bean that communicates with the database. It connects to the database, queries the database with respect to the selected data set, retrieves the corresponding data and stores the data. The second, distribution-evaluating, JavaBean has been devised particularly to evaluate various statistical distribution models using the retrieved data. The data are passed to the distribution-evaluating JavaBean from the database-querying Bean. The statistical distribution models and evaluating standards exploited are elaborated in the ‘Statistical and mathematical modeling’ section. At the end of the processing, the related information is returned and displayed to the user. In our case, basic statistical characteristics of data in the selected data set, such as mean and variance, are returned to the user. The distribution models are provided to the user as well. For the purpose of statistical projection in the next step, the user needs to specify N, the number of years for which the projection process generates the estimated numbers of hurricane occurrences. This information is captured by a textfield within a form. After the user inputs the desired number of Un 5 rec Figure 4. Distribution models evaluation Web page. c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 S.-C. CHEN ET AL. ted 10 pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 10 Figure 5. AHO projection result Web page (line). Statistical projection cor 15 Once it obtains the statistical projection request and the necessary information from the user, the system starts the projection process. The calculation part of the projection work is performed by another JavaBean, which generates the N values of the number of hurricane occurrences based on the indicated distribution. The statistical projection results, a collection of the number of hurricane occurrences, are sent back to the user. In the meantime, these results will be stored in the database for future computation. To offer live visualization, the Java Applet mechanism is introduced. Our Java Applets are implemented based on the Ptolemy Java Applet package from Berkeley [15]. The statistical projection result can be plotted as a line chart or a bar chart, as shown in Figures 5 and 6. The maximum number of years displayed is 100 per screen. There are both ‘Previous 100’ and ‘Next 100’ buttons to allow for the browsing of a very large number of years, screen by screen. In the example illustrated in Figures 5 and 6, the user specified a large number of years for hurricane occurrences projection, and the graphs actually present the third screen of data that starts from year 201 to 300. Un 10 rec 5 years and clicks the ‘Submit’ button to send the request, the statistical projection is conducted based on the best probability distribution generated from the user selected data set. Figure 4 is a snapshot of the corresponding JSP Web page. The upper part displays information returned to the user; for example, the data set selection information and the statistical values of the selected data set. The lower part uses the text area to obtain the user’s input data. c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 11 11 ted HURRICANE OCCURRENCE PROJECTION Figure 6. AHO projection result Web page (bar). 5 One essential trait of a hurricane is its genesis time. The Storm Genesis Time (SGT) is the date and time that an organized closed cyclonic circulation is first identified in the surface wind field surrounding a low-pressure area, such that a regional forecast center would classify the system as an incipient tropical cyclone. For each numerically simulated hurricane resulting from the AHO, the associated SGT needs to be produced. SGT projection aims to achieve this target. The prediction of genesis time is grounded in the investigation and analysis of the historical hurricane genesis time data. One data set needs to be determined to serve as the basis of the statistical projection. As in AHO, five data sets are available for the projection of SGT: (1) 1851–2000, (2) 1900–2000, (3) 1944–2000, (4) ENSO and (5) Multi-Decadal. The meaning of each data set is described in Tables II and III. Genesis time is represented by the first fix data of the selected data set. To record the precise genesis time of a storm once it forms is still beyond the capability of the currently available observational instrumentation and hurricane modeling techniques. The first fix data are a collection of data related to the characteristics of a hurricane the first time it is observed and recorded, including storm name, date, time, position (longitude and latitude), maximum wind speed and pressure, etc. Hence, technically, it is a suitable approximation of the actual SGT. The first fix data are stored in the database and are retrieved Un 15 cor Rationale 10 rec Storm Genesis Time projection c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 12 S.-C. CHEN ET AL. pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 12 Table IV. Example of first fix data records. StormId StormName GenesisDate JulianDate GenesisTime 310 311 1114 1153 NME NME NME NME 5-Jul-1851 16-Aug-1851 19-Aug-1852 5-Sep-1852 2 397 309 2 397 351 2 397 720 2 397 737 120 000 000 000 000 000 000 000 Table V. First fix data records after processing. GenesisTime SGT 5-Jul-1851 16-Aug-1851 19-Aug-1852 120 000 000 000 000 000 1572 2592 2640 ted GenesisDate NME NME NME upon the running time. Table IV depicts some data record examples for the first fix data. Each record shows when and where a particular tropical storm originated. In all these data fields, those of the utmost concern are fields representing time information: GenesisDate, JulianDate and GenesisTime. The GenesisDate field records the calendar date on which the storm began. The corresponding JulianDate of that calendar date is stored in the JulianDate field. JulianDate is simply a continuous count of days and fractions since noon Universal Time on 1 January, 4713 BCE and is widely used as time variables within astronomical software. GenesisTime indicates the time point when the storm originated. Since the actual observation is conducted every hour, the value in that field represents not an exact time point but a time interval. The 24 h day is divided into four intervals: I1 = [0AM, 6AM), I2 = [6AM, 12Noon), I3 = [12Noon, 6P M) and I4 = [6P M, Midnight), which are denoted respectively as values 000000, 060000, 120000 and 180000. For instance, the first record shows that the tropical storm with StormId 310 began during the time interval (12:00, 18:00), 07/05/1851. Since the actual estimation is based on the time intervals between the continuous hurricanes that are estimated in the unit of hours, the first fix data are processed to produce the interval data for the calculation purpose. The conversion is conducted as following: rec 10 StormName 310 311 1114 cor 5 StormId SGT = 24 × (Julian date of a storm − Julian date of 05/01/1851) + GenesisTime For example, the storm with StormId 311 happened in the time interval I1 on 08/16/1851, the SGT value is: 24 × (2 397 351 − 2 397 243) + 0 = 2592. where 2 397 351 is the Julian date of 16 August 1851, and 2 397 243 is the Julian date of 1 May 1851. The resulting SGT data after processing are also shown in Table V. Un 15 c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 13 HURRICANE OCCURRENCE PROJECTION Begin System provides dataset selection User selects System gets data from database Oracle DB ted System estimates the CDF of HBG 13 System generates the SGT values Save the SGT data to database rec Display result Figure 7. Flow chart for SGT. Implementation The flow chart of SGT indicates that the users need to first appoint a data set, then the system automatically begins to estimate the distribution and to generate new SGT values. During the whole predicting process, JSP Web pages allow the users to select the desired data set. The JavaBeans deal with calculating and data retrieving/restoring work. The distribution estimating job involves a lot of statistical and mathematical functions, which are accomplished by C/C++ code. Un 10 cor 5 After this preprocessing, the probability distribution of the SGT values is analyzed based on certain estimating algorithms, which is elaborated in the ‘Statistical and mathematical modeling’ section. Then, according to the estimated distribution, an associated genesis time is produced for each hurricane that is predicted by AHO. The overall information flow for SGT prediction is shown in Figure 7. c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 S.-C. CHEN ET AL. ted 14 pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 14 Figure 8. Data set selection Web page (SGT). rec Data set selection Similar to AHO, there are a total of five data sets available, and they are provided to the users via a drop-down list. The users select one of them and send the projection request to the system by submitting the selection to the system. A snapshot of the data set selection Web page is illustrated in Figure 8. Based on the data set choice, the system first retrieves the related first fix data from the database and processes them to generate SGT data in conformity to the above-mentioned converting approach. Then the system estimates the distribution of the SGT values and produces a SGT value for each numerically simulated hurricane from AHO. The SGT data are stored into the database at the same time. Example SGT values are dynamically displayed to the user in the format of a table. Figure 9 is the resulting Web page. cor 10 Distribution estimation and SGT projection STATISTICAL AND MATHEMATICAL MODELING The modeling approach utilized in our system complies with the popular hurricane projection strategy as detailed in [16], i.e. to model the entire track of a hurricane beginning with its initiation over the Un 5 c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 15 15 rec ted HURRICANE OCCURRENCE PROJECTION Figure 9. SGT projection result Web page. 15 cor 10 Un 5 open ocean to its dissipation. The characteristics of the storm are modeled at each 6 h point in the storm history. The first step in modeling the complete track of a hurricane is to model the number of hurricanes occurring per year and the genesis time of each individual storm, which are the purposes of the AHO projection and SGT projection, respectively. Specifically, AHO projection aims to model and predict the number of storms occurring per year, and SGT projection attempts to predict the genesis time of each specific storm. A statistical approach is adopted, and the statistical models of the AHO and SGT are built from the historical storm data via statistical analysis. One meteorological fact is that the statistical properties of AHO vary with different year ranges. For example, the statistical properties of storms in El Niño years are quite different from those in non El Niño years. Therefore, different statistical models are necessary for different year ranges. In our system, all the historical storm records in the database are categorized into five data sets according to meteorologic criteria that are: (1) 1851–2000, (2) 1900–2000, (3) 1944–2000, (4) ENSO and (5) MultiDecadal. The meaning of each category has been discussed in the last section. Different statistical models are built for individual data sets. c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 16 pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 16 S.-C. CHEN ET AL. AHO projection 5 10 AHO projection aims to model and predict the number of storms occurring per year. According to domain knowledge in meteorology, the best statistical distribution of the number of storms occurring per year is either Poisson distribution or negative binomial distribution. The Poisson distribution has been the classic distribution describing the occurrence of a stochastic process. However, the Poisson distribution assumes that the mean number of hurricanes in any two nonoverlapping time intervals of equal length is the same. Allowing these means to be different leads to the ANO being modeled by a mixture of Poisson distributions, which in effect is the negative binomial distribution. First, the parameters of both the Poisson distribution and the negative binomial distribution are estimated from the historical data. Then the goodness of fit for the two distributions is evaluated based on the chi-squared statistic, and the distribution with the better fit is picked as the final statistical model of AHO. rec Estimation of Poisson distribution The probability distribution of a Poisson random variable x with mean γ is P (x) = (γ x e−γ )/x!. Given the data samples X = {xi } (i = 1, 2, . . . , M) from the historical storm data, the maximum likelihood estimator of the parameter γ is M γ̂ = i=1 xi (1) M cor 20 Since different statistical models are built for different data sets, the user first needs to select one data set from the five categories through the user interface as mentioned in the last section. Then the historical data of the selected data set are retrieved from the database. The retrieved M data samples are denoted by X = {xi } (i = 1, 2, . . . , M) where M is the number of years in the data set and xi denotes the number of storms that occurred in the ith year in the data set. The statistical model of AHO is built based on the M data samples. Estimation of negative binomial distribution The single variable negative binomial distribution can be represented as P (x) = (x + k) (x + 1) ∗ (k) where (·) is the gamma function, namely (x) = Un 15 ted Data samples c 2004 John Wiley & Sons, Ltd. Copyright k m+k ∞ 0 k m m+k x (2) t x−1 e−t dt Softw. Pract. Exper. 2004; 34:1–23 pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 17 HURRICANE OCCURRENCE PROJECTION 17 Given the M data samples X = {xi } (i = 1, 2, . . . , M) from the historical storm data, the estimates of parameter m and k are M xi (3) m̂ = i=1 M k̂ = m̂2 s 2 − m̂ where s is the variance of data samples X. Model selection After the estimation of both the Poisson distribution and the negative binomial distribution parameters, the chi-square statistic is calculated to select the final model. The distribution with higher p-value is selected as the final statistical model of the AHO. Assume the data are divided into k bins. The test statistic of the chi-square goodness of fit is defined as: k (Oi − Ei )2 /Ei (5) p= i=1 ted 5 (4) rec where Oi is the observed frequency for bin i, and Ei is the expected frequency for bin i. Let K = max{xi } (i = 1, 2, . . . , M), which means K is the maximum number of hurricanes occurring per year in historical data. It is safe to assume the number of hurricanes occurring per year ranges from 0 to K. Then the data are divided into (K + 1) bins with width of 1. The chi-square test statistic can be rewritten as K (Oi − Ei )2 /Ei (6) p= i=0 10 where Oi is the observed frequency for i hurricanes occurring per year, and Ei is the expected frequency for i hurricanes occurring per year according to the statistical model that is either Poisson distribution or negative binomial distribution. The distribution with higher p value is selected as the final statistical model of the AHO. 20 To validate the projection performance of the models explored for AHO, a subset of the data set that includes hurricane occurrence data is used for statistical distribution estimation, and then the derived model is used to forecast the number of hurricanes for a number of years. Considering the historical hurricane data stored in the database, the subset used to estimate the distribution contains 100 years worth of data, namely from the years 1900 to 1999, and the actual data used for comparison includes data from years 1991 to 2001. On the basis of historical data from 1900 to 1990, the 95% confidence intervals for the mean number of hurricanes every year using Poisson distribution is (7.95, 9.15) and is (7.80, 9.29) using the negative binomial distribution. Figure 10 presents side by side the projected frequencies of hurricane occurrences for the years 1991–2001 and the associated actual occurrence frequencies, which are based Un 15 cor AHO projection validation c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 18 S.-C. CHEN ET AL. Historical Projected 6 Frequency 5 4 3 2 pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 18 0 1 2 3 4 5 6 7 8 ted 1 9 10 11 12 13 14 15 16 17 18 19 Number of Annual Hurricane Occurrences Figure 10. Frequencies histogram of historical/projected AHOs. SGT projection 15 Data samples cor 10 The genesis time of a storm are the first fix data of that storm. SGT projection aims to predict the genesis time of each specific storm. This goal is achieved by modeling the number of hours between the genesis of a storm in 6 h resolution and the start of its hurricane season rather than directly modeling the SGT. A storm season starts on 1 May of one year and ends on 30 April of the next year. After modeling the number of storms using AHO from the historical data, the SGT projection model can be used to predict the time intervals among storms, and thus the SGT of each storm can be predicted as well. The user first selects one data set from the five categories through the user interface. Since originally the data set in the database contains no values of time intervals, the data conversion, described previously in the ‘SGT projection’ section, is applied first to generate that information, then the time intervals can be retrieved from the database. The retrieved N data samples are denoted by S = {si } (i = 1, 2, . . . , N), where N is the number of storms in the data set and si denotes the time interval associated with the ith storm in the selected data set. The statistical model of SGT is built based on the data samples S. Un 5 rec on the negative binomial distribution model. Since 11 years worth of data is too small to give accurate predications, some projected data are not very close to the actual data as illustrated in this figure. c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 19 HURRICANE OCCURRENCE PROJECTION Distribution estimation of time intervals 19 A nonparametric approach is applied to estimate the cumulative distribution function (CDF) of the time intervals. Let T denote the random variable time (number of hours). The nonparametric approach is described in detail as follows. All the time intervals si are sorted in ascending order. Assume the sorted result is 0 ≤ T1 ≤ T2 ≤ · · · ≤ TW , where W ≤ N. Let fi denote the frequency of the storms at time Ti . The empirical CDF for T as an estimate of the true CDF F (t) = P (T ≤ t) is calculated using the following equation. 1 if t < T1 f1 + f2 + · · · + fi (7) FN (t) = if Ti ≤ t < Ti+1 , i = 1, 2, . . . , W − 1 N 0 if t > TW 5 ted The empirical CDF is then smoothed using standard kernel √ smoothing techniques. The kernel used is the Epanechnikov kernel: K(x) = 0.75(1 − 0.2x 2)/ 5, and the local bandwidth is hN (t) = (S/2)(1/N)1/3 . The smooth estimator of F (t) is then calculated as ∞ W 1 ∗ t − Tj (8) K((t − x)/ hN (t))FN (x) dx = Sj K F̂N (t) = hN (t) hN (t) 0 j =1 where Sj is the jump of FN at Tj , that is Sj = FN (Tj )−F u N (Tj −1 ), j = 2, 3, . . . , W and S1 = FN (T1 ). Also K ∗ (u) is the integral of K(x), that is, K ∗ (u) = −∞ K(x) dx. 15 We have no intention to validate the approach used for SGT modeling here in the same manner as we do for the AHO. The reason is that SGT is modeled using a nonparametric approach. Although confidence intervals for the smooth estimates exist, they are highly technical relying on difficult statistical theories and may not be appropriate to present here. However, as a demonstration of the accuracy of the SGT projection, the comparison histogram is illustrated in Figure 11. The historical hurricane data from 1900 to 1990 are still used to derive the distribution, and the actual data of year 1991–2001 are used for comparison. The possible SGT values are divided into a number of bins with interval of 600. The corresponding frequency histograms of both actual and projected data are plotted, and the result is promising. cor 10 rec SGT projection validation DATABASE COMPONENT Un 20 The Oracle9i database is incorporated into the system as the information storehouse that stores data records for any storms happening in the Atlantic basin since the year 1851. An object-relational database schema is designed to facilitate the data reusability and manageability. The major advantage brought by object-relational concepts is the ability to incorporate higher levels of abstraction into our data models, while current relational databases are usually highly normalized models but with little abstraction. c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 20 pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 20 S.-C. CHEN ET AL. Number of Generated SGT Number of Historical SGT 35 30 Frequency 25 20 15 10 5 0 1200-1799 1800-2399 2400-2999 3000-3599 3600-4199 4200-4799 4800-5399 5400-5999 6000-6599 ted SGT Range Figure 11. Frequencies histogram of historical/projected SGT. Hurricane data modeling 15 20 cor 10 Data analyzing and modeling is a vital aspect of the database component. In our system, an objectrelational design pattern is applied to model hurricane data. Object-relational models can assist the reuse of the database objects. The overall view of the hurricane data schema is depicted in Figure 12. The database schema for the HURDAT data set consists of six major object types and five major tables. The table Atmosevent list is used to hold the tracking data for all atmosevents, namely the storms and hurricanes, which were dated from 1851 to the present day in the database, For each atmosevent, an atmosevent object is used to model its structured information. The table Storm category is used to store the information about the atmosevent’s category and description. The relationship between the table Atmosevent list and table Storm category is built by adding a foreign key into the table Atmosevent list. The table Landfall stores a storm id and a nested table of Landfall type arr object. The foreign key storm id of the table Landfall corresponds to the primary key key id of Atmosevent list table. The table Stormfix list is used to store the fixes of all the atmosevents and each stormfix is represented by a Stormfix object. This table is related to table Atmosevent list by a foreign key event id. Furthermore, the for event field of the Stormfix object refers to an Atmosevent object, and its produced id and produced by fields refer to Platform type object, while its fixobj field is based on Fix object. The table Platform type list is an object table of Platform type object. The primary key key id of platform type list corresponds to the foreign key produced by of the table Stormfix list. Un 5 rec The original data set in the format of textual files is processed and extracted to fit into the objectrelational schema. Several programs in a variety of programming languages are developed to automate the processing and populating tasks. c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 21 21 rec ted HURRICANE OCCURRENCE PROJECTION Figure 12. Database schema. 10 Historical hurricane data stored in this database are directly imported from the North Atlantic ‘best track’ HURDAT database that is maintained by the National Hurricane Center in Miami, Florida and the National Climatic Data Center in Asheville, North Carolina. Currently, the ‘best track’ database has been extended from 1851 to 2001. One problem with the original data representation of the storm tracks of the Atlantic basin is that they are recorded in text files, and there is no unified format for the data entries. Hence the original data need to be processed and converted properly in order to populate them into the database schema. The first step to process the original data is to extract the useful data and to remove the unwanted data, such as the format symbols. We use the database table Atmosevent list as an example. This table stores the high-level information for all storms, and the following corresponding data fields need to be Un 5 cor Original data and data processing c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 22 5 10 pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 22 S.-C. CHEN ET AL. extracted from the original data file: (i) storm number, (ii) begin date of that storm and (iii) storm type. Some of the required data can be obtained directly from the original data set, while others need further conversion such as ‘storm type’. The ‘storm type’ field cannot be obtained directly from the original data file; instead, it has to be calculated by converting the maximum wind speed of each storm to its corresponding storm category according to some criteria. A C++ program is developed to retrieve the data and then to automatically assign a correct storm type to each storm. As another example, the table Stormfix list stores the detailed information about each storm or hurricane. Such information includes a storm’s life line, the exact latitude and longitude, the wind speed and the central pressure at different fix points for each day, etc. Therefore, this information needs to be derived from the original data file. However, the un-unified data entries make it difficult to directly import the needed data. A Java program is then developed to deal with the various formats of the data entries and to output a text file with unified formats, which can be loaded into the database later on. To ensure the data consistency between the extracted data and the original data, data checking is done either manually or automatically through programs. CONCLUSION 20 In this paper, a Web-based distributed system for the projection of hurricane occurrences is presented. It integrates a group of individual applications by combining hurricane data acquisition, storage, retrieval and analysis functions. The system exhibits a modular, extensible, and scalable architecture that makes it possible to adapt to more complex tasks such as storm track simulation and wind field generation. The well-established three-tier architecture is exploited to build the system. A variety of advanced techniques such as JSP, JNI and JDBC are used in the design and development of the application. Both Oracle Database and Application Server are deployed to make the system a coherent integration. In addition, it is accessible to any user who is able to connect to the Internet and has interest in hurricane prediction information. ACKNOWLEDGEMENT 25 rec ted 15 This work was partially supported by the Florida Department of Insurance (DOI) under the ‘Hurricane Risk and Insured Loss Projection Model’ project. While the project is funded by the Florida DOI it is not responsible for this paper content. 35 40 1. National Hurricane Center. http://www.nhc.noaa.gov/. 2. Smith E. Atlantic and east coast hurricanes 1900–98: A frequency and intensity study for the twenty-first century. Bulletin of the American Meteorological Society 1999; 18(12):2717–2720. 3. Kurihara Y, Bender MA, Tuleya RE, Ross RJ. Improvements in the GFDL hurricane prediction system. Monthly Weather Review 1995; 123:2791–2801. 4. Russell LR. Probability distributions for hurricane effects. Journal of Waterways, Harbors, and Coastal Engineering Division, ASCE 1971; 97:139–154. 5. HAZUS Home. http://www.fema.gov/hazus/. 6. HAZUS Overview. http://www.nibs.org/hazusweb/verview/overview.php. 7. http://www.oracle.com/ip/deploy/database/oracle9i/. 8. HURDAT data. http://www.aoml.noaa.gov/hrd/hurdat/Data Storm.html. 9. Java Server Pages (TM) Technology. http://java.sun.com/products/jsp/. Un 30 cor REFERENCES c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 pro ofs October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 23 HURRICANE OCCURRENCE PROJECTION cor rec ted Oracle9iAS Container for J2EE. http://technet.oracle.com/tech/java/oc4j/content.html. Panda D. Oracle Container for J2EE (OC4J). http://www.onjava.com/pub/a/onjava/2002/01/16/oracle.html. The JDBC API Universal Data Access for the Enterprise. http://java.sun.com/products/jdbc/overview.html. Java Native Interface. http://java.sun.com/docs/books/tutorial/native1.1/. Morisseau-Leroy N, Solomon MK, Basu J. Oracle8i: Java Component Programming with EJB, CORBA, and JSP. Oracle Press (McGraw-Hill/Osborne), 2000. 15. The Ptolemy Java Applet package. http://ptolemy.eecs.berkeley.edu/papers/99/HMAD/html/plotb.html. 16. Vickery PJ, Skerjl PF, Twisdale LA. Simulation of hurricane risk in the United States using an empirical storm track modeling technique. Journal of Structural Engineering 2000; 126:12222–1237. Un 5 10. 11. 12. 13. 14. 23 c 2004 John Wiley & Sons, Ltd. Copyright Softw. Pract. Exper. 2004; 34:1–23 Annotations from spe580.pdf Page 22 Annotation 1 Au: Please give access dates for all web site references. Annotation 2 Au: ref. 7 Please give further details. Page 23 Annotation 1 Au: ref. 14 Please give location of publisher. Annotation 2 Au: ref. 16 Please check that page numbers are correct.