A Web-based distributed system for hurricane occurrence projection

advertisement
SOFTWARE—PRACTICE AND EXPERIENCE
Softw. Pract. Exper. 2004; 34:1–23 (DOI: 10.1002/spe.580)
A Web-based distributed system
for hurricane occurrence
projection
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 1
Shu-Ching Chen1,∗,† , Sneh Gulati2 , Shahid Hamid3 , Xin Huang1, Lin Luo1 ,
Nirva Morisseau-Leroy4, Mark D. Powell5 , Chengjun Zhan1 and Chengcui Zhang1
1
2
3
4
ted
School of Computer Science, Florida International University, Miami, FL 33199, U.S.A.
Department of Statistics, Florida International University, Miami, FL 33199, U.S.A.
Department of Finance, Florida International University, Miami, FL 33199, U.S.A.
Cooperative Institute for Marine and Atmospheric Science, University of Miami, Coral Gables,
FL 33124, U.S.A.
5 Hurricane Research Division, NOAA, Miami, FL 33149, U.S.A.
SUMMARY
5
KEY WORDS :
rec
15
distributed system; hurricane statistical analysis; database
20
cor
10
As an environmental phenomenon, hurricanes cause significant property damage and loss of life in coastal
areas almost every year. Research concerning hurricanes and their aftermath is gaining more and more
attention nowadays. This paper presents our work in designing and building a Web-based distributed
software system that can be used for the statistical analysis and projection of hurricane occurrences.
Firstly, our system is a large-scale system and can handle the huge amount of hurricane data and intensive
computations in hurricane data analysis and projection. Secondly, it is a distributed system, which allows
multiple users at different locations to access the system simultaneously and to share and exchange the data
and data model. Thirdly, our system is a database-centered system where the Oracle database is employed
to store and manage the large amount of hurricane data, the hurricane model and the projection results.
Finally, a three-tier architecture has been adopted to make our system robust and resistant to the potential
change in the lifetime of the system. This paper focuses on the three-tier system architecture, describing the
c 2004 John Wiley & Sons, Ltd.
design and implementation of the components at each layer. Copyright ∗ Correspondence to: Professor Shu-Ching Chen, Florida International University, School of Computer Science, 11200 SW 8th
Street, ECS 354, Miami, FL 33199, U.S.A.
† E-mail: chens@cs.fiu.edu
Un
Contract/grant sponsor: Florida Department of Insurance under the ‘Hurricane Risk and Insured Loss Projection Model’ project
c 2004 John Wiley & Sons, Ltd.
Copyright Received 21 March 2003
Revised 4 September 2003
Accepted 4 September 2003
2
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 2
S.-C. CHEN ET AL.
INTRODUCTION
25
30
35
40
ted
20
rec
15
cor
10
Un
5
Due to their significant threat to life and property, it is very important to predict the possible occurrences
of hurricanes in order to prevent damage and loss. However, tracking the recovery process across
decades to predict their future impact is a challenging task.
A hurricane is a type of tropical cyclone, which is a generic term for a low-pressure system that
generally forms over warm, tropical oceans. Usually a hurricane measures several hundred miles in
diameter and is accompanied by violent winds, incredible waves, heavy rains and floods. Normally a
hurricane starts as a tropical depression, becomes a tropical storm when the maximum sustained wind
speed exceeds 38 mph and finally turns into a hurricane when the winds have a speed higher than
74 mph. Hurricanes have an eye and eye wall. The eye is the calm area near the rotational axis of the
hurricane. Surrounding the eye are the chick clouds, called the eye wall, which is the violent area of a
hurricane [1].
Hurricanes are categorized according to their severity using the Saffir-Simpson hurricane scale,
ranging from 1 to 5 [2] as shown in Table I. A category 1 storm has the lowest wind speeds while
a category 5 hurricane has the strongest. These are relative terms, because lower category storms can
sometimes inflict greater damage than higher category storms, depending on where they strike and the
particular hazards they bring. In fact, tropical storms can also produce tremendous damage, mainly due
to flooding.
It is reported that every year approximately ten tropical storms develop over the Atlantic Ocean.
Although many of these remain over the ocean, some become hurricanes and strike the United States
coastline and at least two of them are greater than category 3, posing enormous threats to life and
property. For example, storm tides preceding hurricane Camille in 1969 were in excess of 20 ft, and
the flooding accompanying hurricane Agnes in 1972 caused 122 deaths and US$6.4 billion in damage
in the northeast.
Sophisticated three-dimensional numerical weather prediction models (e.g. [3]) are too computationally expensive to conduct hurricane loss projection simulation studies. In order to project losses
associated with landfalling hurricanes, statistical Monte-Carlo simulations [4] are conducted, which
attempt to model thousands of years of hurricane activity based on the statistical character of the
historical storms in the vicinity of the location of interest.
Another hurricane damage and loss projection model is HAZUS [5,6]. HAZUS, or Hazards U.S.,
was developed by the Federal Emergency Management Agency (FEMA) as a standardized, national
methodology for natural hazards losses assessment. HAZUS can estimate the damage and losses that
are caused by various natural disasters such as earthquakes, wind and floods. Some useful databases,
such as a national-level basic exposure database, are built into the HAZUS system, which allow the
users to run a preliminary analysis without having to collect additional local data. It also provides the
functionality to allow the users to plug their own data into the databases.
Although HAZUS is powerful and useful, the necessary software packages, such as the commercial
GIS software, need to be installed in every machine on which the HAZUS system runs, which in turn
increases both expenses and manual labor.
This paper presents a distributed system for hurricane statistical analysis and projection. First of
all, our system is built upon an object-relational database management system Oracle9i [7], which
is one of the core system components to store and manage the large amount of hurricane data, the
hurricane data model and the projection results as well. The source data sets, such as the HURDAT
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 3
HURRICANE OCCURRENCE PROJECTION
3
Table I. Saffir–Simpson hurricane scale.
Category
1
2
3
4
5
15
Damage
minimal
moderate
extensive
extreme
catastrophic
cor
To achieve the system robustness, flexibility and resistance to potential change, the popular three-tier
architecture is deployed in the intended system. The architecture consists of three layers: the user
interface layer, the application logic layer and the database layer. The three-tier architecture aims
to solve a number of recurring design and development problems, hence to make the application
development work easier and more efficient. The interface layer in the three-tier architecture offers
the user a friendly and convenient entry to communicate with the system while the application logic
layer performs the controlling functionalities and manipulates the underlying logic connection of
information flows; finally, the database layer conducts the data modeling job, which can store, index,
manage and model information needed for this application.
Un
25
Examples
Charley (1998)
Bob (1991)
Alicia (1983)
Andrew (1992)
Camille (1969)
database [8], are imported into the database and are modeled by applying the object-relational concepts.
The user may also import the customized data into the database. In addition, the models and projection
results produced by the system are stored into and managed by the database for future use. Secondly,
in contrast to the existing hurricane projection applications, an important feature of the proposed
system is that it aims to support both professional and general-purpose users in a very convenient way.
For that purpose, a Web-based distributed system architecture following the client–server architecture
is adopted to provide easy and parallel accesses to multiple users at different locations. Specifically, a
Web-based system based on Java Server Pages (JSPs) [9] and J2EE is implemented. All the specific
software and hardware are installed only on the server side. Anyone who can surf the Internet using
a standard Web browser is able to take advantage of the system without any additional cost, while
the underlying principles are seamlessly concealed from the Website visitors. Prototyping the system
online also offers great flexibility in content presentation and visualization. Since the hurricane data
are constantly being updated and the mathematical models for the hurricane data are also potentially
changeable, a three-tier architecture is adopted as our system’s fundamental architecture to provide the
transparency among the data layer (hurricane data), application logic layer (the hurricane data model)
and the user interface layer. This architecture makes our system more robust and resistant to a potential
change in the lifetime of the system.
SYSTEM ARCHITECTURE
20
4–5
6–8
9–12
13–18
>18
ted
10
74–95
96–110
11–130
131–155
>155
Storm surge
(ft)
rec
5
Wind speed
(mph)
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
4
S.-C. CHEN ET AL.
User Interface
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 4
Application Logic
Web
Browser
HTTP/SSL
Web
Server
Database
OC4J
Container
ORACLE
DB
JDBC
Java Bean
ted
JNI
Math
Model
in C++
IMSL
Library
Math Model
Web applications are perfect for utilizing three-tier architecture because the presentation layer
is necessarily separated, and the logic and data components can be divided up much like a
client–server application. A detailed illustration of the system’s architecture is given in Figure 1.
Components contained in each tier and the relations among different tiers are described in the following
sections.
User interface tier
The first tier is the user interface tier. This tier manages the input/output data and their display. With the
intention of offering great convenience for the users, the system is prototyped on the Internet. The users
are allowed to access the system by using any existing Web browser software. The user interface
tier contains HTML components needed to collect incoming information and to display information
received from the application logic tiers. The Web visitors communicate with the Web server via
application protocols such as HTTP and SSL, sending requests and receiving replies. In our system, the
major Web-scripting language exploited in designing the presentation layer is the JSP technique [9].
Un
10
cor
5
rec
Figure 1. Detailed architecture of the system.
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 5
HURRICANE OCCURRENCE PROJECTION
Application logic tier
5
10
The application logic tier is the middle tier, which bridges the gap between the user interface and
the underlying database and hides technical details from the users. An Oracle9i Application Server is
deployed. Its OC4J container embeds a Web server, which responds to events, such as data receiving,
translating, dispatching and feed-backing jobs [10,11]. Components in this tier receive requests coming
from the interface tier and interpret the requests into apropos actions controlled by the defined work
flow in accordance with certain pre-defined rules. JavaBeans perform the appropriate communication
and calculation activities such as getting/pushing information from/to the database and carrying out the
necessary computing work with respect to proper statistical and mathematical models. JDBC [12] is
utilized for JavaBeans to access the physical database. In the interest of quick system response, C/C++
language is used to program the computing modules that are integrated into the Java code via JNI [13].
Database tier
20
ted
The database tier is responsible for modeling and storing information needed for the system and for
optimizing the data access. Data needed by the application logic layer are retrieved from the database,
then the computation results produced by application logic layer are stored back to the database.
Since data constitute one of the most complex aspects of many existing information systems, it is
essential in structuring systems. Both the facts and rules captured during data modeling and processing
are important to ensure the data integrity. An oracle9i database is deployed in our system, and Object
Relational Model is applied to facilitate data reuse and standard adherence.
USER INTERFACE
rec
15
5
The intended system is prototyped into the Internet, therefore the design and implementation of the
system user interface mainly becomes a job to design and implement Web pages. The users can gain
access to the system through any commonly used commercial browsers such as Internet Explorer,
Netscape, etc.
35
cor
30
Due to its ‘unlimited’ expressive power and natural coherence with the J2EE architecture, JSP Webscripting technology is adopted to implement the Web pages [9,14]. JSPs, sitting on top of a Java
servlets model, can easily and flexibly generate the dynamic content of a Web page. The basic idea
of JSPs is to allow Java codes to be mixed together with static HTML or XML templates. The Java
logic handles the dynamic content generating, while the markup language controls the structuring and
presentation of the data.
Since putting all the Java codes into a JSP itself causes unmanageable content, especially when
the tasks performed by the Java code are not simple, JavaBeans are imported to perform most of the
actual work. For the sake of performance, complex computational tasks are actually achieved by using
C/C++ codes. The C/C++ code is seamlessly integrated into corresponding Java code via the Java
Native Interface (JNI) mechanism [13]. Java Applet techniques are exploited when necessary to live
up the Web page.
Un
25
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
6
S.-C. CHEN ET AL.
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 6
Table II. El Niño and La Niña years.
La Niña year
1925
1929
1930
1940
1941
1951
1953
1957
1963
1965
1969
1972
1976
1977
1982
1986
1987
1990
1991
1993
1994
1997
1933
1938
1942
1944
1945
1948
1949
1950
1954
1955
1956
1961
1964
1967
1970
1971
1973
1974
1975
1978
1988
1995
1998
1999
2000
rec
ted
El Niño year
Annual Hurricane Occurrence projection
Rationale
For the estimation of hurricane occurrence distribution to be conducted, a suitable data set needs
to be selected. Different data set choices significantly influence the final estimation of a probability
distribution.
Hurricane records in the database are categorized into five datasets according to climate cycles
or qualifications. The categories are: (1) 1851–2000, (2) 1900–2000, (3) 1944–2000, (4) ENSO
Un
10
cor
5
The first step to study the hurricane phenomena and their impact is to estimate the frequency of
hurricanes in the future. Annual Hurricane Occurrence (AHO) projection is proposed to address this
problem. AHO estimates the frequency of hurricanes occurring in a series of years based on an
associated hurricane occurrence probability distribution, which is obtained through statistical analysis
and calculation on the basis of historical hurricane records.
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 7
HURRICANE OCCURRENCE PROJECTION
7
Table III. Multi-Decadal year
ranges and climate phase.
Climate phase
(warm)
Climate phase
(cold)
1870–1902
1926–1970
1995–2001
1903–1925
1971–1994
ted
begin
Oracle DB
system gives out
dataset selection
user
select
rec
system gets data
calculate basic
statistical features
IMSL Statistic &
Math Library
generate
distribution
and (5) Multi-Decadal. The first three groups contain hurricanes occurring in different year ranges.
The ENSO data set is for the El Niño and La Niña years. Table II lists all El Niño and La Niña years
up to date. The last group, Multi-Decadal, includes records of hurricanes that occurred in certain years
when the climate phase was either warm or cold. The years contained in this category are detailed in
Table III.
The statistical models are generated from the historical data set. Based on the generated probability
distribution models, the number of hurricane occurrences per year in the future are produced for any
number of years the user desires. The detailed description of these models is presented later in the
Un
5
cor
Figure 2. Flow chart for AHO.
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
S.-C. CHEN ET AL.
ted
8
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 8
Figure 3. Data set selection Web page (AHO).
Implementation
10
Several JSPs and JavaBeans are constructed to implement the functionalities of AHO projection.
JSPs offer interfaces for the user to specify a data set and for displaying results to the user. JavaBeans
are responsible for handling communication and computation tasks and for hiding the technical details
from the external users. The data are retrieved from and stored back to the database via calling JDBC
API. Simple calculations are performed by Java code itself while more complicated computing tasks
are achieved by C/C++ programs that are integrated into Java code through JNI in order to improve the
computing performance.
Data set selection
First, the Web visitor needs to select a data set and to tell the system to use the selected data set as the
basis of the statistical projection. A JSP is built for that task. To avoid typos and illegal datasets, all
data sets that are currently available are offered to the user via a drop-down list. The user’s choice is
collected by a form. The user chooses a data set he/she wants and submits the selection to the system
by clicking the ‘Submit’ button. The actual Web page is portrayed in Figure 3.
Un
15
cor
5
rec
‘Statistical and mathematical modeling’ section. Figure 2 illustrates the overall workflow for AHO
estimation.
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 9
9
ted
HURRICANE OCCURRENCE PROJECTION
Statistical models evaluation
15
cor
10
Another JSP file handles the submitted selection from the user. In this JSP file, there are two
imported JavaBeans. The first JavaBean is the database-querying Bean that communicates with the
database. It connects to the database, queries the database with respect to the selected data set,
retrieves the corresponding data and stores the data. The second, distribution-evaluating, JavaBean
has been devised particularly to evaluate various statistical distribution models using the retrieved
data. The data are passed to the distribution-evaluating JavaBean from the database-querying Bean.
The statistical distribution models and evaluating standards exploited are elaborated in the ‘Statistical
and mathematical modeling’ section.
At the end of the processing, the related information is returned and displayed to the user. In our
case, basic statistical characteristics of data in the selected data set, such as mean and variance, are
returned to the user. The distribution models are provided to the user as well.
For the purpose of statistical projection in the next step, the user needs to specify N, the number
of years for which the projection process generates the estimated numbers of hurricane occurrences.
This information is captured by a textfield within a form. After the user inputs the desired number of
Un
5
rec
Figure 4. Distribution models evaluation Web page.
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
S.-C. CHEN ET AL.
ted
10
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 10
Figure 5. AHO projection result Web page (line).
Statistical projection
cor
15
Once it obtains the statistical projection request and the necessary information from the user, the system
starts the projection process. The calculation part of the projection work is performed by another
JavaBean, which generates the N values of the number of hurricane occurrences based on the indicated
distribution.
The statistical projection results, a collection of the number of hurricane occurrences, are sent back
to the user. In the meantime, these results will be stored in the database for future computation. To offer
live visualization, the Java Applet mechanism is introduced. Our Java Applets are implemented based
on the Ptolemy Java Applet package from Berkeley [15]. The statistical projection result can be plotted
as a line chart or a bar chart, as shown in Figures 5 and 6. The maximum number of years displayed
is 100 per screen. There are both ‘Previous 100’ and ‘Next 100’ buttons to allow for the browsing of
a very large number of years, screen by screen. In the example illustrated in Figures 5 and 6, the user
specified a large number of years for hurricane occurrences projection, and the graphs actually present
the third screen of data that starts from year 201 to 300.
Un
10
rec
5
years and clicks the ‘Submit’ button to send the request, the statistical projection is conducted based
on the best probability distribution generated from the user selected data set.
Figure 4 is a snapshot of the corresponding JSP Web page. The upper part displays information
returned to the user; for example, the data set selection information and the statistical values of the
selected data set. The lower part uses the text area to obtain the user’s input data.
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 11
11
ted
HURRICANE OCCURRENCE PROJECTION
Figure 6. AHO projection result Web page (bar).
5
One essential trait of a hurricane is its genesis time. The Storm Genesis Time (SGT) is the date and time
that an organized closed cyclonic circulation is first identified in the surface wind field surrounding a
low-pressure area, such that a regional forecast center would classify the system as an incipient tropical
cyclone.
For each numerically simulated hurricane resulting from the AHO, the associated SGT needs to be
produced. SGT projection aims to achieve this target. The prediction of genesis time is grounded in the
investigation and analysis of the historical hurricane genesis time data.
One data set needs to be determined to serve as the basis of the statistical projection. As in AHO,
five data sets are available for the projection of SGT: (1) 1851–2000, (2) 1900–2000, (3) 1944–2000,
(4) ENSO and (5) Multi-Decadal. The meaning of each data set is described in Tables II and III.
Genesis time is represented by the first fix data of the selected data set. To record the precise genesis
time of a storm once it forms is still beyond the capability of the currently available observational
instrumentation and hurricane modeling techniques. The first fix data are a collection of data related to
the characteristics of a hurricane the first time it is observed and recorded, including storm name, date,
time, position (longitude and latitude), maximum wind speed and pressure, etc. Hence, technically, it is
a suitable approximation of the actual SGT. The first fix data are stored in the database and are retrieved
Un
15
cor
Rationale
10
rec
Storm Genesis Time projection
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
12
S.-C. CHEN ET AL.
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 12
Table IV. Example of first fix data records.
StormId
StormName
GenesisDate
JulianDate
GenesisTime
310
311
1114
1153
NME
NME
NME
NME
5-Jul-1851
16-Aug-1851
19-Aug-1852
5-Sep-1852
2 397 309
2 397 351
2 397 720
2 397 737
120 000
000 000
000 000
000 000
Table V. First fix data records after processing.
GenesisTime
SGT
5-Jul-1851
16-Aug-1851
19-Aug-1852
120 000
000 000
000 000
1572
2592
2640
ted
GenesisDate
NME
NME
NME
upon the running time. Table IV depicts some data record examples for the first fix data. Each record
shows when and where a particular tropical storm originated.
In all these data fields, those of the utmost concern are fields representing time information:
GenesisDate, JulianDate and GenesisTime. The GenesisDate field records the calendar date on which
the storm began. The corresponding JulianDate of that calendar date is stored in the JulianDate
field. JulianDate is simply a continuous count of days and fractions since noon Universal Time on
1 January, 4713 BCE and is widely used as time variables within astronomical software. GenesisTime
indicates the time point when the storm originated. Since the actual observation is conducted every
hour, the value in that field represents not an exact time point but a time interval. The 24 h day
is divided into four intervals: I1 = [0AM, 6AM), I2 = [6AM, 12Noon), I3 = [12Noon, 6P M)
and I4 = [6P M, Midnight), which are denoted respectively as values 000000, 060000, 120000 and
180000. For instance, the first record shows that the tropical storm with StormId 310 began during the
time interval (12:00, 18:00), 07/05/1851.
Since the actual estimation is based on the time intervals between the continuous hurricanes that
are estimated in the unit of hours, the first fix data are processed to produce the interval data for the
calculation purpose. The conversion is conducted as following:
rec
10
StormName
310
311
1114
cor
5
StormId
SGT = 24 × (Julian date of a storm − Julian date of 05/01/1851) + GenesisTime
For example, the storm with StormId 311 happened in the time interval I1 on 08/16/1851, the SGT
value is:
24 × (2 397 351 − 2 397 243) + 0 = 2592.
where 2 397 351 is the Julian date of 16 August 1851, and 2 397 243 is the Julian date of 1 May 1851.
The resulting SGT data after processing are also shown in Table V.
Un
15
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 13
HURRICANE OCCURRENCE PROJECTION
Begin
System provides
dataset selection
User
selects
System gets data
from database
Oracle DB
ted
System estimates
the CDF of HBG
13
System generates
the SGT values
Save the SGT data
to database
rec
Display result
Figure 7. Flow chart for SGT.
Implementation
The flow chart of SGT indicates that the users need to first appoint a data set, then the system
automatically begins to estimate the distribution and to generate new SGT values. During the whole
predicting process, JSP Web pages allow the users to select the desired data set. The JavaBeans deal
with calculating and data retrieving/restoring work. The distribution estimating job involves a lot of
statistical and mathematical functions, which are accomplished by C/C++ code.
Un
10
cor
5
After this preprocessing, the probability distribution of the SGT values is analyzed based on certain
estimating algorithms, which is elaborated in the ‘Statistical and mathematical modeling’ section.
Then, according to the estimated distribution, an associated genesis time is produced for each hurricane
that is predicted by AHO.
The overall information flow for SGT prediction is shown in Figure 7.
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
S.-C. CHEN ET AL.
ted
14
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 14
Figure 8. Data set selection Web page (SGT).
rec
Data set selection
Similar to AHO, there are a total of five data sets available, and they are provided to the users via a
drop-down list. The users select one of them and send the projection request to the system by submitting
the selection to the system. A snapshot of the data set selection Web page is illustrated in Figure 8.
Based on the data set choice, the system first retrieves the related first fix data from the database
and processes them to generate SGT data in conformity to the above-mentioned converting approach.
Then the system estimates the distribution of the SGT values and produces a SGT value for each
numerically simulated hurricane from AHO. The SGT data are stored into the database at the same
time. Example SGT values are dynamically displayed to the user in the format of a table. Figure 9 is
the resulting Web page.
cor
10
Distribution estimation and SGT projection
STATISTICAL AND MATHEMATICAL MODELING
The modeling approach utilized in our system complies with the popular hurricane projection strategy
as detailed in [16], i.e. to model the entire track of a hurricane beginning with its initiation over the
Un
5
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 15
15
rec
ted
HURRICANE OCCURRENCE PROJECTION
Figure 9. SGT projection result Web page.
15
cor
10
Un
5
open ocean to its dissipation. The characteristics of the storm are modeled at each 6 h point in the storm
history.
The first step in modeling the complete track of a hurricane is to model the number of hurricanes
occurring per year and the genesis time of each individual storm, which are the purposes of the AHO
projection and SGT projection, respectively. Specifically, AHO projection aims to model and predict
the number of storms occurring per year, and SGT projection attempts to predict the genesis time of
each specific storm. A statistical approach is adopted, and the statistical models of the AHO and SGT
are built from the historical storm data via statistical analysis.
One meteorological fact is that the statistical properties of AHO vary with different year ranges.
For example, the statistical properties of storms in El Niño years are quite different from those in non
El Niño years. Therefore, different statistical models are necessary for different year ranges. In our
system, all the historical storm records in the database are categorized into five data sets according to
meteorologic criteria that are: (1) 1851–2000, (2) 1900–2000, (3) 1944–2000, (4) ENSO and (5) MultiDecadal. The meaning of each category has been discussed in the last section. Different statistical
models are built for individual data sets.
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
16
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 16
S.-C. CHEN ET AL.
AHO projection
5
10
AHO projection aims to model and predict the number of storms occurring per year. According to
domain knowledge in meteorology, the best statistical distribution of the number of storms occurring
per year is either Poisson distribution or negative binomial distribution. The Poisson distribution has
been the classic distribution describing the occurrence of a stochastic process. However, the Poisson
distribution assumes that the mean number of hurricanes in any two nonoverlapping time intervals of
equal length is the same. Allowing these means to be different leads to the ANO being modeled by a
mixture of Poisson distributions, which in effect is the negative binomial distribution.
First, the parameters of both the Poisson distribution and the negative binomial distribution are
estimated from the historical data. Then the goodness of fit for the two distributions is evaluated based
on the chi-squared statistic, and the distribution with the better fit is picked as the final statistical model
of AHO.
rec
Estimation of Poisson distribution
The probability distribution of a Poisson random variable x with mean γ is P (x) = (γ x e−γ )/x!.
Given the data samples X = {xi } (i = 1, 2, . . . , M) from the historical storm data, the maximum
likelihood estimator of the parameter γ is
M
γ̂ =
i=1 xi
(1)
M
cor
20
Since different statistical models are built for different data sets, the user first needs to select one
data set from the five categories through the user interface as mentioned in the last section. Then the
historical data of the selected data set are retrieved from the database. The retrieved M data samples
are denoted by X = {xi } (i = 1, 2, . . . , M) where M is the number of years in the data set and xi
denotes the number of storms that occurred in the ith year in the data set. The statistical model of AHO
is built based on the M data samples.
Estimation of negative binomial distribution
The single variable negative binomial distribution can be represented as
P (x) =
(x + k)
(x + 1) ∗ (k)
where (·) is the gamma function, namely (x) =
Un
15
ted
Data samples
c 2004 John Wiley & Sons, Ltd.
Copyright k
m+k
∞
0
k m
m+k
x
(2)
t x−1 e−t dt
Softw. Pract. Exper. 2004; 34:1–23
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 17
HURRICANE OCCURRENCE PROJECTION
17
Given the M data samples X = {xi } (i = 1, 2, . . . , M) from the historical storm data, the estimates
of parameter m and k are
M
xi
(3)
m̂ = i=1
M
k̂ =
m̂2
s 2 − m̂
where s is the variance of data samples X.
Model selection
After the estimation of both the Poisson distribution and the negative binomial distribution parameters,
the chi-square statistic is calculated to select the final model. The distribution with higher p-value is
selected as the final statistical model of the AHO.
Assume the data are divided into k bins. The test statistic of the chi-square goodness of fit is defined
as:
k
(Oi − Ei )2 /Ei
(5)
p=
i=1
ted
5
(4)
rec
where Oi is the observed frequency for bin i, and Ei is the expected frequency for bin i.
Let K = max{xi } (i = 1, 2, . . . , M), which means K is the maximum number of hurricanes
occurring per year in historical data. It is safe to assume the number of hurricanes occurring per year
ranges from 0 to K. Then the data are divided into (K + 1) bins with width of 1. The chi-square test
statistic can be rewritten as
K
(Oi − Ei )2 /Ei
(6)
p=
i=0
10
where Oi is the observed frequency for i hurricanes occurring per year, and Ei is the expected
frequency for i hurricanes occurring per year according to the statistical model that is either Poisson
distribution or negative binomial distribution. The distribution with higher p value is selected as the
final statistical model of the AHO.
20
To validate the projection performance of the models explored for AHO, a subset of the data set that
includes hurricane occurrence data is used for statistical distribution estimation, and then the derived
model is used to forecast the number of hurricanes for a number of years. Considering the historical
hurricane data stored in the database, the subset used to estimate the distribution contains 100 years
worth of data, namely from the years 1900 to 1999, and the actual data used for comparison includes
data from years 1991 to 2001.
On the basis of historical data from 1900 to 1990, the 95% confidence intervals for the mean
number of hurricanes every year using Poisson distribution is (7.95, 9.15) and is (7.80, 9.29) using the
negative binomial distribution. Figure 10 presents side by side the projected frequencies of hurricane
occurrences for the years 1991–2001 and the associated actual occurrence frequencies, which are based
Un
15
cor
AHO projection validation
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
18
S.-C. CHEN ET AL.
Historical
Projected
6
Frequency
5
4
3
2
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 18
0
1
2
3
4
5
6
7
8
ted
1
9
10
11
12 13
14
15
16
17 18
19
Number of Annual Hurricane Occurrences
Figure 10. Frequencies histogram of historical/projected AHOs.
SGT projection
15
Data samples
cor
10
The genesis time of a storm are the first fix data of that storm. SGT projection aims to predict the genesis
time of each specific storm. This goal is achieved by modeling the number of hours between the genesis
of a storm in 6 h resolution and the start of its hurricane season rather than directly modeling the SGT.
A storm season starts on 1 May of one year and ends on 30 April of the next year. After modeling the
number of storms using AHO from the historical data, the SGT projection model can be used to predict
the time intervals among storms, and thus the SGT of each storm can be predicted as well.
The user first selects one data set from the five categories through the user interface. Since originally the
data set in the database contains no values of time intervals, the data conversion, described previously in
the ‘SGT projection’ section, is applied first to generate that information, then the time intervals can be
retrieved from the database. The retrieved N data samples are denoted by S = {si } (i = 1, 2, . . . , N),
where N is the number of storms in the data set and si denotes the time interval associated with the ith
storm in the selected data set. The statistical model of SGT is built based on the data samples S.
Un
5
rec
on the negative binomial distribution model. Since 11 years worth of data is too small to give accurate
predications, some projected data are not very close to the actual data as illustrated in this figure.
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 19
HURRICANE OCCURRENCE PROJECTION
Distribution estimation of time intervals
19
A nonparametric approach is applied to estimate the cumulative distribution function (CDF) of the time
intervals. Let T denote the random variable time (number of hours). The nonparametric approach is
described in detail as follows.
All the time intervals si are sorted in ascending order. Assume the sorted result is 0 ≤ T1 ≤ T2 ≤
· · · ≤ TW , where W ≤ N. Let fi denote the frequency of the storms at time Ti . The empirical CDF for
T as an estimate of the true CDF F (t) = P (T ≤ t) is calculated using the following equation.


1
if t < T1


f1 + f2 + · · · + fi
(7)
FN (t) =
if Ti ≤ t < Ti+1 , i = 1, 2, . . . , W − 1

N


0
if t > TW
5
ted
The empirical CDF is then smoothed using standard kernel
√ smoothing techniques. The kernel used
is the Epanechnikov kernel: K(x) = 0.75(1 − 0.2x 2)/ 5, and the local bandwidth is hN (t) =
(S/2)(1/N)1/3 . The smooth estimator of F (t) is then calculated as
∞
W
1
∗ t − Tj
(8)
K((t − x)/ hN (t))FN (x) dx =
Sj K
F̂N (t) =
hN (t)
hN (t)
0
j =1
where Sj is the jump of FN at Tj , that is Sj = FN (Tj )−F
u N (Tj −1 ), j = 2, 3, . . . , W and S1 = FN (T1 ).
Also K ∗ (u) is the integral of K(x), that is, K ∗ (u) = −∞ K(x) dx.
15
We have no intention to validate the approach used for SGT modeling here in the same manner as we do
for the AHO. The reason is that SGT is modeled using a nonparametric approach. Although confidence
intervals for the smooth estimates exist, they are highly technical relying on difficult statistical theories
and may not be appropriate to present here. However, as a demonstration of the accuracy of the SGT
projection, the comparison histogram is illustrated in Figure 11. The historical hurricane data from
1900 to 1990 are still used to derive the distribution, and the actual data of year 1991–2001 are used
for comparison. The possible SGT values are divided into a number of bins with interval of 600.
The corresponding frequency histograms of both actual and projected data are plotted, and the result is
promising.
cor
10
rec
SGT projection validation
DATABASE COMPONENT
Un
20
The Oracle9i database is incorporated into the system as the information storehouse that stores data
records for any storms happening in the Atlantic basin since the year 1851. An object-relational
database schema is designed to facilitate the data reusability and manageability. The major advantage
brought by object-relational concepts is the ability to incorporate higher levels of abstraction into our
data models, while current relational databases are usually highly normalized models but with little
abstraction.
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
20
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 20
S.-C. CHEN ET AL.
Number of Generated SGT
Number of Historical SGT
35
30
Frequency
25
20
15
10
5
0
1200-1799
1800-2399
2400-2999
3000-3599
3600-4199 4200-4799
4800-5399 5400-5999
6000-6599
ted
SGT Range
Figure 11. Frequencies histogram of historical/projected SGT.
Hurricane data modeling
15
20
cor
10
Data analyzing and modeling is a vital aspect of the database component. In our system, an objectrelational design pattern is applied to model hurricane data. Object-relational models can assist the
reuse of the database objects. The overall view of the hurricane data schema is depicted in Figure 12.
The database schema for the HURDAT data set consists of six major object types and five major
tables. The table Atmosevent list is used to hold the tracking data for all atmosevents, namely the
storms and hurricanes, which were dated from 1851 to the present day in the database, For each
atmosevent, an atmosevent object is used to model its structured information. The table Storm category
is used to store the information about the atmosevent’s category and description. The relationship
between the table Atmosevent list and table Storm category is built by adding a foreign key into the
table Atmosevent list. The table Landfall stores a storm id and a nested table of Landfall type arr
object. The foreign key storm id of the table Landfall corresponds to the primary key key id of
Atmosevent list table. The table Stormfix list is used to store the fixes of all the atmosevents and each
stormfix is represented by a Stormfix object. This table is related to table Atmosevent list by a foreign
key event id. Furthermore, the for event field of the Stormfix object refers to an Atmosevent object,
and its produced id and produced by fields refer to Platform type object, while its fixobj field is based
on Fix object. The table Platform type list is an object table of Platform type object. The primary key
key id of platform type list corresponds to the foreign key produced by of the table Stormfix list.
Un
5
rec
The original data set in the format of textual files is processed and extracted to fit into the objectrelational schema. Several programs in a variety of programming languages are developed to automate
the processing and populating tasks.
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 21
21
rec
ted
HURRICANE OCCURRENCE PROJECTION
Figure 12. Database schema.
10
Historical hurricane data stored in this database are directly imported from the North Atlantic ‘best
track’ HURDAT database that is maintained by the National Hurricane Center in Miami, Florida and
the National Climatic Data Center in Asheville, North Carolina. Currently, the ‘best track’ database
has been extended from 1851 to 2001.
One problem with the original data representation of the storm tracks of the Atlantic basin is that
they are recorded in text files, and there is no unified format for the data entries. Hence the original
data need to be processed and converted properly in order to populate them into the database schema.
The first step to process the original data is to extract the useful data and to remove the unwanted
data, such as the format symbols. We use the database table Atmosevent list as an example. This table
stores the high-level information for all storms, and the following corresponding data fields need to be
Un
5
cor
Original data and data processing
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
22
5
10
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 22
S.-C. CHEN ET AL.
extracted from the original data file: (i) storm number, (ii) begin date of that storm and (iii) storm type.
Some of the required data can be obtained directly from the original data set, while others need further
conversion such as ‘storm type’. The ‘storm type’ field cannot be obtained directly from the original
data file; instead, it has to be calculated by converting the maximum wind speed of each storm to its
corresponding storm category according to some criteria. A C++ program is developed to retrieve the
data and then to automatically assign a correct storm type to each storm.
As another example, the table Stormfix list stores the detailed information about each storm or
hurricane. Such information includes a storm’s life line, the exact latitude and longitude, the wind
speed and the central pressure at different fix points for each day, etc. Therefore, this information
needs to be derived from the original data file. However, the un-unified data entries make it difficult
to directly import the needed data. A Java program is then developed to deal with the various formats
of the data entries and to output a text file with unified formats, which can be loaded into the database
later on. To ensure the data consistency between the extracted data and the original data, data checking
is done either manually or automatically through programs.
CONCLUSION
20
In this paper, a Web-based distributed system for the projection of hurricane occurrences is presented.
It integrates a group of individual applications by combining hurricane data acquisition, storage,
retrieval and analysis functions. The system exhibits a modular, extensible, and scalable architecture
that makes it possible to adapt to more complex tasks such as storm track simulation and wind field
generation. The well-established three-tier architecture is exploited to build the system. A variety
of advanced techniques such as JSP, JNI and JDBC are used in the design and development of the
application. Both Oracle Database and Application Server are deployed to make the system a coherent
integration. In addition, it is accessible to any user who is able to connect to the Internet and has interest
in hurricane prediction information.
ACKNOWLEDGEMENT
25
rec
ted
15
This work was partially supported by the Florida Department of Insurance (DOI) under the ‘Hurricane Risk and
Insured Loss Projection Model’ project. While the project is funded by the Florida DOI it is not responsible for
this paper content.
35
40
1. National Hurricane Center. http://www.nhc.noaa.gov/.
2. Smith E. Atlantic and east coast hurricanes 1900–98: A frequency and intensity study for the twenty-first century. Bulletin
of the American Meteorological Society 1999; 18(12):2717–2720.
3. Kurihara Y, Bender MA, Tuleya RE, Ross RJ. Improvements in the GFDL hurricane prediction system. Monthly Weather
Review 1995; 123:2791–2801.
4. Russell LR. Probability distributions for hurricane effects. Journal of Waterways, Harbors, and Coastal Engineering
Division, ASCE 1971; 97:139–154.
5. HAZUS Home. http://www.fema.gov/hazus/.
6. HAZUS Overview. http://www.nibs.org/hazusweb/verview/overview.php.
7. http://www.oracle.com/ip/deploy/database/oracle9i/.
8. HURDAT data. http://www.aoml.noaa.gov/hrd/hurdat/Data Storm.html.
9. Java Server Pages (TM) Technology. http://java.sun.com/products/jsp/.
Un
30
cor
REFERENCES
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
pro
ofs
October 22, 2003 Marked proof Ref: SPE580/26122e Sheet number 23
HURRICANE OCCURRENCE PROJECTION
cor
rec
ted
Oracle9iAS Container for J2EE. http://technet.oracle.com/tech/java/oc4j/content.html.
Panda D. Oracle Container for J2EE (OC4J). http://www.onjava.com/pub/a/onjava/2002/01/16/oracle.html.
The JDBC API Universal Data Access for the Enterprise. http://java.sun.com/products/jdbc/overview.html.
Java Native Interface. http://java.sun.com/docs/books/tutorial/native1.1/.
Morisseau-Leroy N, Solomon MK, Basu J. Oracle8i: Java Component Programming with EJB, CORBA, and JSP. Oracle
Press (McGraw-Hill/Osborne), 2000.
15. The Ptolemy Java Applet package. http://ptolemy.eecs.berkeley.edu/papers/99/HMAD/html/plotb.html.
16. Vickery PJ, Skerjl PF, Twisdale LA. Simulation of hurricane risk in the United States using an empirical storm track
modeling technique. Journal of Structural Engineering 2000; 126:12222–1237.
Un
5
10.
11.
12.
13.
14.
23
c 2004 John Wiley & Sons, Ltd.
Copyright Softw. Pract. Exper. 2004; 34:1–23
Annotations from spe580.pdf
Page 22
Annotation 1
Au: Please give access dates for all web site references.
Annotation 2
Au: ref. 7
Please give further details.
Page 23
Annotation 1
Au: ref. 14
Please give location of publisher.
Annotation 2
Au: ref. 16
Please check that page numbers are correct.
Download