Table 3.2.1 - Stetson University

advertisement
HURRICANES AND DISASTER DECLARATIONS:
A STATISTICAL ANALYSIS
By
VERONICA REOTT
A SENIOR RESEARCH PAPER PRESENTED TO THE DEPARTMENT OF MATHEMATICS AND
COMPUTER SCIENCE OF STETSON UNIVERSITY IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE DEGREE OF BACHELOR OF SCIENCE
STETSON UNIVERSITY
2005
ACKNOWLEDGEMENTS
I would like to acknowledge all of my mathematics professors at Stetson University,
especially Dr. Will Miles who has advised, encouraged, and believed in me.
I would also like to acknowledge Joanne Spano, who has helped me so very much
throughout this whole endeavor and without whose acquaintance I may never have come
up with this idea.
2
TABLE OF CONTENTS
ACKNOWLEDGEMENTS ---------------------------------------------------------------- 2
LIST OF TABLES--------------------------------------------------------------------------- 4
LIST OF FIGURES-------------------------------------------------------------------------- 5
ABSTRACT---------------------------------------------------------------------------------- 6
CHAPTERS
1. INTRODUCTION------------------------------------------------------------------------ 7
2. BACKGROUND--------------------------------------------------------------------------10
3. SIMPLE LINEAR REGRESSIONS----------------------------------------------------15
3.1. Distance from coast------------------------------------------------------------18
3.2. Total Rainfall-------------------------------------------------------------------22
3.3. Maximum sustained wind speed---------------------------------------------24
3.4. Distance from eye--------------------------------------------------------------26
3.5. Distance from site of landfall-------------------------------------------------28
3.6. Overall speed--------------------------------------------------------------------31
3.7. Discussion-----------------------------------------------------------------------32
4. MULTIPLE REGRESSIONS-----------------------------------------------------------36
4.1 Techniques-----------------------------------------------------------------------36
4.1.1 Use of Matrices------------------------------------------------------36
4.1.2 Data collection and missing data----------------------------------38
4.2 Multiple regression models---------------------------------------------------39
5. RESULTS----------------------------------------------------------------------------------42
5.1 Choosing the correct model--------------------------------------------------- 42
5.2 Testing the model---------------------------------------------------------------43
5.2.1 Multicollinearity-----------------------------------------------------45
5.3 Finalizing-------------------------------------------------------------------------46
SUMMARY---------------------------------------------------------------------------------- 47
APPENDICES--------------------------------------------------------------------------------49
A: Probability Tables---------------------------------------------------------------49
B: FEMA Figures-------------------------------------------------------------------52
REFERENCES------------------------------------------------------------------------------- 48
3
LIST OF TABLES
1) Table A.1:Distance from coast stats-------------------------------------------------------- 49
2) Table 3.1.1:Distance from coast ANOVA--------------------------------------------------21
3) Table A.2:Total rainfall stats -----------------------------------------------------------------49
4) Table 3.2.1:Total rainfall ANOVA----------------------------------------------------------23
5) Table A.3:Max wind speed stats ----------------------------------------------------------- 50
6) Table 3.3.1:Max wind speed ANOVA------------------------------------------------------25
7) Table A.4:Distance from eye stats---------------------------------------------------------- 50
8) Table 3.4.1:Distance from eye ANOVA----------------------------------------------------28
9) Table A.5:Distance from site of landfall stats -------------------------------------------- 51
10) Table 3.5.1:Distance from site of landfall ANOVA--------------------------------------30
11) Table A.6:Overall speed stats ---------------------------------------------------------------51
12) Table 3.6.1:Overall speed ANOVA---------------------------------------------------------32
13) Table 3.7.1 Distance from coast linear model ANOVA----------------------------------34
14) Table 4.2.1 Multiple regression ANOVA information and F test scores---------------40
15) Table 5.2.2 Coefficient comparison test for multicollinearity---------------------------46
4
LIST OF FIGURES
1) Figure 3.1.1:Distance from coast observed values and exponential model ------------20
2) Figure 3.1.2:Distance from coast observed values vs predicted values-----------------21
3) Figure 3.2.1: Total Rainfall observed values and logarithmic model-------------------22
4) Figure 3.2.2: Total Rainfall predicted versus observed values---------------------------23
5) Figure 3.3.1: Max sustained wind speed observed values and logarithmic model----24
6) Figure 3.3.2 Max sustained wind speed predicted versus observed values------------25
7) Figure 3.4.1:Distance from eye observed values and exponential model---------------27
8) Figure 3.4.2:Distance from eye observed values vs predicted values-------------------27
9) Figure 3.5.1:Distance from site of landfall observed values and exponential model- 29
10) Figure 3.5.2:Distancefrom site of landfall observed values vs predicted values------30
11) Figure 3.6.1: Overall speed observed values and linear model------------------------- 31
12) Figure 3.7.1. Residuals example-------------------------------------------------------------34
13) Figure B.1: Saffir-Simpson scale----------------------------------------------------------- 52
14) Figure B.2: FEMA Regions------------------------------------------------------------------53
15) Figures B.3-B.5: Declared counties examples-----------------------------------------54-56
14) Figure 5.2.1 Variance covariance matrix for multiple regression model--------------44
5
ABSTRACT
HURRICANES AND DISASTER DECLARATIONS: A STATISTICAL ANALYSIS
By
Veronica Reott
May 2005
Advisor: Dr. Will Miles
Department: Mathematics and Computer Science
This study will focus on selected hurricanes from the 1998 through 2004 hurricane
seasons.
Statistical analysis of overall speed of the hurricane, county by county
quantities for maximum sustained wind speed, total quantity of rainfall, proximity to the
eye, proximity to the coast and proximity to site of landfall determines a statistical
correlation between these factors and which counties in the states of FEMA’s region IV
are likely to be declared “disaster areas” by the governor. Each of the factors will be
regressed individually and all will be regressed simultaneously to bring about a model
whose input is a known or projected value for each of the aforementioned factors and the
output is the probability that a county will be declared a disaster area.
6
CHAPTER 1
INTRODUCTION
Charley, Frances, Ivan and Jeanne are names that have recently become common
place in most homes across Florida and the rest of the nation. These are not characters on
the newest reality television show. They are not the names of politicians who have
battled it out in this year’s elections. These are the names of deadly storms, hurricanes,
which ravaged Florida from both coasts this hurricane season.
For as far back as this researcher can remember, Florida has been virtually
invincible to hurricanes. Save hurricane Andrew in 1992, the count of hurricanes with
landfall and damaging effects in Florida has been quite low in the past few decades. Each
time there is a tropical depression or simply an area of low pressure out in the Atlantic
somewhere, the meteorologists of the world put their heads (and computers) together.
Four-day and seven-day possible path predictions are calculated and the local news
weather forecasters do the best they can to prepare the viewers for the worst. Until this
past hurricane season, their advice may or may not have been completely heeded.
Floridians have gone through many hurricane scares when right at the last moment, the
hurricane is pushed northward or back out into the Atlantic and Florida sees mild wind
and rains and no landfall. It is no surprise that some may have come to view hurricane
path predictions as the forecaster “crying wolf.”
This kind of attitude was highly
diminished during the 2004 hurricane season. Floridians were glued to their television
sets (or radios for lack of power) for days on end, several weeks in a row, as new and
ever more daunting weather systems arose that threatened their lives and their
livelihoods.
7
The Federal Emergency Management Agency or FEMA is a governmental
organization which “is tasked with responding to, planning for, recovering from and
mitigating against disasters” (2). FEMA’s involvement in the response to and recovery
from a catastrophic event officially begins when the damage from such an event reaches a
level that is beyond the capabilities of the state to handle on its own. When the damage
reaches this level, the governor of the state will request a “disaster declaration” from the
president. This declaration, given county by county, sets into motion FEMA’s processes
of assessment of the amount of federal aid that will be needed for each county and
delivery of this aid to those in need. The aid given by FEMA comes in many forms.
The process by which FEMA provides assistance to victims of hurricanes in the
state of Florida is the focus of this study. Aside from loss of life, the most devastating
damage that can occur is the loss (complete or partial) of one’s home. Therefore, the
specific type of aid that will be focused on in this study is Individual Assistance which is
allocated by the Individuals and Households Program (IHP) department of FEMA.
The Individuals and Households Program is the department that an individual can
contact in the case of a hurricane (or other disaster) to appeal for aid such as temporary
housing, repairs to one’s home, replacement of lost articles and other types of individual
assistance. The IHP uses a standard model for determining the amount of aid that will be
needed in each county. This model, known as the Preliminary Damage Assessment
(PDA), is based on the number of structures (homes, apartments, mobile homes) which
were fully or partially damaged by the hurricane, the percentage of those likely to be
insured, and the expected number of aid applications in each county. The dollar amount
8
of federal and state aid that will be needed to assist individuals each county is given by
this PDA.
Hurricane data from the 1998-2004 hurricane seasons such as the site of landfall
and overall speed of each hurricane in conjunction with county by county averages of
maximum sustained wind speed and quantity of rainfall, county by county values for
proximity to eye and proximity to coast will be used in the statistical analyses. The
probability that a county is declared given that it has certain values for the factors given
above will be found. An example of the type of calculation that will be performed is to
find the probability that a county will be declared a disaster area given that it is within
100 miles of the eye of the storm, or the probability that a county will be declared given
that it experienced maximum wind speeds less than 74 miles per hour. These conditional
probabilities will be found for different configurations of each of the factors listed above.
They will be found for each of the counties affected in each state that was declared of
FEMA’s region number four (See Figure B.2). Hypothesis tests and other tests of
statistical correlation will be performed on this data to determine how crucial each of the
factors is in determining the disaster declaration of the counties. This analysis lends itself
to a model whose input would be the overall speed of the hurricane, the projected path,
and rain band and wind speed band data. The surrounding counties that are statistically
likely to be declared as disaster areas are the output.
9
CHAPTER 2
BACKGROUND
Hurricanes are some of the most devastating disasters and therefore the most
costly. Hurricane Andrew alone cost FEMA $1.8 billion. This figure takes into account
personal loss of people’s houses, cars, businesses and other personal damages they may
have incurred as well as damage to public works such as roads, public and governmental
facilities and the costs of restoration of Wildlife Management Areas (1). Damage from
hurricanes is so severe because of the variety of destructive forces a hurricane entails.
The high speed winds blow off roofs and blow trees into houses, the intense quantity of
rainfall causes flooding during and after the storm. It is difficult for people to see that a
hurricane is an important and necessary part of the earth on which they live, especially
when it brings its destructive forces to their front door. But, hurricanes do have a very
important role to play in the highly intricate workings of the atmosphere.
The movement of the atmosphere and the interaction between the air masses
therein is a very detailed and specialized subject. There are two essential processes of the
“global weather machine,” a radiative balance between the earth and the sun, and the
transport of energy within the atmosphere around the surface of the earth (5). The sun
radiates energy constantly towards the earth at differing, visible and invisible
wavelengths. At the same time, the earth, including land, ocean and air, is radiating
energy back into space. Over time this influx and out flux of energy balance out. The
poles of the earth are tilted away from the sun for much of the year. Therefore, very little
energy is radiating into the earth in these areas while the land, ocean and air at the poles
are still radiating energy out into space. The balance of energy coming in and energy
leaving the earth, therefore, must be kept by the transport of energy within the earth’s
10
atmosphere from the equator, where the sun’s radiation of energy is the most intense,
towards the poles where much energy is lost but very little gained. This transportation of
energy “drives the global atmospheric and oceanic circulation during the year” (5).
Hurricanes, also known as typhoons or cyclones in different parts of the earth, are
very important in the transfer of large amounts of energy from the center latitudes
towards the poles. They begin as depressions along the intertropical convergence zone,
the area around 20 degrees south to 20 degrees north latitude of the equator, where the
sun beats down upon the earth with greatest intensity. The depressions that become
hurricanes show a sharp decrease in pressure at the center accompanying an increase in
wind speed and circular cloud formation about 30-60 kilometers from the center. These
storms pick up energy from the heat in the water around them. Once the wind speeds
have reached 120 kilometers per hour, the storm is labeled a hurricane. Hurricanes grow
in size and intensity quickly. The more organized the center, the more intense the
hurricane.
“They may speed up, slow down, or even stop for a while to build up strength. As
it travels across the ocean, a hurricane can pick up as much as two billion tons of
water a day through evaporation and sea sprays.” (6)
The path of an Atlantic hurricane is essentially a northward movement and also a
movement towards warmer waters whenever possible.
A hurricane is an intensely
destructive atmospheric creation that occurs because of the necessity of the atmosphere to
constantly transport energy from the equator towards the poles. Hurricanes, thus, have an
actual physical responsibility in the functioning of the earth’s atmosphere.
While they help to keep things running smoothly on a global scale, hurricanes do
definitely cause quite a few problems on the local scale. Luckily, the US government has
11
developed agencies and programs which can help minimize the devastation caused by
these monsters. FEMA was developed during the presidency of Jimmy Carter as a
centralized unit to control the handling of emergencies and disasters including, of course,
hurricanes. The Agency united the operations of the Federal Insurance Administration,
the National Fire Prevention and Control Administration, the National Weather Service
Community Preparedness Program, the Federal Preparedness Agency of the General
Services Administration and the Federal Disaster Assistance Administration among
others. In 2003 FEMA became a segment of the Department of Homeland Security.
When a hurricane is out in the Atlantic or in the Gulf and it looks as though it may
strike Florida, law enforcement and emergency teams in the threatened cities and
counties are on call, awaiting commands from higher authorities. They and the general
populace are constantly informed about the decisions being made in the governor’s office
and about what will be done in the case that the hurricane makes landfall on Florida’s
approximately 1350 miles of general coastline. Shelters open up all over the state where
people can seek refuge if they are concerned or if they need special assistance. The
governor, when the situation seems unavoidable, will sign an executive order to place the
state in a “state of emergency,” which directs each county to activate its Emergency
Operations Center and its County Emergency Management Plan. At this point
evacuations and curfews and other such procedures are put into effect according to the
governor’s executive order. Emergency personnel are centralized in the areas of most risk
to “protect the lives and property of persons in the threatened communities” (7).
Actions are being taken on a federal scale at this time as well. Since hurricanes
can be tracked and their intensity is known in advance, FEMA is usually aware before
12
landfall whether federal response is going to be necessary. No federal aid is specifically
allowed before the presidential declaration but the
“DHS [Department of Homeland Security] can use limited predeclaration
authorities to move Initial Response Resources (IRR) (critical goods typically
needed in the immediate aftermath of a disaster)…and emergency teams closer to
potentially affected areas. DHS also can activate essential command and control
structures to lessen or avert the effects of a disaster and to improve the timeliness
of disaster operations.”(8)
When the hurricane makes landfall, the local and state authorities do the best they can to
keep everyone safe and informed. Once the damage levels are more than the state can
handle, the governor requests a declaration of disaster from the president. After this
declaration is given, FEMA’s emergency response takes full effect.
Emergency
Response Teams including members of the IHP division are set up in the affected areas
and response and recovery procedures begin. This is the stage in which the Preliminary
Damage Assessments are made. Data concerning the amount of households; be they
single family, apartment homes or mobile home units which are fully or partially
damaged is collected by assessors who fly over the affected areas in helicopters and
tabulate the damage as they see it. This assessor, a member of the IHP division, then
takes this data and applies the appropriate calculations to determine the amount of federal
and state assistance needed in each area. Assistance is then given to those in need
according to federal regulations.
The process of assisting those in need of aid following a hurricane is an intricate
and interesting one. The beauty is in the logistics. There is an amazing amount of
coordination that goes on between local, state, and federal governments in the event of a
disaster such as a hurricane. Hurricanes not only help to coordinate the movements of
13
gases in the system of the atmosphere, but they also force the coordination of movements
of hundreds of people and entire agencies in the system of our government.
14
CHAPTER 3
CURRENT RESEARCH METHODS/DATA COLLECTION:
The Hurricanes that were used for this study are Hurricanes Charley, Frances,
Ivan, and Jeanne from 2004, Isabel (2003), Isidore (2002), Allison (2001), Irene (1999),
Floyd (1999), Dennis (1999), Bonnie (1998), Earl (1998), Mitch (1998), and Georges
(1998). Very integral to the analyses in this study is the designation data for each county
in each hurricane (examples: Figures B.3-B.5). The data for each of the factors (rainfall,
wind speed, etcetera) has been collected from different sources such as the NWS, the
NOAA, FEMA, and other sources such as private mapping and analysis centers. The
probability of declaration data used in this project was found by physically measuring
and counting. Maps of declared counties (examples: figures B.3-B.5) were used and, for
example, a dot was placed directly on the site of landfall, concentric circles with radii
increasing by 20 miles were drawn and the number of counties that were completely or
partially inside each area were counted and separately tabulated. The number of counties
that were declared disaster areas out of these tabulated values gives an upper and lower
bound on the probability of declaration. The upper and lower bounds were averaged for
each storm at each level (20 miles from landfall, 40 miles from eye, 6 inches of rain, etc.)
to bring about the probability of declaration shown in the tables. The probabilities at each
level were then averaged over all of the storms to give the mean. These means were used
as the observed values, Yi, in the calculations to follow.
Least squares regression has been conducted on each data set and the technique of
transformation of variables has also been used in some cases.
regression model used for these data sets is
15
The simple linear
Yi = βo + β1Xi + εi
3.0.0
The regression function for this model is
E{Y} = βo + β1X
3.0.0a
Where, from the normal equations, we obtain
(3.0.1)
β1 = (nΣxiyi – ΣxiΣyi)/(nΣxi2 – (Σxi) 2)
and
(3.0.2)
βo = Σyi/n - β1Σxi/n
The Sum of Squared Errors and the R squared values have been found for each of
the regressions to determine the goodness of fit of the regression equations found. An
ANOVA (Analysis of Variance) table is given for each individual regression. The first of
the two basic components of the ANOVA tables used here is the SSR, the regression sum
of squares, which is the sum of the squared deviations between the output values of the
regression equation and the mean of the observed values or
SSR = Σ (Ŷi - ΣYi/n)2
The second component is the SSE or Sum of Squared Errors which is the sum of the
squared deviations of the observed values from the fitted regression line or
SSE = Σ (Yi - Ŷi )2
The deviation of the observed values from their mean is given by the SSTO or the total
sum of squares.
16
SSTO = Σ (Yi - ΣYi/n)2
It is easily proven from the preceding equations that
(3.0.3)
SSTO = SSE + SSR
The coefficient of determination, or R squared, value is determined from the SSR or SSE
and the SSTO
(3.0.4)
r2 = SSR/SSTO = 1-SSE/SSTO
Other information given in the ANOVA table is the degrees of freedom associated with
each sum. The degrees of freedom associated with the SSR is the number of independent
variables under consideration, p, which, for simple linear regression is always 1. The
degrees of freedom associated with the SSE is the number of valid cases, n, minus the
number of independent variables, p, minus one.
The MSR, or regression mean square, and the MSE, error mean square, are found by
dividing their associated sum of squares by its degrees of freedom.
MSR = SSR/p
MSE = SSE/(n-p-1)
The values given by the ANOVA table provide a measure of the goodness of fit of the
model to the actual data. The MSR and MSE are used to find the test statistic F*.
(3.0.5)
F* = MSR/MSE
17
The “F test” for simple linear regression models is a test to determine whether β1=0 or
β1≠ 0. Which, given the general linear model
E{Y}= βo + β1X
determines whether there is a linear relationship between X and Y. The hypotheses are as
follows
Ho: β1=0
(3.0.6)
Ha: β1≠ 0
The F distribution is found by taking two Chi squared random variables, SSR and SSE,
dividing each by their degrees of freedom and dividing those quotients by each other. The
F distribution is F(d1, d2) where d1 is the numerator degrees of freedom and d2 is that of
the denominator. Where F(1-α,1,n-2) is the 100*(1- α) percentile of the appropriate F
distribution, the decision rule is
(3.0.7)
If F* ≤ F(1-α,1,n-2), conclude Ho
If F* > F(1-α,1,n-2), conclude Ha
In essence, the F ratio, MSR/MSE, statistic is used to conclude how confident we are that
the slope found in the regression equation did not occur by chance.
3.1 DISTANCE FROM COAST
Storm surge is one of the most devastating and damage-causing effects of a hurricane.
Coastal areas are at high risk for the damaging effects of the hurricane. The probability of
declaration given that a county is 20, 40, 60…200 miles from the coast for each storm is
18
given in table A.1. Logically, as one moves further and further inland the damaging
effects of this type diminish. Consequently, the most applicable type of function for this
data is an exponential function.
(3.1.1)
Y = β0e^β1X
Which is equivalent to,
(3.1.1a)
ln(Y) = ln(βo) + β1X
Transforming the variables gives a linear function in terms of X and a new response
variable, Y´
(3.1.2)
Y´ = βo´ + β1X
where
Y´ = ln(Y) and βo´ = ln(βo)
Using equations (3.0.1) and (3.0.2) above with Yi´ and βo´ yields,
β1 = -.002366 and βo´ = -.57711
Thus, the transformed linear regression equation is
Y´ = -.57711 - .002366X
and the regression function is given by
(3.1.3)
Y = .56152e-.002366X
19
Figure 3.1.1 displays the observed values for Y and the regression function 3.1.3.
Figure 3.1.1
Distance from coast observed values and exponential
model
Probability of
declaration
0.60000
0.50000
0.40000
Distance from
coast
0.30000
0.20000
0.10000
Expon.
(Distance from
coast)
0.00000
0
50
100
150
200
Distance from coast (miles)
250
When transforming variables, the traditional R squared value may not provide the same
information as it does for a non-transformed model. This is due to the fact that the SSR
and SSE values do not necessarily sum to equal SSTO and equation (3.0.3) fails to be
true. Thus, a separate plot of the data must be analyzed to determine the goodness of fit
of the model. This separate plot (Figure 3.1.2) is a plot of the predicted values to the
observed values at each X level. If the model (3.1.3) predicts the observed data with
100% accuracy, the data points should lie on the line y = x. The regression line fitted to
this data is
(3.1.4)
Y = -.0146 + 1.934X
20
Observed
Probability of
Declaration
Figure 3.1.2
Predicted vs Observed: Distance from coast
exponential model
0.7
0.6
0.5
0.4
0.3
y=x
0.2
0.1
0
0.0000 0.1000 0.2000 0.3000 0.4000 0.5000 0.6000 0.7000
0
0
0
0
0
0
0
0
Predicted Probability of Declaration (equation 3.1.3)
The interpretation of the R squared value for this regression equation is still the same; it
gives the amount of variance in the data that is explained by the model. Table 3.1.1 gives
the analysis of variance information for this plot.
Table 3.1.1
Source of Variation
Regression
Error
Total
SS
SSR = .03755
SSE=.00179
SSTO=.03934
df
MS
1 MSR = .03755
8 MSE = .00022
9
From equation (3.0.5),
F* = .03755/.00222 = 170.6818.> F(.999,1,8) = 25.4
So, according to the decision rule (3.0.7) we reject the null hypothesis in (3.0.6) and
conclude, with a confidence level of .999, that the slope in this data found with the
regression equation did not occur by chance.
From Table 3.1.1, we obtain an R squared value of .95450. This tells us that the function
(3.1.3)
Y = .56152e-.002366X
21
explains over 95% of the variance in the data and is a good fit to the observed probability
values.
3.2 TOTAL RAINFALL
Torrential rain is another very destructive force that hurricanes bring on to land. The
probability that a county is declared given the number of inches of total rainfall is given
in table A.2. Logically, the more rain, the more likely the disaster declaration. The look
of the data corresponds to a logarithmic model for the data. When regressing using a
logarithmic model for the data the normal equations 3.0.1 and 3.0.2 are used with the
natural logarithm of x instead of the given x value. Figure 3.2.1 shows the observed
values for the probabilities of declaration along with the logarithmic model of the data,
(3.2.1)
Y = .13821ln(x) + .51331
Probability of
declaration
Figure 3.2.1
Observed values and logarithmic model:Total
Rainfall
1.2
1
0.8
0.6
0.4
0.2
0
Series1
Log. (Series1)
0
5
10
15
20
25
Rainfall (inches)
Figure 3.2.2 shows the predicted values of the probabilities given by equation 3.2.1
against the observed values of the probabilities
22
Observed Probability of
Declaration
Figure 3.2.2
Predicted Values vs Observed Values: Total Rainfall
1.2
1
0.8
0.6
y=x
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
1.2
Predicted Probability of Declaration
The linear model running through this data is
(3.2.2)
Y = 1.00003X-.000023501
Table 3.2.1 gives the analysis of variance information for this regression equation.
Table 3.2.1
Source of Variation
Regression
Error
Total
SS
SSR = .52545
SSE = .08545
SSTO=.61090
df
MS
1 MSR = .52545
6 MSE = .01424
7
From equation (3.0.5),
F* = .52545/.01424 = 36.89958 > F(.999,1,6) = 35.5
So, according to the decision rule (3.0.7), we reject the null hypothesis in (3.0.6) and
conclude, with a confidence level of .999, that the slope in this data found with the
regression equation did not occur by chance and there is a statistical association between
total rainfall and probability of declaration.
23
From Table 3.2.1 we obtain an R squared value of .86012. This means that the regression
equation,
(3.2.2)
Y = 1.00003X-.000023501
explains about 86% of the variation in the data and thus the equation
(3.2.1)
Y = .13821ln(x) + .51331
is a more than adequate fit to the observed values.
3.3 MAXIMUM SUSTAINED WIND SPEED
The “category” of a hurricane on the Saffir-Simpson scale (Figure B.1) is determined by
the maximum sustained wind speed of the storm. The higher the sustained winds, the
more destructive the storm. The probability that a county will be declared a disaster area
given that it experienced tropical storm or hurricane force winds is displayed in table A.3.
Figure 3.3.1 shows the observed probabilities and their logarithmic model,
(3.3.1)
Y = .1167ln(x) + .3944
Probability of
Declaration
Figure 3.3.1
Max sustained wind speed observed values
and logarithmic model
1.2
1
0.8
0.6
0.4
0.2
0
0
50
100
150
Wind speed
24
Once again it is necessary to look at a plot of the predicted values versus the observed
values to get a real picture of the analysis of variance for this plot. Figure 3.3.2 shows
these values and the line y = x.
Observed probabilities of
declaration
Figure 3.3.2 Predicted versus observed values:
Max wind logarithmic model
1.2
1
Series1
0.8
Series3
0.6
Linear (Series3)
y=x
0.4
Linear (Series1)
0.2
0
0
0.5
1
1.5
Predicted probabilities of declaration
(equation 3.3.1)
The linear equation going through these points is
(3.3.2)
Y = .9999X + .00006
The analysis of variance information for this model is given in table 3.3.1.
Table 3.3.1
Source of Variation
Regression
SS
SSR = 0.6330644633
Error
SSE = 0.0066575602
3 MSE = .00221919
Total
SSTO =0.63971807932
4
df
MS
1 MSR = 0.6330644633
From equation (3.0.5),
F* = 0.6330644633/.00221919 = 285.26868 > F(.999,1,4) = 74.1
25
So, according to the decision rule (3.0.7), we reject the null hypothesis in (3.0.6) and
conclude, with a confidence level of .999, that the slope in this data found with the
regression equation did not occur by chance and there is a linear association between
maximum sustained wind speed and probability of declaration.
From Table 3.3.1 we obtain an R squared value of .98959. This means that the regression
equation,
(3.3.2)
Y = .9999X + .00006
explains almost 99% of the variation in the data and
(3.3.1)
Y = .1167ln(x) + .3944
is a superb fit to the observed values.
3.4 DISTANCE FROM EYE
The eye of a hurricane is the area directly in the middle of the storm where there
is no rain or wind. The better formed the eye, the stronger and longer lasting the storm.
The most damaging effects that a hurricane produces occur right around the eye. The
further you get from the eye the less damage you receive. The observed probabilities that
a county will be declared a disaster area given its distance from the eye are given in table
A.4. As with the distance from the coast, the best model for this data is an exponential
model. Using the transformation of variables technique described in section 3.1, the
regression equation is found to be
(3.4.1)
Y = .65671e-.00234X
26
Figure 3.4.1 shows the observed values for the probability of declaration and the
exponential model, 3.4.1.
Figure 3.4.1
Distance from eye observed values and exponential
model
Probability of
declaration
0.8
0.6
Series1
Expon. (Series1)
0.4
0.2
0
0
100
200
300
Distance (miles)
Figure 3.4.2 shows the predicted values given by equation 3.4.1 plotted against the
observed values at each x level. As before, the line y = x is also displayed in figure 3.4.2.
The linear regression equation for the data points shown in figure 3.4.2 is
(3.4.2)
Y = 1.00177X-.001003
Table 3.4.2 gives the analysis of
variance
information for this plot.
Figure
3.4.2
Predicted vs Observed values: Distance from eye
exponential model
Observed
Probabilities
0.8
0.6
0.4
y=x
0.2
0
0.00000
0.20000
0.40000
0.60000
Predicted Probabilities of declaration (equation 3.4.1)
27
0.80000
Table 3.4.1
Source of Variation
Regression
Error
Total
SS
SSR = .04740
SSE=.000536
SSTO=.04793
df
MS
1 MSR = .04740
8 MSE = .000067
9
From equation (3.0.5),
F* = .04740/.000067 = 707.46269 > F(.999,1,8) = 25.4
So, according to the decision rule (3.0.7) we reject the null hypothesis in (3.0.6) and
conclude, with a confidence level of .999, that the slope in this data found with the
regression equation did not occur by chance and there is a linear association between
distance from the eye and probability of declaration.
From Table 3.4.1, we obtain an R squared value of .98882. This tells us that the function
(3.4.1)
Y = .65671e-.00234X
explains over 98% of the variance in the data and is an excellent fit to the observed
probability values. The superb fit of this regression function is also easily verified by an
examination of figure 3.4.2. As was previously mentioned, if the regression equation is
100% accurate then all of the data points fall on the line y = x. The data points shown in
figure 3.4.2 are all very close if not exactly falling on this line.
3.5 DISTANCE FROM SITE OF LANDFALL
As a storm moves across the ocean it picks up energy from the warm water around it.
When the storm makes landfall it experiences a decrease in potency, the magnitude of
which depends on the shape and temperature of the land as well as several other factors.
The decrease in the potency of the storm decreases the amount of damage incurred by
28
counties further away from the site of landfall. The observed probabilities that a county
will be declared a disaster area given certain distances from the site of landfall are given
in table A.5. The relationship between the distance from the site of landfall and the
probability of declaration is best described by the exponential model
Y = 1.05888e-.00357X
(3.5.1)
which was arrived at using the technique of transformation of variables described in
detail in section 3.1.
Figure 3.5.1 shows the observed probabilities of declaration and exponential model 3.5.1.
Probability of
declaration
Figure 3.5.1
Distance from site of landfall: observed values
and exponential model
1.20000
1.00000
0.80000
0.60000
0.40000
0.20000
0.00000
Series1
Expon. (Series1)
0
100
200
300
Distance (miles)
Figure 3.5.2 shows the predicted values of the probabilities at each x-level given by
equation 3.5.1 plotted against the observed values at each x-level.
29
Observed
probability of
declaration
Figure 3.5.2
Predicted values vs Observed values:Distance
from site of landfall exponential model
1.20000
1.00000
0.80000
0.60000
y=x
0.40000
0.20000
0.00000
0.00000 0.20000 0.40000 0.60000 0.80000 1.00000 1.20000
Predicted probability of declaration (equation 3.5.1)
The linear regression equation obtained from these data points is
(3.5.2)
Y = .97865X + .01578
Table 3.5.1 gives the analysis of variance information for this model
Table 3.5.1
Source of Variation
Regression
Error
Total
SS
SSR = .21282
SSE=.00519
SSTO=.21801
df
MS
1 MSR = .21282
8 MSE = .0006488
9
From equation (3.0.5),
F* = .21282/.0006488 = 328.046 > F(.999,1,8) = 25.4
So, according to the decision rule (3.0.7) we reject the null hypothesis in (3.0.6) and
conclude, with a confidence level of .999, that the slope in this data found with the
regression equation did not occur by chance and there is a linear association between
distance from the site of landfall and probability of declaration.
From Table 3.5.1, we obtain an R squared value of .97619. This tells us that the function
30
Y = 1.05888e-.00357X
(3.5.1)
explains over 97% of the variance in the data and is an excellent fit to the observed
probability values.
3.6 OVERALL SPEED
The overall speed of a hurricane can have a huge effect on the areas it hits. If a hurricane
is traveling very quickly, the damaging winds and rains are fleeting and less damage is
incurred. However, if a storm is moving very slowly, one area might feel the effects for a
much longer period of time and therefore have more damage. The probabilities of
declaration given different overall speeds are given in table A.6. The basic linear
regression model used for this data is as follows
(3.6.1)
Y = -.00754X + .86153
Figure 3.6.1 shows the observed probabilities along with this regression equation.
Figure 3.6.1
Overall speed observed values and linear model
Probability of
declaration
1
0.8
0.6
Series1
0.4
Linear (Series1)
0.2
0
0
5
10
15
20
25
Speed(mph)
The analysis of variance information for this model is given in table 3.6.1.
31
Table 3.6.1
Source of Variation
Regression
Error
Total
SS
SSR = .01421
SSE=.05161
SSTO=.06582
df
MS
1 MSR = .01421
3 MSE = .01720
4
From equation (3.0.5),
F* = .01421/.01720 = .82600 < F(.90,1,3) = 5.54
So, according to the decision rule (3.0.7) we do not reject the null hypothesis in (3.0.6)
and conclude that we can not say that the slope in this data found with the regression
equation did not occur by chance and must submit that there is no apparent linear
association between overall speed and probability of declaration.
From Table 3.6.1, we obtain an R squared value of .21589. This tells us that the function
(3.6.1)
Y = -.00754X + .86153
explains only 21.589% of the variance in the data. This R squared value shows, along
with the F test, that equation 3.6.1 is a terrible fit to the observed probability values and is
essentially meaningless.
3.7 DISCUSSION
The F test score and R squared value for a simple linear regression are important values
which aid tremendously in the assessment of a regression model but, they can be
misleading. Residual analysis is a technique of regression analysis which examines the
observed error, or the difference between the observed value Yi and the fitted value given
by the regression function, Ŷi,
3.7.1
ei = Yi - Ŷi
32
Least squares regression analysis is conducted by minimizing the sum of the squares of
these error terms. Inherent in the technique of least squares regression are assumptions
about the unknown true error terms εi of the regression equation 3.0.0. It is assumed that
the εi are independent normal random variables with mean 0 and constant variance σ2.
Residual analysis is based on the idea that the appropriateness of a model can be
measured by the reflection of the assumed properties for the unknown true error terms εi
in the observed error terms or residuals. A function is appropriate for the model if the
residuals are randomly dispersed around the x axis, with no obvious trend or outlier. If
any of these fails to be true then a variety of changes to the model are warranted, such as
the change to a different type of function or the addition of more or better predictor
variables. This analysis can be conducted using any one of the following; a residual plot
against the predictor variable, a sequence plot of the residuals, a box plot of the residuals
or a normal probability plot of the residuals.
In section 3.1, the probability of declaration given distance from the coast is regressed
using the technique of transformation of variables. Logically, as the distance from the
coast increases, the probability of declaration decreases. This suggests a linear
relationship between the variables. The linear regression function found for this data set
is
(3.7.1)
Y = -.00105X + .55302
Table 3.7.1 gives the analysis of variance information for this data set.
33
Table 3.7.1
Source of
Variation
Regression
Error
Total
SS
SSR = .03666
SSE=.0027
SSTO=.039337
df
1
8
9
MS
MSR = .03666
MSE = .0003375
The F test score is 108.6222 > F(.999,1,8) = 25.4.
We would reject the null hypothesis and determine that there is a linear association
between distance from coast and probability of declaration. Also, the R squared value
found here is .93137, which would lead us to determine that the linear association exists
and this function is quite appropriate to model the behavior of our variables. But, when
we take a look at the plot of the residuals against the predictor variable (Figure 3.7.1),
Figure 3.7.1
Residuals:Distance from coast
Residual
0.03
0.02
0.01
0
-0.01 0
-0.02
50
100
150
200
250
-0.03
-0.04
-0.05
Distance from coast (miles)
it is obvious that the model is not as good as we originally thought. There is systematic
variation in the residuals between negative and positive values. This type of trend in the
residual plot demonstrates a need for a nonlinear regression function.
This analysis leads us to the use of an exponential function and transformation of
variables. The exponential regression equation
34
(3.1.3)
Y = .56152e-.002366X
has an associated R squared value of over .95 and an F test score of over 170.
Residual analysis demonstrates that the high R squared and F test values for the linear
model of this data were, in fact, misleading. By analyzing the residuals, we have
discarded the incorrect type of function and bettered the fit of the observed values to a
regression equation.
Residual analysis is an important tool in regression analysis and has been used in this
study, along with the F test score and the R squared value, to determine the goodness of
fit of each model.
With exception of 3.6.1, all of the simple regression functions found in this study have
been quite well fitted to the data. The residuals show little cause for alarm and the R
squared and F test values are high. Part of the reason that 3.6.1 turned out so poorly is
that the number of x levels for the predictor variable was so low. This is apparent in the
F ratio. The number of degrees of freedom of the SSE was n – 2 = 3. The SSE was
almost five times the SSR and the small number of degrees of freedom allowed for the
MSE to be greater than the MSR. The F test value, therefore, was less than one, which
tells us that we are only slightly more than 50% confident that there is a linear association
between the variables. The model could be bettered by the use of more information in
the observational quantities. The low R squared value and F test score lead us to
determine that the observed quantities for overall speed provide no important information
to this regression analysis and thus this variable will be left out of the multiple
regressions in Chapter 4.
35
CHAPTER 4
MULTIPLE LINEAR REGRESSION ANALYSIS OF DATA
4.1 MULTIPLE LINEAR REGRESSION TECHNIQUES
In chapter three, regression equations were found for each of the individual variables.
Most of the regression equations found for each of the variables in sections 3.1-3.6 were
good fits to the data. The general linear regression equation, 3.0.0, can be extended to
include any number of predictor variables. The regression model used here will be of the
form
(4.1.1)
Yi = βo + β1Xi1 + β2Xi2 + β3Xi3 +…+ βp-1Xp-1 + εi
The addition of more predictor variables to the model is important for many reasons. In
this study in particular, the probability that a county will be declared a disaster area is
affected by all of the factors regressed in sections 3.1-3.5. A hurricane brings with it rain,
wind and storm surge and a model determining the probability of declaration must
include all of these variables to be complete.
4.1.1 USE OF MATRICES IN REGRESSION ANALYSIS
The normal regression model for multiple linear regression
(4.1.1)
Yi = βo + β1Xi1 + β2Xi2 + β3Xi3 +…+ βp-1Xp-1 + εi
In matrix terms is
(4.1.2)
Y = Xβ + ε
where Y is the n x 1 vector of observed values
36
(4.1.3)
Y = {{Y1},{ Y2},…,{ Y1}}
X is the n x p matrix of constants
(4.1.4) X = {{1, X11, X12,…, X1,p-1},{1, X21, X22,… X2,p-1},…,{1, Xn1, Xn2,… Xn,p-1}}
and β is the p x 1 vector of parameters
(4.1.5)
β = {{β0},{β1},…,{βp-1}}
The least squares estimated regression coefficients are those values of β that minimize the
sum of the squared error in the model. The vector of estimated regression coefficients is
(4.1.6)
b = {{b0},{b1},…,{bp-1}}
which are found using the least squares normal equations for model 4.1.2
X′Xb = X′Y
(4.1.7)
Accordingly,
(4.1.8)
b = (X′X)-1 (X′Y)
The ANOVA values SSE, SSR and SSTO, in matrix terms are as follows
(4.1.9)
SSE = Y′Y - b′X′Y
(4.1.10)
SSR = b′X′Y – (1/n)Y′JY
(4.1.11)
SSTO = Y′Y - (1/n)Y′JY
Where J is an n x n matrix of 1’s.
37
Matrices are so helpful because they provide a simple view of the complex mathematics
involved in multiple regression analysis.
4.1.2 DATA COLLECTION AND MISSING DATA
In this study, as was mentioned previously, the observed values for the probabilities of
declaration were found by physically drawing areas on a map and counting the number of
declared counties in the area for each variable. This method of collection of data has
posed an interesting problem in the multiple regression analysis. Normally, for each
output value, there are values for each of the input variables that correspond with it. This
is not the case in this research. The probabilities of declaration are specifically found for
each predictor variable and do not correspond with those found for the other predictor
variables. Therefore, the X matrix is incomplete. The missing values must be filled in
for the regression to be performed. Two different methods of filling in the missing data
have been used. The first method is to fill the rest of the matrix with zeros. This is
reasonable in that the output values really only were dictated by the input variable to
which they correspond. Regression equations found by filling the rest of the matrix with
zeros are a representation of only the real observed values. Another method of filling in
the missing data is to use the mean of each x variable. This is reasonable because the
mean is the expected value of the variable. The amount of rain that occurred in the
declared counties with wind speeds of 74 miles per hour was not tabulated but there is no
doubt that there was some rain. How much? We don’t know. But the mean value of
rainfall amounts is a worthy estimate.
38
4.2 MULTIPLE REGRESSION MODELS
The following multiple regression models were found using matrix techniques and
Mathematica. For the following models: x1=distance from coast, x2=total rainfall,
x3=maximum sustained wind speed, x4=distance from eye, and x5=distance from site of
landfall. Starting with x1, each new variable is added to bring about the following models.
The first equation of each set was found using zeros to fill in the missing data. The
second of each set was found using the means of the x values to fill in the missing data.
(4.2.1a)
0.43025 - 0.000176767x1 + 0.0346808x2
(4.2.1b)
0.266553 - 0.00105368x1 + 0.0440763x2
(4.2.2a)
0.444471 - 0.000278347x1 + 0.0244849x2 + 0.00316138x3
(4.2.2b)
-0.0905743 - 0.00105368x1 + 0.031088x2 + 0.00653304x3
(4.2.3a)
0.489208 - 0.000597892x1 + 0.021941x2 + 0.0028497x3 - 0.0000922999x4
(4.2.3b)
0.0242913 - 0.00105368x1 + 0.0310848x2 + 0.00653461x3 - 0.00119645x4
(4.2.4a) 0.58467 - 0.0012798x1 + 0.016473x2 + 0.0021984x3 - 0.0007742x4 + .000494x5
(4.2.4b) 0.35064 - 0.00105368x1 + 0.03117x2 + 0.00649x3 - 0.00119645x4 - 0.0025493x5
Table 4.2.1 gives the ANOVA information for each equation.
39
Table 4.2.1
4.2.1a
4.2.1b
4.2.2a
4.2.2b
4.2.3a
4.2.3b
4.2.4a
4.2.4b
SSR
0.756819
0.526202
1.001795
0.977658
1.002759
1.025121
0.99398
1.23974
SSE
0.112213
0.34283
0.543275
0.567412
0.609991
0.587629
1.08718
0.84142
SSTO
0.869032
0.869032
1.54507
1.54507
1.61275
1.61275
2.08116
2.08116
R squared
0.8708759
0.6055036
0.6483816
0.6327597
0.6217696
0.6356354
0.4776086
0.5956966
MSE
0.0080152
0.0244879
0.0301819
0.0315229
0.0225923
0.021764
0.0301994
0.0233728
MSR
0.3784095
0.263101
0.3339317
0.325886
0.2506898
0.2562803
0.198796
0.247948
F
47.211402
10.744141
11.063955
10.338075
11.096267
11.7754
6.5827701
10.60841
Just as in simple regression analysis, in multiple regression there is an F test for a
regression relation. As before, the MSR and MSE are used to find the test statistic F*.
(3.0.5)
F* = MSR/MSE
The “F test” for multiple linear regression models is a test to determine whether β1= β2=
…= βp-1 = 0 or not all βk equal zero. The hypotheses are as follows
(4.2.5)
Ho: β1= β2= …= βp = 0
Ha: not all βk (k=1,…,p-1) equal zero
Where F(1-α, p-1, n-p) is the 100*(1- α) percentile of the appropriate F distribution, the
decision rule is
(4.2.6)
If F* ≤ F(1-α,p-1,n-p), conclude Ho
If F* > F(1-α,p-1,n-p), conclude Ha
Table 4.2.1 gives the F test scores for each of the regressions.
The F test scores here give us valuable information when it comes to choosing the correct
model.
40
For equation 4.2.1a
F* = 47.211402 > F(.975,2,14) = 39.4
For 4.2.1b
F* = 10.744141 > F(.90,2,14) = 9.4175
For 4.2.2a
F* = 11.06395 > F(.95,3,18) = 8.68
For 4.2.2b
F* = 10.338075 > F(.95,3,18) = 8.68
For 4.2.3a
F* = 11.096267 > F(.975,4,27) = 8.48
For 4.2.3b
F* = 11.7754 > F(.975,4,27) = 8.48
For 4.2.4a
F* = 6.5827701 > F(.975,5,36) = 6.20
For 4.2.4b
F* = 10.60841 > F(.99,5,36) = 9.344
41
CHAPTER 5
RESULTS
5.1 CHOOSING THE CORRECT MODEL
For the models constructed using the means to fill in the data, the confidence level of the
F test increases as the number of variables increases. The more variables we add, the
more confident we are that we can reject the null hypothesis in (4.2.5) and conclude that
not all Bk = 0 and that there is a linear association between the predictor and response
variables that did not happen by chance. A simple comparison of the R squared values
indicates that, for the models including more variables, the “means” models explain more
of the variation from the data. Another indication that the “means” equations are a better
choice is that, for the “zeros” equations, the R squared values decrease as the number of
variables increases. The goodness of fit of each model to the actual data decreases. The
final reason for our choice of the “means” models is that each of the coefficients always
have the logically correct sign no matter which variables are in the model. The
probability of declaration would logically increase as wind speed and total rainfall
increase. Also, the probability would decrease as the distances from the eye, site of
landfall and coast increase. Therefore, x2=total rainfall, and x3=maximum sustained wind
speed should have positive coefficients while x4=distance from eye, x5=distance from site
of landfall and x1=distance from coast should have negative coefficients. The best model
including all necessary variables is, therefore,
(4.2.4b) 0.35064 - 0.00105368x1 + 0.03117x2 + 0.00649x3 - 0.00119645x4 - 0.0025493x5
As previously mentioned, we can, with a confidence level of .99, reject the null
hypothesis in (4.2.5) and conclude that not all coefficients are zero and there is thus a
42
non-random linear statistical association between these five predictor variables and the
probability of disaster declaration.
Also, this model has a coefficient of multiple determination,
R2 = 0.5956966
which gives a coefficient of multiple correlation of
R = .77181.
5.2 TESTING THE MODEL
Now that we have chosen the model we wish to test whether any of the variables should
be removed from the model. The test is set up similarly to the F test with hypotheses,
(5.2.1)
Ho: βk=0
Ha: βk≠ 0
The test statistic here uses the t distribution
(5.2.2)
t* = bk/s{bk}
The decision rule with this test statistic at a level of significance α, is
(5.2.3)
If |t*| ≤ t(1-α/2,n-p), conclude Ho
Otherwise, conclude Ha.
We will use the estimated variance-covariance matrix given by
(5.2.4)
s2{b}= MSE(X′X)-1
s2{b}= {{ s2{b0}, s2{b0,b1}, …,s2{b0,bp-1}},
{s2{b1, b0}, s2{b1}, …,s2{b1,bp-1}},…,
{s2{bp-1, b0}, s2{bp-1, b0}, …,s2{bp-1,bp-1}}}=
43
Figure 5.2.1
0.040968044
0.000077909
0.000077909 7.08266 10^ 7
0.000585928
0
0.00012515
0
0.000077909
0
0.000077909
0
0.000585928
0.00012515
0.000077909
0.000077909
0
0
0
0
0.00010136
4.33169 10^ 6
0
0
4.33169 10^ 6 2.178814 10^ 6
0
0
0
0
7.08266 10^ 7
0
0
0
0
7.08266 10^ 7
s2{b1}=7.08266*10-7
s{b1} = .000841585
|t*| = |-.0010537/.000841585| = |-1.25204| = 1.25204 > t(.85,36) =1.052
We conclude, with level of significance α = .3 and a confidence level of .85 that we reject
the null hypothesis and the b1 term should stay in the model.
s2{b2}=.00010136
s{b2} = .01006777
|t*| = |.03117/.01006777| = |3.09602| = 3.09602 > t(.9975,36) = 3.0236
We conclude, with a significance level of .005, and a confidence level of .9975 that we
reject the null hypothesis and the b2 term should stay in the model.
s2{b3}= 2.178814*10-6
s{b3} = .00147608
|t*| = |.0064901/.00147608| = |4.39685| = 4.39685 > t(.9995,36) = 3.589
Therefore, we conclude, with a significance level of .001, and a confidence level of .9995
that we reject the null hypothesis and the b3 term should stay in the model.
s2{b4}=7.08266*10-7
s{b4} = .000841585
|t*| = |-.0011965/.000841585| = |-1.42172| = 1.42172 > t(.90,36) = 1.30958
So, we conclude, with a significance level of .2, and a confidence level of .9 that we
reject the null hypothesis and the b4 term should stay in the model.
44
s2{b5}=7.08266*10-7
s{b5} = .000841585
|t*| = |-.0025493/.000841585| = |-3.029165| = 3.029165 > t(.9975,36) = 3.0236
Thus, we conclude, with a significance level of .005, and a confidence level of .9975 that
we reject the null hypothesis and the b5 term should stay in the model.
The t test scores found above show that each of the variables is significant at least to a
level of .85. This is an acceptable significance level for us to determine that none of the
variables needs to be dropped from the model.
5.2.1 MULTICOLLINEARITY
The appropriateness of the model can be determined by examining the Bk’s after
the addition of each variable. The addition of more variables to the model should not
drastically change the values of the coefficients. If it does, then there is concern that
there may be either interaction effects or multicollinearity between some of the variables.
When two or more predictor variables are correlated, the values of the coefficients
depend on which variables are already in the model and which ones are not. The reason
for this is that each coefficient is supposed to determine the effect of a single unit
increase of the variable to which they correspond on the model when the rest of the
variables are held constant. If there is correlation among the variables then the
coefficient on any one variable can reflect a partial effect of more than one variable
depending on which variables are already in the model. Table 5.2.2 shows the values for
the coefficients for different configurations of variables in the “means” models
.
45
Table 5.2.2
b1
b2
b3
b4
x1,x2
-0.0010537
x1,x2,x3
-0.0010537
0.031088
0.006533
x1,x2,x3,x4
-0.0010537
0.0310848
0.0065346
x1,x2,x3,x4,x5
-0.0010537
x2,x3,x4,x5
b5
0.0440763
-0.0011965
0.03117
0.00649
-0.0011965
-0.0025493
0.0311733
0.0064901
-0.0011965
-0.0025493
0.0078223
-0.0011965
-0.0025493
-0.0011965
-0.0025493
x1,x3,x4,x5
-0.0010537
x1,x2,x4,x5
-0.0010537
0.0440761
x1,x2,x3,x5
-0.0010537
0.0311733
0.0064901
-0.0025493
As you can see, there is very little evidence of multicollinearity shown by this test. The
values for each of the coefficients are very close if not exactly equal from model to model
no matter what other variables are in the model.
5.3 FINALIZING
The essential idea behind this multiple regression is to determine the probability of
declaration of a county given values for the predictor variables. This means that the
output value of the model is a probability and as such must be a value between zero and
one. We must, hence, place restrictions on the model by making it a piecewise function
in the following fashion. The final model, then is
Probability of declaration =
{0 when Y = 0.35064 - 0.00105368x1 + 0.03117x2 + 0.00649x3 - 0.00119645x4 - 0.0025493x5 < 0
Y when 0 ≤ Y ≤ 1
1 when Y > 1}
46
SUMMARY
The research done in this study is very new in that part of the data used is
Hurricane data from the 2004 hurricane season. Also, according the Florida State Hazard
Mitigation Plan, “At this time [June 2004], the Risk Assessment only includes
information from the overall statewide risk assessment and does not include risk
information from local jurisdictions” and “no local plans were approved by FEMA and
the integration of local risk assessment data into the state plan was premature [at the time
of the draft of the FSHMP, April 2004]” (10, page 89). This research also departs from a
solely local and state analysis and includes a federal aspect. In this research, as it should
be in politics, meeting the needs of the individual comes first. The focus is on the
Individuals and Households Program division of FEMA in hopes to better serve the
uninsured person with a tree through their roof or the mother with a month’s worth of
food that they need to feed their children rotting in the refrigerator. These are the people
who should get federal aid first. This is the reason for this study.
47
REFERENCES
1) FEMA News release, October 20, 2004,
http://www.fema.gov/news/newsrelease.fema?id=14919
2) “FEMA History”, October 22, 2004
http://www.fema.gov/about/history.shtm
3) “The disaster process and disaster aid programs,” October 22, 2004,
http://www.fema.gov/library/dproc.shtm
4) FEMA Mapping and Analysis Center, September 22, 2004,
http://www.gismaps.fema.gov/gis04.shtm
5) Burroughs, WJ, Watching the World’s Weather, Cambridge UP, 1991
6) “Wind speed in a hurricane,” 1999,
http://hypertextbook.com/facts/StephanieStern.shtml
7)State of Florida Executive Order number 04-217, September 24, 2004,
http://floridadisaster.org/eoc/eoc_Activations/Jeanne04/Executive%20Orders/ExecutiveOrder04-217.pdf
8) FEMA’s Federal Response Plan (II-A-3)
http://www.fema.gov/txt/rrr/frp/frp_a_basicplan.txt
9) State of Florida Hazard Mitigation Plan, Effective Date-August 25, 2004,
http://www.floridadisaster.org/eoc/haz_mit/State%20Plan%20Revised%2008%2027%2004.pdf
11) FEMA National Situation Update, November 9, 2004,
http://www.fema.gov/emanagers/2004/nat110904.shtm
12) “Region IV Disaster History” November 23, 2004,
http://www.fema.gov/regions/iv/disasters_region4.fema
48
APPENDIX A
TABLE A.1
Distance from Bonnie
coast (miles)
20
Earl
Georges
Mitch
Allison
Dennis
Floyd
Irene
1
0.13961
0.27551
0.16234
0.06818
0.41667
40
0.97826
0.14688
0.25316
0.16927
0.04792
0.2668
0.93889 0.48822
60
0.94052
0.090353
0.28422
0.17289
0.06024
0.22845
0.88778 0.47962
80
0.85128
0.086956
0.32117
0.17391
0.07246
0.17692
0.8375 0.49819
100
0.72807
0.086956
0.35708
0.17391
0.07246
0.14561
0.77857 0.52968
120
0.64884
0.086956
0.36231
0.17391
0.07246
0.12977
0.77087 0.55665
140
0.55274
0.086956
0.36121
0.17391
0.07246
0.11055
0.75623 0.58607
160
0.493
0.086956
0.34257
0.17391
0.07246
0.09856
0.73228 0.60126
180
0.45238
0.086956
0.32264
0.17391
0.07246
0.09048
0.70399 0.60576
200
0.42329
0.08466
0.67259 0.59884
Isidore
0.086956
Isabel
Charley
0.30293
Frances
0.17391
Ivan
0.07246
Jeanne
0.95652
0.6
Mean
1
1
0.72727
0.57612
0.24286
0.8263
0.57081
0.91667
1
0.4375
0.49556
0.17966
0.79063
0.50782
0.625
1
0.39153
0.48763
0.1973
0.75125
0.47120
0.56548
0.98649
0.37681
0.45547
0.20897
0.76812
0.45569
0.39254
0.94913
0.37681
0.49431
0.20646
0.76812
0.43284
0.30128
0.90293
0.37681
0.48619
0.20142
0.76812
0.41704
0.27957
0.83091
0.37681
0.46771
0.19649
0.76812
0.40141
0.22069
0.75752
0.37681
0.45699
0.19509
0.76812
0.38402
0.1875
0.71276
0.37681
0.45377
0.19526
0.76812
0.37163
0.16197
0.64758
0.37681
0.45573
0.19551
0.76812
0.35867
TABLE A.2
Total Rainfall Bonnie
Earl
(inches)
<3
0.12603
Georges
Mitch
Allison
Dennis
Floyd
Irene
0
0.00855
0
0
0.03636
0.17856
0.29935
3-6
0.94737
0.275
0.17679
0
0.03571
0.12121
0.8875
1
6-9
1
0.27273
0.2449
0.225
0.15385
0.19643
1
1
9-12
1
0.4
0.43919
0.91667
0.8
1
1
12-15
1
1
0.56
1
1
1
15-18
0.90476
1
1
1
18-21
1
21+
1
1
Isidore
Isabel
Charley
Frances
1
Ivan
Jeanne
Mean
0
0.58232
0.42767
0.22012
0.18699
0.39649
0.175888571
0.03125
1
0.93103
0.44717
0.61967
0.97222
0.53178
0.04594
1
0.66667
0.7318
0.93182
0.85281
0.9872
1
0.425
0.594425
0.724369091
0.92308
0.926154286
1
0.980952
1
1
1
49
TABLE A.3
Max
sustained
wind speed
(mph)
<39
Bonnie
Earl
Georges
Mitch
Allison
Dennis
Floyd
Irene
0
0
0
0
0
0
0.05263
0.44186
40-73
0.04
0.24038
0.51397
0.14583
0.3125
0.15476
0.53778
0.52268
74-95
0.83774
0.25397
0.75639
0.79634
96-110
1
111-130
1
1
0.33333
1
131-155
Isidore
Isabel
Charley
Frances
Ivan
Jeanne
Mean
0
0
0.06369
0
0.13071
0
0.04921
0.16405
0.38333
0.52602
0.51515
0.4709
0.60165
0.36636
0.97619
1
0.97826
0.88194
1
0.83120
1
1
1
0.95
0.92037
1
1
0.97500
1
0.9
1
1
TABLE A.4
Minimum
distance
from eye
(miles)
Bonnie
Earl
Georges
Mitch
Allison
Dennis
Floyd
Irene
20
0.825
0.5
0.88095
0
0
0.16333
1
1
40
0.85027
0.625
0.53914
0
0
0.1446
1
1
60
0.82848
0.35
0.5
0.06667
0
0.11543
1
1
80
0.73455
0.25397
0.51042
0.05263
0
0.10317
1
0.88
100
0.66616
0.25833
0.50972
0.05833
0
0.08098
0.99074
0.85714
120
0.59552
0.28039
0.46163
0.04805
0
0.07448
0.95522
0.74235
140
0.53726
0.26389
0.43269
0.04352
0.18182
0.06863
0.90921
0.73072
160
0.49672
0.27525
0.37583
0.03807
0.21667
0.06597
0.8761
0.69062
180
0.46649
0.28043
0.35381
0.03486
0.25427
0.06419
0.82266
0.67643
200
0.42892
0.24582
0.3067
0.03083
0.26611
0.06316
0.78038
0.68465
Isidore
Isabel
Charley
Frances
Ivan
Jeanne
Mean
0.04762
1
0.92857
0.385
0.90625
1
0.61691
0.12821
1
0.93182
0.54729
0.75974
0.96875
0.60677
0.12791
1
0.84806
0.60373
0.7799
0.95238
0.58375
0.12519
1
0.72857
0.66253
0.51513
0.90936
0.53397
0.11397
0.9881
0.61043
0.68987
0.64589
0.86683
0.52404
0.0943
0.92981
0.60354
0.67425
0.57113
0.80398
0.48819
0.08926
0.84728
0.56845
0.64883
0.50251
0.79807
0.47301
0.08643
0.77821
0.54791
0.62728
0.47063
0.7717
0.45124
0.08537
0.67911
0.52574
0.60722
0.4379
0.74974
0.43130
0.08537
0.637
0.5098
0.57656
0.40737
0.75385
0.41261
50
TABLE A.5
Distance
Bonnie
from site of
landfall
(miles)
20
Earl
Georges
Mitch
1
0.5
1 Na
40
1
0.625
1 Na
60
0.95833
0.31538
1
Dennis
Floyd
Irene
1 na
1
0.9375
0.91667
1
0.16667
0.60714
0.94444
1
80
0.92105
0.1709
1
0.07692
0.46402
1
1
100
0.79762
0.14884
1
0.09375
0.27604
1
0.92857
120
0.67936
0.13645
0.91667
0.05833
0.20203
0.925
0.9375
140
0.53472
0.09514
0.91667
0.05214
0.17237
0.90891
0.80128
160
0.45989
0.09327
0.51528
0.04423
0.14205
0.86585
0.76667
180
0.38492
0.07984
0.46408
0.04058
0.11785
0.85325
0.74286
200
0.34789
0.06184
0.39198
0.03465
0.10694
0.79105
0.69385
Isidore
Isabel
Charley
Frances
Ivan
Jeanne
Mean
na
1
1.00000
1.00000
1.00000
1.00000
0.95
na
1
1.00000
1.00000
1.00000
1.00000
0.9526518
na
1
0.95455
1.00000
1.00000
0.95000
0.8247088
na
1
0.92308
1.00000
0.97059
0.95833
0.7904074
1
1
0.82500
0.97222
0.95238
0.82639
0.7554471
1
1
0.77473
0.95000
0.89299
0.81944
0.7148079
0.94444
1
0.68473
0.92582
0.83879
0.81010
0.6680857
0.65
0.91964
0.78344
0.92006
0.82647
0.80016
0.5990011
0.41506
0.77441
0.66092
0.92970
0.76087
0.82424
0.5421983
0.27604
0.70972
0.65245
0.93725
0.72359
0.84793
0.5057832
TABLE A.6
Overall
Bonnie
Earl
speed (mph)
<5
0.75676
5-10
Georges
Allison
Dennis
Floyd
0.77143
0.66667
1.00000
0.30769
1.00000
15-20
20-25
Isidore
0.06250
Isabel
Charley
Frances
Ivan
1.00000
Jeanne
Mean
0.87838
1.00000
0.25658
0.88762
0.28571
0.80000
1.00000
1.00000
Irene
1.00000
1.00000
10-15
Mitch
0.72222
0.75000
0.95238
0.60039
0.86111
0.70313
51
APPENDIX B
FIGURE B.1
52
FIGURE B.2
53
FIGURE B.3
54
FIGURE B.4
55
FIGURE B.5
56
Download