HURRICANES AND DISASTER DECLARATIONS: A STATISTICAL ANALYSIS By VERONICA REOTT A SENIOR RESEARCH PAPER PRESENTED TO THE DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE OF STETSON UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF BACHELOR OF SCIENCE STETSON UNIVERSITY 2005 ACKNOWLEDGEMENTS I would like to acknowledge all of my mathematics professors at Stetson University, especially Dr. Will Miles who has advised, encouraged, and believed in me. I would also like to acknowledge Joanne Spano, who has helped me so very much throughout this whole endeavor and without whose acquaintance I may never have come up with this idea. 2 TABLE OF CONTENTS ACKNOWLEDGEMENTS ---------------------------------------------------------------- 2 LIST OF TABLES--------------------------------------------------------------------------- 4 LIST OF FIGURES-------------------------------------------------------------------------- 5 ABSTRACT---------------------------------------------------------------------------------- 6 CHAPTERS 1. INTRODUCTION------------------------------------------------------------------------ 7 2. BACKGROUND--------------------------------------------------------------------------10 3. SIMPLE LINEAR REGRESSIONS----------------------------------------------------15 3.1. Distance from coast------------------------------------------------------------18 3.2. Total Rainfall-------------------------------------------------------------------22 3.3. Maximum sustained wind speed---------------------------------------------24 3.4. Distance from eye--------------------------------------------------------------26 3.5. Distance from site of landfall-------------------------------------------------28 3.6. Overall speed--------------------------------------------------------------------31 3.7. Discussion-----------------------------------------------------------------------32 4. MULTIPLE REGRESSIONS-----------------------------------------------------------36 4.1 Techniques-----------------------------------------------------------------------36 4.1.1 Use of Matrices------------------------------------------------------36 4.1.2 Data collection and missing data----------------------------------38 4.2 Multiple regression models---------------------------------------------------39 5. RESULTS----------------------------------------------------------------------------------42 5.1 Choosing the correct model--------------------------------------------------- 42 5.2 Testing the model---------------------------------------------------------------43 5.2.1 Multicollinearity-----------------------------------------------------45 5.3 Finalizing-------------------------------------------------------------------------46 SUMMARY---------------------------------------------------------------------------------- 47 APPENDICES--------------------------------------------------------------------------------49 A: Probability Tables---------------------------------------------------------------49 B: FEMA Figures-------------------------------------------------------------------52 REFERENCES------------------------------------------------------------------------------- 48 3 LIST OF TABLES 1) Table A.1:Distance from coast stats-------------------------------------------------------- 49 2) Table 3.1.1:Distance from coast ANOVA--------------------------------------------------21 3) Table A.2:Total rainfall stats -----------------------------------------------------------------49 4) Table 3.2.1:Total rainfall ANOVA----------------------------------------------------------23 5) Table A.3:Max wind speed stats ----------------------------------------------------------- 50 6) Table 3.3.1:Max wind speed ANOVA------------------------------------------------------25 7) Table A.4:Distance from eye stats---------------------------------------------------------- 50 8) Table 3.4.1:Distance from eye ANOVA----------------------------------------------------28 9) Table A.5:Distance from site of landfall stats -------------------------------------------- 51 10) Table 3.5.1:Distance from site of landfall ANOVA--------------------------------------30 11) Table A.6:Overall speed stats ---------------------------------------------------------------51 12) Table 3.6.1:Overall speed ANOVA---------------------------------------------------------32 13) Table 3.7.1 Distance from coast linear model ANOVA----------------------------------34 14) Table 4.2.1 Multiple regression ANOVA information and F test scores---------------40 15) Table 5.2.2 Coefficient comparison test for multicollinearity---------------------------46 4 LIST OF FIGURES 1) Figure 3.1.1:Distance from coast observed values and exponential model ------------20 2) Figure 3.1.2:Distance from coast observed values vs predicted values-----------------21 3) Figure 3.2.1: Total Rainfall observed values and logarithmic model-------------------22 4) Figure 3.2.2: Total Rainfall predicted versus observed values---------------------------23 5) Figure 3.3.1: Max sustained wind speed observed values and logarithmic model----24 6) Figure 3.3.2 Max sustained wind speed predicted versus observed values------------25 7) Figure 3.4.1:Distance from eye observed values and exponential model---------------27 8) Figure 3.4.2:Distance from eye observed values vs predicted values-------------------27 9) Figure 3.5.1:Distance from site of landfall observed values and exponential model- 29 10) Figure 3.5.2:Distancefrom site of landfall observed values vs predicted values------30 11) Figure 3.6.1: Overall speed observed values and linear model------------------------- 31 12) Figure 3.7.1. Residuals example-------------------------------------------------------------34 13) Figure B.1: Saffir-Simpson scale----------------------------------------------------------- 52 14) Figure B.2: FEMA Regions------------------------------------------------------------------53 15) Figures B.3-B.5: Declared counties examples-----------------------------------------54-56 14) Figure 5.2.1 Variance covariance matrix for multiple regression model--------------44 5 ABSTRACT HURRICANES AND DISASTER DECLARATIONS: A STATISTICAL ANALYSIS By Veronica Reott May 2005 Advisor: Dr. Will Miles Department: Mathematics and Computer Science This study will focus on selected hurricanes from the 1998 through 2004 hurricane seasons. Statistical analysis of overall speed of the hurricane, county by county quantities for maximum sustained wind speed, total quantity of rainfall, proximity to the eye, proximity to the coast and proximity to site of landfall determines a statistical correlation between these factors and which counties in the states of FEMA’s region IV are likely to be declared “disaster areas” by the governor. Each of the factors will be regressed individually and all will be regressed simultaneously to bring about a model whose input is a known or projected value for each of the aforementioned factors and the output is the probability that a county will be declared a disaster area. 6 CHAPTER 1 INTRODUCTION Charley, Frances, Ivan and Jeanne are names that have recently become common place in most homes across Florida and the rest of the nation. These are not characters on the newest reality television show. They are not the names of politicians who have battled it out in this year’s elections. These are the names of deadly storms, hurricanes, which ravaged Florida from both coasts this hurricane season. For as far back as this researcher can remember, Florida has been virtually invincible to hurricanes. Save hurricane Andrew in 1992, the count of hurricanes with landfall and damaging effects in Florida has been quite low in the past few decades. Each time there is a tropical depression or simply an area of low pressure out in the Atlantic somewhere, the meteorologists of the world put their heads (and computers) together. Four-day and seven-day possible path predictions are calculated and the local news weather forecasters do the best they can to prepare the viewers for the worst. Until this past hurricane season, their advice may or may not have been completely heeded. Floridians have gone through many hurricane scares when right at the last moment, the hurricane is pushed northward or back out into the Atlantic and Florida sees mild wind and rains and no landfall. It is no surprise that some may have come to view hurricane path predictions as the forecaster “crying wolf.” This kind of attitude was highly diminished during the 2004 hurricane season. Floridians were glued to their television sets (or radios for lack of power) for days on end, several weeks in a row, as new and ever more daunting weather systems arose that threatened their lives and their livelihoods. 7 The Federal Emergency Management Agency or FEMA is a governmental organization which “is tasked with responding to, planning for, recovering from and mitigating against disasters” (2). FEMA’s involvement in the response to and recovery from a catastrophic event officially begins when the damage from such an event reaches a level that is beyond the capabilities of the state to handle on its own. When the damage reaches this level, the governor of the state will request a “disaster declaration” from the president. This declaration, given county by county, sets into motion FEMA’s processes of assessment of the amount of federal aid that will be needed for each county and delivery of this aid to those in need. The aid given by FEMA comes in many forms. The process by which FEMA provides assistance to victims of hurricanes in the state of Florida is the focus of this study. Aside from loss of life, the most devastating damage that can occur is the loss (complete or partial) of one’s home. Therefore, the specific type of aid that will be focused on in this study is Individual Assistance which is allocated by the Individuals and Households Program (IHP) department of FEMA. The Individuals and Households Program is the department that an individual can contact in the case of a hurricane (or other disaster) to appeal for aid such as temporary housing, repairs to one’s home, replacement of lost articles and other types of individual assistance. The IHP uses a standard model for determining the amount of aid that will be needed in each county. This model, known as the Preliminary Damage Assessment (PDA), is based on the number of structures (homes, apartments, mobile homes) which were fully or partially damaged by the hurricane, the percentage of those likely to be insured, and the expected number of aid applications in each county. The dollar amount 8 of federal and state aid that will be needed to assist individuals each county is given by this PDA. Hurricane data from the 1998-2004 hurricane seasons such as the site of landfall and overall speed of each hurricane in conjunction with county by county averages of maximum sustained wind speed and quantity of rainfall, county by county values for proximity to eye and proximity to coast will be used in the statistical analyses. The probability that a county is declared given that it has certain values for the factors given above will be found. An example of the type of calculation that will be performed is to find the probability that a county will be declared a disaster area given that it is within 100 miles of the eye of the storm, or the probability that a county will be declared given that it experienced maximum wind speeds less than 74 miles per hour. These conditional probabilities will be found for different configurations of each of the factors listed above. They will be found for each of the counties affected in each state that was declared of FEMA’s region number four (See Figure B.2). Hypothesis tests and other tests of statistical correlation will be performed on this data to determine how crucial each of the factors is in determining the disaster declaration of the counties. This analysis lends itself to a model whose input would be the overall speed of the hurricane, the projected path, and rain band and wind speed band data. The surrounding counties that are statistically likely to be declared as disaster areas are the output. 9 CHAPTER 2 BACKGROUND Hurricanes are some of the most devastating disasters and therefore the most costly. Hurricane Andrew alone cost FEMA $1.8 billion. This figure takes into account personal loss of people’s houses, cars, businesses and other personal damages they may have incurred as well as damage to public works such as roads, public and governmental facilities and the costs of restoration of Wildlife Management Areas (1). Damage from hurricanes is so severe because of the variety of destructive forces a hurricane entails. The high speed winds blow off roofs and blow trees into houses, the intense quantity of rainfall causes flooding during and after the storm. It is difficult for people to see that a hurricane is an important and necessary part of the earth on which they live, especially when it brings its destructive forces to their front door. But, hurricanes do have a very important role to play in the highly intricate workings of the atmosphere. The movement of the atmosphere and the interaction between the air masses therein is a very detailed and specialized subject. There are two essential processes of the “global weather machine,” a radiative balance between the earth and the sun, and the transport of energy within the atmosphere around the surface of the earth (5). The sun radiates energy constantly towards the earth at differing, visible and invisible wavelengths. At the same time, the earth, including land, ocean and air, is radiating energy back into space. Over time this influx and out flux of energy balance out. The poles of the earth are tilted away from the sun for much of the year. Therefore, very little energy is radiating into the earth in these areas while the land, ocean and air at the poles are still radiating energy out into space. The balance of energy coming in and energy leaving the earth, therefore, must be kept by the transport of energy within the earth’s 10 atmosphere from the equator, where the sun’s radiation of energy is the most intense, towards the poles where much energy is lost but very little gained. This transportation of energy “drives the global atmospheric and oceanic circulation during the year” (5). Hurricanes, also known as typhoons or cyclones in different parts of the earth, are very important in the transfer of large amounts of energy from the center latitudes towards the poles. They begin as depressions along the intertropical convergence zone, the area around 20 degrees south to 20 degrees north latitude of the equator, where the sun beats down upon the earth with greatest intensity. The depressions that become hurricanes show a sharp decrease in pressure at the center accompanying an increase in wind speed and circular cloud formation about 30-60 kilometers from the center. These storms pick up energy from the heat in the water around them. Once the wind speeds have reached 120 kilometers per hour, the storm is labeled a hurricane. Hurricanes grow in size and intensity quickly. The more organized the center, the more intense the hurricane. “They may speed up, slow down, or even stop for a while to build up strength. As it travels across the ocean, a hurricane can pick up as much as two billion tons of water a day through evaporation and sea sprays.” (6) The path of an Atlantic hurricane is essentially a northward movement and also a movement towards warmer waters whenever possible. A hurricane is an intensely destructive atmospheric creation that occurs because of the necessity of the atmosphere to constantly transport energy from the equator towards the poles. Hurricanes, thus, have an actual physical responsibility in the functioning of the earth’s atmosphere. While they help to keep things running smoothly on a global scale, hurricanes do definitely cause quite a few problems on the local scale. Luckily, the US government has 11 developed agencies and programs which can help minimize the devastation caused by these monsters. FEMA was developed during the presidency of Jimmy Carter as a centralized unit to control the handling of emergencies and disasters including, of course, hurricanes. The Agency united the operations of the Federal Insurance Administration, the National Fire Prevention and Control Administration, the National Weather Service Community Preparedness Program, the Federal Preparedness Agency of the General Services Administration and the Federal Disaster Assistance Administration among others. In 2003 FEMA became a segment of the Department of Homeland Security. When a hurricane is out in the Atlantic or in the Gulf and it looks as though it may strike Florida, law enforcement and emergency teams in the threatened cities and counties are on call, awaiting commands from higher authorities. They and the general populace are constantly informed about the decisions being made in the governor’s office and about what will be done in the case that the hurricane makes landfall on Florida’s approximately 1350 miles of general coastline. Shelters open up all over the state where people can seek refuge if they are concerned or if they need special assistance. The governor, when the situation seems unavoidable, will sign an executive order to place the state in a “state of emergency,” which directs each county to activate its Emergency Operations Center and its County Emergency Management Plan. At this point evacuations and curfews and other such procedures are put into effect according to the governor’s executive order. Emergency personnel are centralized in the areas of most risk to “protect the lives and property of persons in the threatened communities” (7). Actions are being taken on a federal scale at this time as well. Since hurricanes can be tracked and their intensity is known in advance, FEMA is usually aware before 12 landfall whether federal response is going to be necessary. No federal aid is specifically allowed before the presidential declaration but the “DHS [Department of Homeland Security] can use limited predeclaration authorities to move Initial Response Resources (IRR) (critical goods typically needed in the immediate aftermath of a disaster)…and emergency teams closer to potentially affected areas. DHS also can activate essential command and control structures to lessen or avert the effects of a disaster and to improve the timeliness of disaster operations.”(8) When the hurricane makes landfall, the local and state authorities do the best they can to keep everyone safe and informed. Once the damage levels are more than the state can handle, the governor requests a declaration of disaster from the president. After this declaration is given, FEMA’s emergency response takes full effect. Emergency Response Teams including members of the IHP division are set up in the affected areas and response and recovery procedures begin. This is the stage in which the Preliminary Damage Assessments are made. Data concerning the amount of households; be they single family, apartment homes or mobile home units which are fully or partially damaged is collected by assessors who fly over the affected areas in helicopters and tabulate the damage as they see it. This assessor, a member of the IHP division, then takes this data and applies the appropriate calculations to determine the amount of federal and state assistance needed in each area. Assistance is then given to those in need according to federal regulations. The process of assisting those in need of aid following a hurricane is an intricate and interesting one. The beauty is in the logistics. There is an amazing amount of coordination that goes on between local, state, and federal governments in the event of a disaster such as a hurricane. Hurricanes not only help to coordinate the movements of 13 gases in the system of the atmosphere, but they also force the coordination of movements of hundreds of people and entire agencies in the system of our government. 14 CHAPTER 3 CURRENT RESEARCH METHODS/DATA COLLECTION: The Hurricanes that were used for this study are Hurricanes Charley, Frances, Ivan, and Jeanne from 2004, Isabel (2003), Isidore (2002), Allison (2001), Irene (1999), Floyd (1999), Dennis (1999), Bonnie (1998), Earl (1998), Mitch (1998), and Georges (1998). Very integral to the analyses in this study is the designation data for each county in each hurricane (examples: Figures B.3-B.5). The data for each of the factors (rainfall, wind speed, etcetera) has been collected from different sources such as the NWS, the NOAA, FEMA, and other sources such as private mapping and analysis centers. The probability of declaration data used in this project was found by physically measuring and counting. Maps of declared counties (examples: figures B.3-B.5) were used and, for example, a dot was placed directly on the site of landfall, concentric circles with radii increasing by 20 miles were drawn and the number of counties that were completely or partially inside each area were counted and separately tabulated. The number of counties that were declared disaster areas out of these tabulated values gives an upper and lower bound on the probability of declaration. The upper and lower bounds were averaged for each storm at each level (20 miles from landfall, 40 miles from eye, 6 inches of rain, etc.) to bring about the probability of declaration shown in the tables. The probabilities at each level were then averaged over all of the storms to give the mean. These means were used as the observed values, Yi, in the calculations to follow. Least squares regression has been conducted on each data set and the technique of transformation of variables has also been used in some cases. regression model used for these data sets is 15 The simple linear Yi = βo + β1Xi + εi 3.0.0 The regression function for this model is E{Y} = βo + β1X 3.0.0a Where, from the normal equations, we obtain (3.0.1) β1 = (nΣxiyi – ΣxiΣyi)/(nΣxi2 – (Σxi) 2) and (3.0.2) βo = Σyi/n - β1Σxi/n The Sum of Squared Errors and the R squared values have been found for each of the regressions to determine the goodness of fit of the regression equations found. An ANOVA (Analysis of Variance) table is given for each individual regression. The first of the two basic components of the ANOVA tables used here is the SSR, the regression sum of squares, which is the sum of the squared deviations between the output values of the regression equation and the mean of the observed values or SSR = Σ (Ŷi - ΣYi/n)2 The second component is the SSE or Sum of Squared Errors which is the sum of the squared deviations of the observed values from the fitted regression line or SSE = Σ (Yi - Ŷi )2 The deviation of the observed values from their mean is given by the SSTO or the total sum of squares. 16 SSTO = Σ (Yi - ΣYi/n)2 It is easily proven from the preceding equations that (3.0.3) SSTO = SSE + SSR The coefficient of determination, or R squared, value is determined from the SSR or SSE and the SSTO (3.0.4) r2 = SSR/SSTO = 1-SSE/SSTO Other information given in the ANOVA table is the degrees of freedom associated with each sum. The degrees of freedom associated with the SSR is the number of independent variables under consideration, p, which, for simple linear regression is always 1. The degrees of freedom associated with the SSE is the number of valid cases, n, minus the number of independent variables, p, minus one. The MSR, or regression mean square, and the MSE, error mean square, are found by dividing their associated sum of squares by its degrees of freedom. MSR = SSR/p MSE = SSE/(n-p-1) The values given by the ANOVA table provide a measure of the goodness of fit of the model to the actual data. The MSR and MSE are used to find the test statistic F*. (3.0.5) F* = MSR/MSE 17 The “F test” for simple linear regression models is a test to determine whether β1=0 or β1≠ 0. Which, given the general linear model E{Y}= βo + β1X determines whether there is a linear relationship between X and Y. The hypotheses are as follows Ho: β1=0 (3.0.6) Ha: β1≠ 0 The F distribution is found by taking two Chi squared random variables, SSR and SSE, dividing each by their degrees of freedom and dividing those quotients by each other. The F distribution is F(d1, d2) where d1 is the numerator degrees of freedom and d2 is that of the denominator. Where F(1-α,1,n-2) is the 100*(1- α) percentile of the appropriate F distribution, the decision rule is (3.0.7) If F* ≤ F(1-α,1,n-2), conclude Ho If F* > F(1-α,1,n-2), conclude Ha In essence, the F ratio, MSR/MSE, statistic is used to conclude how confident we are that the slope found in the regression equation did not occur by chance. 3.1 DISTANCE FROM COAST Storm surge is one of the most devastating and damage-causing effects of a hurricane. Coastal areas are at high risk for the damaging effects of the hurricane. The probability of declaration given that a county is 20, 40, 60…200 miles from the coast for each storm is 18 given in table A.1. Logically, as one moves further and further inland the damaging effects of this type diminish. Consequently, the most applicable type of function for this data is an exponential function. (3.1.1) Y = β0e^β1X Which is equivalent to, (3.1.1a) ln(Y) = ln(βo) + β1X Transforming the variables gives a linear function in terms of X and a new response variable, Y´ (3.1.2) Y´ = βo´ + β1X where Y´ = ln(Y) and βo´ = ln(βo) Using equations (3.0.1) and (3.0.2) above with Yi´ and βo´ yields, β1 = -.002366 and βo´ = -.57711 Thus, the transformed linear regression equation is Y´ = -.57711 - .002366X and the regression function is given by (3.1.3) Y = .56152e-.002366X 19 Figure 3.1.1 displays the observed values for Y and the regression function 3.1.3. Figure 3.1.1 Distance from coast observed values and exponential model Probability of declaration 0.60000 0.50000 0.40000 Distance from coast 0.30000 0.20000 0.10000 Expon. (Distance from coast) 0.00000 0 50 100 150 200 Distance from coast (miles) 250 When transforming variables, the traditional R squared value may not provide the same information as it does for a non-transformed model. This is due to the fact that the SSR and SSE values do not necessarily sum to equal SSTO and equation (3.0.3) fails to be true. Thus, a separate plot of the data must be analyzed to determine the goodness of fit of the model. This separate plot (Figure 3.1.2) is a plot of the predicted values to the observed values at each X level. If the model (3.1.3) predicts the observed data with 100% accuracy, the data points should lie on the line y = x. The regression line fitted to this data is (3.1.4) Y = -.0146 + 1.934X 20 Observed Probability of Declaration Figure 3.1.2 Predicted vs Observed: Distance from coast exponential model 0.7 0.6 0.5 0.4 0.3 y=x 0.2 0.1 0 0.0000 0.1000 0.2000 0.3000 0.4000 0.5000 0.6000 0.7000 0 0 0 0 0 0 0 0 Predicted Probability of Declaration (equation 3.1.3) The interpretation of the R squared value for this regression equation is still the same; it gives the amount of variance in the data that is explained by the model. Table 3.1.1 gives the analysis of variance information for this plot. Table 3.1.1 Source of Variation Regression Error Total SS SSR = .03755 SSE=.00179 SSTO=.03934 df MS 1 MSR = .03755 8 MSE = .00022 9 From equation (3.0.5), F* = .03755/.00222 = 170.6818.> F(.999,1,8) = 25.4 So, according to the decision rule (3.0.7) we reject the null hypothesis in (3.0.6) and conclude, with a confidence level of .999, that the slope in this data found with the regression equation did not occur by chance. From Table 3.1.1, we obtain an R squared value of .95450. This tells us that the function (3.1.3) Y = .56152e-.002366X 21 explains over 95% of the variance in the data and is a good fit to the observed probability values. 3.2 TOTAL RAINFALL Torrential rain is another very destructive force that hurricanes bring on to land. The probability that a county is declared given the number of inches of total rainfall is given in table A.2. Logically, the more rain, the more likely the disaster declaration. The look of the data corresponds to a logarithmic model for the data. When regressing using a logarithmic model for the data the normal equations 3.0.1 and 3.0.2 are used with the natural logarithm of x instead of the given x value. Figure 3.2.1 shows the observed values for the probabilities of declaration along with the logarithmic model of the data, (3.2.1) Y = .13821ln(x) + .51331 Probability of declaration Figure 3.2.1 Observed values and logarithmic model:Total Rainfall 1.2 1 0.8 0.6 0.4 0.2 0 Series1 Log. (Series1) 0 5 10 15 20 25 Rainfall (inches) Figure 3.2.2 shows the predicted values of the probabilities given by equation 3.2.1 against the observed values of the probabilities 22 Observed Probability of Declaration Figure 3.2.2 Predicted Values vs Observed Values: Total Rainfall 1.2 1 0.8 0.6 y=x 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 1.2 Predicted Probability of Declaration The linear model running through this data is (3.2.2) Y = 1.00003X-.000023501 Table 3.2.1 gives the analysis of variance information for this regression equation. Table 3.2.1 Source of Variation Regression Error Total SS SSR = .52545 SSE = .08545 SSTO=.61090 df MS 1 MSR = .52545 6 MSE = .01424 7 From equation (3.0.5), F* = .52545/.01424 = 36.89958 > F(.999,1,6) = 35.5 So, according to the decision rule (3.0.7), we reject the null hypothesis in (3.0.6) and conclude, with a confidence level of .999, that the slope in this data found with the regression equation did not occur by chance and there is a statistical association between total rainfall and probability of declaration. 23 From Table 3.2.1 we obtain an R squared value of .86012. This means that the regression equation, (3.2.2) Y = 1.00003X-.000023501 explains about 86% of the variation in the data and thus the equation (3.2.1) Y = .13821ln(x) + .51331 is a more than adequate fit to the observed values. 3.3 MAXIMUM SUSTAINED WIND SPEED The “category” of a hurricane on the Saffir-Simpson scale (Figure B.1) is determined by the maximum sustained wind speed of the storm. The higher the sustained winds, the more destructive the storm. The probability that a county will be declared a disaster area given that it experienced tropical storm or hurricane force winds is displayed in table A.3. Figure 3.3.1 shows the observed probabilities and their logarithmic model, (3.3.1) Y = .1167ln(x) + .3944 Probability of Declaration Figure 3.3.1 Max sustained wind speed observed values and logarithmic model 1.2 1 0.8 0.6 0.4 0.2 0 0 50 100 150 Wind speed 24 Once again it is necessary to look at a plot of the predicted values versus the observed values to get a real picture of the analysis of variance for this plot. Figure 3.3.2 shows these values and the line y = x. Observed probabilities of declaration Figure 3.3.2 Predicted versus observed values: Max wind logarithmic model 1.2 1 Series1 0.8 Series3 0.6 Linear (Series3) y=x 0.4 Linear (Series1) 0.2 0 0 0.5 1 1.5 Predicted probabilities of declaration (equation 3.3.1) The linear equation going through these points is (3.3.2) Y = .9999X + .00006 The analysis of variance information for this model is given in table 3.3.1. Table 3.3.1 Source of Variation Regression SS SSR = 0.6330644633 Error SSE = 0.0066575602 3 MSE = .00221919 Total SSTO =0.63971807932 4 df MS 1 MSR = 0.6330644633 From equation (3.0.5), F* = 0.6330644633/.00221919 = 285.26868 > F(.999,1,4) = 74.1 25 So, according to the decision rule (3.0.7), we reject the null hypothesis in (3.0.6) and conclude, with a confidence level of .999, that the slope in this data found with the regression equation did not occur by chance and there is a linear association between maximum sustained wind speed and probability of declaration. From Table 3.3.1 we obtain an R squared value of .98959. This means that the regression equation, (3.3.2) Y = .9999X + .00006 explains almost 99% of the variation in the data and (3.3.1) Y = .1167ln(x) + .3944 is a superb fit to the observed values. 3.4 DISTANCE FROM EYE The eye of a hurricane is the area directly in the middle of the storm where there is no rain or wind. The better formed the eye, the stronger and longer lasting the storm. The most damaging effects that a hurricane produces occur right around the eye. The further you get from the eye the less damage you receive. The observed probabilities that a county will be declared a disaster area given its distance from the eye are given in table A.4. As with the distance from the coast, the best model for this data is an exponential model. Using the transformation of variables technique described in section 3.1, the regression equation is found to be (3.4.1) Y = .65671e-.00234X 26 Figure 3.4.1 shows the observed values for the probability of declaration and the exponential model, 3.4.1. Figure 3.4.1 Distance from eye observed values and exponential model Probability of declaration 0.8 0.6 Series1 Expon. (Series1) 0.4 0.2 0 0 100 200 300 Distance (miles) Figure 3.4.2 shows the predicted values given by equation 3.4.1 plotted against the observed values at each x level. As before, the line y = x is also displayed in figure 3.4.2. The linear regression equation for the data points shown in figure 3.4.2 is (3.4.2) Y = 1.00177X-.001003 Table 3.4.2 gives the analysis of variance information for this plot. Figure 3.4.2 Predicted vs Observed values: Distance from eye exponential model Observed Probabilities 0.8 0.6 0.4 y=x 0.2 0 0.00000 0.20000 0.40000 0.60000 Predicted Probabilities of declaration (equation 3.4.1) 27 0.80000 Table 3.4.1 Source of Variation Regression Error Total SS SSR = .04740 SSE=.000536 SSTO=.04793 df MS 1 MSR = .04740 8 MSE = .000067 9 From equation (3.0.5), F* = .04740/.000067 = 707.46269 > F(.999,1,8) = 25.4 So, according to the decision rule (3.0.7) we reject the null hypothesis in (3.0.6) and conclude, with a confidence level of .999, that the slope in this data found with the regression equation did not occur by chance and there is a linear association between distance from the eye and probability of declaration. From Table 3.4.1, we obtain an R squared value of .98882. This tells us that the function (3.4.1) Y = .65671e-.00234X explains over 98% of the variance in the data and is an excellent fit to the observed probability values. The superb fit of this regression function is also easily verified by an examination of figure 3.4.2. As was previously mentioned, if the regression equation is 100% accurate then all of the data points fall on the line y = x. The data points shown in figure 3.4.2 are all very close if not exactly falling on this line. 3.5 DISTANCE FROM SITE OF LANDFALL As a storm moves across the ocean it picks up energy from the warm water around it. When the storm makes landfall it experiences a decrease in potency, the magnitude of which depends on the shape and temperature of the land as well as several other factors. The decrease in the potency of the storm decreases the amount of damage incurred by 28 counties further away from the site of landfall. The observed probabilities that a county will be declared a disaster area given certain distances from the site of landfall are given in table A.5. The relationship between the distance from the site of landfall and the probability of declaration is best described by the exponential model Y = 1.05888e-.00357X (3.5.1) which was arrived at using the technique of transformation of variables described in detail in section 3.1. Figure 3.5.1 shows the observed probabilities of declaration and exponential model 3.5.1. Probability of declaration Figure 3.5.1 Distance from site of landfall: observed values and exponential model 1.20000 1.00000 0.80000 0.60000 0.40000 0.20000 0.00000 Series1 Expon. (Series1) 0 100 200 300 Distance (miles) Figure 3.5.2 shows the predicted values of the probabilities at each x-level given by equation 3.5.1 plotted against the observed values at each x-level. 29 Observed probability of declaration Figure 3.5.2 Predicted values vs Observed values:Distance from site of landfall exponential model 1.20000 1.00000 0.80000 0.60000 y=x 0.40000 0.20000 0.00000 0.00000 0.20000 0.40000 0.60000 0.80000 1.00000 1.20000 Predicted probability of declaration (equation 3.5.1) The linear regression equation obtained from these data points is (3.5.2) Y = .97865X + .01578 Table 3.5.1 gives the analysis of variance information for this model Table 3.5.1 Source of Variation Regression Error Total SS SSR = .21282 SSE=.00519 SSTO=.21801 df MS 1 MSR = .21282 8 MSE = .0006488 9 From equation (3.0.5), F* = .21282/.0006488 = 328.046 > F(.999,1,8) = 25.4 So, according to the decision rule (3.0.7) we reject the null hypothesis in (3.0.6) and conclude, with a confidence level of .999, that the slope in this data found with the regression equation did not occur by chance and there is a linear association between distance from the site of landfall and probability of declaration. From Table 3.5.1, we obtain an R squared value of .97619. This tells us that the function 30 Y = 1.05888e-.00357X (3.5.1) explains over 97% of the variance in the data and is an excellent fit to the observed probability values. 3.6 OVERALL SPEED The overall speed of a hurricane can have a huge effect on the areas it hits. If a hurricane is traveling very quickly, the damaging winds and rains are fleeting and less damage is incurred. However, if a storm is moving very slowly, one area might feel the effects for a much longer period of time and therefore have more damage. The probabilities of declaration given different overall speeds are given in table A.6. The basic linear regression model used for this data is as follows (3.6.1) Y = -.00754X + .86153 Figure 3.6.1 shows the observed probabilities along with this regression equation. Figure 3.6.1 Overall speed observed values and linear model Probability of declaration 1 0.8 0.6 Series1 0.4 Linear (Series1) 0.2 0 0 5 10 15 20 25 Speed(mph) The analysis of variance information for this model is given in table 3.6.1. 31 Table 3.6.1 Source of Variation Regression Error Total SS SSR = .01421 SSE=.05161 SSTO=.06582 df MS 1 MSR = .01421 3 MSE = .01720 4 From equation (3.0.5), F* = .01421/.01720 = .82600 < F(.90,1,3) = 5.54 So, according to the decision rule (3.0.7) we do not reject the null hypothesis in (3.0.6) and conclude that we can not say that the slope in this data found with the regression equation did not occur by chance and must submit that there is no apparent linear association between overall speed and probability of declaration. From Table 3.6.1, we obtain an R squared value of .21589. This tells us that the function (3.6.1) Y = -.00754X + .86153 explains only 21.589% of the variance in the data. This R squared value shows, along with the F test, that equation 3.6.1 is a terrible fit to the observed probability values and is essentially meaningless. 3.7 DISCUSSION The F test score and R squared value for a simple linear regression are important values which aid tremendously in the assessment of a regression model but, they can be misleading. Residual analysis is a technique of regression analysis which examines the observed error, or the difference between the observed value Yi and the fitted value given by the regression function, Ŷi, 3.7.1 ei = Yi - Ŷi 32 Least squares regression analysis is conducted by minimizing the sum of the squares of these error terms. Inherent in the technique of least squares regression are assumptions about the unknown true error terms εi of the regression equation 3.0.0. It is assumed that the εi are independent normal random variables with mean 0 and constant variance σ2. Residual analysis is based on the idea that the appropriateness of a model can be measured by the reflection of the assumed properties for the unknown true error terms εi in the observed error terms or residuals. A function is appropriate for the model if the residuals are randomly dispersed around the x axis, with no obvious trend or outlier. If any of these fails to be true then a variety of changes to the model are warranted, such as the change to a different type of function or the addition of more or better predictor variables. This analysis can be conducted using any one of the following; a residual plot against the predictor variable, a sequence plot of the residuals, a box plot of the residuals or a normal probability plot of the residuals. In section 3.1, the probability of declaration given distance from the coast is regressed using the technique of transformation of variables. Logically, as the distance from the coast increases, the probability of declaration decreases. This suggests a linear relationship between the variables. The linear regression function found for this data set is (3.7.1) Y = -.00105X + .55302 Table 3.7.1 gives the analysis of variance information for this data set. 33 Table 3.7.1 Source of Variation Regression Error Total SS SSR = .03666 SSE=.0027 SSTO=.039337 df 1 8 9 MS MSR = .03666 MSE = .0003375 The F test score is 108.6222 > F(.999,1,8) = 25.4. We would reject the null hypothesis and determine that there is a linear association between distance from coast and probability of declaration. Also, the R squared value found here is .93137, which would lead us to determine that the linear association exists and this function is quite appropriate to model the behavior of our variables. But, when we take a look at the plot of the residuals against the predictor variable (Figure 3.7.1), Figure 3.7.1 Residuals:Distance from coast Residual 0.03 0.02 0.01 0 -0.01 0 -0.02 50 100 150 200 250 -0.03 -0.04 -0.05 Distance from coast (miles) it is obvious that the model is not as good as we originally thought. There is systematic variation in the residuals between negative and positive values. This type of trend in the residual plot demonstrates a need for a nonlinear regression function. This analysis leads us to the use of an exponential function and transformation of variables. The exponential regression equation 34 (3.1.3) Y = .56152e-.002366X has an associated R squared value of over .95 and an F test score of over 170. Residual analysis demonstrates that the high R squared and F test values for the linear model of this data were, in fact, misleading. By analyzing the residuals, we have discarded the incorrect type of function and bettered the fit of the observed values to a regression equation. Residual analysis is an important tool in regression analysis and has been used in this study, along with the F test score and the R squared value, to determine the goodness of fit of each model. With exception of 3.6.1, all of the simple regression functions found in this study have been quite well fitted to the data. The residuals show little cause for alarm and the R squared and F test values are high. Part of the reason that 3.6.1 turned out so poorly is that the number of x levels for the predictor variable was so low. This is apparent in the F ratio. The number of degrees of freedom of the SSE was n – 2 = 3. The SSE was almost five times the SSR and the small number of degrees of freedom allowed for the MSE to be greater than the MSR. The F test value, therefore, was less than one, which tells us that we are only slightly more than 50% confident that there is a linear association between the variables. The model could be bettered by the use of more information in the observational quantities. The low R squared value and F test score lead us to determine that the observed quantities for overall speed provide no important information to this regression analysis and thus this variable will be left out of the multiple regressions in Chapter 4. 35 CHAPTER 4 MULTIPLE LINEAR REGRESSION ANALYSIS OF DATA 4.1 MULTIPLE LINEAR REGRESSION TECHNIQUES In chapter three, regression equations were found for each of the individual variables. Most of the regression equations found for each of the variables in sections 3.1-3.6 were good fits to the data. The general linear regression equation, 3.0.0, can be extended to include any number of predictor variables. The regression model used here will be of the form (4.1.1) Yi = βo + β1Xi1 + β2Xi2 + β3Xi3 +…+ βp-1Xp-1 + εi The addition of more predictor variables to the model is important for many reasons. In this study in particular, the probability that a county will be declared a disaster area is affected by all of the factors regressed in sections 3.1-3.5. A hurricane brings with it rain, wind and storm surge and a model determining the probability of declaration must include all of these variables to be complete. 4.1.1 USE OF MATRICES IN REGRESSION ANALYSIS The normal regression model for multiple linear regression (4.1.1) Yi = βo + β1Xi1 + β2Xi2 + β3Xi3 +…+ βp-1Xp-1 + εi In matrix terms is (4.1.2) Y = Xβ + ε where Y is the n x 1 vector of observed values 36 (4.1.3) Y = {{Y1},{ Y2},…,{ Y1}} X is the n x p matrix of constants (4.1.4) X = {{1, X11, X12,…, X1,p-1},{1, X21, X22,… X2,p-1},…,{1, Xn1, Xn2,… Xn,p-1}} and β is the p x 1 vector of parameters (4.1.5) β = {{β0},{β1},…,{βp-1}} The least squares estimated regression coefficients are those values of β that minimize the sum of the squared error in the model. The vector of estimated regression coefficients is (4.1.6) b = {{b0},{b1},…,{bp-1}} which are found using the least squares normal equations for model 4.1.2 X′Xb = X′Y (4.1.7) Accordingly, (4.1.8) b = (X′X)-1 (X′Y) The ANOVA values SSE, SSR and SSTO, in matrix terms are as follows (4.1.9) SSE = Y′Y - b′X′Y (4.1.10) SSR = b′X′Y – (1/n)Y′JY (4.1.11) SSTO = Y′Y - (1/n)Y′JY Where J is an n x n matrix of 1’s. 37 Matrices are so helpful because they provide a simple view of the complex mathematics involved in multiple regression analysis. 4.1.2 DATA COLLECTION AND MISSING DATA In this study, as was mentioned previously, the observed values for the probabilities of declaration were found by physically drawing areas on a map and counting the number of declared counties in the area for each variable. This method of collection of data has posed an interesting problem in the multiple regression analysis. Normally, for each output value, there are values for each of the input variables that correspond with it. This is not the case in this research. The probabilities of declaration are specifically found for each predictor variable and do not correspond with those found for the other predictor variables. Therefore, the X matrix is incomplete. The missing values must be filled in for the regression to be performed. Two different methods of filling in the missing data have been used. The first method is to fill the rest of the matrix with zeros. This is reasonable in that the output values really only were dictated by the input variable to which they correspond. Regression equations found by filling the rest of the matrix with zeros are a representation of only the real observed values. Another method of filling in the missing data is to use the mean of each x variable. This is reasonable because the mean is the expected value of the variable. The amount of rain that occurred in the declared counties with wind speeds of 74 miles per hour was not tabulated but there is no doubt that there was some rain. How much? We don’t know. But the mean value of rainfall amounts is a worthy estimate. 38 4.2 MULTIPLE REGRESSION MODELS The following multiple regression models were found using matrix techniques and Mathematica. For the following models: x1=distance from coast, x2=total rainfall, x3=maximum sustained wind speed, x4=distance from eye, and x5=distance from site of landfall. Starting with x1, each new variable is added to bring about the following models. The first equation of each set was found using zeros to fill in the missing data. The second of each set was found using the means of the x values to fill in the missing data. (4.2.1a) 0.43025 - 0.000176767x1 + 0.0346808x2 (4.2.1b) 0.266553 - 0.00105368x1 + 0.0440763x2 (4.2.2a) 0.444471 - 0.000278347x1 + 0.0244849x2 + 0.00316138x3 (4.2.2b) -0.0905743 - 0.00105368x1 + 0.031088x2 + 0.00653304x3 (4.2.3a) 0.489208 - 0.000597892x1 + 0.021941x2 + 0.0028497x3 - 0.0000922999x4 (4.2.3b) 0.0242913 - 0.00105368x1 + 0.0310848x2 + 0.00653461x3 - 0.00119645x4 (4.2.4a) 0.58467 - 0.0012798x1 + 0.016473x2 + 0.0021984x3 - 0.0007742x4 + .000494x5 (4.2.4b) 0.35064 - 0.00105368x1 + 0.03117x2 + 0.00649x3 - 0.00119645x4 - 0.0025493x5 Table 4.2.1 gives the ANOVA information for each equation. 39 Table 4.2.1 4.2.1a 4.2.1b 4.2.2a 4.2.2b 4.2.3a 4.2.3b 4.2.4a 4.2.4b SSR 0.756819 0.526202 1.001795 0.977658 1.002759 1.025121 0.99398 1.23974 SSE 0.112213 0.34283 0.543275 0.567412 0.609991 0.587629 1.08718 0.84142 SSTO 0.869032 0.869032 1.54507 1.54507 1.61275 1.61275 2.08116 2.08116 R squared 0.8708759 0.6055036 0.6483816 0.6327597 0.6217696 0.6356354 0.4776086 0.5956966 MSE 0.0080152 0.0244879 0.0301819 0.0315229 0.0225923 0.021764 0.0301994 0.0233728 MSR 0.3784095 0.263101 0.3339317 0.325886 0.2506898 0.2562803 0.198796 0.247948 F 47.211402 10.744141 11.063955 10.338075 11.096267 11.7754 6.5827701 10.60841 Just as in simple regression analysis, in multiple regression there is an F test for a regression relation. As before, the MSR and MSE are used to find the test statistic F*. (3.0.5) F* = MSR/MSE The “F test” for multiple linear regression models is a test to determine whether β1= β2= …= βp-1 = 0 or not all βk equal zero. The hypotheses are as follows (4.2.5) Ho: β1= β2= …= βp = 0 Ha: not all βk (k=1,…,p-1) equal zero Where F(1-α, p-1, n-p) is the 100*(1- α) percentile of the appropriate F distribution, the decision rule is (4.2.6) If F* ≤ F(1-α,p-1,n-p), conclude Ho If F* > F(1-α,p-1,n-p), conclude Ha Table 4.2.1 gives the F test scores for each of the regressions. The F test scores here give us valuable information when it comes to choosing the correct model. 40 For equation 4.2.1a F* = 47.211402 > F(.975,2,14) = 39.4 For 4.2.1b F* = 10.744141 > F(.90,2,14) = 9.4175 For 4.2.2a F* = 11.06395 > F(.95,3,18) = 8.68 For 4.2.2b F* = 10.338075 > F(.95,3,18) = 8.68 For 4.2.3a F* = 11.096267 > F(.975,4,27) = 8.48 For 4.2.3b F* = 11.7754 > F(.975,4,27) = 8.48 For 4.2.4a F* = 6.5827701 > F(.975,5,36) = 6.20 For 4.2.4b F* = 10.60841 > F(.99,5,36) = 9.344 41 CHAPTER 5 RESULTS 5.1 CHOOSING THE CORRECT MODEL For the models constructed using the means to fill in the data, the confidence level of the F test increases as the number of variables increases. The more variables we add, the more confident we are that we can reject the null hypothesis in (4.2.5) and conclude that not all Bk = 0 and that there is a linear association between the predictor and response variables that did not happen by chance. A simple comparison of the R squared values indicates that, for the models including more variables, the “means” models explain more of the variation from the data. Another indication that the “means” equations are a better choice is that, for the “zeros” equations, the R squared values decrease as the number of variables increases. The goodness of fit of each model to the actual data decreases. The final reason for our choice of the “means” models is that each of the coefficients always have the logically correct sign no matter which variables are in the model. The probability of declaration would logically increase as wind speed and total rainfall increase. Also, the probability would decrease as the distances from the eye, site of landfall and coast increase. Therefore, x2=total rainfall, and x3=maximum sustained wind speed should have positive coefficients while x4=distance from eye, x5=distance from site of landfall and x1=distance from coast should have negative coefficients. The best model including all necessary variables is, therefore, (4.2.4b) 0.35064 - 0.00105368x1 + 0.03117x2 + 0.00649x3 - 0.00119645x4 - 0.0025493x5 As previously mentioned, we can, with a confidence level of .99, reject the null hypothesis in (4.2.5) and conclude that not all coefficients are zero and there is thus a 42 non-random linear statistical association between these five predictor variables and the probability of disaster declaration. Also, this model has a coefficient of multiple determination, R2 = 0.5956966 which gives a coefficient of multiple correlation of R = .77181. 5.2 TESTING THE MODEL Now that we have chosen the model we wish to test whether any of the variables should be removed from the model. The test is set up similarly to the F test with hypotheses, (5.2.1) Ho: βk=0 Ha: βk≠ 0 The test statistic here uses the t distribution (5.2.2) t* = bk/s{bk} The decision rule with this test statistic at a level of significance α, is (5.2.3) If |t*| ≤ t(1-α/2,n-p), conclude Ho Otherwise, conclude Ha. We will use the estimated variance-covariance matrix given by (5.2.4) s2{b}= MSE(X′X)-1 s2{b}= {{ s2{b0}, s2{b0,b1}, …,s2{b0,bp-1}}, {s2{b1, b0}, s2{b1}, …,s2{b1,bp-1}},…, {s2{bp-1, b0}, s2{bp-1, b0}, …,s2{bp-1,bp-1}}}= 43 Figure 5.2.1 0.040968044 0.000077909 0.000077909 7.08266 10^ 7 0.000585928 0 0.00012515 0 0.000077909 0 0.000077909 0 0.000585928 0.00012515 0.000077909 0.000077909 0 0 0 0 0.00010136 4.33169 10^ 6 0 0 4.33169 10^ 6 2.178814 10^ 6 0 0 0 0 7.08266 10^ 7 0 0 0 0 7.08266 10^ 7 s2{b1}=7.08266*10-7 s{b1} = .000841585 |t*| = |-.0010537/.000841585| = |-1.25204| = 1.25204 > t(.85,36) =1.052 We conclude, with level of significance α = .3 and a confidence level of .85 that we reject the null hypothesis and the b1 term should stay in the model. s2{b2}=.00010136 s{b2} = .01006777 |t*| = |.03117/.01006777| = |3.09602| = 3.09602 > t(.9975,36) = 3.0236 We conclude, with a significance level of .005, and a confidence level of .9975 that we reject the null hypothesis and the b2 term should stay in the model. s2{b3}= 2.178814*10-6 s{b3} = .00147608 |t*| = |.0064901/.00147608| = |4.39685| = 4.39685 > t(.9995,36) = 3.589 Therefore, we conclude, with a significance level of .001, and a confidence level of .9995 that we reject the null hypothesis and the b3 term should stay in the model. s2{b4}=7.08266*10-7 s{b4} = .000841585 |t*| = |-.0011965/.000841585| = |-1.42172| = 1.42172 > t(.90,36) = 1.30958 So, we conclude, with a significance level of .2, and a confidence level of .9 that we reject the null hypothesis and the b4 term should stay in the model. 44 s2{b5}=7.08266*10-7 s{b5} = .000841585 |t*| = |-.0025493/.000841585| = |-3.029165| = 3.029165 > t(.9975,36) = 3.0236 Thus, we conclude, with a significance level of .005, and a confidence level of .9975 that we reject the null hypothesis and the b5 term should stay in the model. The t test scores found above show that each of the variables is significant at least to a level of .85. This is an acceptable significance level for us to determine that none of the variables needs to be dropped from the model. 5.2.1 MULTICOLLINEARITY The appropriateness of the model can be determined by examining the Bk’s after the addition of each variable. The addition of more variables to the model should not drastically change the values of the coefficients. If it does, then there is concern that there may be either interaction effects or multicollinearity between some of the variables. When two or more predictor variables are correlated, the values of the coefficients depend on which variables are already in the model and which ones are not. The reason for this is that each coefficient is supposed to determine the effect of a single unit increase of the variable to which they correspond on the model when the rest of the variables are held constant. If there is correlation among the variables then the coefficient on any one variable can reflect a partial effect of more than one variable depending on which variables are already in the model. Table 5.2.2 shows the values for the coefficients for different configurations of variables in the “means” models . 45 Table 5.2.2 b1 b2 b3 b4 x1,x2 -0.0010537 x1,x2,x3 -0.0010537 0.031088 0.006533 x1,x2,x3,x4 -0.0010537 0.0310848 0.0065346 x1,x2,x3,x4,x5 -0.0010537 x2,x3,x4,x5 b5 0.0440763 -0.0011965 0.03117 0.00649 -0.0011965 -0.0025493 0.0311733 0.0064901 -0.0011965 -0.0025493 0.0078223 -0.0011965 -0.0025493 -0.0011965 -0.0025493 x1,x3,x4,x5 -0.0010537 x1,x2,x4,x5 -0.0010537 0.0440761 x1,x2,x3,x5 -0.0010537 0.0311733 0.0064901 -0.0025493 As you can see, there is very little evidence of multicollinearity shown by this test. The values for each of the coefficients are very close if not exactly equal from model to model no matter what other variables are in the model. 5.3 FINALIZING The essential idea behind this multiple regression is to determine the probability of declaration of a county given values for the predictor variables. This means that the output value of the model is a probability and as such must be a value between zero and one. We must, hence, place restrictions on the model by making it a piecewise function in the following fashion. The final model, then is Probability of declaration = {0 when Y = 0.35064 - 0.00105368x1 + 0.03117x2 + 0.00649x3 - 0.00119645x4 - 0.0025493x5 < 0 Y when 0 ≤ Y ≤ 1 1 when Y > 1} 46 SUMMARY The research done in this study is very new in that part of the data used is Hurricane data from the 2004 hurricane season. Also, according the Florida State Hazard Mitigation Plan, “At this time [June 2004], the Risk Assessment only includes information from the overall statewide risk assessment and does not include risk information from local jurisdictions” and “no local plans were approved by FEMA and the integration of local risk assessment data into the state plan was premature [at the time of the draft of the FSHMP, April 2004]” (10, page 89). This research also departs from a solely local and state analysis and includes a federal aspect. In this research, as it should be in politics, meeting the needs of the individual comes first. The focus is on the Individuals and Households Program division of FEMA in hopes to better serve the uninsured person with a tree through their roof or the mother with a month’s worth of food that they need to feed their children rotting in the refrigerator. These are the people who should get federal aid first. This is the reason for this study. 47 REFERENCES 1) FEMA News release, October 20, 2004, http://www.fema.gov/news/newsrelease.fema?id=14919 2) “FEMA History”, October 22, 2004 http://www.fema.gov/about/history.shtm 3) “The disaster process and disaster aid programs,” October 22, 2004, http://www.fema.gov/library/dproc.shtm 4) FEMA Mapping and Analysis Center, September 22, 2004, http://www.gismaps.fema.gov/gis04.shtm 5) Burroughs, WJ, Watching the World’s Weather, Cambridge UP, 1991 6) “Wind speed in a hurricane,” 1999, http://hypertextbook.com/facts/StephanieStern.shtml 7)State of Florida Executive Order number 04-217, September 24, 2004, http://floridadisaster.org/eoc/eoc_Activations/Jeanne04/Executive%20Orders/ExecutiveOrder04-217.pdf 8) FEMA’s Federal Response Plan (II-A-3) http://www.fema.gov/txt/rrr/frp/frp_a_basicplan.txt 9) State of Florida Hazard Mitigation Plan, Effective Date-August 25, 2004, http://www.floridadisaster.org/eoc/haz_mit/State%20Plan%20Revised%2008%2027%2004.pdf 11) FEMA National Situation Update, November 9, 2004, http://www.fema.gov/emanagers/2004/nat110904.shtm 12) “Region IV Disaster History” November 23, 2004, http://www.fema.gov/regions/iv/disasters_region4.fema 48 APPENDIX A TABLE A.1 Distance from Bonnie coast (miles) 20 Earl Georges Mitch Allison Dennis Floyd Irene 1 0.13961 0.27551 0.16234 0.06818 0.41667 40 0.97826 0.14688 0.25316 0.16927 0.04792 0.2668 0.93889 0.48822 60 0.94052 0.090353 0.28422 0.17289 0.06024 0.22845 0.88778 0.47962 80 0.85128 0.086956 0.32117 0.17391 0.07246 0.17692 0.8375 0.49819 100 0.72807 0.086956 0.35708 0.17391 0.07246 0.14561 0.77857 0.52968 120 0.64884 0.086956 0.36231 0.17391 0.07246 0.12977 0.77087 0.55665 140 0.55274 0.086956 0.36121 0.17391 0.07246 0.11055 0.75623 0.58607 160 0.493 0.086956 0.34257 0.17391 0.07246 0.09856 0.73228 0.60126 180 0.45238 0.086956 0.32264 0.17391 0.07246 0.09048 0.70399 0.60576 200 0.42329 0.08466 0.67259 0.59884 Isidore 0.086956 Isabel Charley 0.30293 Frances 0.17391 Ivan 0.07246 Jeanne 0.95652 0.6 Mean 1 1 0.72727 0.57612 0.24286 0.8263 0.57081 0.91667 1 0.4375 0.49556 0.17966 0.79063 0.50782 0.625 1 0.39153 0.48763 0.1973 0.75125 0.47120 0.56548 0.98649 0.37681 0.45547 0.20897 0.76812 0.45569 0.39254 0.94913 0.37681 0.49431 0.20646 0.76812 0.43284 0.30128 0.90293 0.37681 0.48619 0.20142 0.76812 0.41704 0.27957 0.83091 0.37681 0.46771 0.19649 0.76812 0.40141 0.22069 0.75752 0.37681 0.45699 0.19509 0.76812 0.38402 0.1875 0.71276 0.37681 0.45377 0.19526 0.76812 0.37163 0.16197 0.64758 0.37681 0.45573 0.19551 0.76812 0.35867 TABLE A.2 Total Rainfall Bonnie Earl (inches) <3 0.12603 Georges Mitch Allison Dennis Floyd Irene 0 0.00855 0 0 0.03636 0.17856 0.29935 3-6 0.94737 0.275 0.17679 0 0.03571 0.12121 0.8875 1 6-9 1 0.27273 0.2449 0.225 0.15385 0.19643 1 1 9-12 1 0.4 0.43919 0.91667 0.8 1 1 12-15 1 1 0.56 1 1 1 15-18 0.90476 1 1 1 18-21 1 21+ 1 1 Isidore Isabel Charley Frances 1 Ivan Jeanne Mean 0 0.58232 0.42767 0.22012 0.18699 0.39649 0.175888571 0.03125 1 0.93103 0.44717 0.61967 0.97222 0.53178 0.04594 1 0.66667 0.7318 0.93182 0.85281 0.9872 1 0.425 0.594425 0.724369091 0.92308 0.926154286 1 0.980952 1 1 1 49 TABLE A.3 Max sustained wind speed (mph) <39 Bonnie Earl Georges Mitch Allison Dennis Floyd Irene 0 0 0 0 0 0 0.05263 0.44186 40-73 0.04 0.24038 0.51397 0.14583 0.3125 0.15476 0.53778 0.52268 74-95 0.83774 0.25397 0.75639 0.79634 96-110 1 111-130 1 1 0.33333 1 131-155 Isidore Isabel Charley Frances Ivan Jeanne Mean 0 0 0.06369 0 0.13071 0 0.04921 0.16405 0.38333 0.52602 0.51515 0.4709 0.60165 0.36636 0.97619 1 0.97826 0.88194 1 0.83120 1 1 1 0.95 0.92037 1 1 0.97500 1 0.9 1 1 TABLE A.4 Minimum distance from eye (miles) Bonnie Earl Georges Mitch Allison Dennis Floyd Irene 20 0.825 0.5 0.88095 0 0 0.16333 1 1 40 0.85027 0.625 0.53914 0 0 0.1446 1 1 60 0.82848 0.35 0.5 0.06667 0 0.11543 1 1 80 0.73455 0.25397 0.51042 0.05263 0 0.10317 1 0.88 100 0.66616 0.25833 0.50972 0.05833 0 0.08098 0.99074 0.85714 120 0.59552 0.28039 0.46163 0.04805 0 0.07448 0.95522 0.74235 140 0.53726 0.26389 0.43269 0.04352 0.18182 0.06863 0.90921 0.73072 160 0.49672 0.27525 0.37583 0.03807 0.21667 0.06597 0.8761 0.69062 180 0.46649 0.28043 0.35381 0.03486 0.25427 0.06419 0.82266 0.67643 200 0.42892 0.24582 0.3067 0.03083 0.26611 0.06316 0.78038 0.68465 Isidore Isabel Charley Frances Ivan Jeanne Mean 0.04762 1 0.92857 0.385 0.90625 1 0.61691 0.12821 1 0.93182 0.54729 0.75974 0.96875 0.60677 0.12791 1 0.84806 0.60373 0.7799 0.95238 0.58375 0.12519 1 0.72857 0.66253 0.51513 0.90936 0.53397 0.11397 0.9881 0.61043 0.68987 0.64589 0.86683 0.52404 0.0943 0.92981 0.60354 0.67425 0.57113 0.80398 0.48819 0.08926 0.84728 0.56845 0.64883 0.50251 0.79807 0.47301 0.08643 0.77821 0.54791 0.62728 0.47063 0.7717 0.45124 0.08537 0.67911 0.52574 0.60722 0.4379 0.74974 0.43130 0.08537 0.637 0.5098 0.57656 0.40737 0.75385 0.41261 50 TABLE A.5 Distance Bonnie from site of landfall (miles) 20 Earl Georges Mitch 1 0.5 1 Na 40 1 0.625 1 Na 60 0.95833 0.31538 1 Dennis Floyd Irene 1 na 1 0.9375 0.91667 1 0.16667 0.60714 0.94444 1 80 0.92105 0.1709 1 0.07692 0.46402 1 1 100 0.79762 0.14884 1 0.09375 0.27604 1 0.92857 120 0.67936 0.13645 0.91667 0.05833 0.20203 0.925 0.9375 140 0.53472 0.09514 0.91667 0.05214 0.17237 0.90891 0.80128 160 0.45989 0.09327 0.51528 0.04423 0.14205 0.86585 0.76667 180 0.38492 0.07984 0.46408 0.04058 0.11785 0.85325 0.74286 200 0.34789 0.06184 0.39198 0.03465 0.10694 0.79105 0.69385 Isidore Isabel Charley Frances Ivan Jeanne Mean na 1 1.00000 1.00000 1.00000 1.00000 0.95 na 1 1.00000 1.00000 1.00000 1.00000 0.9526518 na 1 0.95455 1.00000 1.00000 0.95000 0.8247088 na 1 0.92308 1.00000 0.97059 0.95833 0.7904074 1 1 0.82500 0.97222 0.95238 0.82639 0.7554471 1 1 0.77473 0.95000 0.89299 0.81944 0.7148079 0.94444 1 0.68473 0.92582 0.83879 0.81010 0.6680857 0.65 0.91964 0.78344 0.92006 0.82647 0.80016 0.5990011 0.41506 0.77441 0.66092 0.92970 0.76087 0.82424 0.5421983 0.27604 0.70972 0.65245 0.93725 0.72359 0.84793 0.5057832 TABLE A.6 Overall Bonnie Earl speed (mph) <5 0.75676 5-10 Georges Allison Dennis Floyd 0.77143 0.66667 1.00000 0.30769 1.00000 15-20 20-25 Isidore 0.06250 Isabel Charley Frances Ivan 1.00000 Jeanne Mean 0.87838 1.00000 0.25658 0.88762 0.28571 0.80000 1.00000 1.00000 Irene 1.00000 1.00000 10-15 Mitch 0.72222 0.75000 0.95238 0.60039 0.86111 0.70313 51 APPENDIX B FIGURE B.1 52 FIGURE B.2 53 FIGURE B.3 54 FIGURE B.4 55 FIGURE B.5 56