Statistical and Neural Network Modeling and Predictions of Tides in the Shallow Waters of the Gulf of Mexico ALEXEY L. SADOVSKI, PHILIPPE TISSOT, PATRICK MICHAUD, CARL STEIDLEY Department of Computing and Mathematical Sciences Department of Physical and Life Sciences Conrad Blucher Institute for Surveying and Science Texas A&M University - Corpus Christi 6300 Ocean Dr. Corpus Christi, Texas 78412, USA Sadovski@falcon.tamucc.edu http://www.cbi.tamucc.edu Abstract: - This paper presents a preliminary report of statistical modeling for filling gaps in the data collected to make predictions of tides in the shallow waters of the Gulf of Mexico. The approach discussed is based on data gathered by the Texas Coastal Ocean Observation Network (TCOON). By applying multiple regression and factor analysis to different kinds of data (water level, wind speed and direction, water temperature) we were able to make quite reliable predictions from 6 to 72 hours. Results of this investigation were compared with predictions based on the usage of Neural Networks, and integration of these two approaches looks very promising. Key-Words: - Predictions, Statistical Modeling, Regression, Factor Analysis, Neural Networks. 1 Introduction The goal of this on going research is to develop effective and reliable tools for predicting water levels in the shallow waters of the Gulf of Mexico. Different methodologies for the prediction of water levels include: statistical models [1], harmonic analysis, numerical methods based on finite elements/finite differences, neural networks [2], etc. Here we would like to discuss a statistical based model of prediction (SMP) of tides and compare it with neural network predictions (NNP). Both of these two approaches are under development at the Conrad Blucher Institute in cooperation with the Department of Computing and Mathematical Sciences both of Texas A&M University-Corpus Christi (A&M-CC). Many stations of the Texas Coastal Ocean Observation Network (TCOON) located in the coastal waters of the Gulf of Mexico provide data for such predictions [3]. TCOON consists of approximately 50 data gathering stations located along the Texas Gulf coast from the Louisiana to Mexico borders. Data sampled at these stations include: precise water levels, wind speed and direction, atmospheric and water temperatures, barometric pressure, and water currents. The measurements collected at these stations are often used in legal proceedings such as littoral boundary determinations; therefore data are collected according to National Ocean Service standards. Some stations of TCOON collect parameters such as turbidity, salinity, and other water quality parameters. All data are transmitted back to A&M-CC at multiples of six minutes via line-of-sight packet radio, cellular phone, or GOES satellite, where they are then processed and stored in a realtime, web-enabled database. TCOON has been in operation since 1988. The second approach is another multiregression model in which two-hour The general idea is to predict water levels predictions of water level are based on the for the next two hours by using a multilevels of water during the previous 48 hours, regression model. Then step by step - using using 2-hour steps. Here we now believe these predicted levels as the given levels that information about weather (pressure, predict water level for 4, 6,…,48 hours. We wind, temperature, etc.) is hidden in the have considered three different models for previous levels of water. This model worked two-hour predictions, and two of these remarkably well: R squared for all stations produced quite reliable predictions. The first was greater than 0.95. To make further of these models is a multi-regression model predictions we used the previously in which two-hour prediction is based on the determined levels of water. Such a step by levels of water, speeds and directions of step approach produced quite good wind for the previous 48 hours with the step predictions. The table below presents of 2 hours. This model did not produce statistical data for differences between expected results, because R square for such predicted and real levels of water for 6, 12, a prediction was less than 0.5 18, 24, 30, 36, 42, and 48 hours: Table 1. Statistical characteristics of prediction errors (in meters) Mean Median Std. Min. Max. Deviation range range Error 6hr 0.0124 0.0121 0.310 -0.858 0.796 2 Statistical Model Error 12hr 0.0129 0.0117 0.105 -0.421 0.442 Error 18hr 0.0155 0.0108 0.313 -0.951 0.866 Error 24hr 0.00924 0.0023 0.177 -0.580 0.622 Error 30hr 0.0176 0.0062 0.297 -0.748 0.803 Error 36hr 0.0140 0.0198 0.184 -0.653 0.641 Error 42hr 0.0156 - 0.0034 0.293 -0.746 0.828 Error 48hr 0.0265 0.0289 0.193 -0.568 0.593 The third approach is based on linear multiregression of the levels of water, first differences, and second differences for such levels for the previous 48 hours with the step equal to two hours. This approach produces the same quality of water level prediction as the second approach. These results are quite understandable, because in both cases we have to deal with linear combinations of previous water levels. The difference in these two models is the following: third approach has between four (4) and eight (8) significant variables in a linear regression while in the second model of linear regression we use all twenty four (24) variables. To fill gaps in water level data, we can use the following procedure. First, find backward and forward linear regressions for predicted water levels, and then evaluate lost data as a linear combination of forward and backward predictions with weights proportional distances from the edges of the gap. 3 Factor Analysis To determine why our regression models that do not include wind and atmospheric pressure data provide us with a better prediction than the models that include such data we preformed a Factor Analysis. The analysis of the major components has shown that 5 factors explain 95% of variance for water levels. In deep waters the first three components are periodical while in the shallow waters the major component is not periodical, while the other components are periodical. Our conclusion is that the prime factor is weather. It is well known that the weather affects tides much more in shallow waters than in deep waters [1], [2]. Linear regression models for different locations have different coefficients for the same variables. This difference may be explained by the geography of the location where the data are collected. 4 ANN Modeling and Predictions The application of Artificial Neural Networks (ANN) to a number of fields including environmental modeling started shortly after the development of the backpropagation algorithm by Rumelheart et al. [4]. During the past five to ten years ANNs have been successfully applied to a growing number of applications such as coastal and riverine cases including the forecasting of physical or water quality parameters [5], [6], [7], [8], [9], the forecasting of flooding along rivers [10], [11] and the forecasting of water levels along the coasts of the Gulf of Mexico [2], [12]. Back propagation neural networks use the repeated comparison between the output of an ANN and an associated set of target vectors to optimize the weights of the neurons and biases of the model. The learning process consists more specifically in backpropagating a function of the error through the network. The main advantages and key characteristics of ANNs for water level forecasting are their non-linear modeling capability, their generic modeling capacity, their robustness to noisy data, and their ability to deal with high dimensional data [13]. Forecasting water levels with ANN consists of finding weights and biases by training the model using historical measurements. Our model’s inputs consist of time series of previous water level and wind measurements as well as tidal data. All measurements and tidal forecasts for this work were extracted from the TCOON [3] database. The typical structure of the neural networks used in this work is illustrated in figure 1 and consists of one hidden layer with 1 to a few neurons and one output layer consisting of one neuron when predicting individually each water level. The tidal forcing is included in the model by using water level differences between the measured and hindcasted water levels and the water levels predicted by the tide tables published by NOAA. The water level differences are then a direct function of the meteorological forcing. Finally the model predicts changes in water level differences rather than absolute water level differences. This methodology allows for a more direct relationship between short-term forcing and changes in water levels and also allows for the inclusion of long-term effects such as steric effects as part of the input to each short-term forecast. The models were tested with and without wind hindcasts. All the ANNs discussed in this work were trained using the Levenberg-Marquardt backpropagation algorithm and implemented within version 4.0 of the Matlab Neural Network Toolbox and the MATLAB 6.0 Release 12 computational environment [14] running on a Pentium PC. To test the performance of ANNs for the prediction of water levels at Bob Hall Pier, Texas, the model was trained and tested using three data sets composed of 3600 hourly measurements of water levels, wind speeds and wind directions. The data set covered the Spring seasons of 1998, 2000, and 2001 from Julian day 21 to Julian day 182. This procedure provided a set of six time series of predicted water levels to be used for validation. For each time series the average absolute error between predicted and measured water levels was computed. Averages and standard deviations were then computed for the results of the six validation time series for these two parameters. The standard deviation gives an overall measure of the variability due to the differences between training sets as well as the differences resulting from the training process. The inputs to the model were selected as the previous12 hourly water level and wind measurements based on experience gathered during the modeling for other locations [12]. One model was trained without wind predictions while for the second case wind measurements were used to simulate wind forecasts. These wind forecasts consisted of future wind measurements at 3 hour intervals up to 36 hours. A database of wind forecasts is presently being constructed and models based on wind forecasts are expected to be more representative of future model performance [15]. Figure 2 displays a comparison between a 36-hour water level hindcast, the tide tables, and TCOON measurements. As can be observed in the figure, the ANN model captures a large fraction of the water anomaly and improves significantly on the tide tables. The performance of the models with and without wind forecasts is compared with the performance of the tide tables in Figure 3 for forecasting times ranging from 6 to 36 hours. Both ANN models improve significantly on the tide tables for forecasting times up to 24 hours. Improvements for 30-hours and 36-hours predictions are still measurable. The addition of wind hindcasts improves the model performance although not significantly as compared to the improvement over the tide tables. Based on these results an ANN model will be implemented in the near future as a real-time water level forecasting tool integrated within the Texas Coastal Ocean Observation Network. Water Level Time Series Tidal Data East-West Wind Stress North-South Wind Stress East-West Wind Stress Forecasts (a1,ixi) (X1+b1) (a3,ixi) b1 (a2,ixi) (X2+b2) Forecasted water level variation (X3+b3) X b3 b2 North-South Wind Stress Forecasts Figure1. Schematic of the type of neural network applied to the problem of water level forecasting including outputs, inputs, and neural network topology. Fig. 2. Comparison of a 36-hour ANN hindcast, tide table readings and TCOON measurements of water levels for the Bob Hall Pier Station located near Corpus Christi, Texas during the Spring of 2001. 0.14 Absolute Average Error [m] 0.12 0.10 0.08 0.06 0.04 Tide Tables ANN without Wind Hindcasts 0.02 ANN with Wind Hindcasts 0.00 0 5 10 15 20 25 30 35 40 Forecasting Time [hrs] Fig. 3. Comparison of the performance of the ANN model and the tide tables for the forecasting of water levels at the Bob Hall Pier Station. 5 Conclusions Comparing these two approaches we have found that neural networks are more flexible and give better predictions of the water levels. On the other hand, statistical methods are more simple to implement and can be applied with only the knowledge of the water levels. We are presently pursuing both approaches with the goal of combining the different approaches and improving the overall quality of the forecasts. 6 Acknowledgements The work presented in this paper is funded in part by the following federal and state agencies of the USA: - National Aeronautic and Space Agency (NASA Grant #NCC5-517) - Texas General Land Office - National Oceanic and Atmospheric Administration (NOAA) - Coastal Management Program (CMP). The views expressed herein are those of the authors and do not necessarily reflect the views of NASA, TGLO, NOAA, CMP or any of their sub-agencies. References: [1] Thomson Bosley K, and Hess, K.W., Comparison of Statistical and ModelBased Hindcasts of Subtidal Water Levels in Chesapeake Bay, Journal of Geophysical Research, v. 106, no C8, 16,869-16,885, 2001. [2] Cox, D., Tissot, P, and Michaud, P. 2002. Water Level Observations and Short Term Predictions Including Meteorological Events for the Entrance of Galveston Bay, Texas. Journal of Waterways, Port, Coastal and Ocean Engineering, 128-1, 21-29. [3] Michaud, P., G. Jeffress, R. Dannelly, and C. Steidley 2001. Real Time Data Collection and the Texas Coastal Ocean Observation Network. Proc. International Measurement and Control (InterMAC), Tokyo, Japan, in press. [4] Rumelhart, D. E., Hinton, G. E., and Williams, R.J. 1986. Learning Representations by Back-Propagating Errors. Nature, 323, 533-534. [5] Mase, H. and Kitano, T. 1999. Prediction Model for Occurrence of Impact Wave Force. Ocean Engineering, 26 (10), 949-961. [6] Mase, H., Sakamoto, M., and Sakai, T. 1995. Neural Network for Stability Analysis of Rubble-Mound Breakwaters. Journal of Waterway, Port, Coastal, and Ocean Engineering, 121 (6), ASCE, 294299. [7] Moatar F., Fessant, F., and Poirel, A. 1999. pH Modelling by Neural Networks. Application of Control and Validation Data Series in the Middle Loire River. Ecological Modeling, 120, 141-156. [8] Recknagel, F., French, M., Harkonen, P., and Yabunaka, K-I. 1997. Artificial Neural Network Approach for Modeling and Prediction of Algal Blooms. Ecological Modeling, 96, 11-28. [9] Tsai, C-P., and Lee, T-L. 1999. BackPropagation Neural Network in TidalLevel Forecasting. Journal of Waterway, Port, Coastal, and Ocean Engineering, 125(4),ASCE,195-202. [10] Campolo, M., Andreussi, P., and Soldati, A. 1997. River Flood Forecasting with a Neural Network Model. Water Resources Research, 35 (4), 1191-1197. [11] Kim, G., and Barros, A. 2001. Quantitative Flood Forecasting Using Multisensor Data and Neural Networks. Journal of Hydrology, 246, 45-62. [12] Tissot P.E., Cox D.T., Michaud P. 2002. Neural Network Forecasting of Storm Surges along the Gulf of Mexico. Proceedings of the Fourth International Symposium on Ocean Wave Measurement and Analysis (Waves `01), ASCE, 1535-1544. [13] Rumelhart, D. E., Durbin, R., Golden, R., and Chauvin, Y. 1995. Backpropagation: The Basic Theory. Backpropagation: Theory, Architectures, and Applications, Rumelhart, D. E., Chauvin, Y., eds, Lawrence Erlbaum Associates, Publishers, Hillsdale, 1-34. [14] The MathWorks, Inc. 1998. Neural Network Toolbox for use with Matlab 5.3/version 3, The MathWorks, Natick, MA. [15] Stearns, J., Tissot, P.E., Michaud, P., Colllins, W.G., and Patrick, A.R., “Comparison of MesoEta Wind Forecasts with TCOON Measurements along the Coast of Texas” Proceedings of the 19th AMS Conference on Weather Analysis and Forecasting/15th AMS Conference on Numerical Weather Prediction, 12-16 August 2002, San Antonio, Texas, accepted.