Forecasting Temperature and Solar Radiation: An Application of Neural Networks and Neuro-Fuzzy Techniques J. Zisos (1) , A. Dounis (2) , G. Nikolaou (2) ,G. Stavrakakis (1) , D.I. Tseles(2) (1) Technical University of Crete, Chania, Greece (2) T.E.I. of Piraeus, Athens, Greece _____________________________________________________________________ Abstract Neural and neuro-fuzzy systems are used, in order to forecast temperature and solar radiation. The main advantage of these systems is that they don’t require any prior knowledge of the characteristics of the input time-series in order to predict their future values. These systems with different architectures have been trained using as inputdata measurements of the above meteorological parameters obtained from the National Observatory of Athens. After having simulated many different structures of neural networks and trained using measurements as training data, the best structures are selected in order to evaluate their performance in relation with the performance of a neuro-fuzzy system. As the alternative system, ANFIS neurofuzzy system is considered, because it combines fuzzy logic and neural network techniques that are used in order to gain more efficiency. ANFIS is also trained with the same data. The comparison and the evaluation of both of the systems are done according to their predictions, using several error metrics. Keywords: Forecast, Neural Network, Neuro-Fuzzy system, ANFIS, error metrics 1. Introduction There have been many different scientific efforts in order to realize better results in the domain of forecasting meteorological parameters. Temperature and solar radiation forecasting constitutes a very crucial issue for different scientific areas as well as for many different aspects of everyday life. In the present paper, many different structures of multilayer feedforward neural networks have been developed and simulated, using as training and test data twenty year measurements, obtained from the National Observatory of Athens. The aim is, after a large number of simulations, using different parameters, to result at the best possible structure of neural network forecaster of daily temperature and solar radiation. Additionally, in order to observe the results of the combination of the linguistic terms of fuzzy logic with the training algorithm of neural networks, in the domain of forecasting, the neuro-fuzzy system ANFIS has been used. It has been trained and tested with the same data as the neural networks, and its forecasting results have been compared with these of the best neural network structures. It has to be mentioned that the developed structures have been used for short-term prediction. The paper is structured as following: initially, there are presented the main aspects of timeseries analysis and more specifically timeseries prediction as well as its advantages. Next neural networks are described briefly. Besides there are reported several structural training issues of neural network predictors. Afterwards neuro-fuzzy systems and more specifically ANFIS are described briefly. Lastly the are presented and compared the main results-predictions of temperature and solar radiation, after simulating the above systems , using several error metrics. Theoritical Part 2. Timeseries analysis - prediction Timeseries is a stochastic process where the time index takes on a finite or countably infinite set of values. These values usually constitute measurements of a physical system obtained on specific time delays which might be hours,days,months or years. Timeseries analysis includes three basic problems: Prediction Modeling Characterization. The one that concerns this paper is the prediction problem. Timeseries prediction is defined as a method depicting past timeseries values to future ones. The method is based on the hypothesis that the change of the values follows a specific model which can be called latent model. The timeseries model can be considered as a black box which makes no effort to retrieve the coefficients which affect its behaviour. Figure 1. Timeseries model block diagram Input of the model are the past values X i until the time instance x=t and output Y is the prediction of the model at the time instance x=t+p. The system could be a neural network, a neurofuzzy system e.t.c The main advantages of the timeseries prediction model are: In a lot of cases we need to identify what happens, not why it happens. The cost of the model is least in comparison with other categories of models like the explanatory model. Timeseries normalization It constitutes one of the most frequently used preprocessing methods. Normalizing data results in smoothing timeseries, as the values are limited in a specific range. It is also very suitable for the neural and neuro-fuzzy systems which use activation functions. 3. Neural networks A neural network is a massively parallel distributed processor made up of simple processing units, which has a natural propensity for storing experiential knowledge and making it available for use. Neural networks are characterized by several parameters. These are, the activation functions used in every layer nodes, network different architectures, the learning processes that are used. The most commonly used activation functions are linear, sigmoid and hyperbolic tangent activation functions. Network architectures are depended to the learning algorithms used. Network architectures can be separated in three catecories: Single-layer feedforward Networks, Multilayer Feedforward Networks and Recurrent Networks. Learning processes are separated in two main categories, supervised and unsupervised learning. Their main difference is that supervised learning algorithms use input-output patterns in order to be trained in contrast to unsupervised learning algorithms whose response is based on the networks ability to self-organize. Multilayer networks have been applied successfully to solve diverse problems by training them in supervised manner with a highly popular algorithm known as error back-propagation algorithm. Two kinds of signals are identified in this network: function signals and error signals. A function signal propagates forward through the network, and emerges at the output end of the network as an output signal. An error signal originates at an output neuron of the network, and propagates backward through the network. It is called an error signal because its computation by every neuron of the network involves an error-dependent function. Target of the process is to train network weights with input-output patterns, by reducing the error signal of the output node, for every training epoch. The error signal at the output of neuron j at iteration n is defined by: e j (n)=t j (n)-y j (n). The value of the error energy over all neurons in the output layer is defined by: E(n)= 1 e 2j ( n ) 2 jC The average squared error energy or cost function is defined by: Ε av = 1 N E(n ) N n1 Training process object is to adapt the parameters in order to minimize the cost function. A very efficient minimization method is Levenberg-Marquardt. It is very quick and has quite small computational complexity. 4. ANFIS (Adaptive Neuro-Fuzzy Inference System) Neuro-fuzzy systems constitute an intelligent systems hybrid technique that combines fuzzy logic with neural networks in order to have better results. ANFIS can be described as a fuzzy system equipped with a training algorithm. It is quite quick and has very good training results that can be compared to the best neural networks. Experimental Part 5. Solar Radiation and Temperature Timeseries It has to be mentioned that the meteorological data used, come from the Greek National Observatory, of Athens. Solar Radiation The data obtained were hourly measurements for the period 1981-2000 and were measured in Watt 2 . After observing the data it’s obvious that apart from the logical m values, there are some data that are equal to -99.9. They constitute measurement errors and have to be replaced. The object is to create the mean daily solar radiation timeseries without error measurements in order to use it to the prediction systems. The process used in order to create the timeseries described is the following: 1. Replacement -99.9 values with zeros 2. Computation of the average of non-zero hourly values for every 24 hours for all the years 3. The mean daily values of the timeseries that are equal to zero are replaced from the average of the previous and next values if they are non-zeros or else from first previous non-zero value. Temperature The temperature data obtained, are for the period 1981-2003 and are measured in C o . They also contain error measurements equal to -99.9 that have to be eliminated. The object is to create the mean, maximum and minimum daily temperature timeseries without the error measurements. The process used is the same as before differing in the fact that in the temperature timeseries we are concerned to zero and negative values in contrast with the solar timeseries that we care for the positive nonzero values of solar radiation. In the following curves appear the daily timeseries created: solar radiation temperature 700 40 solar radiation daily measurements temperature mean daily measurements 35 600 30 500 25 400 20 300 15 200 10 5 100 0 0 0 1000 2000 3000 4000 5000 6000 7000 8000 -5 0 1000 Mean daily solar radiation 2000 3000 4000 5000 6000 7000 8000 9000 Mean daily temperature temperature temperature 35 45 temperature minimum daily measurements temperature maximum daily measurements 40 30 35 25 30 20 25 15 20 10 15 5 10 5 0 0 -5 -5 0 1000 2000 3000 4000 5000 6000 7000 Max daily temperature 8000 0 1000 2000 3000 4000 5000 6000 7000 9000 Min daily temperature 8000 9000 Normalization In everyone of the above timeseries there are used the next two normalizing transformations: X mean 1. Yk k , normalization with mean=0 , standard deviation=1 std 0 .8 0.1 * max 0.9 * min * Xk + 2. Yk , normalization with maximum max min max min value equal to 0.9 and minimum equal to 0.1. 6. Neural Network Predictors The main object is to create many different structures of neural network predictors, train and test them with the meteorological data available, in order to conclude to the best and more efficient topology for forecasting solar radiation and temperature. Initially it is appropriate to split the data to training and test data. There is not any fixed theoretical model clarifying what percentage of the whole data should be the training or the test data. Usually test data constitute a 20 to 30 percent of the overall data. So, concerning the solar radiation data, they were split in four parts its one consisting of 5 years and the temperature data were split in three parts of 8 years of data each. The question is which part of the data could be used as test data in order to constitute a characteristic sample of the measurements. That’s because there may be a part of data that during the training process can cause local minima, which essentially is a destruction of the predictions. In order to solve this problem, multifold cross-validation has been utilized. After implementing this method it was obvious that any part of data can be used as test data as the test error was almost the same for every part. So, it was decided that for the solar radiation the training data are the data of the years 1981 to 1995 and the test data are the data of the years 1996 to 2000, and for the temperature the training data are the data of the years 1981 to 1995 and the test data are the 1996 to 2003 data. Next step is to decide the main structure of the neural network predictors. The main question to this is the number of hidden layers used firstly, and the number of nodes secondly, in order to avoid huge complexity and the overfitting problem. For the meteorological timeseries used in this project, one hidden layer is appropriate and enough. That’s because the first hidden layer is used in order to find the local characteristics of the variable examined and that’s what the project is occupied with. So there have been created and simulated the following cases of structures of neural networks: Inputs: 2,3,5,7 previous daily measurements for the 12 different timeseries created (real and normalized data) Number of hidden layers: 1(local characteristics) Number of nodes of hidden layer: 2,3,5,10,15 (trial and error) Output: 1 (One day prediction) Activation Functions used in neurons: Hidden layer: sigmoid, linear Output layer: linear What is following is the training method. Training methods object is to minimize the mean square error between the predictions and the real values. In order to find the most appropriate training algorithm, there was created a small neural network 3-10-1,with sigmoid activation function in the nodes of the hidden layer and it was simulated for normalized data in the region 0.1-0.9 for 13 different algorithms that appear in the neural toolbox of Matlab. Five error metrics were used in order to choose the most efficient algorithm: MSE, RMSE, AME, NDEI, ρ. The algorithms used are the following: 1. Quasi Newton Backpropagation(trainbfg) 2. Bayesian regularization backpropagation(trainbr) 3. Conjugate gradient backpropagation with Powell-Beale restarts(traincgb) 4. Conjugate gradient backpropagation with Fletcher-Reeves updates(traincgf) 5. Conjugate gradient backpropagation with Polak-Ribiere updates(traincgp) 6. Gradient descent backpropagation(traingd) 7. Gradient descent with adaptive learning rate backpropagation(traingda) 8. Gradient descent with momentum backpropagation(traingdm) 9. Gradient descent with momentum and adaptive learning rate backpropagation 10. Levenberg-Marquardt backpropagation(trainlm) 11. One step secant backpropagation(trainoss) 12. Resilient backpropagation(trainrp) 13. Scaled conjugate gradient backpropagation(trainscg) (traingdx) 0.14 0.16 0.025 X= 6 Y= 0.024803 X= 6 Y= 0.15749 X= 8 Y= 0.021149 X= 8 Y= 0.14543 0.14 0.12 X= 6 Y= 0.12143 0.02 X= 8 Y= 0.11173 0.12 0.1 X= 7 Y= 0.11516 0.1 X= 1 Y= 0.10636 X= 2 Y= 0.10589 X= 4 Y= 0.107 X= 3 Y= 0.10625 X= 9 Y= 0.11261 X= 5 Y= 0.10694 X= 10 Y= 0.10581 X= 11 Y= 0.10702 X= 12 Y= 0.10779 X= 13 Y= 0.10651 0.08 X= 9 Y= 0.01268 X= 12 X= 11 X= 10 Y= 0.011454 Y= 0.011618 Y= 0.011196 X= 5 Y= 0.011436 X= 9 Y= 0.083727 X= 11 X= 13 X= 12 X= 10 Y= 0.077858 Y= 0.078352 Y= 0.078275 Y= 0.078438 0.06 X= 13 Y= 0.011345 X= 2 Y= 0.011213 0.01 0.08 X= 7 Y= 0.083261 X= 5 X= 4 X= 1 X= 2 X= 3 Y= 0.077941 Y= 0.077805 Y= 0.077763 Y= 0.078125 Y= 0.078672 A.M.E. M.S.E. X= 7 Y= 0.013262 X= 4 X= 3 Y= 0.011289 Y= 0.01145 X= 1 Y= 0.011312 R.M.S.E. 0.015 0.06 0.04 0.04 0.02 0.005 0.02 0 0 0 1 2 3 4 5 6 7 8 Training Algorithms 9 10 11 12 1 2 3 4 5 6 13 Mean Square Error 7 8 Training Algorithms 9 10 11 12 1 2 13 Root Mean Square Error 3 4 5 6 7 8 Training Algorithms 9 10 0.8 0.3 X= 6 Y= 0.28548 X= 1 Y= 0.80972 X= 2 Y= 0.81151 X= 3 Y= 0.81023 X= 4 Y= 0.80722 X= 5 Y= 0.80731 X= 9 Y= 0.78415 X= 7 Y= 0.78334 X= 10 Y= 0.81192 X= 11 X= 12X= 13 Y= 0.80693 Y= Y=0.80899 0.80388 X= 13 Y= 0.80899 0.7 X= 6 Y= 0.63006 X= 8 Y= 0.26362 0.25 X= 8 Y= 0.68168 0.6 X= 7 Y= 0.20876 N.D.E.I. X= 3 Y= 0.19261 X= 4 Y= 0.19397 X= 9 Y= 0.20413 X= 5 Y= 0.19385 X= 2 Y= 0.19195 X= 11 Y= 0.19401 X= 12 Y= 0.19539 X= 13 Y= 0.19308 0.5 X= 10 Y= 0.19181 p X= 1 Y= 0.1928 0.2 0.4 0.15 0.3 0.1 0.2 0.05 0.1 0 0 1 2 3 4 5 6 7 Training Algorithms 8 9 10 11 12 13 Normalized Root Mean Square Error Index 1 2 3 4 5 6 7 8 Training Algorithms 9 10 11 12 13 Correlation Coefficient From the above curves it’s obvious that the most efficient algorithm is the LevenbergMarquardt Backpropagation algorithm. Even in comparison with algorithms whose metrics have close values with Levenbergs, Levenberg is better as it’s converging more quickly. Error Metrics MSE ( x(k ) xˆ (k )) n 1 n 2 RMSE k 1 ( x(k ) xˆ(k )) n 1 n AME ( x(k ) xˆ(k )) k 1 x (k ) n 2 k 1 x(k ) xˆ(k )) n 1 n k 1 k 1 ( x(k ) x ) ( xˆ(k ) xˆ ) n RMSE NDEI 2 n 2 k 1 ( x(k ) x ) ( xˆ(k ) xˆ ) n n 2 k 1 12 13 Absolute Mean Error 0.9 0.35 11 2 k 1 Where x(k) is the real value in time instance k, x (k) the prediction of the model, n the number of test data used for the prediction. It has to be mentioned that the most characteristic error criterion showing the quality of prediction was proved be the correlation coefficient criterion (ρ). As the prediction improves, ρ is getting close to 1. Neural Networks training and prediction results Test Measurement-Prediction with Neural Network Mse=11.1818 Rmse=3.3439 Ame=2.882 p=0.97689 Ndei=0.16569 Test Measurement-Prediction with Neural Network Mse=7394.6479 Rmse=85.9921 Ame=64.4325 p=0.81751 Ndei=0.22886 Measurements Predictions 550 500 450 400 350 Temperature Daily Measurements - Predictions Mean Solar Radiation Daily Measurements - Predictions After having trained and tested all the different cases and structures of neural networks with the different in normalization and type meteorological time-series, they have been compared according to the above error criteria in order to result in the most suitable neural predictors structure for every different time-series. In the beginning there were chosen the best four neural networks for every different type of normalization Next there were chosen the best four neural network predictors for every different type of meteorological time-series in order to use them to more complex systems like neural network committee machines. Finally there was made choice of the best neural network predictor for every different time-series in order to be able to compare its results with ANFIS or any other system created. The best neural network predictors for every different time-series are introduced: 1. Mean daily Solar radiation: 5-15-1, normalized in the range 0.1-0.9, using sigmoid activation function 2. Mean daily Temperature: 5-10-1, normalized in the range 0.1-0.9, using sigmoid activation function 3. Maximum daily Temperature: 7-5-1, normalized in the range 0.1-0.9, using sigmoid activation function 4. Minimum daily Temperature: 2-15-1, normalized in the range 0.1-0.9, using sigmoid activation function 18 14 12 10 8 6 4 1070 750 800 850 900 Number of samples 950 Mean daily solar radiation predictions vs measurements with 5-15-1 N.N. for a small sample of data Measurements Predictions 18 16 14 12 10 8 1090 1100 Number of samples 1110 1120 1130 Test Measurement-Prediction with Neural Network Mse=27.7587 Rmse=5.2687 Ame=4.6675 p=0.94254 Ndei=0.31335 Min Temperature Daily Measurements - Predictions 20 1080 Mean daily temperature predictions vs measurements with 5-10-1 N.N. for a small sample of data Test Measurement-Prediction with Neural Network Mse=6.4888 Rmse=2.5473 Ame=1.9196 p=0.96458 Ndei=0.10388 Max Temperature Daily Measurements - Predictions Measurements Predictions 16 15 Measurements Predictions 10 5 0 1050 1055 1060 1065 1070 1075 1080 1085 1090 1095 1100 Number of samples Maximum daily temperature predictions vs measurements with 7-5-1 N.N. for small a sample of data 1050 1055 1060 1065 1070 1075 1080 1085 1090 1095 Number of samples Minimum daily temperature predictions vs measurements with 2-15-1 N.N. for a small sample of data 7. ANFIS 600 Measurements Predictions 550 500 450 400 350 300 250 200 800 820 840 860 880 900 920 Number of samples 940 960 980 Max Temperature Daily Measurements - Anfis Predictions Mean daily solar radiation predictions vs measurements with ANFIS for a small sample of data Test Measurement-Prediction with Anfis Mse=6.4536 Rmse=2.5404 Ame=1.9207 p=0.96413 Ndei=0.10359 20 Measurements Predictions 18 16 14 12 10 8 6 1060 1065 1070 1075 1080 1085 Number of samples 1090 1095 1100 Test Measurement-Prediction with ANFIS Mse=9.183 Rmse=3.0304 Ame=2.6213 p=0.97699 Ndei=0.15011 Measurements Predictions 14 12 10 8 6 4 1080 1085 1090 1095 1100 1105 1110 1115 1120 1125 Number of samples Mean daily temperature predictions vs measurements with ANFIS for a small sample of data Min Daily Temperature Measurements - ANFIS Predictions Solar Daily Measurements - ANFIS Predictions Test Measurement-Prediction with ANFIS Mse=7543.572 Rmse=86.8537 Ame=64.3385 p=0.81305 Ndei=0.22346 Mean Daily Temperature Measurements - ANFIS Predictions For the training and testing of data there was created a first Sugeno type system consisting of 7 inputs. The combination of fuzzy logic with neural networks proved to have very good results in the daily solar radiation and temperature forecasting. Test Measurement-Prediction with ANFIS Mse=25.5036 Rmse=5.0501 Ame=4.4378 p=0.93309 Ndei=0.30015 18 Measurements Predictions 16 14 12 10 8 6 4 2 0 -2 1040 Maximum daily temperature predictions vs measurements with ANFIS for small a sample of data 1050 1060 1070 Number of samples 1080 1090 1100 Minimum daily temperature predictions vs measurements with ANFIS for a small sample of data 8. ‘Best’ Neural Network predictors vs ANFIS Below there is presented a comparison between the ‘best’ N.N. predictors and ANFIS, for the four different timeseries, using as criterion the metric that proved to be the most accurate, which is the correlation coefficient (ρ). Mean Daily Solar radiation Ν.N. 5-15-1 ρ=0.81751 ΑΝFIS ρ=0.81305 Mean Daily Temperature Ν.N. 5-10-1 ρ=0.97689 ΑΝFIS ρ=0.97699 Maximum Daily Temperature Ν.N. 7-5-1 ρ=0.96458 ΑΝFIS ρ=0.96413 Minimum Daily Temperature Ν.N. 2-15-1 ρ=0.94254 ΑΝFIS ρ=0.93309 9.Conclussions Concerning neural networks that were created and simulated there have been excluded the below conclusions: The most efficient training algorithm proved to be the Levenberg-Marquardt Backpropagation, as it surpassed the others not only in quality of results but also in speed. The best type of normalization proved to be the one in region 0.1-0.9 in combination with sigmoid activation function in the hidden layer nodes. Most suitable choice of inputs is 5 or 7 previous days data. The number of nodes used in the hidden layer depends from the type and the complexity of the timeseries, and from the number of inputs. Concerning neuro-fuzzy predictors and more specifically ANFIS it is proved that the combination of linguistic rules of fuzzy logic with the training algorithm used in neural networks, contribute in very qualitative prediction results, which approach the ‘best’ neural predictors results.