See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/341635348 LSTM-based Models for Earthquake Prediction Conference Paper · March 2020 DOI: 10.1145/3386723.3387865 CITATIONS READS 41 1,675 3 authors, including: Asmae Berhich Mohammed Issam Kabbaj Ecole Mohammadia d'Ingénieurs Ecole Mohammadia d'Ingénieurs 7 PUBLICATIONS 127 CITATIONS 33 PUBLICATIONS 670 CITATIONS SEE PROFILE All content following this page was uploaded by Asmae Berhich on 21 August 2022. The user has requested enhancement of the downloaded file. SEE PROFILE LSTM-based Models for Earthquake Prediction Asmae BERHICH, Fatima-Zahra BELOUADHA, Mohammed Issam KABBAJ AMIPS research team, E3S research center Mohammadia school of engineers, Mohammed V University in Rabat, Morocco berhich.asmae@gmail.com, belouadha@emi.ac.ma, kabbaj@emi.ac.ma ABSTRACT Over the last few years, many works have been done in earthquake prediction using different techniques and precursors in order to warn of earthquake damages and save human lives. Plenty of works have failed to sufficiently predict earthquakes, because of the complexity and the unpredictable nature of this task. Therefore, in this work we use the powerful deep learning technique. A useful algorithm that captures complex relationships in time series data. The technique is called long short-term memory (LSTM). The work employs this method in two cases of study; the first learns all the datasets in one model, the second case learns the correlations on two divided groups considering their range of magnitude. The results show that learning decomposed datasets gives more wellfunctioning predictions since it exploits the nature of each type of seismic events. CCS Concepts • Computing methodologies → Machine learning → Machine learning approaches → Neural networks. Keywords Prediction; earthquakes; LSTM; time series data; deep learning. 1. INTRODUCTION The application of ANN, RNN, and DNN models is present in earthquake prediction where many works have emerged using different techniques, models and data sets of different areas. However, many of them were not capable to make a reliable prediction especially in the case of large earthquakes. They couldn’t capture the correlations in the datasets because of their small number in the studied areas. In this paper, we present our work of earthquake prediction using the data sets of Morocco since it is not immunized from the risks of earthquakes and their disastrous consequences. Earthquakes generate significant human and material damage and their costs are in billions of Dirhams. The case of Al Hoceima, which experienced a tragedy in 2004 is an example. The data set of Moroccan seismicity from 1900 to 2019 is given by the National Geophysical Institute of the National Centre for Scientific and Technical Research CNRST. This work is based on the application of the deep learning technique Long short-term memory (LSTM), which is widely used to classify, process and predict time series data problems. LSTM is a variant of recurrent neural networks known by their ability to model sequence data and to remember past data in memory. Earthquakes can suddenly strike any region in the world; they lead to great damages depending on their magnitudes. Earthquakes with large magnitudes could be potentially fatal and cause serious economic and material losses. The medium earthquakes are also dangerous especially for countries that are not taking the necessary precautions. Warning from earthquakes was and is still a challenging problem that needs more in-depth researches and suitable solutions. Many works on earthquake predictions have been done for many years. Even that it was considered an impossible task, machine learning and deep learning challenges have allowed realizing this task and making it possible. From 1994 till now seismologists and scientists are trying to solve the earthquake prediction using neural networks and deep learning [19]. However, a lot of works do not fill the full meaning of earthquake prediction. Seismologists had defined earthquake prediction by providing [1]: By this powerful algorithm, we model our seismic datasets in two cases of study. The first one gives the prediction of the magnitude, location and year of the incoming earthquakes using the whole dataset, the second case focuses on datasets decomposition based on their magnitude range. The decomposition makes two groups of the dataset for prediction. The first group presents large earthquakes, and the second presents medium and small earthquakes. The models of both cases are evaluated and compared in the following sections. Section 2 presents the classification of previous works that have been applied in earthquake prediction. The LSTM model architecture is explained in section 3. Section 4 describes our datasets and introduces our proposed methodology and model. The performance of our models is evaluated and discussed in section 5. Finally, the last section concludes in brief the aim of our paper. a specific time range, a specific location, a specific magnitude range and the probability that performs the prediction. In this section, we present the previous works classified into four categories: Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. NISS2020, March 31-April 2, 2020, Marrakech, Morocco © 2020 Association for Computing Machinery. ACM ISBN 978-1-4503-7634-1/20/03...$15.00 https://doi.org/10.1145/3386723.3387865 Works based on precursor signals, this technique uses the natural effects and their abnormal behaviors, for example, Fan et al [6] present an earthquake prediction approach based on extracting the texture and emergence frequency of clouds and estimate the possible location of earthquakes. Florido et al [7] analyze large seismic events of Chilean zones to detect precursory patterns for large earthquakes using clustering algorithms. Hayakawa [9] Propose to utilize the electromagnetic phenomena as a short-term precursor and presented the history and the reason for using these precursors. Hayakawa and Yamauchi et al [10] monitor the milk 2. RELATED WORKS yield of cows at Kagawa, and they find an abnormal depletion about 10 days before an earthquake, they discuss and compare this behavior with electromagnetic precursors. Works based on statistical and mathematical approaches like in [14] Kannan identify patterns within the random occurrences of multiple seismic zones like California, central USA, Northeast USA, Hawaii, Turkey, and Japan using Poisson distribution and spatial connection model. Pasari [25] employs probability distributions such as Weibull and gamma models to observe the cumulative probability of magnitude 6.0 or higher in the northern Himalaya. Sitharam et al [26] use three models lognormal, Weibull and gamma to estimate the probability of occurrences in six different seismic regions of different tectonic features. They find that a higher value of ln L guarantees a performant model using the logarithmic probability of the likelihood function. Boucouvalas et al [5] produce a modified version of the mathematical technique FDL based on Fibonacci, Dual and Lucas numbers, in order to predict the location of the epicenter of an earthquake. works based on machine learning and artificial neural networks (ANN), such as Su and Zhu [29] who build an ANN model to study the correlations between the maximum of earthquakes affecting coefficient and influencing factors like basemen rock and site condition. Ni and Zhou [22] apply A model for damage detection, where they applied a principal component analysis (PCA) based data reduction technique to the measured frequency response functions FRFs as the input variables of ANN instead of the raw FRF data. Asim et al [2] elaborate a machine learning classifiersbased model, it utilizes the eight calculated features from the geophysical parameters of Gutenberg-Richter’s inverse law. Four classifiers are applied and compared neural networks, recurrent neural networks, random forest, and linear programming boost ensemble classifiers in order to predict earthquakes magnitude for the Hindukush. Moreover, Asim et al [3] Compute seismic indicators to consider the maximum of information of seismic activity in different regions, then they apply the genetic programming and Adaboost (GP-Adaboost) as an ensemble method to predict earthquakes of magnitude 5.0 and above. And in [20] the same authors calculate sixty seismic features using geophysical and seismological concepts, then a support vector machine (SVM) regressor combined by a hybrid neural network (merging three different ANNs) are employed to predict earthquakes in three different regions. Furthermore, it exists some ANN approaches applied to precursors signals to predict earthquakes. For instance, Moustra et al [20] predict the magnitude of the impending seismic events using artificial neural networks and the seismic electric signals as features for input data, because they are believed to occur before an earthquake and considered as earthquake precursors. Itai et al [13] introduce a multilayer neural network using compression data to detect precursor signals from the electromagnetic waves. These waves radiate from the earth’s crust and they are useful for earthquake prediction. Külahci et al [16] build A three-layer Levenberg-Marquardt feedforward learning algorithm using eight different parameters including radon gas changes for earthquake prediction. Ozerdem et al [23] Elaborate a neural network model to extract correlations between Spatio-temporal electric field data in order to detect hazard precursory anomalous signal patterns. Works based on deep learning i.e. Li and Liu [17] suggest a combination of backpropagation neural networks and an improved variant of particle swarm optimization for magnitude prediction, for improving PSO they used a non-linear decreasing inertia weight strategy. Mahmoudi et al [18] develop 128 different MLP networks to find the best architecture of the magnitude earthquake prediction model. Narayanakumar and Raja [21] use seismic indicators and historical data, and evaluate the performance of Back-propagation (BP) neural networks to predict earthquakes in the region of Himalaya. The proposed model is a three-layer feed-forward BP ANN. Finally, Panakkat and Adeli [24] propose an RNN model for earthquake location and time prediction of moderate to large earthquakes using seismic, considering two cases of studies location decomposition and time decomposition. We notice that most of the abovementioned neural network models use various kinds of features as input to predict the time and magnitudes of earthquakes, but none of them considers the decomposition of magnitude ranges, and the correlations are not well studied either. 3. MODEL ARCHITECTURE 3.1. Recurrent neural networks architecture Recurrent neural networks (RNNs) are a class of artificial neural networks (ANNs) specified by their memory state. The RNNs network is similar to ANN's one, where the model calculates the output by multiplying inputs with weights and the activation function, in order to add the non-linearity to the network (see Figure 2). In contrast, RNNs considers the memorized output of the previous time step t-1 and add it to the inputs of current time step t, and this is the role of their memory state (equation 1 and 2). ht=tanh (Whhht-1+Wxhxt) (1) yt=Whyht (2) where Whh is the weight of the previous hidden state h t-1, xt is the current input, Wxh is the weight of the current input state and tanh is the activation function. yt is the output state and Why is the weight at the output state. The memory is the key of RNNs which allows them to learn the correlations in sequence data, where it examines the whole context and elements of each timestep to make predictions. RNNs are applicable in time series data because they are dependent on each other, which present behavior and the trend change by time in their sequence values. But, RNNs become sometimes untrainable since they suffer from the vanishing and exploding gradient problem. When the information is passing in long timesteps and deep layers, the gradients can't progress and converge then the model cannot learn and the gradient stays constant. 3.2. Long short-term memory architecture LSTM is an RNN that replaces the standard neural network layer with LSTM cells, proposed by Hochreiter [11] in 1991. The LSTM cells are enhanced by three components called gates: the input gate, the forget gate and the output gate, its architecture is illustrated in Figure 2. LSTM trains the features in a different fashion where it starts by using the tanh activation function to squash the input data and make them very small in a very non-linear manner. After that, the features are passed into the input gate which takes the relevant information from the squashed input by multiplying them with a sigmoid function, this function filters the elements that are not required where the values are between 0 (remove from the network) and 1 (pass through the network). Afterward, another important element called the internal state is the memory of the current state. It takes into account the previous state s t-1 and adds it to the input data (as in RNN). It uses an addition operation instead of multiplication to avoid the vanishing problem. These operations are described in the following equations: The recurrence of states is enforced by a forget gate, this gate decides which state elements should be memorized or forgotten using a sigmoid function. Finally, the Tanh function squashes the outputs. These outputs are controlled by an output gate that specifies the values that are allowed to be the outputs of the current cell state. it= σ (Wi . [ht-1, xt] + bi) (3) ct=tanh (Wc [ht-1, xt] + bc) (4) ft= σ (Wf . [ht-1, xt ] + bf) (5) ot= σ ( Wo [ht-1, xt] + bo) (6) ht=ot * tanh (ct) (7) where it, ct, ft, ot, ht are the input gate, cell state, forget gate, output gate, and the hidden state respectively. Wi, Wc, Wf and Wo are their weight matrices respectively. bi, bc, bf, and bo are the biases. Xt is the input, ht-1 is the last hidden state, ht is the internal state. σ is the sigmoid function. 4. LSTM MODEL FOR EARTHQUAKE PREDICTION In this section, we present the dataset studied in our work, the preprocessing used techniques and finally we explain our methodology and describe our LSTM models architecture. 4.1. Dataset In this work, we use the dataset of seismic activity of the regions of Morocco recorded from 1900 to 2019. The data was gathered from the national geophysical institute of CNRST. 4.1.1. Data preprocessing After cleaning, removing duplicated data and merging data files into one file, we set 10 features which present: • • • • The geographic characteristics Latitude and Longitude of the seismic event The depth of the event by kilometers Day, Month, Year, Hour, Minute and Second: we keep all the elements of the time when the events occurred, to conserve the exact information. Mag is the magnitude of the seismic event Some negative magnitudes presented as outliers in our dataset are removed since they don’t cause any damage and they are not felt. Feature generation or other possible added features are not treated, since the ANN and deep learning models especially LSTM are capable to extract and learn original and complex data without using any other tool to generate their characteristics. For instance, authors in [4] compared the situations when using the features with generated characteristics like b-value and using original features, thereby they found the same results which demonstrate the capability of deep learning to learn insights by itself. 4.1.2. Data description After data preparation, we get 29689 from the 32396 seismic events in which we will apply our model directly. Table 1 presents the descriptive statistics of our data, where the largest magnitude is 7.3 and the smallest one is 0.02. 4.2. Methodology Earthquakes events are a time series data, that are not captured by linear and classical methods, but with LSTM complex architectures can be successfully trained and predicted with multiple input variables. Therefore, our research is based on this model in order to find an efficient and performant result in earthquake prediction. In this section, we present the important steps of our work to predict earthquakes in Moroccan regions. Initially, we start by introducing the first case of our work, which uses the model LSTM and the whole dataset we have presented above. In the second case, we apply the LSTM model on decomposed data based on their magnitude range. In the two cases of studies, we predict the magnitude, location and year of the incoming earthquakes. The Flow chart in Figure 3 presents our proposed LSTM model to predict earthquakes. In the first step, the datasets are normalized using the Min-Max scaler, this scaler is a transformation technique calculated by formula 8, it transforms datasets to an exact same scale, in a range between 0 and 1. Such transformers are used to standardize the features so that no one dominates the others. The second step consists of dividing data into 80% of the training set and 20% of the testing set. Then the model is trained and supported by the mean squared error (MSE) for error calculation and evaluation, and Adam optimizer for convergence to the minimum error. Adam is widely used in deep learning because it helps the model to achieve good results fast. In [15] empirical research demonstrates that Adam works well and it outperforms other stochastic optimizers. Finally, when the model is trained, the predictions are computed on the testing set, and the gap between predicted values and real values is calculated using the Mean squared error and the Mean absolute error (MAE). The MAE and MSE, are used to evaluate regression models and were used in previous works of earthquakes prediction. Their calculations are described in formula 9 and 10 respectively. π₯−minβ‘(π₯) z = max(π₯)−min(π₯) (8) π 1 ππ΄πΈ = π ∑ | π¦π − π¦Μ| (9) π=1 1 π 1 πππΈ = π ∑ π=1 (π¦π − π¦Μ)2 (10) 1 4.2.1. Study case 1: Earthquake prediction using LSTM In this case, we are using the 29689 seismic events, that we present in 4.1.2, where we apply our proposed architecture of the LSTM algorithm illustrated in figure 4. The proposed architectures are found after doing a tuning approach to search the most adequate architecture. The way used to tune is suggested in [27], where they recommend finding the balance between underfitting and overfitting with an examination of training and testing loss. As it is described in figure 4, our proposed architecture contains a LSTM layer, a dropout function, a dense network, and reLu activation function. The LSTM layer contains 15 neurons which give the optimal result after trying multiple numbers for this case. The dropout function is applied to LSTM layer. It is used to drop out of the network some neurons, where it doesn't consider them during the training process. This function helps the model to avoid overfitting since it ignores the co-dependency between neurons [28]. This limits the power of each neuron to deal with its calculation for new inputs individually, and focus on historical data. The dense network is a fully connected neural network, that is connected to the output neurons of the LSTM layer. And it is used to give the desired targets predicted from the output of the pattern of the LSTM layer. The dense network applies an activation function for outputs calculation. In our case, we use the Rectified Linear Function (reLu) as it is simple and performant. It returns the main value if it is positive and 0 if it is null or negative. This function is recommended by Goodfellow and Bengio [12] where they say: ‘’ major algorithmic change that has greatly improved the performance of feedforward networks was the replacement of sigmoid hidden units with piecewise linear hidden units, such as rectified linear units.”. The outputs of the model are presented in four features the Magnitude, Longitude, Latitude and Year of the coming earthquakes. 4.2.2. Study case 2: Earthquake prediction using LSTM and magnitude decomposition In this case, we consider two LSTM-based models applied on divided datasets. The decomposition of our dataset is based on their range of magnitudes since the characteristics of large earthquakes couldn’t be related to the smallest ones. There is a big gap between the two seismic magnitude ranges. Large, medium and small earthquakes do not present the same problem; each one have his own type of dangers, patterns, and typical features, e.g. we could never study rich and poor people shopping activities in one model because we think it is the same case of our seismic datasets. Hence, we decompose the datasets into two different magnitude ranges, as follows: • • Small and medium earthquakes: From magnitudes 0.2 to under 5.0; Large earthquakes: magnitude 5.0 and above. The flowcharts in Figure 4 present our used architectures for each model, which we found after testing multiple different ones. The LSTM layer contains 15 neurons for small and medium earthquakes, and 10 neurons for large earthquakes. The activation function applied in both models is reLu and a dropout function is applied in both models. The outputs of the models are presented in four features the Magnitude, Longitude, Latitude and Year of the coming earthquakes. 5. PERFORMANCE EVALUATION AND DISCUSSION In this section, we evaluate the performance of each model in both cases of studies using the MAE and MSE metrics. First, we start by evaluating the performance of earthquake prediction when using the whole dataset in one model. After that, we evaluate the performance of the prediction when decomposing the datasets into two different magnitude ranges. Finally, we discuss and compare the results of the two cases, and we evaluate their performance against ANN models with the same architectures. The simulation results of our model in the first case give good results where the MAE is 0.075 and the MSE is 0.014 as shown in table 3. But, in the second case when data is decomposed into two groups results become much better and improved; for large earthquake, the MSE is 0.11 and MAE is 0.027, for small earthquakes, the MAE is 0.041 and the MSE is 0.0058 and the overall error are calculated where the MAE is 0.042 and MSE is 0.0059 (Table 2). Training time in the second case is faster since it ends in 6.27 seconds before the first case’s model. Hence, the decomposition of datasets by magnitudes gives more performant results, because the lack of datasets for large earthquakes makes them hard to learn. But, the power of LSTM in learning complicated data realizes the extraction of patterns from the few datasets of large earthquakes, especially when learning them individually. The fitting curves of our models are illustrated in Figure 5, all the models are well fitted and not overfitted since we use the dropout function. Comparing with last works is very hard in the field of earthquake prediction because of the use of different performance metrics and different dataset, catalogs, and regions [8]. Consequently, to evaluate the performance of our LSTM models we build ANN models with same architectures for both cases, Table 3 shows the results of each case. We choose the algorithm ANN since it is widely used in literature as it is presented in Section 2. The LSTM models are outperforming the ANN models in both cases. Especially when predicting large earthquakes, ANN gives 0.30 for MAE and 0.13 for MSE. ANN doesn’t show any difference in performance when decomposing datasets since it wasn't able to well predict large earthquakes as LSTM did. Whereas, the training time for decomposed data is slow by 37.62 seconds. In brief, the results of our experiments prove the performance and effectiveness of our LSTM models in earthquakes prediction. In addition, our models are not complicated by using seismic indicators and generated features. The LSTM can learn patterns and features from datasets by itself. Furthermore, our work fills the full meaning of earthquakes prediction where it gives all the abovementioned important elements in section 1, the magnitude, location and time. Then, we recommend to use and evaluate our models in earthquake prediction with similar datasets and seismic activity. 6. CONCLUSION In this paper, we have suggested a new model for earthquake prediction using historical datasets. We have built two model prediction architectures. The first is an LSTM model that we apply on the dataset of Moroccan regions. It predicts year, location and magnitude of earthquakes. The second one focuses on datasets decomposition based on their range of magnitudes and applies two LSTM models on the divided data. The decomposition we propose is performant and efficient especially when predicting large earthquakes. Experiment results of our LSTM models are described and evaluated and compared with ANN models. The results demonstrate that our proposed model achieves favorable performance compared to others. Figure 1. Typical architecture of Recurrent neural networks algorithm Figure 2. Typical architecture of Long-short term memory algorithm Figure 4. Flow charts of LSTM models used in earthquake prediction. The model in the left is applicated when using all datasets, and the other two models are applicated when using decomposed datasets. Figure 3. Flow chart of the proposed LSTM model Figure 5. Plotting results of prediction models with and without datasets decomposition Table 1. Descriptive statistics of seismic dataset of Morocco from 1900 to 2019 after data preparation Characte ristics count Depth Latitude Longitude Year Month Day Hour Minute Second Mag 29689 24.15938 1 29689 34.61972 3 29689 29689 29689 29689 29689 5.884166 15.900805 29.421368 29.106403 2.410389 25.23118 1.777117 25.008664 3.505600 9.048841 29689 11.4719 93 6.93243 9 29689 std 29689 2000.95 4865 20.5470 24 17.397553 17.608949 0.879518 min 0.100000 20.02000 -4251.00000 1901.00 1.000000 1.000000 0.00000 0.000000 0.000000 0.020000 25% 10.00000 33.52400 -6.550000 1991.00 3.000000 8.000000 5.00000 14.000000 14.000000 1.700000 50% 18.07185 3 35.25000 -4.100000 2009.00 6.000000 16.000000 12.0000 29.000000 29.000000 2.400000 75% 30.00000 35.66600 -3.676000 2016.00 9.000000 24.000000 17.0000 44.000000 44.000000 2.940540 max 675.0000 43.32000 7.648000 2019.00 12.000000 31.000000 23.0000 59.000000 59.000000 7.300000 mean -5.384475 infrared cloud images. (Dec. 2015), 98150E. Table 2. Experiment results of LSTM models, when using all datasets and when using decomposed magnitude ranges using the performance metrics MAE and MSE, and the elapsed time during training process. [7] Florido, E. et al. 2015. Detecting precursory patterns to enhance earthquake prediction in Chile. Computers and Geosciences. 76, (Mar. 2015), 112–120. DOI:https://doi.org/10.1016/j.cageo.2014.12.002. [8] Galkina, A. and Grafeeva, N. 2019. Machine learning methods for earthquake prediction: A survey. CEUR Workshop Proceedings. 2372, (2019), 25–32. [9] Hayakawa, M. 2016. Earthquake prediction with electromagnetic phenomena. AIP Conference Proceedings (Feb. 2016). [10] Hayakawa, M. et al. 2016. On the Precursory Abnormal Animal Behavior and Electromagnetic Effects for the Kobe Earthquake (M~6) on April 12, 2013. Open Journal of Earthquake Research. 05, 03 (2016), 165–171. DOI:https://doi.org/10.4236/ojer.2016.53013. [11] Hochreiter, J. 1991. DIPLOMARBEIT IM FACH INFORMATIK Untersuchungen zu dynamischen neuronalen Netzen. [12] Ian Goodfellow, Yoshua Bengio, A.C. 2017. The Deep Learning Book. MIT Press. 521, 7553 (2017), 785. DOI:https://doi.org/10.1016/B978-0-12-391420-0.09987X. [13] Itai, A. et al. 2005. Multi-layer neural network for precursor signal detection in electromagnetic wave observation applied to great earthquake prediction. (Sep. 2005), 31–31. [14] Kannan, S. 2014. Innovative Mathematical Model for Earthquake Prediction. Engineering Failure Analysis. 41, (2014), 890–895. DOI:https://doi.org/10.1016/j.engfailanal.2013.10.016. [15] Allen, C.R. 1976. Responsibilities in earthquake prediction. Bulletin of the Seismological Society of America. 66, 6 (1976), 2069–2074. Kingma, D.P. and Ba, J.L. 2015. Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings (2015). [16] Asim, K.M. et al. 2017. Earthquake magnitude prediction in Hindukush region using machine learning techniques. Natural Hazards. 85, 1 (Jan. 2017), 471–486. DOI:https://doi.org/10.1007/s11069-016-2579-3. KülahcΔ±, F. et al. 2009. Artificial neural network model for earthquake prediction with radon monitoring. Applied Radiation and Isotopes. 67, 1 (Jan. 2009), 212–219. DOI:https://doi.org/10.1016/J.APRADISO.2008.08.003. [17] Asim, K.M. et al. 2018. Seismic indicators based earthquake predictor system using Genetic Programming and AdaBoost classification. Soil Dynamics and Earthquake Engineering. 111, (Aug. 2018), 1–7. DOI:https://doi.org/10.1016/j.soildyn.2018.04.020. Li, C. and Liu, X. 2016. An improved PSO-BP neural network and its application to earthquake prediction. Proceedings of the 28th Chinese Control and Decision Conference, CCDC 2016 (Aug. 2016), 3434–3438. [18] Mahmoudi, J. et al. 2016. Predicting the Earthquake Magnitude Using the Multilayer Perceptron Neural Network with Two Hidden Layers. Civil Engineering Journal. 2, 1 (Jan. 2016), 1–12. DOI:https://doi.org/10.28991/cej-2016-00000008. [19] Mignan, A. and Broccardo, M. 2019. Neural Network Applications in Earthquake Prediction (1994-2019): Meta-Analytic Insight on their Limitations. September (2019), 1–25. Metrics MAE MSE Training time ALL Dataset Range [0.2, 0.5[ Range [0.5,0.2[ 0.075612 26 0.014995 616 86.63059 59224700 9 0.041774 616 0.005821 1996 0.11138 759 0.02728 5077 15.6931 1920000 0012 61.66931 41 Total for the decomposed datasets 0.042091154 0.005918795 77.362433300 00002 Table 3. Experiment results of ANN models, when using all datasets and when using decomposed magnitude ranges using the performance metrics MAE and MSE, and the elapsed time during training process. Metrics MAE MSE Training time ALL Dataset Range [0.2, 0.5[ Range [0.5,0.2[ 0.072536 81569956 802 0.012023 64744513 1583 0.081498 21554646 755 0.015362 80415918 4586 75.60912 97000000 1 0.30554 2783127 6546 0.13679 7647679 3874 5.79379 8300000 006 43.78028 88 Total for the decomposed datasets 0.0825543178 852341 0.0159352242 96990257 81.402928000 00002 7. REFERENCES [1] [2] [3] [4] Bhatia, A. et al. 2018. Earthquake forecasting using artificial neural networks. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives. 42, 5 (2018), 823– 827. DOI:https://doi.org/10.5194/isprs-archives-XLII-5823-2018. [5] Boucouvalas, A.C. et al. 2015. Modified-Fibonacci-DualLucas method for earthquake prediction. Third International Conference on Remote Sensing and Geoinformation of the Environment (RSCy2015) (Jun. 2015), 95351A. [20] Moustra, M. et al. 2011. Artificial neural networks for earthquake prediction using time series magnitude data or Seismic Electric Signals. Expert Systems with Applications. 38, 12 (Nov. 2011), 15032–15039. DOI:https://doi.org/10.1016/j.eswa.2011.05.043. [6] Fan, J. et al. 2015. Research on earthquake prediction from [21] Narayanakumar, S. and Raja, K. 2016. A BP Artificial Neural Network Model for Earthquake Magnitude Prediction in Himalayas, India. Circuits and Systems. 07, 11 (2016), 3456–3468. DOI:https://doi.org/10.4236/cs.2016.711294. [26] [22] Ni, Y.Q. et al. 2006. Experimental investigation of seismic damage identification using PCA-compressed frequency response functions and neural networks. Journal of Sound and Vibration. 290, 1–2 (Feb. 2006), 242–263. DOI:https://doi.org/10.1016/j.jsv.2005.03.016. Sitharam, A.S.T.G. and Haider, S.T. 2015. Probabilistic models for forecasting earthquakes in the northeast region of India. Bulletin of the Seismological Society of America. 105, 6 (Dec. 2015), 2910–2927. DOI:https://doi.org/10.1785/0120140361. [27] [23] Ozerdem, M.S. et al. 2006. Self-organized maps based neural networks for detection of possible earthquake precursory electric field patterns. Advances in Engineering Software. 37, 4 (2006), 207–217. DOI:https://doi.org/10.1016/j.advengsoft.2005.07.004. Smith, L.N. 2016. a Disciplined Approach To Neural Network Hyper-Parameters: Part 1. (2016), 1–21. [28] Srivastava, N. et al. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. [29] Su, Y.P. and Zhu, Q.J. 2009. Application of ANN to prediction of earthquake influence. 2009 2nd International Conference on Information and Computing Science, ICIC 2009 (2009), 234–237. [24] Panakkat, A. and Adeli, H. 2009. Recurrent neural network for approximate earthquake time and location prediction using multiple seismicity indicators. ComputerAided Civil and Infrastructure Engineering. 24, 4 (2009), 280–292. DOI:https://doi.org/10.1111/j.14678667.2009.00595.x. [25] Pasari, S. 2018. Stochastic modelling of earthquake interoccurrence times in Northwest Himalaya and View publication stats adjoining regions. Geomatics, Natural Hazards and Risk. 9, 1 (Jan. 2018), 568–588. DOI:https://doi.org/10.1080/19475705.2018.1466730.