-ECE539 Project Report (Professor Yu Hen Hu)- Application of Multilayer Perceptron (MLP) Neural Network in Identification and Picking P-wave arrival Haijiang Zhang Department of Geology and Geophysics University of Wisconsin-Madison Abstract Quickly detecting and accurately picking the first-arrival of a P wave is of great importance in locating earthquakes and characterizing velocity structure, especially in the era of large volumes of digital and real-time seismic data. The detector should be capable of finding the onset of the P-wave arrival against the background of microseismic and cultural noise. Normally, P-wave onset is characterized by a rapid change in the amplitude and/or the arrival of high-frequency energy. The Akaike information criteria (AIC) picker has been used to detect and pick the Pwave arrival (Maeta 1986; Maeta 1989). But AIC picker requires an appropriate time window, or it will detect the wrong P-wave arrival. The Multilayer Perceptron (MLP) neural network is used to detect the P-wave arrival, from which a time window can be chosen for the AIC picker. This method has been applied to our PASO array data set. About 90% of P first-arrivals are detected correctly. Compared with manual picks, this picker provides onset times and uncertainties with high confidence. 91% of autopicks are within 0.15 seconds of analyst picks for this data set. 1 1. Introduction Quickly detecting and picking the arrival times for P and S waves from the recordings of earthquake events are of great importance in event location, event identification, source mechanism analysis, and spectral analysis. Traditionally, this work is did by an analyst who checking the seismograms and picking out P and S arrivals based on his individual experience. This task is time consuming and subjective, especially in the era of large volumes of digital and real-time seismic data. There is a need to provide a more reliable and robust alternative, which is less time consuming and perhaps more objective. There have been some techniques in the literature to detect and pick the seismic waves arrivals. The traditional approach to automatic phase detection has been to apply a series of narrow bandpass frequency filters and then use the absolute value as the characteristic function (CF). When the ratio between the short term average (STA) and the long-term average (LTA) of the CF exceeds a predefined threshold, a detection is declared. Absolute values and the envelope function of the seismogram are usually used as CF (Allen, 1982). Artificial neural networks have also been used to construct the characteristic function to detect and pick the seismic phases (Dai et al., 1995, 1997; Zhao et al., 1999; Wang et al., 1997). It is claimed that ANN method is very successful and promising in detecting and picking seismic phases. There are two different types of input vector fed to the neural network, which are the associated values of the seismograms such as mean amplitude, spectral properties, planarity, etc., and the absolute values of the seismograms, respectively. Comparatively, the former method may lose information and involve too much computing time. Using the full waveforms as the network input might be a better choice. ANN is very successful in detecting the seismic phases. However, it is difficult to pick the seismic arrival time from the characteristic function. It is not easy to determine which point should be chosen as the arrival time because there is a region of the characteristic function exceeding the predefined threshold. Multi-term method is tried to shrink this region, but it still requires an empirical value to determine the phase arrival (Zhao et al., 1999). Different from the previous methods, the Akaike Information Criterion (AIC) picker is used to pick the P-wave arrival in this report. When the time 2 window is chosen properly, AIC picker can choose the phase arrival very accurately. The MLP neural network will choose a time window for the AIC picker. This report will review the AIC picker and the Multilayer Perceptron (MLP) neural network first. Then I will discuss the problem of constructing the MLP neural network to detect the P-wave arrival and how the AIC picker is used to pick the P-wave arrival. Finally the application of this method in the PASO array data is given. 2. AIC Picker Suppose that the seismogram can be divided into locally stationary segments each modeled as an Autoregressive (AR) process and the intervals before and after the onset time are two different stationary processes (Sleeman et al, 1999). The order and the value of the AR coefficients change when the characteristic of the current segment of seismogram is different from before. For example, the typical seismic noise is well represented by a relatively low order AR process, whereas seismic signals usually require higher order AR process (Leonard, et al., 1999). Akaike Information Criterion (AIC) is always used to determine the order of the AR process when fitting a time series with AR process, which indicates the badness of the model fit as well as the unreliability (Akaike, 1974). This method has been used in onset estimation by analyzing the variation in AR coefficients representing both multi-component and single-component traces of broadband and short period seismogram (Leonard et al., 1999). When the order of the AR process is fixed, AIC function is a measure for the model fit, and the point where AIC is minimized determines the optimal separation of the two stationary time series in the least squares sense, and thus is interpreted as the phase onset (Sleeman et al, 1999). This picker is known as AR-AIC picker (Leonard, 2000). Different from AR-AIC picker, Maeta calculates AIC function directly from the seismogram, without using the AR coefficients (Maeta, 1985 and Maeta, 1986). The onset is the point where the AIC has a minimum value. For the seismogram x, the AIC value is defined as AIC(k)=k*log(variance(x[1,k]))+(n-k-1)*log(variance(x[k+1,n])) where k goes through all the seismogram. Noted that AIC picker finds the onset point as the global minimum. For this reason, it is necessary to choose a time window that includes only the segment of seismogram of 3 interest. If the time window is chosen properly, AIC picker can find the p-wave arrival accurately. For the seismogram with a very clear onset, AIC values have a very clear global minimum, which corresponds to the P-wave arrival (Figure 1a). For the seismogram with a relatively low S/N ratio, there are a few local minima in AIC values. But the global minimum still indicates accurately the P-wave onset (Figure 1b). When there are more noises in the seismogram, global minimum cannot guarantee to indicate the P-wave arrival (Figure 1c). That is, the signal to the noise ratio in the seismogram affects the accuracy of the AIC picker to some extent. But it is noted that this effect is not significant. For this reason, we do not filter the seismogram in advance because the band pass filter can reduce the first motion and distort the true P-wave arrival (Douglas et al., 1997). a b c Figure 1. Seismogram and its corresponding AIC values. a) For Seismogram with clear p-wave arrival, AIC value is a very clear minimum point. b)For seismogram with clear pwave arrival with relatively lower S/N ratio, AIC function has many local minima, whereas the global minima still corresponds to the p-wave onset. c) For very low S/N seismogram, there are a few of local minima close to each other. In this case, the global minima ca not be guaranteed to be the p-wave arrival. 4 If there are more seismic phases in a time window, AIC picker will choose the stronger phase (Figure 2). On the other hand, AIC picker is not "smart" enough that it will usually pick an "onset" for any segment of data no matter whether there is a true phase arrival in the time window or not (Figure 3). For this reason, we need guide the work of AIC picker by choosing an appropriate window for it. Figure 2.Seismogram with two phases and the corresponding AIC values. It is noted that there are clear local minima with respect to each phase arrival. But the global minimum indicates the arrival of stronger phase. Figure 3. Seismic noise data and its AIC values. The minimum value does not indicate any phase arrival although it divides the data into two different stationary segments. 3. Artificial Neural Network: Multilayer Perceptrons (MLP) Multilayer perceptrons have been successfully applied to solve many difficult and diverse problems. The mathematical perceptron was proposed by McCulloch and Pitts 5 (1943) to mimic the behavior of a biological neuron (Haykin, 1999). The biological neuron is mainly composed of three parts: the dendrites, the soma, and the axon. The dendrites accept information from other neurons by synapses. These input signals are attenuated with an increasing distance from the synapses to the soma. The soma integrates the received signal and thereafter activates an output depending on the total input. The axon transmits the output signal to other neurons by the synapses located at the tree structure at the end of the axon (Ban, 2000). The mathematical neuron proceeds in a similar way but simpler way as integration takes place only over space. Typically, the network is made up of sets of nodes arranged in layers, an input layer, one or more hidden layers and an output layer. The input signal propagates through the network in a forward direction, on a layer-by-layer basis. Each node is the basic processing unit with a nonlinear activation function. The outputs of the nodes in one layer are transmitted to nodes in another layer through links called weights, which can effectively amplify or attenuate the signals. Except for the input layer, the net input to each node is the sum of the weighted outputs of nodes in the previous layer. MLP successfully solve some difficult problems by training them in a supervised manner with a highly popular algorithm known as the error back-propagation algorithm, which is based on the error-correction learning rule. Basically, the error-correction learning consists of two phases: a forward phase and a backward phase. In the forward phase, the input vector is fed into the nodes of the input layer and propagates through the network layer by layer. The output vector is produced as the actual response of the network. In the forward phase, the weights connecting the network nodes are fixed. During the backward phase, however, the synaptic weights are all adjusted based on an error-correction rule. This method attempts to find the most suitable solution for a global minimum in the mismatch between the desired output pattern and its actual value for all of the training samples. The degree of mismatch for each input-output pair is quantified by solving for unknown synaptic weights between the hidden and output layer and then by propagating the mismatch backwards through the network to adjust the synaptic weights to make the actual response of the network move closer to the desired response in a statistical sense. A multilayer perceptron has three distinctive characteristics (Haykin, 1999): 6 (1) The model of each neuron in the network includes a nonlinear activation function. Two types of nonlinear activation function are usually used: the sigmoid function and the hyperbolic tangent function. (2) The network includes one or more hidden layers, which could enable the network to learn complex tasks by extracting progressively more meaningful features from the input patterns. (3) The network exhibits a high degree connectivity, which is determined by the synapses of the network. 4. MLP neural network: Detection of the P-wave arrival Several characteristic functions of the seismogram can be used as the input of the neural network, such as the absolute value function, the square function, Allen’s function, the envelop function, and the modified differential function. Following Dai’s method (Dai et al., 1995, 1997), the absolute values of the seismogram is chosen as the input of the MLP neural network since they have the highest fidelity and processing speed and are most objective amongst these functions. The reason that the seismogram itself is not used is that the first motion of an arrival has two directions (up and down) and is source dependent. 30 samples of absolute values of the seismogram are fed into the neural network. The input samples are normalized because the amplitude of the seismogram is strongly dependent on the magnitude and epicentral distance of an earthquake. By this normalization, a small set of training data can cover all the recordings with different amplitudes. For P-wave segment, the arrival is located at the 20th sample. The noise segment is extracted from the prior part of the P-wave arrival. The part before the onset is made longer than the part after it in order to achieve better distinction between the signal patterns and noise patterns. Figure 4 shows the P-wave segment and the noise segment, respectively. There are two output nodes of the neural network flag the input segment with (1, 0) for P arrivals and (0, 1) for the background noise. It is very important to select the appropriate training sets. The training sets should represent the typical features of a signal with different frequency characters. A rule of thumb is to begin with a very small training set and add new patterns until performance is 7 satisfactory. For the PASO array data, 9 pairs of the P-wave arrival and noise segments are chosen to train the MLP network (Figure 5). For the input vector, MLP neural network creates the decision boundary for the input space, making it possible to recognize patterns. Any given decision boundary can closely be approximated by a two-layer network-one hidden layer and one output layer-having a sigmoid activation function. For this reason, only one hidden layer is used for configuring the MLP neural network. . Figure 4. P-wave arrival and noise segments Figure 5. 9 pairs of P-wave arrival and noise segments are used to train the MLP neural network. 8 Currently, there is no good hint to determine the number of hidden nodes, which is highly problem dependent (Hu, 2001). With too few hidden nodes, the network may not be powerful enough for a given learning task. If too many hidden nodes are used, however, the computation is too expensive and the network could be over-fitting the current training sets and cannot generalize to the other data sets. For our PASO dataset, 5 hidden nodes are best with the classification rate of 94.5% for the training set and 82% for a separate testing set. Professor Hu’s popular program bp.m (Hu, 2001) is used to train the network. Figure 6 shows the learning curve for 18 P-wave arrival and noise segments. Figure 6. Learning curve for the MLP network with 5 hidden nodes. To train the network, learning rate is 0.1, momentum is 0.8, epoch size is 18, hyperbolic tangent function and sigmoid function are used to the hidden layer and the output layer, respectively. After the MLP neural network is trained, it is applied to the entire seismogram by moving a time window of the same size of the input layer. The resulting outputs are converted into a time series N(t) (Dai et al. 1995): 1 o1 (t ) 2 1 o2 (t )2 2 which is used to detect the seismic arrivals. This function exaggerates the difference N (t ) between the desired output and the background noise. Figure 7 shows the seismogram and its corresponding N(t) values. It is noted that the point when N(t) exceeds a predefined threshold can be used to detect P-wave arrival. For the PASO array data, 0.3 is chosen as the threshold. With this method, 90% P-wave arrivals are detected. 9 a b Figure 7 (a) Seismogram and (b) its N(t) values constructed from the outputs of the neural network. It is noted that N(t) function has a left shift of about 20 samples, which is due to the 20th sample in the time window corresponding to the P-wave arrival. 5. MLP neural network: picking the P-wave arrival In Dai’s method, N(t) is also used to pick the arrival onset using the local maximum of N(t). The local maximum is in the window, beginning when the N(t) exceeds the threshold, with a length of the input segments (Dai, 1995). But Zhao (1999) noticed that the detector is activated when a signal enters the window and is inhibited when it leaves the window. In his opinion, it is more reasonable to determine the arrival time based on the rise edge of the peak rather than on its maximum or central point between the rise edge and fall edge. Arrival time is chosen as T , where T is the time when N(t) exceeds the threshold and is chosen empirically, say 0.2 seconds. Different from the above methods, AIC picker is used to pick the P-wave arrival for our dataset. It is noted in the first part that AIC picker is not smart enough to pick the P- 10 wave arrival correctly. But when a time window is chosen properly, it can pick the arrival time accurately. Based on this fact, MLP neural network is used only to detect the Pwave arrival and then choose a time window for the AIC picker (Figure 8). For the PASO array data, we choose the time when N(t) exceeds the threshold as an estimate arrival, and a time window of 3 seconds is chosen with the estimate arrival as its center. Compared with manual picks, this picker provides onset times and uncertainties with high confidence. 91% of autopicks are within 0.15 seconds of analyst picks for this data set (Figure 9). Conclusions and further works A MLP-AIC picker is proposed to detect and pick the P-wave arrival. A MLP neural network can be trained well with a small set of P-wave arrival and noise segments. The network is used to detect the P-wave arrival and provide a time window for the AIC picker, which can pick the P-wave arrival accurately within the time window. Tested with the real data, the MLP-AIC picker seems not too sensitive to the signal to noise ratio and can detect 90% P-wave arrival. Among those picks, 91% autopicks are within 15 ms of analyst picks. Compared with the conventional STA/LTA method, the MLP neural network is more adaptive with regard to phase frequency since the network can be trained with the patterns of a variety of frequency characters. Further works should focus on applying this method to pick S-wave arrival and improving the detection rate of the picker. Acknowledgements The author acknowledges Professor Yu Hen Hu for his insightful instruction on the Artificial Neural Network and Fuzzy Systems, and his permission to use his MLP programs. 11 a b Figure 8. (a) Seismogram same as that shown in Figure 7a, and (b) its corresponding AIC values. The minimum AIC value indicates the P-wave arrival. Figure 9. The MLP-AIC picker is used to pick the P-wave arrival for 3 seismograms from PASO array data. 12 References Akaike, H., Markovian representation of stochastic processes and its application to the analysis of autoregressive moving average process. Ann. Inst. Stat. Math., 26,36326,387, 1974 Allen, R.V., Automatic earthquake recognition and timing from single trace, Bull. Seism. Soc. Am., 68, 1521-1532, 1978 Ban, M., C. Jutten, Neural networks in geophysical applications, Geophysics, 65, 4, 1032-1047, 2000 Dai, H., C. MacBeth, Automatic picking of seismic arrivals in local earthquake data using an artificial neural network, Journal of Geophysical Research, 120, 758-774, 1995 Dai, H., C. MacBeth, The application of back-propagation neural network to automatic picking seismic arrivals from single-component recordings, Journal of Geophysical Research, 102, B7, 15,105-15,114, 1997 Haykin, S., Neural Networks: A Comprehensive Foundation, Prentice Hall, New Jersey, second edition, 1999 Hu, Yu Hen, Course notes on Introduction to Artificial Neural Network and Fuzzy Systems, 2001 Leonard, M., B.L.N. Kennett, Multi-component autoregressive techniques for the analysis of seismograms, Physics of the Earth and Planetary Interiors, 113, 247263, 1997 Naoki Maeda, A method for reading and checking phase times in auto-processing system of seismic wave data, Zisin=Jishin, 38, 3, 365-379, 1985 Sleeman, R., and T. v. Eck, Robust automatic P-phase picking: an on-line implementation in the analysis of broadband seismogram recordings, Physics of the Earth and Planetary Interiors, 113, 265-275, 1999 Wang, J., T. Teng, Identification and picking of S-phase using an artificial neural network, Bulletin of Seismological Society of America, 87, 5, 1140-1149, 1997 Zhao, Y., K. Takano, An artificial neural network approach for broadband seismic phase picking, Bulletin of the Seismological Society of America, 89, 3, 670-680, 1999 13