discrimination between weak earthquakes and chemical explosions

advertisement
DISCRIMINATION BETWEEN WEAK EARTHQUAKES AND CHEMICAL EXPLOSIONS
BASED ON DATA FROM LOCAL SEISMIC NETWORK
L.M.Haikin A.F. Kushnir, V.F.Pisarenko, E.V. Troitsky
SYNAPSE Science Center, 119526 Moscow Vernadskogo Ave 101/1 suite 303, phone (495) 434-3638
International Institute for Earthquake Prediction Theory and Mathematical Geophysics,
Russia, 113556, Moscow, Warshavskoye sh. 79, bld.2.
tlf (095) 110 77 95, FAX (095) 310 70 32, e-mail mitpan@mitp.rssi.tu
Description of the processing technique.
In this study we implemented feature extraction-feature discrimination approach to the seismic
discrimination problem. As the discrimination features we used various power or spectral
characteristics of seismograms that are typically different for earthquakes and explosions. The
discrimination problem was solved by application of statistical pattern recognition techniques.
Numerous investigations in discrimination analysis [1-4] demonstrated that selection of a small
number of the most informative features is extremely useful in this approach. A few carefully selected
features may provide a smaller error classification probability compared to the full set of features. This
is the so-called “pick-effect” or “multivariate effect”.
Procedure for feature selection. Feature selection is based on processing of learning sets of
vectors xl(j); where l1,2 is the number of classes; j1,nl, nl the number of vectors in the learning set.
Initially each vector consists of p features heuristically regarded as relevant for a given discrimination
problem. The feature selection procedure consists of p steps. At an intermediate step some k<p features
are involved, the xkl(j) are k-dimensional vectors of the features selected up to this step. As a base for
feature selection we used a stochastic Kullback-Mahalanobius distance D(k) between two kdimensional probability distributions of the vectors xkl(j) which is estimated using the learning sets. At
the first step of the selection procedure values of the D(1) functional are calculated for everyone of p
features. The maximum from these p values is attained for some feature which is thus selected. At the
second step p-1 values of the D(2) functional are calculated for the feature pairs. The first member of
every pair is the previously selected feature, the second member is one of the rest features. The second
feature is selected as providing a maximum of these D(2) values. At the k-th step of the selecting
procedure values of the D(k) functional are calculated for a learning sets of vectors consisting of kfeatures. The first k-1 components in these vectors are the features which have been selected at the
previous steps, the k-th component is one of the remaining features; k-th feature is selected as
providing a maximum of D(k) functional The procedure described rearranges the initial order of
features in the vectors of the learning sets to provide the most rapid increase of the Mahalanobius
distance while the number of feature increases.
For selecting of the most informative subset of the features we calculated at each step
k=1,2,...,p of the selection procedure the estimate of misclassification probability P(k) given by the
Kolmogorov-Deev formula [1]. In practice the Mahalanobius distance D(k) is as a rule a monotonically
increasing function of k which growth is exhausted as kp. As the result the function P(k) has a
minimum at a some step k0 between 1 and p. Thus the most informative set of features is the set
selected at the steps 1,...,k0 of the procedure described. These features provide the minimum total
misclassification probability for the given set of learning observations.
Cross validation procedure for estimation of misclassification probability. Estimation of
misclassification probability by the Kolmogorov-Deev formula provides a fast and effective procedure
of feature selection [2,3]. However, it is the asymptotic formula and it often gives higher
misclassifications probabilities than those which are really achieved in practical experiments with
mediate (several dozen) learning vectors. The most realistic estimate of misclassification probability is
provided by examination of the learning vector sets with the “cross validation” procedure [5]. We
made classification decisions using the conventional linear discrimination function (LDF) an quadratic
discrimination function (QDF). It is, of course, possible to implement for making decisions more
sophisticated statistical discrimination rules or artificial neural network algorithms. However in all
cases the cross validation algorithm remains the same as described below.
In the cross-validation method at each step one of the learning vectors xl(j) , j1,nl , l1,2 is
eliminated from the learning data set. The remaining vectors are used as the data for LDF or QDF
adaptation (learning). The eliminated vector is then classified by thus learned LDF. If this vector is
classified incorrectly, i.e. attributed to a class 2 instead 1 or vice versa, the appropriate count 12 or 21
is increased by one. The eliminated feature vector is then returned to the learning data set and the next
vector xl(j) is extracted. This procedure is repeated with the all nl +n2 learning vectors. The p0=(1 +
2)/( nl +n2 ) is asymptotically unbiased estimates for probability of total classification error. The LDF
(or QDF) values produced by the cross validation procedure for both classes can be ranked in
magnitude. The two ranked LDF or (QDF) sequences allow to investigate the “physical” reasons for
misclassifications.
Description of experimental results.
The statistical classification approach described above was applied to discrimination of earthquakes
and explosions in the Israel region. We are grateful to Dr. V. Pinsky of the Israel Geophysical Institute
for a set of seismograms of weak events recorded by stations of the local Israel seismic network [6].
The set consisted of recordings of 28 earthquakes with magnitudes 1.1–2.6 and 25 chemical explosions
with magnitude 1.3–2.6, every event was recorded by several stations of the network. Fig. 1. shows
sets of earthquake and explosion waveforms; onsets of P-waves are aligned, seismograms are ordered
according to epicenter distances and scaled to the waveform maximum. Comparison of the waveforms
indicates that the earthquake and explosion seismograms reveal some visual differences: the
earthquakes have more powerful S-waves relatively to P-waves in comparison to explosions. This is
evident for the majority of event wavetrains in spite of rather poor signal-to-noise ratio for some
seismograms.
We designed an identification procedure in the framework of the Seismic Network Data
Analysis (SNDA) System, a problem-oriented programming shell developed at the Moscow IRIS Data
Analysis Center/SYNAPSE Science Center. The procedure comprises routines for extracting
discrimination features from event waveforms and processing feature vector sets by feature selection,
classification and cross-validation algorithms described above. We used as discrimination features
different ratios of average and peak power of P and S-waves in the following frequency bands: 0 = (115), 1 = (1-3), 2 = (3-6), 3 = (6-10), 4 = (10-15) Hz. Besides the power ratios we measured the
ratio of peak values of power spectral densities for P and S phases in the total frequency band 0 and
the frequencies fmp and fms for which the peak values are attained. Totally 21 physically meaningful
features were measured for every event wavetrain; the features reflected the phase power distributions
in the total frequency band 0 and the spectral ratios of S and P phases in different frequency bands.
To make the feature measurement procedure more robust average seismic noise power was
measured in the same frequency bands i inside a noise window preceding the P-wave onset. The
noise power values were subtracted from the corresponding signal phase power values. This ensured
more precise feature measurement even if the signal-to-noise ratio in the event recording was poor. The
features measured from seismograms were transformed using two successive nonlinear functions:
y=ln(x) and z=(1/7)(y1/7-1). The transformation provides more dense clasterization of the earthquake
and explosion points in multidimensional feature space and thus contribute to the more reliable event
discrimination.
The processing of sets of feature vectors started by identifying pairs of strongly statistically
dependent features which correlation coefficient exceeded 0.75. For every such pair one of the features
(providing minimum of 1-dimensional Mahalanobius value) is eliminated from the feature vectors.
Thus the number of features was reduced from 21 to 16. The second processing step was
implementation of the feature selection and cross validation procedures described above. Fig. 2 and
Fig. 3 illustrate results of these procedures applied to the seismograms recorded at distances of 100200 km (Fig. 1). Fig. 2a shows the Mahalanobius distance and the theoretical estimates of
misclassification probability. Points on these curves correspond to feature subsets consisting of 1, 2,
...,16 features rearranged to provide the most rapid increase of Mahalanobius distance. We see that the
Mahalanobius curve becomes flat after the 5-th step and the misclassification probability has a
minimum at this step. The list of features ranked according to their discrimination power allows to
choice 5 optimal features providing the minimum misclassification probability. After investigating the
physical meaning of the selected features we can see that the conventional discrimination feature, S/P
power ratios (typically employed for seismic source discrimination) were automatically selected by our
statistical procedure for the high (above 6 Hz) and low (below 3 Hz) frequency bands. Portions of total
S power at high and low frequencies also appeared to be important for discrimination in this study.
Fig. 3 displays results of cross validation processing of the earthquake and explosion learning
vectors with features selected at the previous steps. Fig. 3a shows the output diagram of cross
validation procedure applied to the LDF decision rule. The LDF values for all examined 5-feature
vectors are ranked in magnitudes - separately for earthquakes (class 1) and explosions (class 2).
Comparing these values with zero threshold we see that only 2 earthquake LDF values are above the
threshold and thus have to be attributed as explosions and also only 2 explosions whose LDF values
are below the threshold have to be attributed as earthquakes. The total error probability estimate in
equal in this case 7.5%. The analogous cross validation diagram for Quadratic Discrimination Function
is presented in Fig.3b. We see that in this case the all earthquakes were examined correctly and only
for two explosions the QDF values are above the threshold and thus these evens have to be mistakably
attributed to the earthquake class. Note that these explosions are the same events as for the LDF case.
The total error probability estimate in case of QDF implementation is equal 3.8%.
Fig. 4 present so called three dimensional scattering diagram for 5 selected features. Such
diagram comprises points with coordinates equal to values of three chosen features for all the events
under study. In Fig. 4 the points for three features with the highest ranks are displayed. One can see
that there exists the two distinct clusters: for earthquakes (triangles) and for explosions (circles). The
earthquake cluster is much more dense then the explosion one. It is clearly seen from this 3dimensional scattering diagram that no more then 4 mistakes can be achieved by separation of the
earthquake and explosion clusters with some plain in three dimensional feature space. As it is proved
by results presented in Fig. 4b the separation of the earthquake and explosion clusters by some
hypersphere in 5-dimensional space of the selected features provides only two mistakes.
Conclusions and recommendations
1) A statistical approach seems to be helpful for precise measurement and selection of
seismogram features optimal for discrimination of earthquakes from explosions.
2) Cross validation procedures have to be used for consistent estimation of misclassification
probability intrinsic to a given region, especially in the case of number of learning observations of
earthquakes and explosions not so large.
3) Discrimination efficiency of local events tends to improve with increasing distance of the
recording station from the event source. In our experiments the misclassification probability of events
recorded at distances of about 60 km is 9.5% while for the same events recorded at distances of about
140 km it is 3.5%.
References
1. A.D. Deev (1970) Representation of statistics of discrimination analysis and their asymptotic
expansion for space dimension comparable with sample size. Docl. Acad Nauk USSR, v.195, 759-762
(in Russian)
2. B.R. Levin, E.V. Troitsky (1970) Total probability error in classification of normal populations
differing in vectors of means. Automatics and Telemechanic, n.1, 54-56. (in Russian)
3. Sh. Yu. Raudis (1976) Limitations of sample size in classification problem. Statistical Problems
of Control, Publ. of Institute of Physics and Mathematics AN Lit. SSR, Vilnius, n.18 (in Russian)
4. S.L. Tsvang, V.I. Pinsky, E.S. Husebye (1993) Enhanced seismic discrimination using NORESS
recordings from European events. Geophys. J. Intern, v.112, 1-14
5. K. Fukunaga, D.L. Kessel (1971) Estimation of classification errors. IEEE Trans. on Comp. v.C20, N12, 136-143.
6. Y. Gitterman, T. van Eck. (1993) Spectra of quarry blasts and microearthquakes recorded at local
distances in Israel, Bull. Seism. Soc. Am., 83, pp 1799-1812
Figure captions
Fig. 1. One of the set of event seismograms used in discrimination study.
a. Earthquake seismograms. b. Explosion seismograms.
Fig. 2. Selection of the most informative features by the stepwise procedure.
a. Kullback-Mahalanobius distance between learning distributions in
depend of amount of features used for classification.
b. Probability of classification errors in depend of amount of features used
for classification.
Fig. 3. Values of linear (a) and quadratic (b) discrimination function calculated by
cross validation procedure and arranged in magnitude for every class. Class 1 are
earthquakes, class 2 are explosions. Wrongly classified events are marked by numbers
these events.
Fig. 4. Three-dimensional scattering diagrams for triplet of selected optimal features.
of
Download