PreliminaryResults

advertisement
Preliminary Results:
This document is divided into two parts:
1. Discussion of EM simulation.
2. Discussion of SCEC data.
The EM Simulation Part:
I. Motivation:
One assumption of ETAS (epidemic-type aftershock sequences) data fitting is that,
parameters are estimated by assuming all the data unobserved are not influential to the
model. However, it is doubted that parameter estimates may be biasedly computed under
this assumption. According to Sornette and Werner (2005), under this assumption,
biasedness may occur, e.g. the apparent branching ratio and apparent background source
would be off considerably. Sornette and Werner investigated this issue by formula
derivation based on seismology theory, without simulation or realistic data mining.
Constant values of parameters were used in their study. On the other hand, our research
intends to show evidence of biasedness of parameter estimation with computer simulation
and seek for reasonable theoretical support.
II. Description of Simulation:
We currently have computational tools available to optimize for maximum likelihood
estimation (MLE) based on the EM algorithm (Veen 2006). Our tools are capable of
simulating ETAS data, given the seismic “true” parameter values of interest (indicated by
a, c, d, K0, p and q) and a conventional model (Ogata 1998). With the simulated data
generated, our tools compute the parameter estimates for each parameter, by
implementing the EM algorithm. Then the differences between the “true” and estimated
values are computed. The “true” parameter values we use are suggested in Veen.
Our simulation will be using the homogeneous ETAS model, assuming the background
rate is unvaried everywhere in the region considered. This is a simpler way to approach
the problem, and if our assertion is supported by computational results, it should be
reasonable to apply the same assertion to an inhomogeneous case.
We easily modified our computational programs to use various values of ETAS’s
minimum magnitude cutoff M0 = 2, to simulate the ETAS data. Next, we pretend we did
not “observe” the data below a new M0, which is set to be greater than or equal to M0 =
2. The new cutoffs we used are from 2 to 3.5, increasing by 0.1. Therefore, there are 16
different cutoff values used in the EM computation.
For each single M0, simulated data are generated for 50 times. This enable us to see the
distribution of the mean (average) for each fixed M0, and how spread out or clustered the
averages are. For a clear visualization, plots are generated in two forms:
1. All the 50 times of simulation vs. the magnitude;
2. The means (averages) for every 10 times of simulation vs. the magnitude.
III. “Work Plan” -- Observations and Problems to be Answered:
1. Comparison of difference parameters on biasedness: Based on the plots produced,
we have observed that some parameter estimates (a and K0, and especially K0) among
all, have shown plots of a somewhat fan-shape like pattern. Also, we have observed
some extremely high estimated values of parameter a. It happens rarely at high values of
cutoffs (e.g. 3.5). And it raises a issue of study. As simulation of above cutoff magnitude
greater than 3.5 (M0 > 3.5) are yet to be produced, we will aim for higher values of M0
for a complete investigation. We will seek to answer: What is the effect of minimum
cutoff, and why certain parameters have this phenomenon but not others? If there is a bias
in estimation, what are they, by how much, and how is the bias related to M0 in scale?
Are we as confident in estimating a parameter when M0 is large?
2. How sensitive is the bias responding to changing parameter values in simulation
and minimum cutoff: Take the parameter ‘a’ as an example, and let’s call D = a’s
estimate – a’s true value, where we see from our simulation that D is in an increasing
manner. This also happens to K0. With larger Standard Deviation of the estimators as
M0 increases, we are not as confident of the parameter estimates. If we can find out why
this happens, we can find a better way to explain the biasedness of the parameter
estimation. Sornette and Werner commented that, when (b – a)’s is negative, M0
dominates. We will include different values of parameters a and b in our simulation, to
see how parameter estimates can be affected, or if they will have any effect at all. (Here,
a is a GR constant and b is a productivity law constant.) By varying simulation
parameters, we will seek to answer: when using difference parameters to simulation
ETAS data, will we see same kind of biasedness (if there is any)? In what situation will
the difference occur?
3. Application and future contribution: Our simulation so far uses algorithm written in
R. R is one type of high-level, console type statistical package, and it requires longer
simulation time. We will also seek to compute it in a lower level language and hope to
reduce the computing time to gain more control on computing time for larger number of
loops. We will also seek to enhance our computing tools to provide a statistical package
for the science community.
The SCEC Data Fitting Part:
I. Description of the Data, Variables and Model:
SCEC (Southern California earthquake center) data consist of 6796 cases, with variable
of time of earthquake occurrence, depth, magnitude, and location. Location was
measured in longitude and latitude and they were transformed into variables x and y,
respectively. The longitude goes from –121.9 to –114. The latitude goes from 32 to 37.
The transformed x goes from –349 to 375. And the transformed y goes from –279 to 278.
II. Model Fitting:
We have analyzed the data using the EM algorithm, using magnitude cutoff M0 = 3,
which is in fact the minimum of the observed magnitude, and with the Gutenberg-Richter
minimum and maximum being 2 and 8, respectively. The model we use assumes
homogeneous background rate. The parameters are estimated. Summary statistics of the
variables and histograms are included. Magnitude follows an exponential distribution
(Ogata 1988), and the histogram has visually verified this.
III. “Work Plan” -- Observations and Problems to be Answered:
1. Unreasonable branching ratio: One issue we have observed in the SCEC data fitting
is that, the estimated parameter produces an expected branching ratio above 1, using the
form (2.3) in Ogata (1998). This is not quite pleasant, since practically it means
explosive situation. Why this happens, or is there a better way to explain the branching
ratio’s estimation, will also be part of our study.
2. Pattern of differences: For the SCEC data, we also have estimated the parameters
with different magnitude cutoffs, using 3.1 to 4.2, increasing by 0.1. Therefore, there are
12 tests here. After we obtain the parameter estimates, we find their difference from the
results of new M0, to the original M0 = 3. For example, D = K0 (estimated with M0 =
3.1) – K0 (estimated with M0 = 3). Plots are generated for each of the 6 parameters.
Something interesting we have observed is that:
(1) K0’s plot has an increasing pattern;
(2) c and p has the same pattern;
(3) a, d and q have the same pattern.
K0’s pattern seems to echo our simulation described previously, we suspect there may be
a relationship between this parameter and the usage of EM’s magnitude cutoff. It is of
our interest to find the relationship of the parameters just described, the affect of M0 on
them, for the SCEC data.
3. Application in simulation: Application to data simulation will be done. In additional
to the data mining, we will use the result parameter estimate for simulation, similar as
described in the simulation part. We’ll observe if there is a difference between using the
estimated parameters by fitting the Southern Californian data, compared with those
provided by Veen. As we implement simulation based on the Southern Californian
parameters, we may have a better understanding of this area’s seismic nature.
Download