PreliminaryResults

Preliminary Results: This document is divided into two parts: 1. Discussion of EM simulation. 2. Discussion of SCEC data. The EM Simulation Part: I. Motivation: One assumption of ETAS (epidemic-type aftershock sequences) data fitting is that, parameters are estimated by assuming all the data unobserved are not influential to the model. However, it is doubted that parameter estimates may be biasedly computed under this assumption. According to Sornette and Werner (2005), under this assumption, biasedness may occur, e.g. the apparent branching ratio and apparent background source would be off considerably. Sornette and Werner investigated this issue by formula derivation based on seismology theory, without simulation or realistic data mining. Constant values of parameters were used in their study. On the other hand, our research intends to show evidence of biasedness of parameter estimation with computer simulation and seek for reasonable theoretical support. II. Description of Simulation: We currently have computational tools available to optimize for maximum likelihood estimation (MLE) based on the EM algorithm (Veen 2006). Our tools are capable of simulating ETAS data, given the seismic “true” parameter values of interest (indicated by a, c, d, K0, p and q) and a conventional model (Ogata 1998). With the simulated data generated, our tools compute the parameter estimates for each parameter, by implementing the EM algorithm. Then the differences between the “true” and estimated values are computed. The “true” parameter values we use are suggested in Veen. Our simulation will be using the homogeneous ETAS model, assuming the background rate is unvaried everywhere in the region considered. This is a simpler way to approach the problem, and if our assertion is supported by computational results, it should be reasonable to apply the same assertion to an inhomogeneous case. We easily modified our computational programs to use various values of ETAS’s minimum magnitude cutoff M0 = 2, to simulate the ETAS data. Next, we pretend we did not “observe” the data below a new M0, which is set to be greater than or equal to M0 = 2. The new cutoffs we used are from 2 to 3.5, increasing by 0.1. Therefore, there are 16 different cutoff values used in the EM computation. For each single M0, simulated data are generated for 50 times. This enable us to see the distribution of the mean (average) for each fixed M0, and how spread out or clustered the averages are. For a clear visualization, plots are generated in two forms: 1. All the 50 times of simulation vs. the magnitude; 2. The means (averages) for every 10 times of simulation vs. the magnitude. III. “Work Plan” -- Observations and Problems to be Answered: 1. Comparison of difference parameters on biasedness: Based on the plots produced, we have observed that some parameter estimates (a and K0, and especially K0) among all, have shown plots of a somewhat fan-shape like pattern. Also, we have observed some extremely high estimated values of parameter a. It happens rarely at high values of cutoffs (e.g. 3.5). And it raises a issue of study. As simulation of above cutoff magnitude greater than 3.5 (M0 > 3.5) are yet to be produced, we will aim for higher values of M0 for a complete investigation. We will seek to answer: What is the effect of minimum cutoff, and why certain parameters have this phenomenon but not others? If there is a bias in estimation, what are they, by how much, and how is the bias related to M0 in scale? Are we as confident in estimating a parameter when M0 is large? 2. How sensitive is the bias responding to changing parameter values in simulation and minimum cutoff: Take the parameter ‘a’ as an example, and let’s call D = a’s estimate – a’s true value, where we see from our simulation that D is in an increasing manner. This also happens to K0. With larger Standard Deviation of the estimators as M0 increases, we are not as confident of the parameter estimates. If we can find out why this happens, we can find a better way to explain the biasedness of the parameter estimation. Sornette and Werner commented that, when (b – a)’s is negative, M0 dominates. We will include different values of parameters a and b in our simulation, to see how parameter estimates can be affected, or if they will have any effect at all. (Here, a is a GR constant and b is a productivity law constant.) By varying simulation parameters, we will seek to answer: when using difference parameters to simulation ETAS data, will we see same kind of biasedness (if there is any)? In what situation will the difference occur? 3. Application and future contribution: Our simulation so far uses algorithm written in R. R is one type of high-level, console type statistical package, and it requires longer simulation time. We will also seek to compute it in a lower level language and hope to reduce the computing time to gain more control on computing time for larger number of loops. We will also seek to enhance our computing tools to provide a statistical package for the science community. The SCEC Data Fitting Part: I. Description of the Data, Variables and Model: SCEC (Southern California earthquake center) data consist of 6796 cases, with variable of time of earthquake occurrence, depth, magnitude, and location. Location was measured in longitude and latitude and they were transformed into variables x and y, respectively. The longitude goes from –121.9 to –114. The latitude goes from 32 to 37. The transformed x goes from –349 to 375. And the transformed y goes from –279 to 278. II. Model Fitting: We have analyzed the data using the EM algorithm, using magnitude cutoff M0 = 3, which is in fact the minimum of the observed magnitude, and with the Gutenberg-Richter minimum and maximum being 2 and 8, respectively. The model we use assumes homogeneous background rate. The parameters are estimated. Summary statistics of the variables and histograms are included. Magnitude follows an exponential distribution (Ogata 1988), and the histogram has visually verified this. III. “Work Plan” -- Observations and Problems to be Answered: 1. Unreasonable branching ratio: One issue we have observed in the SCEC data fitting is that, the estimated parameter produces an expected branching ratio above 1, using the form (2.3) in Ogata (1998). This is not quite pleasant, since practically it means explosive situation. Why this happens, or is there a better way to explain the branching ratio’s estimation, will also be part of our study. 2. Pattern of differences: For the SCEC data, we also have estimated the parameters with different magnitude cutoffs, using 3.1 to 4.2, increasing by 0.1. Therefore, there are 12 tests here. After we obtain the parameter estimates, we find their difference from the results of new M0, to the original M0 = 3. For example, D = K0 (estimated with M0 = 3.1) – K0 (estimated with M0 = 3). Plots are generated for each of the 6 parameters. Something interesting we have observed is that: (1) K0’s plot has an increasing pattern; (2) c and p has the same pattern; (3) a, d and q have the same pattern. K0’s pattern seems to echo our simulation described previously, we suspect there may be a relationship between this parameter and the usage of EM’s magnitude cutoff. It is of our interest to find the relationship of the parameters just described, the affect of M0 on them, for the SCEC data. 3. Application in simulation: Application to data simulation will be done. In additional to the data mining, we will use the result parameter estimate for simulation, similar as described in the simulation part. We’ll observe if there is a difference between using the estimated parameters by fitting the Southern Californian data, compared with those provided by Veen. As we implement simulation based on the Southern Californian parameters, we may have a better understanding of this area’s seismic nature.

PreliminaryResults

Related documents

Products

Support

PreliminaryResults

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib