Task 6 Statistical Approaches Bob Youngs NGA Workshop #6 July 19, 2004 July 19, 2004 Peer-NGA Project 1 Truncated Data • Unknown number of recordings where value of yi < Ztrunc , value of xi is unknown 10 PGA 1 0.1 Ztrunc 0.01 (Toro, 1981) 0.001 1 10 100 1000 Distance July 19, 2004 Peer-NGA Project 2 Truncated Data Statistical Model Likelihood of observed data L f N i recorded ( yi xi , β) /1 FN ( Z trunc xi , β) Solved by maximizing the log(Likeli hood) ln( L) ln( i recorded 2 ) / 2 ln( yi ) ( xi , β) / 2 2 2 ln 1 F i recorded July 19, 2004 Peer-NGA Project N ( Z trunc xi , β) 3 Fit to Truncated Data Ignoring Effect 10 Acceleration 1 > 0.03g Generating function 0.1 Fit to all data Fit to data > 0.03 0.01 0.001 0.1 1 10 100 1000 Distance July 19, 2004 Peer-NGA Project 4 Fit Using Truncated Data Model 10 Acceleration 1 > 0.03g Generating function 0.1 Fit to all data Truncated fit 0.01 0.001 0.1 1 10 100 1000 Distance July 19, 2004 Peer-NGA Project 5 0.1 0.01 0.03 PGA 0.3 1 3 Fit to Simulated Data 0.001 0.003 rock model soil model fit to rock fit to soil simulated data 0.1 0.3 1 3 10 30 100 300 R July 19, 2004 Peer-NGA Project 6 Fit to Truncated Simulated Data July 19, 2004 Peer-NGA Project 7 Uncertain and Missing Predictor Variables • Uncertain predictors – Magnitude – Distance/Rupture Geometry – Site parameters (discrete and continuous) • Missing predictors – Site parameters (discrete and continuous) – Rupture Geometry (for smaller events) July 19, 2004 Peer-NGA Project 8 Predictor Variable Uncertainty • General model Y = f(X) + ε – Observe W which is imprecisely related to X • Two types of error processes – Error Model W = f(X) + U applies when one wants X, but cannot measure it precisely – “classical” measurement error – Regression Calibration Model X = f(W) + U one can measure W precisely, but quantity of interest X is variable – often applies to laboratory studies July 19, 2004 Peer-NGA Project 9 Magnitude Uncertainty (Rhoades, BSSA, 1997) • Start with random (mixed) effects model yij M i f (rij , ) i ij • Reported magnitude, M̂ i , contains error δi [N(0,si2)] M̂i Mi i • Revised mixed effects model yij M̂ i f (rij , ) i ij i i i • Solution obtained using “standard” approaches, including analytical inversion of variance matrix July 19, 2004 Peer-NGA Project 10 Magnitude Uncertainty for NGA • Models likely to be non-linear in magnitude yij f (M i , rij , ) i ij • Reported magnitude, M̂ i , contains error δi [N(0,si2)] M̂i Mi i • Revised mixed effects model yij f (M i , rij , ) ij ij ij f (M i , rij , ) M i i • Variance matrix terms due to error in magnitude i now vary over j, - as a result not analytically invertible July 19, 2004 Peer-NGA Project 11 Simulation Extrapolation Approach • Applied in cases where W=X+U with U N(0,s2) • Simulate a series of data sets with increasingly large measurement error Wb,i(λ)=Wi + λ½Ub,i where Ub,i are simulated error terms with 0 mean and variance s2 • For each value of λ average the parameters of the model Θ over many simulations to obtain an average value ˆ ( ) July 19, 2004 Peer-NGA Project 12 Simulation Extrapolation (continued) • Extrapolate back to λ = -1 Coefficients • Define a functional relationship for ˆ ( ) -1 July 19, 2004 Peer-NGA Project 0 1 2 13 Example Application of Simulation Extrapolation Approach • Applied in cases where W=X+U with U N(0,s2) • Simulate a series of data sets with increasingly large measurement error Wb,i(λ)=Wi + λ½Ub,i where Ub,i are simulated error terms with 0 mean and variance s2 • For each value of λ average the parameters of the model Θ over many simulations to obtain an average value ˆ ( ) July 19, 2004 Peer-NGA Project 14 Assess the Effect of Magnitude Uncertainty • • Start with a “True” Model Simulate PGA values from “True” model using NGA M-R disribution 1. Calculate mean of model parameters from simulated data sets (parametric bootstrap) 2. Obtain simulated data set where fitted parameters are closest to “True” Model • Using data set from 2, increase sigma in M using NGA M values. Obtain mean parameter from 500 simulations of uncertain M July 19, 2004 Peer-NGA Project 15 Simulated Data ln( pgarock ) C1 C2 M C3 ln( R eC4 C5M ) ) C6 R ln( pgasoil ) ln( pgarock ) SC C7 C8 ln( pgarock 0.05) pga V1 V2M July 19, 2004 Peer-NGA Project 16 ln( pgarock ) C1 C2 M C3 ln( R eC4 C5M ) ) C6 R ln( pgasoil ) ln( pgarock ) SC C7 C8 ln( pgarock 0.05) pga V1 V2M July 19, 2004 Peer-NGA Project 17 ln( pgarock ) C1 C2 M C3 ln( R eC4 C5M ) ) C6 R ln( pgasoil ) ln( pgarock ) SC C7 C8 ln( pgarock 0.05) pga V1 V2M July 19, 2004 Peer-NGA Project 18 ln( pgarock ) C1 C2 M C3 ln( R eC4 C5M ) ) C6 R ln( pgasoil ) ln( pgarock ) SC C7 C8 ln( pgarock 0.05) pga V1 V2M July 19, 2004 Peer-NGA Project 19 ln( pgarock ) C1 C2 M C3 ln( R eC4 C5M ) ) C6 R ln( pgasoil ) ln( pgarock ) SC C7 C8 ln( pgarock 0.05) pga V1 V2M July 19, 2004 Peer-NGA Project 20 ln( pgarock ) C1 C2 M C3 ln( R eC4 C5M ) ) C6 R ln( pgasoil ) ln( pgarock ) SC C7 C8 ln( pgarock 0.05) pga V1 V2M July 19, 2004 Peer-NGA Project 21 Missing Predictor Variables • Site classification variables – VS30, NEHRP Categories, Other Site Categories, – Depth to VS of 1.5 km/sec • Rupture geometry variables – Directivity variables – Hanging wall/footwall determinations – Confined to smaller events/distant recordings where effect is believed to be minimal? July 19, 2004 Peer-NGA Project 22 Reason for Missing Predictors • Independent of all data • Dependent on value of the missing predictor • Dependent on the values of other predictors July 19, 2004 Peer-NGA Project 23 Pattern of Missing Predictors Univariate Monotone Special July 19, 2004 Random Peer-NGA Project 24 Missing Data Methods • Complete-case analysis – Easily implemented – Valid inferences when missing predictors depend upon data – May lead to elimination of a lot of useful information – Useful starting result July 19, 2004 Peer-NGA Project 25 Missing Data Methods • Imputation – Missing X’s estimated from correlations with other X’s or X’s and Y’s – Typically down weight imputed observations • Multiple Imputation – Simulate multiple data sets incorporating uncertainty in estimated missing X’s – Provides method for incorporation effect of uncertainty in imputation on estimation July 19, 2004 Peer-NGA Project 26 Missing Data Methods • Maximum Likelihood – Need a model for joint distribution of Y and X, including missing X’s – Random missing patterns will need iterative approaches • Bayesian Simulation Methods – e.g. Gibbs sampler – Computer intensive (multiple thousands of simulations) July 19, 2004 Peer-NGA Project 27 Missing/Uncertain Data • If missing X’s are estimated from an external model (e.g. VS30– becomes an uncertain predictor problem • Simulation methods appear to be useful for both problems • Implement these methods at later stage of model development to obtain final coefficients and their uncertainty • Develop an implementation of each developer’s final model to quantify the effects of missing/uncertain data and provide parameter uncertainty July 19, 2004 Peer-NGA Project 28