THE PRINCIPLES OF MONTE CARLO SIMULATION Lecture Three: Statistical Models and Stationarity •Statistical Inference –Parameter Estimation –Confidence Limits on Parameters –Hypothesis Testing •Stationarity •Representative Statistics •Implication of Scale and Space/Time Coordinates •Regression Statistical Inference in MCS Need of Statistical Inference • Decision making requires a model of the populations involved. • Statistics allow us to infer those models. • The formalism is: – Interpret the values z(x) at location x (spatial, temporal or spatiotemporal) as an outcome of a random variable Z(x), which has some probability distribution F. – Also interpret the set of random variables {Z(x), for all x} as the underlying random function. • Need to assume stationarity of the data to be able to make inference from the sample set. • Sample set must be representative of the population. The Decision of Stationarity • Impossible to infer the random function Z(x) with only one realization z(x) (the data). What does F(z) mean if we only have one data per location (space or time)? • Assuming that the same random function applies to all coordinates (locations or time) x, we can use the data z(x) to infer the underlying random function Z(x). Stationarity • Stationarity works as an export license to use a set of data to infer the population parameters: mean, covariance, … The Decision of Stationarity • Pooling data in a histogram assumes they come from the same population (we assume stationarity, even if we don’t know what that means!!) • Evident example: Would you estimate population parameters using this histogram? Clearly, there are two populations: We must infer the population parameters separately! The Decision of Stationarity • Stationarity can be under translations (homogeneity) and/or rotations (isotropy). • Stationarity is a property of the RF model. It is not a characteristic of the phenomenon under study. Stationarity is a decision made by the user to make reliable inference. • Exploratory data analysis may indicate the existence of several populations with significantly different statistics. • Consider the possibility of subdividing the area into more homogeneous subzones, conditioned by: – The availability of enough data to infer the parameters of each separate RF – The ability to delineate the different populations both on the data and at unsampled locations (may need qualitative or secondary data). Definition of Stationarity • Stationarity of order 2 : A RF is said to be stationary of order 2 when: ~ E{Z ( x)} = m ∀x ~ ∀x C (h) = E{Z ( x + h) ⋅ Z ( x )} − m 2 the stationarity of the covariance implies the stationarity of the variance and the variogram • Intrinsic stationarity: A RF is said to be intrinsic stationary when: ∀x ~ E{Z ( x)} = m ~ Var{Z ( x + h) − Z ( x)} = E{[ Z ( x + h) − Z ( x)]2 } = 2γ (h) ∀x That is, increments are stationary, but covariance is not. Ergodic Fluctuations • Given that the model statistics are inferred from sample statistics that are uncertain because of limited sample size, exact specification of the model statistics by limited data is not possible. • The stationary RF is said to be “ergodic” in the parameter µ, if the corresponding realization statistics tends toward µ as the size of field increases. • Ergodic fluctuations allow one to account indirectly for the uncertainty about sample statistics. • Removal of ergodic fluctuations may lead to a false sense of certainty. Ergodic Fluctuations φ1 = 12% φ 2 = 9% φ3 = 18% Ergodic Average = 15% Scale Effect