Chapter 11 Simulation studies in statistics

' $ Chapter 11 Simulation studies in statistics & % 1 ' $ Introduction • What is a simulation study, and why do one? • Simulations for properties of estimators • Simulations for properties of hypothesis tests & % 2 ' $ What is a simulation study Simulation • A numerical technique for conducting experiments on the computer. • It involves random sampling from probability distributions. & % 3 ' $ Rationale: (In statistics) • Properties of statistical methods must be established so that the methods may be used with confidence. • Exact analytical derivations of properties are rarely possible. • Large sample approximations to properties are often possible. • But what happens in the finite sample size case? • And what happens when assumptions are violated? & % 4 ' $ Usual issues • Is an estimator biased in finite samples? Is it still consistent under departures from assumptions? What is its sampling variance? • How does it compare to competing estimators on the basis of bias, precision, etc.? • Does a procedure for constructing a confidence interval for a parameter achieve the advertised nominal level of coverage? & % 5 ' $ Usual issues • Does a hypothesis testing procedure attain the advertised level of significance or size? • If it does, what power is possible against different alternatives to the null hypothesis? Do different test procedures deliver different power? How to answer these questions in the absence of analytical results? & % 6 ' $ Monte Carlo simulation approximation • An estimator or test statistic has a true sampling distribution under a particular set of conditions (finite sample size, true distribution of the data, etc.) • Ideally, we could want to know this true sampling distribution in order to address the issues on the previous slide. • But derivation of the true sampling distribution is not tractable. • Hence we approximate the sampling distribution of an estimator or a test statistic under a particular set of conditions. & % 7 ' $ How to approximate A typical Monte Carlo simulation involves the following: • Generate S independent data sets under the conditions of interest • Compute the numerical value of the estimator/test statistic T from each data set. Hence we have T1 , T2 , · · · , Ts , · · · , TS & % 8 ' $ How to approximate • If S is large enough, summary statistics across T1 , T2 , · · · , Ts , · · · , TS should be good approximations to the true properties of the estimator/test statistic under the conditions of interest. An example: Consider an estimator for a parameter θ: – Ts is the value of T from the s-th data set, s=1,2,· · · ,S. – The mean of all Ts ’s is an estimate of the true mean of the sampling distribution of the estimator. & % 9 ' $ Simulation for properties of estimators A simple example: • Compare three estimators for the mean µ of a distribution based on a random sample Y1 , Y2 , · · · , Yn • The three estimators are: – Sample Mean, T (1) – Sample Median, T (2) – Sample 10% trimmed mean,T (3) & % 10 ' $ Simulation procedures For a particular choice of µ, n and true underlying distribution • Generate independent draws Y1 , Y2 , · · · , Yn from the distribution • Compute T (1) , T (2) , T (3) • Repeat S times. Hence we have (1) (1) (1) (2) (2) (2) (3) (3) (3) T1 , T 2 , · · · , T S T1 , T 2 , · · · , T S T1 , T 2 , · · · , T S & % 11 ' $ Simulation procedures Compute for k = 1, 2, 3 mean ˆ = S 1 ∑ (k) Ts = T̄ (k) S s=1 ˆ bias = T̄ (k) − µ v u S u 1 ∑ 2 (k) t (k) ˆ SD = (Ts − T̄ ) S − 1 s=1 ˆ MSE = S 1 ∑ (k) ˆ 2 ˆ 2 + bias (T̄ − µ)2 ≈ SD S s=1 & % 12 ' $ A simulation study • To compare 3 estimators for location µ through a simulation study • The 3 estimators are – Sample Mean, – Sample Median, – Sample 10% trimmed mean • Underlying distribution: N(0,1) • Sample size: 15 • Simulation size: 1000 & % 13 ' $ A simulation study: R code > # To compare 3 estimators for location,mu, through simulation study > # The 3 estimators are sample mean, median, and 10% trimmed mean. > S=1000 # Simulation size (No.of samples) > n=15 # Sample size > mu=1 # Mean of the underlying normal distribution > sd=1 # Standard deviation of the underlying normal distribution & % 14 ' $ A simulation study: R code > meax=numeric(S) # A vector contains all the sample means > medx=numeric(S) # A vector contains all the sample medians > trmx=numeric(S) # A vector contains all the sample 10% trimmed means > stdx=numeric(S) # A vector contains all the sample standard deviation > set.seeds(1234) # Set the seed number & % 15 ' $ A simulation study: R code > for (i in 1:S){ + x=rnorm(n,mu,sd) # Generate a random sample of size n + mean[i]=mean(x) # Compute the mean for the i-th sample,i.e.T∧(1) i + medx[i]=median(x)# Compute the median for the i-th sample,i.e.T∧(2) i + trmx[i]=mean(x,trim=0.1) +# Compute the 10% trimmed mean for the i-th sample, i.e.T∧(3) i + stdx[i]=sd(x)# Compute the standard deviation for the i-th sample } & % 16 ' $ A simulation study: R code > # Compute the mean of “S” sample means, medians, and trimmed means > simumean=apply(cbind(meax,medx,trmx),2,mean) > # “2” in the second argument asks the computer to fine “means” for all the columns in the matrix given in argument 1. > # Hence “simumean” is a 3×1 vector consists of the average of the “S” means, medians, and the 10% trimmed means. & % 17 ' $ A simulation study: R code > # Compute the standard deviation of the “S” sample means, medians, and 10% trimmed means > simustd=apply(cbind(meax,medx,trmx),2,sd) > # Compute the bias > simubias=simumean-rep(mu,3) > # Compute the MSE (Mean Square Error) > simumse=simubias∧2+simustd∧2 & % 18 ' $ A simulation study: R code > ests=c(“Sample mean”,“Sample median”,“Sample 10% trimmed mean”) # column heading in the output > names=c(“True value”,“No. of simu”,“”MC Mean,“MC Std Deviation”,“MC Bias”,“MC MSE”) # row heading in the output > sumdat=rbind(rep(mu,3),rep(S,3),simumean,simustd,simbias,simumse) > > dimnames(sumdat)=list(names,ests) > round(sumdat,4) & % 19 ' $ A simulation study: R code & % 20 ' $ Checking coverage probability of confidence interval • Usual 95% confidence interval for µ based on sample mean is given by s s (Ȳ − t0.025,n−1 √ , Ȳ + t0.025,n−1 √ ) n n • Does the interval achieve the nominal level of coverage 95%? & % 21 ' $ Checking coverage probability of confidence interval > #Check the coverage probability of confidence interval > #We make use of the data obtained from the above simulation study > t05=qt(0.975,n-1) #Get t {0.025,14} >coverage=sum((meaxt05*stdx/sqrt(n)<=mu)&(meax+t05*stdx/sqrt(n)>=mu))/S > #The above statement is equivalent to > # d=0 > # for(i in 1:S){ > #d=d+meax[i]t05*stdx[i]/sqrt(n)<=mu)&(meax[i]+t05*stdx[i]/sqrt(n)>=mu))} > #coverage=d/S >coverage [1]0.945 & % 22 ' $ Simulation for properties of hypothesis tests Example: Consider the size and power of the usual t-test for the mean • Test H0 : µ = µ0 against H0 : µ ̸= µ0 • To evaluate whether size/level of test achieves advertised α – Generate data under H0 : µ = µ0 and calculate the proportion of rejections of H0 • Approximation the true probability of rejecting H0 when it is true – Proportion should be approximately equal to α & % 23 ' $ Simulation for properties of hypothesis tests • To evaluate power – Generate data under some alternative µ ̸= µ0 (let say, µ = µ1 ) and calculate the proportion of rejections of H0 • Approximate the true probability of rejecting H0 when the alternative is true & % 24 ' $ Checking the size of t-test: R code > #Check the size of t-test >ttests=(meax-mu0)/(stdx/sqrt(n)) >size=sum(abs(ttests)>t05)/S >size [1]0.045 & % 25 ' $ Checking the power of t-test: R code > #Check the power of t-test when d=1*sd > Assume the samples generated from N (µ1 , σ) where µ1 = 1, σ = 1. > Test H0 : µ = µ0 vs H1 : µ = µ1 = µ0 + σ where µ0 = 0. >ttests=(meax-mu0)/(stdx/sqrt(n)) >power=sum(abs(ttests)>t05)/S >power [1]0.954 & % 26 ' $ Simulation study in SAS Simulation to study the size of t-test *Generate “ns” normal random samples; * “mcrep” (M.C. Replications) is the index for the s-th random sample; data simu; seed=123; ns=1000;n=25;mu=0;sigma=1; do mcrep=1 to ns; do i=1 to n; x=rannor(seed); output; end; keep mcrep x; end; run; & % 27 ' $ Simulation study in SAS proc sort data=simu; by mcrep; run; *Perform the t-test for each sample. Output the p-value “probt” in the SAS dataset “outtset”; proc univariate data=simu noprint mu0=0; by mcrep; var x; output out=outtest probt=p; run; & % 28 ' $ Simulation study in SAS * Count how many samples have “probt”<0.05; data outtest; set outtest; reject=(p<0.05); run; * “reject” is the number of samples with “probt”<0.05. Hence the mean of “reject” is the rejection rate,“rejrate”; & % 29 ' $ Simulation study in SAS proc means data=outtest noprint; var reject; output out=results mean=rejrate; proc print data=results; var freq rejrate; run; & % 30

Chapter 11 Simulation studies in statistics

Related documents

Products

Support

Chapter 11 Simulation studies in statistics

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib