Stat 401B Fall 2015 Lab #4 (Due September 24) 1. Start RStudio and type the attached Code Set #1 into the upper left pane and run it. This is a simulation version of the "propagation of error"/"error analysis" problem of Section 5 Exercise 2, Section 5.5 page 321 of V&J. a) Approximately what mean and standard deviation should be used to describe what is known about the coefficient of linear expansion of brass based on the values given in this exercise? b) One simulation-based way of trying to assess the importance of uncertainty in the various measured values in producing an overall uncertainty in is to set all but one value at exactly their means and consider only variability in that one input. Do this in turn for each of L1 , L2 , T1 , and T2 . Uncertainty in which of these seems to be the biggest contributor to overall uncertainty in ? (Compare 4 standard deviations.) c) Compare the root of the sum of the squares of your 4 standard deviations in b) to your standard deviation in a). Does it seem that non-linearity of in the inputs is important here? 2. Using Code Set #1 and 1) above as guides, use simulation to do Section 5 Exercise 5, Section 5.5, page 322 of V&J. 3. Type Code Set #2 into the upper left pane and run it. This is a simulation and some summaries for the distributions of Z X X and Zˆ s/ n / n for n 5 and X 1 , X 2 , X 3 , X 4 , X 5 independent and identically distributed (iid) U 0,1 (so that .5 and 1/12 ). a) How do the n 5 distributions of Z and Zˆ compare to each other and to the standard normal distribution? b) Appropriately modify the code and rerun it for the case of n 100 . How do the n 100 distributions of Z and Zˆ compare to each other, the n 5 distributions, and the standard normal distribution? 4. Type Code Set #3 into the upper left pane and run it. This is a simulation intended to study the effectiveness of the confidence interval formulas X z n and X z s n in the case that the distribution being sampled is U 0,1 and n 5 . In its initial form it uses the value z 1.96 . a) What is the target confidence level for two-sided intervals with the end-points above and z 1.96 ? Explain. b) What are the actual approximate confidence levels achieved in this context? Which is closer to your answer for a)? c) Which ones (if any) of the first 10 samples of size 5 fail to produce intervals covering ? (Answer this for both the "known sigma" and "unknown sigma" interval formulas.) d) Why is your answer to b) consistent with the results of part 3a above? e) Appropriately modify the code and rerun it for the case of n 100 . Now how do the actual approximate confidence levels achieved compare to your answer for a)? Code Sets for Stat 401B Laboratory #4 #Code Set 1 #Here is some code for Exercise 2 Section 5.5 alpha<-function(l1,l2,t1,t2){ (l2-l1)/(l1*(t2-t1)) } L1<-rnorm(10000,mean=1,sd=.00005) L2<-rnorm(10000,mean=1.00095,sd=.00005) T1<-rnorm(10000,mean=50,sd=.1) T2<-rnorm(10000,mean=100,sd=.1) a<-rep(0,10000) for(i in 1:10000) {a[i]<-alpha(L1[i],L2[i],T1[i],T2[i])} summary(a) sd(a) hist(a) #Code Set 2 #Here is some code for studying the actual distribution of #some approximately standard normal variables built in "iid" #(random sampling from a fixed distribution) models M<-matrix(runif(50000,min=0,max=1),nrow=10000,byrow=T) av<-1:10000 for (i in 1:10000){ av[i]<-mean(M[i,]) } z<-1:10000 for (i in 1:10000){ z[i]<-(av[i]-.5)*sqrt(60) } hist(z,freq=FALSE) curve(dnorm(x),add=TRUE) plot(ecdf(z)) curve(pnorm(x),add=TRUE) #Now use the sample standard deviation rather than the model sigma=1/sqrt(12) s<-1:10000 for (i in 1:10000){ s[i]<-sd(M[i,]) } z<-1:10000 for (i in 1:10000){ z[i]<-(av[i]-.5)*sqrt(5)/s[i] } hist(z,freq=FALSE) curve(dnorm(x),add=TRUE) plot(ecdf(z)) curve(pnorm(x),add=TRUE) #Code Set 3 #Here is some code for making and checking the performance #of CIs for mu (for U(0,1) observations) #First Use the model Standard deviation M<-matrix(runif(50000,min=0,max=1),nrow=10000,byrow=T) Low<-rep(0,10000) Up<-rep(0,10000) chk<-rep(0,10000) for (i in 1:10000){ av[i]<-mean(M[i,]) } for(i in 1:10000) {Low[i]<-av[i]-1.96*sqrt(1/60)} for(i in 1:10000) {Up[i]<-av[i]+1.96*sqrt(1/60)} for(i in 1:10000) {if((Low[i]<.5)&(.5<Up[i])) chk[i]<-1} cbind(Low[1:10],Up[1:10],chk[1:10]) mean(chk) #Now use the sample standard deviation Low<-rep(0,10000) Up<-rep(0,10000) chk<-rep(0,10000) s<-rep(0,10000) for (i in 1:10000){ s[i]<-sd(M[i,]) } for(i in 1:10000) {Low[i]<-av[i]-1.96*s[i]*sqrt(1/5)} for(i in 1:10000) {Up[i]<-av[i]+1.96*s[i]*sqrt(1/5)} for(i in 1:10000) {if((Low[i]<.5)&(.5<Up[i])) chk[i]<-1} cbind(Low[1:10],Up[1:10],chk[1:10]) mean(chk)