Quiz 9 & 10 (20 points) Due: Friday, April 17, 2008 beginning of class Name_Solution__________________ Exploring the Distribution of Sample Means by Computer Simulation This is an individual assignment. You are allowed to seek help from persons other than me for programming questions only. I reserve the right to verbally question you about your responses and assign a grade of zero if it becomes apparent that the work was not your own. R has been installed in BRH-205, if you need to use campus computing facilities. Purpose: to investigate the distribution of sample means under different conditions using computer simulations instead of theory. 1. In this exercise, you will explore the distribution of sample means when the samples are drawn from and Exp(2) distribution. a. Using R, draw one random sample of size 3 from the Exp(2) distribution. You will use the command rexp(n=3,rate=2). In R “rate” is what we call λ, n is the sample size and “rexp” stands for random generation from the exponential distribution. You can store your sample in an object called “data” using the command, data <- rexp(n=3,rate=2). Type data to view your sample. Write your sample here: __answers will vary 0.9867846 0.1419717 0.4805372___ The sample mean is: _0.5364312_____ (In R, use the command: mean(data) ) b. Repeat part (a). Write the resulting sample and sample mean here: __ 0.7672375, 0.2571384, 0.2458986___sample mean = 0.4234248_____________ c. To understand the behavior of all possible sample means from samples of size 3, we need to repeat part (a) many, many times and record the resulting sample means. This is tedious to do by hand, so use the following lines of code to generate 1000 sample of size 3 from an Exp(2) distribution. Note that # is the comment symbol in R. R will ignore everything on a line after #. simdata <- rexp(n=3000,rate=2) #generate 3000 random samples from Exp(2) matrixdata <- matrix(simdata,nrow=1000,ncol=3) #format simdata as matrix Now type matrixdata to see the random samples you just generated. Note each row is one sample of size 3. Since there are 1000 rows, we have 1000 samples of size 3. Now get the sample mean of each row of data: means.exp <- apply(matrixdata,1,mean) #takes the mean of each row means.exp #print the 1000 sample means to the screen hist(means.exp) mean(means.exp) sd(means.exp) #histogram of the 1000 means from samples of size 3 #mean of the 1000 sample means #standard deviation of the means Estimate μ X by the mean of the 1000 sample means: __0.4999________ How does the estimate compare to the true value of μ X = μ = 1 λ = __ 0.5 ______ ? Very close Estimate σ X using the 1000 sample means:______0.2848___________ How does it compare to the true value of σ X = σ n = 1 λ n = ___1/(2*sqrt(3))=0.289_____? Attach either a printout or a sketch of the histogram of the 1000 sample means. Do the sample means appear to be normally distributed?__attached___No, the sample means do not appear to be normally distributed. The distribution is right skewed.___ 2. Now repeat exercise 1 for samples of size 15. So start by generating 1000 random samples of size 15 from Exp(2). simdata <- rexp(n=15000,rate=2) matrixdata <- matrix(simdata,nrow=1000,ncol=15) Type matrixdata[1:2,] to view the first two rows of the matrix Use the same R commands are before to obtain the histogram, mean and standard deviation of the 1000 sample means for samples of size 15. Estimate μ X by the mean of the 1000 sample means: __ 0.4972355________ How does the estimate compare to the true value of μ X = μ = 1 λ = ___ 0.5 _____ ? Estimate σ X using the 1000 sample means:__ 0.1264290_______________ How does it compare to the true value of σ X = σ n = 1 λ n = ___1/(2*3.87_)_= 0.129__The estimate is close to the true value__? Attach either a printout or a sketch of the histogram of the 1000 sample means. Does it look normal?__attached______ 3. Now draw 1000 samples of size 3 from a N(0,1) distribution. Use the R commands: simdata <- rnorm(n=3000,mean=0,sd=1) matrixdata <- matrix(simdata,nrow=1000,ncol=3) Estimate μ X by the mean of the 1000 sample means: __ -0.0329809________ How does the estimate compare to the true value of μ X = μ = __ 0 ______ ? Estimate is close to true value. Estimate σ X using the 1000 sample means:__ 0.5747191_______________ How does it compare to the true value of σ X = σ n = 1 n = ___0.5774______? Attach either a printout or a sketch of the histogram of the 1000 sample means. Does it look normal?__yes ______ 4. Lastly, draw 1000 samples of size 15 from a N(0,1) distribution. Use the R commands: simdata <- rnorm(n=15000,mean=0,sd=1) matrixdata <- matrix(simdata,nrow=1000,ncol=15) Estimate μ X by the mean of the 1000 sample means: __ -0.01284968________ How does the estimate compare to the true value of μ X = μ = __ 0 ______ ? The estimate is close to the true value. Estimate σ X using the 1000 sample means:_____ 0.2559351____________ How does it compare to the true value of σ X = σ n = 1 n = __0.2582_______? Attach either a printout or a sketch of the histogram of the 1000 sample means. Does it look normal?___yes_____ 5. Suppose X is the mean of a random sample of size 15 drawn from a population that has the N(0,1) distribution. a. Calculate P( X <0.25) using theory to obtain the exact probability. (To use R, look up the command pnorm, i.e. type ?pnorm) P ( X < 0.25) = P( Z < 0.25 − 0 ) = P( Z < 0.97) = 0.8340 1 15 b. Approximate P( X <0.25) using the 1000 sample means simulated in problem 4. (Hint: sorting the sample means in ascending order might help, use the R command sort(x), where x is the name of the vector containing the sample means.) P ( X < 0.25) ≈ number of sample means greater than 0.25 844 = = 0.844 (answers will vary) number of simulated sample means 1000 6. a. Redo problem #21a in section 4.5 of the Navidi text using a simulation. Compare your approximation to the exact theoretic answer. Bulb A life = X ~N(800,sd = 100) Bulb B life = Y ~ N(900,sd = 150) P(Y > X)=P(Y-X > 0) since Y-X ~ N(100, sd = sqroot(100^2 + 150^2)=180.3) =P(Z > (0-100)/180.3)=P(Z>-0.55) = 0.7088 Simulation: simulate 1000 values of X and 1000 values of Y. Calculate Y-X. Then P (Y − X > 0) = number of simualted Y - X greater than 0 726 = = 0.726 (answers will vary but total number of Y - X simulated 1000 should be close to 0.7088) b. Let X = life of Bulb A and Y = life of Bulb B, use a simulation to determine if the following random variables are approximately normal i. Y/X Close to normal, but a little skewed. Graphs attached. ii. sin(X) Not normal at all. Graph attached. c. Use a simulation to approximate the probability that Bulb B lasts over 10% longer than Bulb A. P (Y > 1.1 * X ) = P (Y / X > 1.1) ≈ number of simulated Y/X over 1.1 547 = = 0.547 (ans will 1000 total number of Y/X simulated vary) d. In general, when using computer simulation methods to approximate probabilities, how can you improve the accuracy of your approximation? For example, in part (c) what can you do to increase the accuracy of your answer? Simulate more repetitions of the experiment. So in part (c) simulate more values of Y/X e. Are sample means always normally distributed? No. The sample means in problem 1 are not normally distributed. Sample means will be approximately normally distributed if the sample size is large. If the sample size is small, we only know sample means are normally distributed if the sample is drawn from a normally distributed population. 1) 4) 2) 6b) 3)