Stat 410/510
Lab Week 4
Due: Friday, February 21.
(1) Create a short R function that will calculate a confidence interval for a population mean. The input will be a vector of data and the desired percentage for the confidence interval. Below is the start of the function and some test output. Comment your code.
CI <- function(x,pct=0.95)
{
YOUR CODE HERE
return(
A VECTOR OF LENGTH 2
)
}
## Test the function
# Make practice data set.seed(410510) x <- rnorm(50,mean=20,sd=4)
## First test
CI(x) t.test(x)$conf.int
## Second test
CI(x,pct=.90) t.test(x,conf.level=.90)$conf.int
(2) For this problem you will create a function that demonstrates the Central Limit Theorem. The input will be a large vector which represents the population. The output will be four graphs on one page. The first graph will be a histogram of the original population data. The second graph will be a qq-plot of the population data to demonstrate how close it is to a normal distribution.
The third graph will be histogram of the N sample means, each calculated from a sample of size n from the population. The fourth graph will be a qq-plot of the sample means. For the histograms, use breaks=30. Below I provide you some code to start with. a.
Show your commented code. b.
Demonstrate your code works by showing the output from the following code:
CLT(population=runif(1000),n=100,N=500) c.
Use the provided test code to test your data. What happens to the standard deviation of the sample means as n increases? What happens to the shape of the distribution of sample means? d.
What does the distribution of the sample means look like when n=1?
1
# Make practice population and see it’s not normal set.seed(410510) x1 <- rchisq(5000,df=6)+rnorm(5000,mean=60,sd=1) x2 <- rchisq(10000,df=6)+rnorm(10000,mean=30,sd=2) y1 <- rnorm(10000,mean=45,sd=12) z <- sample(x=c(x1,x2,y1),size=90000,replace=T) hist(z, breaks=30) qqnorm(z) qqline(z)
# Beginning of function
CLT <- function(population,n=30,N=1000)
{
dev.new()
par(mfrow=c(2,2))
hist(population, freq=F, breaks=30, main="Population",
sub=paste("mean=",signif(mean(population),4),
" sd=",signif(sd(population),4)) )
qqnorm(population)
qqline(population)
xbar <- rep(NA,N)
Your code goes here
}
CLT(population=z,n=1,N=500)
CLT(population=z,n=4,N=500)
CLT(population=z,n=16,N=500)
CLT(population=z,n=64,N=500)
CLT(population=z,n=256,N=500)
CLT(population=z,n=1024,N=500)
2