Chapter 7 Sampling Distributions

advertisement
Chapter 7 Sampling Distributions
We want to estimate population parameters, such as μ
(population mean) and σ(population standard deviation) and p
(population proportion).
We use sample statistics x-bar and s and p-hat(p^).
Estimates are numbers we get from specific samples.
Estimators are rules (functions) we use to get the estimates.
X-bar says to sum the values and divide by n the sample size.
Why the distinction?
Estimates are numbers and will change from sample to sample.
Estimators can be looked at Random Variables.
Therefore we may know something about the distribution.
Let us look at a small example to illustrate the distribution of X,
the original RV will differ from the distribution of X-bar the sample
means.
See file clt4.xlsx
Suppose that we have a DRV X with five values {0, 1, 2, 3, 4} and
each value is equally likely. Then the probability distribution is
given in the first two columns. Then suppose that we take a
random sample of size 2 with replacement and we take the
sample mean (x-bar) of each of these samples. These are listed in
columns D – F. The probability distribution of X-bar for these
samples is given in columns H and I. The probability histograms
are given on sheet 2. Note for X each bar is the same height. For
X-bar there are more bars and the histogram looks mound
shaped!
What if we samples without replacement?
See columns K – M and O and P and blue bars on sheet 2.
What if the probabilities for X were not all equal?
See sheet 3
Recall the definition of the sample standard deviation:

  xi  x 
s
n 1
2
What you should know.
X-bar is a good estimator of μ because:
E(X-bar) = μ (unbiased)
The average of all the sample averages is μ.
The standard deviation of X-bar is small, compared to other
estimators.
σ x-bar = σ / √n standard error of the sample mean.
Also, as n gets bigger, X-bar gets closer to μ.
Most important: Central Limit Theorem (CLT). If Q is an estimator
that has mean μ and standard deviation σ(Q) then:
(Q – μ) / (σ(Q))  N(0,1) as n gets big.
For X-bar this becomes:

X
=Z
/ n
What is a big n? For most cases n ≥ 30 will work very nicely.
This is really nice because, it is true for almost all X, and working
with the Normal Distribution is really easy. Use table or calculator.
Ex. Let μ = 200 and σ = 20.
Take a random sample of 100, n = 100
Then σ x-bar = σ / √n = 20 / 10 = 2.
What is the probability that the sample mean is less than or equal
to 205?
P[X-bar ≤ 205] = normalcdf(-1000, 205, 200, 2)= .9938
P[Z ≤ (205 - 200)/2] =
P[Z ≤ 2.5] = .9938 = normalcdf(-10, 2.5)
Ex. Let μ = 200 and σ = 20
n = 100 and σ / √n = 2.
What is the probability that the sample mean is greater than 203?
P[X-bar > 203] = normalcdf(203, 1000, 200, 2) = .0668
P[Z > (203 - 200)/2] =
P[Z > 1.5] = .0668
What is the probability that the sample mean is between 197 and
202?
P( -1.50 < Z < 1.00) = P(197 < X-bar < 202)
normalcdf(197, 202, 200, 2)
.8413 -.0668 = .7745
Ex. Let μ = 200 and σ = 20
n = 100 and σ / √n = 2.
Find a such that 97.5% sample means fall below a.
(Find a such that 2.5% of sample means fall above a.)
P(X-bar < a) = .9750
P(Z < 1.96 ) = .9750
a = 1.96 * 2 + 200 = 203.92
invNorm(.975,200,2) = 203.92
Calculator steps are the same as before, you just need to use
σ / √n not σ, because we are not interested in X but X-bar.
You can use the CLT to calculate probabilities for the sample mean
if you have a sample bigger than or equal to 30 or you know that
the underlying population (RV) is normally distributed.
The CLT works for other estimators also, such as:
𝑇 = ∑ 𝑋 ~𝑁(𝑛 ∗ 𝜇, √𝑛𝜎)
Ex.
Ex. Let μ = 200 and σ = 20.
Take a random sample of 100, n = 100
T ~ (μT = 20000, σT = 200)
Find the probability that the sum is greater than 20350.
P(T > 20350) = P(Z > 1.75) =
normalcdf(20350,10000000,20000,200) = 0.040
Normalcdf(1.75,10) = 0.040
Find the probability that the sum is less than 20222.
P(T < 20222) = P(Z < 1.11) = normalcdf(0, 20222,20000,200) =
0.867
Normalcdf(-10, 1.11) = 0.867
Say we want to estimate p = the population proportion of
defective widgets. Since p is a parameter it might be hard to
obtain. We could take a random sample of size n and find the
number of defective widgets in the sample, call this number X,
then we let:
p^ = X / n
we read this p-hat.
p-hat is then an estimator of p.
Using the Binomial RV one can show that E(p^) = p and the
standard error of p^ = √(p * q/n).
Using the CLT we can see that
^
p p
will have an approximate standard normal distribution
p*q/n
as n gets big.
Note that this is more an academic exercise now, since we can
calculate exact probabilities for p^ using the binomial RV directly.
More on this later.
Chapter 7 Illustrated.
The scores on the Math SAT are normally distributed with a mean
of 420 and a standard deviation of 65.
1. A random person is selected. What is the probability that
he/she scored above 500?
2. A random sample of 25 people is selected. What is the
probability that the mean of the sample is greater than
500?
You should know how to answer question 1.
X = score on Math SAT’s. X ~ N(420, 65)
P (X > 500) = normalcdf(500, 10000, 420, 65) = .109
For question to you need to understand that X-bar is a random
variable with mean 420 and standard deviation 65/5 = 13. The
square root of 25 is 5.
P(X-bar > 500) = normalcdf(500, 10000, 420, 13) = 3.796 * 10-10 ≈ 0
The sample mean varies much less than the original variable (in
this case Math SAT score)
You can use this because although n = 25 < 30 we have a normal
distribution.
Page 297: 7.7
In Class: 1, 2, 6, 8 , 12
Homework: 3, 5, 7, 11
Download