Sampling

advertisement
Sampling
W&W, Chapter 6
Rules for Expectation
Examples
Mean: E(X) = xp(x)
Variance: E(X-)2 = (x- )2p(x)
Covariance: E(X-x)(Y-y) =
(x- x)(y- y)p(x,y)
Rules for Expectation
E(X + Y) = E(X) + E(Y)
E(aX + bY) = aE(X) + bE(Y)
E(R) where R=10+X+Y =
E(10 + X + Y) = 10 + E(X) + E(Y)
E[g(X,Y)] = x yg(x,y)p(x,y)
Sampling
What can we expect of a random sample
drawn from a known population?
Can we generalize findings from our
random sample to the population?
This is the heart of inferential statistics.
Definitions
Population: The total collection of objects
to be studied.
Each individual observation in a random
sample has the population probability
distribution p(x). See Table 6-1, p.190
Random Sample: A sample in which each
individual has an equal chance of being
selected.
Definitions (continued)
The sample mean is not as extreme (doesn’t
vary so widely) as the individual values in the
sample because it represents an average.
In other words, extreme observations are
diluted by more typical observations. See
Figure 6-2.
A sample is representative if it has the same
characteristics as the population; random
samples are much more likely to be
representative.
Sampling with or without
replacement
In large samples, these are practically
equivalent.
A very simple random sample (VSRS) is a
sample whose n observations X1, X2,
…Xn are independent. The distribution
of each X is the population distribution
p(x), that is:
p(x1)=p(x2)=…=p(xn)
Small Samples
The exception to this rule occurs in small
samples, where sampling without
replacement significantly changes the
probability of other X values (see page
216).
Example: calculating the probability of
various poker hands
How Reliable is the Sample?
Suppose we calculate the sample mean (M),
and we want to know how close it comes to
, the population mean.
 Imagine collecting many different samples,
getting a sample mean for each sample. We
could build the sampling distribution of M,
denoted p(M).
 Example: everyone flip a coin 10 times and
tell me how many heads you flipped.
How Reliable is the Sample?


Rather than actually sampling, we can
simulate this sampling on a computer,
which is called Monte Carlo sampling
(or simulation).
We can also derive mathematical
formulas for the sampling distribution of
M.
Moments of the Sample Mean
Recall our objective is to estimate a
population mean, . If we take a
random sample of observations from
the population and calculate the sample
mean, how good will M be as an
estimator of its target, ?
We start with the definition of the sample
mean: M = 1/n(X1 + X2 + …Xn)
Moments of the Sample Mean
We start by calculating the expectation of
the sample mean:
E(M) = 1/n[E(X1) + E(X2) +…+ E(Xn)]
Remember that each observation X has
the population distribution p(x) with
mean . Thus E(X1) = E(X2) = … 
E(M) = 1/n[ +  + … + ]
= 1/n[n ] = 
Moments of the Sample Mean
We can see that E(M) = 
On average, the sample mean will be on target,
that is, equal to .
Of course, an individual sample mean is likely to
be a little above or below its target (think of
the coin flips we did).
The key question is how much above or below?
We must find the variance of M.
Moments of the Sample Mean
Var (M) = 1/n2[var(X1) + var(X2) + …
+ var(Xn)]
Each observation X has the population
distribution p(x) with variance 2, so:
Var (M) = 1/n2[2 + 2 + … + 2]
= 1/n2[n 2] = 2/n
Standard deviation (M) = /n
Standard error
This typical deviation of M from its target,
, represents the estimation error, and
is commonly called the standard error.
What happens as n increases?
The standard error decreases, thus the
larger the sample, the more accurately
M estimates !
The shape of the sampling
distribution
Figure 6-3 shows 3 different parent population
distributions. We see that as n increases, the
sampling distribution has an approximately
normal shape.
Central Limit Theorem: In random samples of
size n, M fluctuates around  with a standard
error of /n. Thus as n increases, the
sampling distribution of M concentrates more
and more around  and becomes normal
(bell-shaped).
Normal approximation rule
If we know the normal approximation rule, or
Central Limit Theorem, we can look at the
probability of particular values (or ranges) of
M using the standard normal table.
Example: Suppose a population of men on a
large southern campus has a mean height of
=69 inches with a standard deviation  =
3.22 inches.
Normal approximation rule
If a random sample of n = 10 men is
drawn, what is the chance that the
sample mean M will be within 2 inches
of the population mean?
E(M) =  = 69
SE = /n = 3.22/ 10 = 1.02
We want to find the probability that M is
within 2 inches, or between 67 and 71.
Normal approximation rule
Z=M-=M-
SE
/n
Z = 71 – 69 = 1.96
1.02
Thus a sample mean of 71 is nearly 2 standard
errors about its expected value of 69.
P(Z > 1.96) = .025, likewise P(Z < -1.96) =
.025
Normal approximation rule
Probability (67 < M < 71)
= 1 – .025 - .025 = .95
We can conclude that there is a 95%
chance that the sample mean will be
within 2 inches of the population mean.
Note that there are 2 formulas for Zscores, one for individual values of X,
and one for sample means, M.
Another Example
Suppose a large statistics class has marks
normally distributed with =72, and
=9.
What is the probability that an individual
student drawn at random will have a
mark over 80?
Here we are comparing a single student’s
score to the distribution of scores.
Another Example
Z = X –  = 80 – 72 = .89

9
Pr(Z > .89) = .187
What is the probability that a random sample of
10 students will have a sample mean over
80?
In this case, we are comparing the sample
mean to all possible sample means, the
sampling distribution.
Another Example
Z = M -  = 80 – 72 = 2.81
/n
9/10
Pr(Z > 2.81) = .002
This sample mean is very unlikely. This
shows that taking averages tends to
reduce the extremes.
Proportions
We often express our data as proportions,
such as the proportion of heads in a
sample of 10 coin flips.
Normal Approximation Rule for
Proportions: In random samples of size
n, the sample proportion P fluctuates
around the population proportion  with
a standard error of [(1- )/n]
Proportions
We can see again that as n increases, our
sample proportion gets closer to the
population proportion.
Example: A population of voters has 60%
Republicans and 40% Democrats.
What is the chance that a sample of
100 will produce a minority of
Republicans (less than 50%)?
Proportion Example
Z=P-=P-
SE [(1- )/n]
Z=
.5 - .6
= -2.00
[.6(1- .6)/100]
Pr(Z < -2.00) = .023 or 2%
Normal Approximation to the
Binomial
Of your first 10 grandchildren, what is the
chance there will be more than 7 boys?
This is the same as the proportion of boys
is more than 7/10.
We could use the binomial to solve this
problem.
Assume p(boy) = .5
Normal Approximation to the
Binomial
P(S > 7) = P(S=8) + P(S=9) + P(S=10)
You could calculate this or just use the
cumulative binomial table on pages 670-671.
P(S > 7) = .044 + .010 + .001 = .055
We can also use what we know about
proportions and that they will approximate
the normal distribution to solve this problem.
Normal Approximation to the
Binomial
We want to know the probability of getting
more than 7 boys. We must calculate this as
p=7/10 because we are dealing with a
continuous distribution (normal), so
everything between 7 and 8 must be
included.
Z= P-
= .7 - .5 = 1.26
[(1- )/n] [.5(1-.5)/10]
Pr(Z > 1.26) = .104
Normal Approximation to the
Binomial
Obviously, this involves some error. We
can correct with the continuity
correction, where we take the half way
point between 7 and 8.
Z= P-
= .75 - .5 = 1.58
[(1- )/n] [.5(1-.5)/10]
Pr(Z > 1.58) = .057
Normal Approximation to the
Binomial
Note that this is very close to our
estimate calculated from the binomial
distribution, .055!
Monte Carlo Simulations
A computer program that repeats
sampling and constructs a sampling
distribution.
This approach is particularly useful for
providing sampling distributions that
cannot be derived easily theoretically.
Download