Chapter 7: The Distribution of Sample Means

advertisement
Chapter 7: The Distribution of
Sample Means
Samples and Populations
• Samples provide an incomplete picture of the
population.
• There are aspects of the population that may not
be included within a sample.
• The sampling error is the error between a
sample statistic and the corresponding
population parameter.
– The sampling error is the measure of the
discrepancy between the sample and the
population.
Sampling Error
By definition, sampling is used to calculate sample
statistics which are estimates of population parameters.
So there will always be a difference (usually an
unknown difference) between the sample statistic and
the population parameter. This difference is called
sampling error.
Examples:
X 
s 
s 
2
p 
2
Sampling distribution
• Sampling distribution is a distribution of statistics
(e.g. M or s) obtained by selecting all of the possible
samples of a specific size (n) from the population.
• General characteristics of the sampling distributions:
1. P(|M-μ| ≤ ɛ) quite large, i.e. M  μ
2. M ~ Normal distribution
3. n↑  σM↓ or M  μ or P(|M-μ| ≤ ɛ) ↑
Example 7.1: N=4, X= 2, 4, 6, 8
• Fig 7.1: population (p. 203): evenly distributed
• Table 7.1: selected 16 samples with n=2 (p.
204), you get sixteen M distribution of M
• Fig 7.2: sampling distribution of those 16 sample
means with n=2 (p. 204) ~ Normal distribution
• P(M>7) = ? 1/16
The Distribution of Sample Means
• The distribution of sample means is defined
as the set of means from all the possible random
samples of a specific size (n) selected from a
specific population.
• This distribution has well-defined (and
predictable) characteristics that are specified in
the Central Limit Theorem
Sampling Distribution of the Sample
Mean
The sampling distribution of the sample mean is a
probability distribution consisting of all possible sample
means of a given sample size selected from a population.
The sampling distribution of the sample mean summarizes
the probabilities of sampling error: X - m
Sampling Distribution of the
Sample Mean
 The mean of the distribution of sample means will be exactly
equal to the population mean if we are able to select all possible
samples of the same size from a given population.
 The Standard Error of the Mean:
 There will be less dispersion in the sampling distribution of
the sample mean than in the population. As the sample size
n increases, the standard error of the mean decreases.
The Central Limit Theorem
1. The mean of the distribution of sample means is
called the Expected Value of M and is always
equal to the population mean μ. E(M) = μ
2. The standard deviation of the distribution of sample
means is called the Standard Error of M and is
computed by
σ
σM = ____
n
or
σ2M =
σ2
____
n
3. The shape of the distribution of sample means
tends to be normal. It is guaranteed to be normal if
either a) the population from which the samples are
obtained is normal, or b) the sample size is n = 30
or more.
Central Limit Theorem
CENTRAL LIMIT THEOREM If all samples of a particular size are
selected from any population, the sampling distribution of the sample
mean is approximately a normal distribution. This approximation
improves with larger samples.
• If the population follows a normal probability distribution, then for
any sample size the sampling distribution of the sample mean will
also be normal.
• If the population distribution is symmetrical (but not normal), the
normal shape of the distribution of the sample mean emerges with
samples as small as 10.
• If a distribution is skewed or has thick tails, it may require samples
of 30 or more to observe the normality feature.
• The mean of the sampling distribution is equal to μ. The variance
is equal to σ2/n and the standard deviation is equal to s / n .
The Expected Value of M: E(M)= μ
• If two (or more) samples are selected from the
same population, the two samples probably will
have different means.
• Although the samples will have different means,
you should expect the sample mans to be close
to the population mean.
• The mean of the distribution of the sample of
means is equal to the mean of the population of
scores (μ): that is the expected value of M.
Standard Error: σM
• The standard error (also known as the standard
deviation of the distribution of sample means,
σM) provides a measure of the average distance
between M (sample mean) and μ (population
mean).
• Standard error describes the distribution of
sample means (variability).
• Law of large numbers: The larger the sample
size (n), the more probable that M is close to μ.
– Inverse relationship: the larger the sample size,
the smaller the stander error.
The Standard Error of M
• The standard error of M is defined as the
standard deviation of the distribution of sample
means and measures the standard distance
between a sample mean and the population
mean.
• Thus, the Standard Error of M provides a
measure of how accurately, on average, a
sample mean (M) represents its corresponding
population mean (μ).
p. 208-209-210
• if σ = 10
• Fig 7.3: n↑  σM↓
• Table 7.2: n↑  σM↓
Note: around n=30, σM is pretty small and stable
Fig. 7.4: population  a sample  sampling
distribution (M)
p. 211
1. μ=50, σ=12
a. n=4, E(M)=? σM =?
b. If the population is not normal, n=4, what is the
shape of M distribution?
c. n=36 , E(M)=? σM =?
d. If the population is not normal, n=36, what is the
shape of M distribution?
2. As n increases, E(M) also increases. (true or
false?)
3. As n increases, σM also increases. (true or
false?)
Probability and Sample Means
• Because the distribution of sample means tends
to be normal, the z-score value obtained for a
sample mean can be used with the unit normal
table to obtain probabilities.
• The procedures for computing z-scores and
finding probabilities for sample means are
essentially the same as we used for individual
scores
Probability and Sample Means (cont'd.)
• However, when you are using sample means,
you must remember to consider the sample size
(n) and compute the standard error (σM) before
you start any other computations.
• Also, you must be sure that the distribution of
sample means satisfies at least one of the
criteria for normal shape before you can use the
unit normal table. i.e.
1. population is normally distributed
2. n > 30
Using the Sampling Distribution of the
Sample Mean
 If a population follows the normal distribution, the sampling
distribution of the sample mean will also follow the normal
distribution.
 If the shape is known to be non-normal but the sample contains
at least 30 observations, the central limit theorem guarantees the
sampling distribution of the mean follows a normal distribution.
 When the population standard deviation is known, a z-statistic for
the sampling distribution of the sample mean is calculated as:
z-Scores and Location within the
Distribution of Sample Means (cont'd.)
• As always, a positive z-score indicates a sample
mean that is greater than μ and a negative zscore corresponds to a sample mean that is
smaller than μ.
• The numerical value of the z-score indicates the
distance between M and μ measured in terms of
the standard error.
z-Scores and Location within the
Distribution of Sample Means
• Within the distribution of sample means, the
location of each sample mean can be specified
by a z-score:
M–μ
z = ─────
σM
Example 7.2 (p.211)
Whenever you have a probability question about a
sample mean, you must use the distribution of
sample means.
• SAT: Population ~ Normal distribution
μ=500, σ=100, n = 25, P(M>540)=?
E(M) = 500, σM =100/5=20,
z0=(540-500)/20=2
P(M>540)= P(z>2) = 0.5 – P(0<z<2)
=0.5 - 0.4772 = 0.0228
example 7.2 (p. 211)
• The population of SAT scores ~ N(500, 100)
• Take a random sample of n=25, P(M>540)=?
M is a probability distribution.
M’s distribution is normal, because the population
is normally distributed.
E(M) = μ = 500, σM = 100/5 = 20
P(M>540) = P(z>540-500/20)=P(z>2) = 0.5-0.4772
= 0.0228
Example 7.3 (p.213)
• Computing z for a single score: use σ
• Computing z for sample mean: use σM
• SAT: Population ~ Normal distribution
μ=500, σ=100, n = 25, P(|z|<z0) = 0.8, z0 = ?
 find P(0 < z < z0) = 0.4, z0 = ?
from z tableP(0 < z < 1.28) = 0.3997
z0 = 1.28
 (X0 – 500)/20 =  1.28  X0 = 500  1.28 * 20
 X0 = 474.44, 525.6
example 7.3 (p. 213)
• The population of SAT scores ~ N(500, 100)
• a random sample: n=25, P(|M|<?)=0.8
E(M) = μ = 500, σM = 100/5 = 20
P(|z|<z0)=0.8  z0 = 1.28
(M0 - 500)/20 = 1.28  M0 - 500 =  25.6
 M0 = (474.4, 525.6)
standard deviation vs standard error
Box 7.2 (p. 214)
• Standard deviation measures the standard
distance between X and μ.
• Standard error measures the standard distance
between M and μ.
p214
1. μ=40, σ=8, M=44 z=?
a. n=4 z = (44-40)/4 = 1
b. n=16 z = (44-40)/2 = 2
2. normal distribution: μ=65, σ=20, n=16, p(M>60)=?
σM =20/4 = 5, z0 = (60-65)/5 = -1
 p(z>-1) = 0.5+0.3413 = 0.8413
3. positively skewed: μ=60, σ=8,
a. n=4, p(M>62)=? not enough info
b. n=64, p(M>62)=?
σM =8/8 =1 p(M>62)=(z>2) = 0.5-0.4772 = 0.0228
example 7.4 (p. 216-217)
• population: students in a local college
• survey question: # of minutes watching video per day
• μ=80, σ=20
• n=1, n=4, n=100 (figure 7.8 (p.217)
∵ population: normal ∴ all three are normal
all three have μ=80
but they have different σM
p. 219
1. σ=10. On average, how much difference .....
a. between μ and a single score? σ = 10
b. between μ and M (n=4)? σM = 10/2
c. between μ and M (n=25)? σM = 10/5
2. Can σM > σ ?
3. σ=12. random sampling
a. if σM ≤ 6, n = ?  12/? ≤ 6  ? ≥ 2  n ≥ 4
b. if σM ≤ 4, n = ?  12/? ≤ 4  ? ≥ 3  n ≥ 9
Example 7.5
Evaluate the effect of new growth hormone:
Normal: μ=400, σ=20, n=25  σM = 4
compare the weight of treated rats (treatment group) and
untreated rats (control group)
If treated sample is noticeably different from untreated samples
 treatment is effective
middle 95% is acceptable difference (i.e. random error)
 If z > 1.96 or z < - 1.96  noticeable difference
 treatment is effective
boundary of middle 95% = (400-1.96*4, 400+1.96*4)
=( 392.16, 407.84)
• outside this boundary  significant (effective)
σM as a measure of reliability
degree of confidence about the accuracy of M 
Is it OK to use M as a representative of μ?
• large σM  less confident about M
• small σM  more confident about M
• the size of σM can be controlled.... select a
large sample
• n ↑  σM ↓
p. 223
1. Normal, μ=80, σ=20,
a. select 1 score, how much distance would you
expect, on average, between X and μ? σ
b. select 100 scores? σM
2. Normal, μ=40, σ=8,
a. n=16, M=36, relatively typical or extreme?
σM = 8/4 = 2, z=(36-40)/2 = -2
b. n=4, typical or exteme?
σM = 8/2 = 4, z=(36-40)/4 = -1
p. 224
3. Normal, μ=530, σ=80,
a. n=16, 95% range?
σM = 80/4 = 20, 5301.96*20 = (490.8, 569.2)
b. n=100, 95% range?
σM = 80/10 = 8, 5301.96*8 = (514.32, 545.68)
4. claimed: (μ=45, σ=4), Sample: n=16, M=43
Is this sample mean likely to occur if the claim is
true? σM = 4/4 = 1, z=(43-45)/1=2
Is the sample mean within the range of values that
would be expected 95% of the time? (assume:
normal population)
Download