Document

advertisement
MM207 Statistics
Welcome to the Unit 8 Seminar
Dr. Ami Gates
Confidence Intervals
Sampling Distributions
Margin of Error
Sample Sizes
with Dr. Ami Gates
95% Confidence Interval
Confidence Intervals to Estimate the
Population Mean Using the Sample
Mean
Suppose 5100 Statistics students show a mean time required to
graduate with a combined BA/MS Degree of 5.12 years with a
standard deviation of 1.71 years.
• Estimate the mean time required to graduate with a BA/MS
for all statistics students.
• In general, for all possible samples of a population of a given size,
the means of all the samples best estimates that population
mean. Here, we have only one sample, but it is still the best
estimate of the population mean that we have. Therefore, the
estimate of our mean time for students to graduate is 5.12 years.
Confidence Intervals and Margin of Error to Estimate the
Population Mean Using the Sample Mean
Suppose 5100 Statistics students show a mean time required to graduate with a
combined BA/MS Degree of 5.12 years with a standard deviation of 1.71 years.
Find the 95% Confidence Interval for the mean time required to get a BA/MS.
First, we will need to find the margin of error using this estimate formula:
Margin of error = E ~= 2s/sqrt(n) where “s” is the sample standard deviation.
Our margin of error E = 2(1.71)/sqrt(5100) = 3.42/71.4 = .048 which we can round to
.05.
The 95% Confidence Interval is created by adding and subtracting the margin of error
from the sample mean
Our sample mean is 5.12
5.12 + .05 = 5.17
5.12 - .05 = 5.07
So the 95% CI is:
5.07 < population mean () < 5.17
Our “Quick” formula for Margin of
Error for 95% Confidence Intervals
Important Note:
This estimation “quick” method of using E = 2s/sqrt(n) can be used
because we know that with a very large sample size (> 200) and a 95%
interval (leaving 2.5% in each tail) and that the t-critical value for each tail is
greater than 1.98 and here is rounded up to 2.
The technical formula for the margin of error E for a 95% CI with an
unknown population standard deviation and mean is
E = (t-critical value)*(sample standard deviation) / sqrt(n)
Again, because our sample sizes are so large, this “technical” formula is
essentially the same as our “quick” formula. Four our class, we will use the
simplified “quick” formula of E = 2s/sqrt(n) to calculate the margin or error
for very large samples. Recall that s is the sample standard deviation and n
is the sample size. This is for a 95% CI.
Confidence Intervals for
Proportions
A sample of 10000 people showed that 2100 (or 21%) of people prefer vanilla to chocolate. What
is the 95% confidence interval for the proportion of all people that prefer vanilla to chocolate?
The sample proportion
p̂
is 2100/10000 = .21
To find the 95% confidence interval, we first need to calculate the margin or error “E”. This
formula will approximate E for us.
E = 2 * sqrt [
p̂ (1 – p̂
)/n]
Here, p̂ is the sample proportion and “n” is the sample size.
E = 2* sqrt( .21 (.79) / 10000) = 2* sqrt(.00001659) = .0081
So the 95% confidence interval is between
.21 + .0081
.21 - .0081
.202 < p < .218
Sampling Distributions
A tutoring service ran for 3 days. Here are the number of calls they received on those three days:
12, 10, 5
Assume that samples of size 2 are randomly selected with replacement from these
three values.
List all the possible samples and find the mean of each sample:
1)
2)
3)
4)
5)
6)
7)
8)
9)
12, 12
12, 10
12,5
10,12
10,10
10,5
5,12
5,10
5,5
Mean of each
sample
Samples
12
12
12
10
10
10
5
5
5
12
10
5
12
10
5
12
10
5
12
11
8.5
11
10
7.5
8.5
7.5
5
Sampling Distributions
Identify the probability of each sample and describe the sampling distribution of the
sample means.
To find the probability of each sample, notice first that there are 9 samples.
Each is equally likely. So, each has a probability of 1/9.
To describe the sampling distribution of the sample
means, we must first group together all the
samples that are identical and then look at the
probability of getting that sample:
The mean of 12 occurs once P(12) = 1/9
The mean of 11 occurs twice P(11) = 2/9
The mean of 10 occurs once P(10) = 1/9
The mean of 8.5 occurs twice P(8.5) = 2/9
The mean of 7.5 occurs twice P(7.5) = 2/9
The mean of 5 occurs once P(5) = 1/9
Mean of
each
sample
Samples
12
12
12
10
10
10
5
5
5
12
10
5
12
10
5
12
10
5
12
11
8.5
11
10
7.5
8.5
7.5
5
To Find the Mean of the Sampling
Distribution
•
•
•
•
•
•
The mean of 12 occurs once P(12) = 1/9
The mean of 11 occurs twice P(11) = 2/9
The mean of 10 occurs once P(10) = 1/9
The mean of 8.5 occurs twice P(8.5) = 2/9
The mean of 7.5 occurs twice P(7.5) = 2/9
The mean of 5 occurs once P(5) = 1/9
The mean of the sampling distribution = sum (x * p(x) )
12 *1/9 + 11*2/9 + 10*1/9 + 8.5*2/9 + 7.5*2/9 + 5*1/9 = 9
The mean from our original population is (12 + 10 + 5) / 3 = 9
Therefore, the mean of the sampling distribution and the mean of the
original population are the same.
Finding the Smallest Sample Size Needed for a
Given Margin or Error
Suppose you want to estimate the mean distance between to
molecules in an elephant. A margin of error that you want is .01
micrometers. Past studies suggest that a population standard
deviation of .16 micrometers is reasonable.
Estimate the minimum sample size required to estimate the
population mean with the given accuracy.
Finding the Smallest Sample Size Needed
for a Given Margin or Error
Here, we want to calculate the smallest sample size we will
need to create a 95% confidence interval with a margin of error
of .01.
The formula is:
n = [(2*sigma)/E]2
The sigma is the population standard deviation. The “E” is the
desired margin of error.
The “n” is the smallest sample size that will give us this error.
We use a “2” because we want this sample size to work for a
95% CI and we know that 2 a good estimate for the critical
values at the tails.
Answer:
Sample size “n” = [(2*.16)/.01] 2 = 322 = 1024
Finding the Sample Mean and
Sample Standard Deviation
• Suppose you collect the following sample data:
What is the sample size?
Here, the sample size n = 14
What is the mean for the sample?
To get the mean, add all the numbers together and divide by
the sample size.
The answer is 242.7.
Finding the Sample Mean and
Sample Standard Deviation
What is the std dev for the sample?
You can use this formula:
s = sqrt[sum(x – sample mean)2 / (n-1)] = 115.6
Note: In Excel, the formula for this is =STDEV.S(A1:A14)
assuming your data values are in cells A through A 14. You
can also do this in StatCrunch.
Calculating Sample Standard
Deviation by Hand
s = sqrt[sum(x – sample mean)2 / (n-1)]
STEPS:
• The first step is to find the sample mean.
• Then, subtract the sample mean from each data value.
• Then, square each difference. (x – mean)^2
• Next, sum up all the squared values together.
• Next, divide that sum by n – 1
• Finally, take the square root of the result.
The next slide shows an Excel spread sheet of these steps – color coded.
Calculating the Sample Standard
Deviation by Hand – color coded
The data Set
92
356
428
360
178
232
274
372
216
156
344
46
152
192
sample mean
x – mean
(x – mean)^2
242.7
242.7
242.7
242.7
242.7
242.7
242.7
242.7
242.7
242.7
242.7
242.7
242.7
242.7
-150.7
113.3
185.3
117.3
-64.7
-10.7
31.3
129.3
-26.7
-86.7
101.3
-196.7
-90.7
-50.7
22710.49
12836.89
34336.09
13759.29
4186.09
114.49
979.69
16718.49
712.89
7516.89
10261.69
38690.89
8226.49
2570.49
Take the sqrt
Divide by
sum of (x-mean)^2
n–1
173620.9
13355.45
Solution
115.5658
Using your Calculated Sample Mean
and Sample Standard Deviation
What is the best estimate for the population mean?
The sample mean is our best estimate for the population
mean. The sample mean is 242.7.
What is the margin of Error
The Margin of Error E can be estimated by the
equation:
E = 2s/sqrt(n) = (2*115.6)/sqrt(14) = 61.8
Using your Calculated Sample Mean
and Sample Standard Deviation
What is the 95% CI for the population mean?
The mean + E is 242.7 + 61.8 = 304.5
The mean – E is 242.7 – 61.8 = 180.9
So, the 95% CI for the population mean is
181 < mean () < 305
Notice that I rounded up as my data have no decimals in them.
Rounding will depend on what the problem requests.
Sample Proportions and Sample
Statistics
You select a random sample of 140 people at a chocolate conference
that is attended by 1691 people. Within your sample, you find that 67
people secretly prefer vanilla.
Based on your sample statistic, estimate how many people at the
conference secretly prefer vanilla?
67/140 = .479 is our sample proportion p^. This is our sample statistic.
Of the 1691 people at the conference, we can estimate using our sample
proportion that: .479 * 1691 = 810
This tells us that 810 people secretly prefer vanilla at our conference.
Sample Proportions and
Sample Statistics
Would you be more confident of your estimate if you sampled
300 people?
Yes – a higher sample is more likely to provide a more reliable
estimate.
Suppose you found out that 400 people at the conference
actually secretly prefer vanilla. What is the population
proportion for our conference?
400/1691 = .237
Sampling Distributions
• Sampling Distributions: A sampling distribution is a
distribution of statistics obtained by selecting all the possible
samples of a specific size from a population.
• Distribution of Sample Means: A sampling distribution of the
mean gives all the values the mean can take, along with the
probability of getting each value if sampling is random from the
null-hypothesis population.
• Distribution of Sample Proportions: The distribution that
results when we find the proportions (ˆp) in all possible samples
of a given size.
Sampling Error
• Sampling Error: The discrepancy between the statistic
obtained from the sample and the parameter for the population
from which the sample was obtained.
• For example, the mean (¯x) calculated from a sample will not
always equal the population mean ().
Central Limit Theorem*
• Central Limit Theorem: For any population with mean 
and standard deviation , the distribution of sample means
for sample
¯x size n will have a mean of  and a standard
deviation of /n, and will approach a normal distribution as n
approaches infinity (n >30 is the general rule).
* See Page 217
Distribution of Sample Means
Example
• Consider the following data as a Population
2, 4, 6, 8
• The population mean is 5
• The population standard deviation is 2.236
• Now we are going to take ALL possible samples of n = 2 from this
population.
• We will calculate the mean for each sample
Sampling Distribution of Means for
Samples of n = 2
Pick 1
Pick 2
Mean
Mean 2 Variance
Standard Deviation
2
2
2
2
4
4
4
4
6
6
6
6
8
8
8
8
2
4
6
8
2
4
6
8
2
4
6
8
2
4
6
8
2
3
4
5
3
4
5
6
4
5
6
7
5
6
7
8
80
4
9
16
25
9
16
25
36
16
25
36
49
25
36
49
64
440
0.000
1.414
2.828
4.243
1.414
0.000
1.414
2.828
2.828
1.414
0.00
1.414
4.243
2.828
1.414
0.00
0
2
8
18
2
0
2
8
8
2
0
2
18
8
2
0
Central Limit Theorem Applied
¯x = 80/16 = 5, which equals the population mean. So we have shown that the
•
mean of the means is equal to mu or the population mean.
• Sx
= √X2 – (X)2/N / N
= √440 – (80)2/16 / 16 (notice we divide by N since this is a population).
= √40/16
= √2.5
= 1.58
• Now, we will calculate what the Central Limit Theorem tells us the standard
deviation will be. It is
σx = σ/ √n
= 2.236 / √2
= 2.236 / 1.14142
= 1.58
Distribution of Sample Proportions
The distribution of sample proportions is the distribution that
results when we find the proportions ( p̂ ) in all possible samples of
a given size.
The larger the sample size, the more closely this distribution
approximates a normal distribution.
In all cases, the mean of the distribution of sample proportions
equals the population proportion.
If only one sample is available, its sample proportion, p̂ , is the best
estimate for the population proportion, p.
Margin of Error
The margin of error for the 95% confidence interval is
2s
margin of error = E ≈
n
where s is the standard deviation of the sample.
We find the 95% confidence interval by adding and subtracting the margin of
error from the sample mean. That is, the 95% confidence interval ranges
from (x – margin of error)
to
(x + margin of error)
We can write this confidence interval more formally as
x̄
or more briefly as
x̄
±E
–E<μ<
x̄ + E
Constructing a Confidence
Interval
• A study finds that the average time spent by eighth-graders
watching television is 6.7 hours per week, with a margin of
error of 0.4 hour (for 95% confidence). Construct and interpret
the 95% confidence interval
¯x best estimate of the population mean is the sample mean,
• The
= 6.7 hours.
• We find the confidence interval by adding and subtracting the
margin of error from the sample mean, so the interval extends
from 6.7 – 0.4 = 6.3 hours to 6.7 + 0.4 = 7.1 hours.
Using StatCrunch -Confidence
Intervals
• In the data set; select:
•
•
•
•
•
•
•
•
STAT
Z Statistics
One-Sample
With Data
Select Variable
Click next
Select confidence interval and percent
Calculate
Interpreting the Confidence Interval
Figure 8.10 This figure illustrates
the idea behind confidence
intervals. The central vertical line
represents the true population
mean, μ. Each of the 20
horizontal lines represents the
95% confidence interval for a
particular sample, with the sample mean marked by the dot in the center of the
confidence interval. With a 95% confidence interval, we expect that 95% of all samples
will give a confidence interval that contains the population mean, as is the case in this
figure, for 19 of the 20 confidence intervals do indeed contain the population mean.
We expect that the population mean will not be within the confidence interval in 5%
of the cases; here, 1 of the 20 confidence intervals (the sixth from the top) does not
contain the population mean.
Determine Minimum Sample Size
• Solve the margin of error formula [E =2s/√n] for n.
 2s 
n   E 
 
2
• You want to study housing costs in the country by sampling recent house
sales in various (representative) regions. Your goal is to provide a 95%
confidence interval estimate of the housing cost. Previous studies suggest
that the population standard deviation is about $7,200. What sample size
(at a minimum) should be used to ensure that the sample mean is within
• a. $500 of the true population mean?
 2
n   E

2
  2  7,200 
2
  
  28.8  829.4
  500 
2
Core Logic of Hypothesis Testing
• Considers the probability that the result of a
study could have come about if the experimental
procedure had no effect
• If this probability is low, scenario of no effect is
rejected and the theory behind the experimental
procedure is supported
Hypothesis Testing using Confidence Intervals




State the claim about the population mean
Determine desired confidence level
Select a random sample from the population
Calculate the confidence interval for the desired level of
confidence.
 If the claim is contained within the interval, the claim is
reasonable; if it is not within the interval, the claim is not
reasonable, at the given level of confidence.
 See Testing a Claim document in Doc Sharing
Questions?
Download