Uploaded by ss s

Confidence Intervals: Inferential Statistics Chapter

advertisement
Chapter 8 — Confdence Intervals
1
Inferential Statistics:
Population data are usually difcult to gather. For instance, let’s suppose that we want to know the average height of all
Americans. It is unrealistic for a researcher to measure the height of each and every American — it’s virtually impossible.
In situations such as this, it’s prudent to estimate the height of all American’s using a more manageable (smaller) sample.
The assignment of value(s) to a population parameter based on a value of the corresponding sample statistic is called
.
Point Estimate:
Interval Estimate:
Example 0.1. Suppose a marine biologist would like to estimate the mean birthweight of the Loggerhead sea turtles
on the Treasure Coast. However, it’s practically impossible to fnd and weigh every Loggerhead sea turtle hatchling on
the Treasure Coast, therefore the biologist must gather a random sample of Loggerhead turtle birth weights and use that
sample data to estimate the birthweight of all Loggerhead hatchlings. Imagine the average of one such sample of 100
Loggerhead hatchlings yielded a mean birthweight of 20 grams. The sample mean x̄ = 20 is referred to as the point
estimate. That being said, it’s unlikely that the true average birthweight of all Loggerhead sea turtles on the Treasure
Coast is precisely 20 grams, so the researcher may instead construct an interval estimate and say that the average
birthweight is likely somewhere between 18 grams and 22 grams.
A
is an interval that is constructed around a point estimate and it is stated that we are XX%
confdent that the interval contains the true population mean. Thus, a confdence interval provides a range of reasonable
values in which we expect the population parameter (µ or p) to fall. There is no guarantee that any single confdence
interval will contain the unknown population parameter — only a 0.XX probability that it does. We will consider
confdence intervals of the form
Point Estimate ± Margin of Error
That is, to calculate a confdence interval, we simply need to calculate the point estimate (µ or p), the margin of error,
and then add/subtract these values to determine the lower and upper bound of the confdence interval. Note: since a
confdence interval is obtained by adding/subtracting the margin of error from the point estimate, the overall width of a
confdence interval will always (in this course) be double the size of the margin of error.
Three Basic Scenarios for Confdence Intervals
1. Confdence Interval for a population mean when σ is known
2. Confdence Interval for a population mean when σ is unknown
3. Confdence Interval for a population proportion
We’ll soon introduce a new statistical table to use for scenario 2. How do we determine what statistical table to use? Use
the following fow chart. By the way, we’ll see three similar scenarios in Chapter 9.
CI for µ or p?
µ
p
Is σ known?
Yes
Use z-table
Use z-table
No
Use t-table
Important Note: A confdence interval is constructed with regard to a specifc ‘confdence level.’ The confdence level
indicates the percentage of intervals that should contain the true, unknown population mean, after numerous samples are
drawn from the same population and their confdence intervals are each considered. It is not guaranteed that any one
confdence interval will contain the true unknown population mean. For instance, if the confdence level is 95%, then we
would expect about 95 out of every 100 confdence intervals to contain the true population mean. In other words, there
is no guarantee that any of our individual solutions in this chapter are ‘correct’ and contain the true population mean.
Chapter 8 — Confdence Intervals
1
2
Estimating a Population Mean When σ is Known
The frst scenario for which we would like to calculate a confdence interval is when we wish to estimate a population
mean when σ is known. That is, we want to estimate the population mean using sample data, when we happen to know
the population standard deviation already (perhaps from previous research). Of course, for the Central Limit Theorem
to apply to the sampling distribution of x̄ and guarantee normality, the sample size must be
or larger.
Alternatively, the sampling distribution of x̄ will be normal if the population distribution is known to be
.
See Chapter 7 for details.
The confdence interval for a population mean (µ) when σ is known is given by
x̄ ± E
where E = z · σx̄ .
The value of z is corresponds to the desired level of confdence and σx̄ =
√σ .
n
How exactly does one determine what value of z to use in the expression above? To determine the z-value that corresponds
to a XX% confdence level, sketch a standard normal curve with XX% of the area under the curve centered about the
origin, then fnd the value of z the separates the shaded middle region from the unshaded right tail.
Example 1.1. Determine the z-value that corresponds to a 90% confdence level.
To determine the z-value that corresponds to a 90% confdence level, we frst sketch a
standard normal curve and shade the middle 90% — centered about the mean. Recall that
the total area under the standard normal curve is 1, thus the area of the middle 90% is 0.90.
If the middle shaded portion has area 0.90, it follows that the undshaded portion (the tails)
of the curve is 1 − 0.90 = 0.10 or 10%. Furthermore, due to the symmetric nature of the
0.90
standard normal distribution, each unshaded tail must have an area of 0.05 (half of 0.10).
Lastly, the desired z-value is the z-value that separates the shaded middle 90% from the
0.05
0.05
unshaded 0.05% in the right tail. Sound familiar? We did this exact exercise in Chapter 6.
Before we can solve, we need to note the total area to the left of the desired z-value, which
is 0.05 + 0.90 = 0.95. Once the area to the left is known, the z-value can fnally be found by −4 −3 −2 −1 0 1 2 3
using the z-table (fnd 0.05 in the interior of the z-table) or by using the ‘invnorm’ feature
z=?
of a graphing calculator. Either method will yield the solution z = 1.64. Thus, the z-value
that is used for a 90% confdence interval is 1.64.
The z-value that corresponds to any other confdence level can be found using the exact method outlined in Example 1.1
above.
Example 1.2. Determine the z-value that correspond to each of the common confdence levels: 95%, 96%, 97%, 98%,
and 99%.
−4
−4
−3
−3
−2
−2
−1
−1
0
0
1
1
2
2
3
3
4
4
−4
−4
−3
−3
−2
−2
−1
−1
0
0
1
1
2
2
3
3
4
4
−4
−3
−2
−1
0
1
2
3
Confdence Level
z -value
90%
z = 1.64
95%
z = 1.96
96%
z = 2.05
97%
z = 2.17
98%
z = 2.33
99%
z = 2.58
4
4
Chapter 8 — Confdence Intervals
3
Example 1.3. The nursing department needs to create a informational brochure for student interested in a nursing career.
The average salary of a nurse on the Treasure Coast needs to be included in this brochure. Of course, it’s not practical
to contact every nurse on the Treasure Coast, ask for his or her salary, and then compute the true population average
salary. Instead, the nursing department contacts a random sample of 36 nurses, asks for the salary information of each,
and then determines that the sample average of those 36 nurses is $56,000. Since the sample data did not include every
nurse on the Treasure Coast, the population mean is most likely not $56,000 exactly, but close. Thus, it’s appropriate to
construct a confdence interval around x̄ = 56, 000, which will yield a range of likely values of the average salary of all
nurses on the Treasure Coast. Suppose the standard deviation of all nursing salaries on the Treasure Coast is $6,000 and
the nursing department would like to use a 99% confdence level for their estimate.
A. Determine the point estimate for the average salary of all nurses on the Treasure Coast.
B. Determine the margin of error for a 99% confdence interval.
C. Construct a 99% confdence interval for the average salary of all nurses on the Treasure Coast.
Example 1.4. Reference Example 1.3 above and suppose the nursing department decided to instead construct a 95%
confdence interval for the mean salary of nurses on the Treasure Coast, in lieu of the original 99% confdence interval. If
all other aspects of Example 1.3 remain the same, explore how the confdence interval is afected by the lowering of the
confdence level.
A. Determine the point estimate for the average salary of all nurses on the Treasure Coast.
B. Determine the margin of error for a 95% confdence interval.
C. Construct a 95% confdence interval for the average salary of all nurses on the Treasure Coast.
Example 1.4 demonstrates that if the level of confdence is directly related to the margin of error. If two confdence
intervals with diferent levels of confdence are constructed from the same sample data, the confdence interval with the
lower level of confdence will have the smaller margin of error. In general, a lower margin of error is desired — but
arbitrarily lowering the level of confdence is not recommended. There is a better way to decrease the margin of error.
Example 1.5. Once again, reference Example 1.3 above and suppose the nursing department decided that the original
margin of error was too large, but they do not want to decrease the level of confdence (as in Example 1.4). Instead, they
decide to increase their sample size from the original 36 to 100. Presume the average of this new sample of 100 nurses on
the Treasure Coast yielded the same average salary of $56,000. If all aspects of example 1.3 remain unchanged except for
sample size, explore how the confdence interval is afected by the increased sample size.
A. Determine the point estimate for the average salary of all nurses on the Treasure Coast.
B. Determine the margin of error for a 99% confdence interval.
C. Construct a 99% confdence interval for the average salary of all nurses on the Treasure Coast.
Therefore, the best way to decrease the margin of error of a confdence interval is to
the
. Decreasing the level of confdence is will also decrease the margin of error, but in ‘real
life’ one would never change the confdence level to achieve a smaller margin of error. Confdence levels
are determined by discipline (e.g. medical research may us 99% but chemists may use 95%).
Chapter 8 — Confdence Intervals
4
In Examples 1.3, 1.4, and 1.5, the interaction between sample size, confdence level, and the margin of error of a confdence
interval was investigated. We noted that using a larger sample size or smaller confdence level resulted in a relative lower
margin of error for a confdence interval. Taking this relationship one step further, we can actually fnd the sample size
required to produce a desired margin of error, given a predetermined level of confdence, maximum margin of error, and
population standard deviation.
The sample size required to produce a confdence interval for µ with a maximum margin
of error E is given by
z2 σ2
n=
E2
where z corresponds to the desired level of confdence and σ is the population standard
deviation.
Note: If the initial calculation of n does not yield a whole number, then the value must be
rounded up to the nearest whole number.
Example 1.6. Reference Example 1.3. Suppose the nursing department has yet to collect any data for their confdence
interval of average nursing salaries on the Treasure Coast. Determine the sample size necessary to produce a confdence
interval with a maximum margin of error of $1,000 if the confdence level desired 99% and the population standard
deviation is known to be $6,000.
2
Estimating a Population Mean When σ is Unknown
In the previous section, we calculated confdence intervals using the expression
x̄ ± E
where the margin of error was calculated using the formula E = z · σx̄ . However, this begs the question, ‘How does one
calculate a confdence interval if the population standard deviation (σ) is not known.’ This very question was answered by
William Gosset in 1908. Using the pseudonym Student, Gosset published the t-distribution (often referred to as ‘Students
t-distribution), which is actually a family of distributions that takes into account the sampling error of the standard
deviation for relatively small sample sizes. Why is this necessary? For small sample
N (0, 1)
sizes, the diference between the population and sample standard deviations is more
t(2)
t(8)
pronounced, leading to greater variability in the sampling distributions for small n.
As shown in the graphic to the right, the t distribution
‹ is symmetric about its mean t = 0,
‹ is bell shaped, and
‹ has ‘heavier’ tails compared to N (0, 1)
−4
−3
−2
−1
0
1
2
3
4
Furthermore, the t-distribution has a single parameter — degrees of freedom. For the purposes of this course, degrees of
freedom (df ) can be calculated as df = n − 1.
As illustrated by the graphic above, the t-distribution is similar, but noticeably diferent, compared to the z-distributions
when n is small. However, as n → ∞, the t-distribution becomes practically indistinguishable from the z-distribution. In
fact, the t-distribution table we will use in this class only includes calculations up to df = 75 (or n = 76), since the t- and
z-distributions are basically the same at that point.
Chapter 8 — Confdence Intervals
5
To construct a confdence interval for a sample mean (x̄) when the population standard deviation (σ) is not known,
the t distribution is used instead of the standard normal (z) distribution. Otherwise, the method is virtually the same.
The confdence interval for a population mean (µ) when σ is not known is given by
x̄ ± E
where E = t · sx̄ .
The value of t is corresponds to the desired level of confdence and sx̄ =
√s .
n
Notice that the confdence intervals is calculated using the same basic expression x̄ ± E. The only diference lies in the
way E is calculated. If σ is known, E = z · σx̄ (see previous section). However, if σ is not known, E = t · sx̄ .
Important Note: In practice (outside of this class), the population standard deviation will be unknown just like
the population mean is unknown. Therefore, the z-distribution is rarely used for confdence intervals apart from their
introduction in a class like this. The t-distribution is usually required when calculating confdence intervals out in the
‘real world.’
So how does one determine the proper t-value to use when calculating the margin of error? Use the following method.
Method to determine the t-value use for a confdence interval when σ is not known
1. Calculate the degrees of freedom. Recall, df = n − 1.
2. Locate the row equal to the result from Step 1.
3. Locate the column equal to the desired confdence level (90%, 95%, etc.).
4. The desired t-value is the intersection of the row from Step 2 and the column from Step 3.
Notice that the only confdence levels included on the t-table are 80%, 90%, 95%, 98%, 99%, and 99.9%. This is due to
the fact that the t-distribution is actually a family of distributions — each value of df defnes a separate distribution.
Imagine that each df value defnes its own table, similar to the z-table. To construct a manageable table of t-values for
students to use, only select confdence levels were included.
Example 2.1. The American Automobile Association, better known as AAA or ‘Triple A,’ publishes the average daily
gas price for the entire nation as well as individual states. To estimate the average gas price in Florida, AAA contacted a
random sample of 50 gas stations from around the state and found that the average gas price was $2.46 per gallon with
a standard deviation of $0.15.
A. Determine the point estimate for the average gas price per gallon in all of Florida.
B. Determine the margin of error for a 99% confdence interval.
C. Construct a 99% confdence interval for the average gas price per gallon in Florida.
Example 2.2. The U.S. Bureau of Labor Statistics (BLS) releases a monthly report on the average hourly wage of
American employees. A recent sample of 500 American employees produced a mean hourly wage of $28.18 with a
standard deviation of $1.35.
A. Determine the point estimate for the average hourly wage of all American employees.
B. Determine the margin of error for a 98% confdence interval.
C. Construct a 98% confdence interval for the average hourly wage of all American employees.
Chapter 8 — Confdence Intervals
6
Example 2.3. The college would like to estimate the average number of hours students work per week. Suppose 15
students are randomly selected and asked how many hours they work per week. The results are given below.
10, 8, 0, 12, 15, 30, 25, 20, 0, 40, 10, 12, 35, 20, 15
Assume the hours students work per week are known to be normally distributed.
A. Determine the point estimate for the average number of hours worked per week by all students at the college.
B. Determine the margin of error for a 90% confdence interval.
C. Construct a 90% confdence interval for the average number of hours worked per week by all students at the college.
3
Estimating a Population Proportion
The fnal type of estimation we will consider is that of proportions. Recall from Chapter 7 that a proportion is simply a
percent in decimal form.
Example 3.1. Suppose 80 out of 100 students in a large class pass a test. The percentage of students who passed the
test is 80% while the proportion of students who passed the test is 0.80. Since this proportion involves the entire class,
we refer to 0.80 as the population proportion and write p = 0.80.
Example 3.2. Fifty students at the college were asked if they plan to transfer to a four-year degree program after earning
their associate’s degree. Thirty-fve of the students responded afrmatively. Thus, the percentage of students that plan to
transfer to a four-year degree program is 35
50 · 100 = 70% while the corresponding proportion is 0.70. Since this proportion
was calculated from sample data, it is referred to as a sample proportion and we write p̂ = 0.70.
Even though we are now working with proportions, we are still doing the same basic task that we have been focused on
all chapter — using sample data to estimate a population parameter using the expression
Point Estimate ± Margin of Error
Previously, we used a sample mean (x̄) to estimate a population mean (µ). In this last section, we will use a sample
proportion (p̂) to estimate a population proportion (p).
The confdence interval for a population proportion (p) is given by
p̂ ± E
where E = z · sp̂ .
The value of z is corresponds to the desired level of confdence and sp̂ =
q
p̂q̂
n.
As discussed in Chapter 7, the prerequisites for the application of the Central Limit Theorem to sampling distributions
of p̂ are that
and
. Thus, one should always ensure that both statements are true before
constructing a confdence interval for a population proportion.
Chapter 8 — Confdence Intervals
7
Example 3.3. Each month, the Bureau of Labor Statistics (BLS) estimates the national unemployment rate. The
unemployment rate is defned as the percentage of the labor force that is not currently employed but could be. Suppose
a recent BLS survey found that 54 out of a random sample of 1500 Americans were jobless.
A. Determine the point estimate for the proportion of all unemployed Americans.
B. Determine the margin of error for a 97% confdence interval.
C. Construct a 97% confdence interval for the proportion of unemployed Americans.
Example 3.4. Each autumn, the consulting frm PWC surveys 2000 Americans regarding their holiday spending plans.
This year, they found that 54% of respondents planned to do the majority of their holiday shopping online.
A. Determine the point estimate for the proportion of all Americans that plan to do the majority of their holiday
spending online.
B. Determine the margin of error for a 90% confdence interval.
C. Construct a 90% confdence interval for the proportion of all Americans that plan to do the majority of their holiday
spending online.
Example 3.5. A political campaign would like to estimate the proportion of likely voters that plan to support their
candidate. A campaign stafer selected a random sample of 30 likely voters and asked them if they planned to vote for
this candidate. The responses of these likely voters are given below.
Yes
Yes
No
No
Yes
Yes
Yes
Yes
No
Yes
No
No
No
Yes
Yes
Yes
Yes
No
No
No
Yes
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
No
A. Determine the point estimate for the proportion of all ‘Yes’ voters.
B. Determine the margin of error for a 96% confdence interval.
C. Construct a 96% confdence interval for the proportion of all ‘Yes’ voters.
D. Do you think the campaign manager will be satisfed with the confdence interval? If not, how might the confdence
interval be improved?
The sample size required to produce a confdence interval for the population proportion
with a maximum margin of error E is given by
n=
z 2 p̂q̂
E2
where z corresponds to the desired level of confdence and p̂ is the sample proportion.
Note: If the initial calculation of n does not yield a whole number, then the value must be
rounded up to the nearest whole number.
Chapter 8 — Confdence Intervals
8
The value of p̂ is gained by collecting preliminary sample data. On the other hand, if no preliminary sample data is
available, a most conservative (worst case) estimate of sample size required can be obtained by setting p̂ = 0.50.
Example 3.6. Suppose the campaign stafer in Example 3.5 knew about the above result before collecting any data for
the confdence interval.
A. Determine the most conservative estimate for the sample size required for the 96% confdence interval in Example
3.5 if the maximum margin of error allowed is 2%.
B. Suppose the data collected in Example 3.5 serves as a preliminary estimate of p̂. Determine the sample size required
for the 96% confdence interval in Example 3.5 if the maximum margin of error allowed is 2%.
As was found earlier in the chapter, the best way to decrease the margin of error and thus shrink the overall width of the
confdence interval is to
. Although
the level of confdence will also
shrink the margin of error, this practice is not advisable since confdence levels are set by discipline and should always be
adhered to.
Very Important Note about the Precision of Solutions
Since Chapter 6, we have seen that rounding errors can greatly afect the fnal solution of a probability
calculation. For example, when calculating probability ‘by hand,’ we round z-scores to two decimal
places and t-scores to four decimal places and anytime one rounds at an intermediate step of a multistep calculation, the fnal solution is afected. We will experience this same issue with confdence
intervals in Chapter 8 (and again with hypothesis tests in Chapter 9).
If you have a graphing calculator, it’s best to use its built-in functionality to calculate a confdence
interval. This mitigates the error that occurs from rounding.
On the other hand, a graphing calculator is not required for this course. Thus, if you are doing
calculations ‘by hand’ using the z and t-charts, be sure to realize that your solutions will be a little
diferent than solutions found via a graphing calculator (or other software). If you’re working with large
data values, this diference could possibly be quite pronounced. However, this does not mean a solution
reached ’by hand’ is incorrect.
Due to the variation of ‘correct’ solutions to these these types of questions, homework an quiz exercises
are coded accept a range of possible solutions (referred to as solution tolerance). This ensures that
solutions reached via a graphing calculator or ‘by hand’ are counted as correct.
As far as multiple-choice tests are concerned, students must understand that the correct solution will
often be generated by software, which means it will be equal to a graphing calculator solution (where
little rounding took place). Solutions reached ‘by hand’ may not be exactly the same — but will be
close. It should be obvious which choice to select on a multiple-choice question.
Lastly, since homework and quiz exercises will often accept a range of correct solutions, this sometimes
allows solutions that are actually incorrect to be scored as correct. This can potentially be confusing.
For instance, it’s possible that a margin of error calculation is not correct, but is scored as correct by
the homework software because it’s ‘close enough’ to to the correct answer. However, when that same
(incorrect) margin of error is used to calculate a confdence interval, the confdence interval solution
might not be close enough to be scored as the correct answer. In reality, both solutions are incorrect
and should receive no credit — but the frst solution is graded as correct because it’s ‘close enough’
while the second solution is not. This does not happen often, but it happens occasionally and students
should be aware of the possibility. There’s really no way around the issue, as long as some student are
doing calculations ‘by hand’ and others are using a graphing calculator.
If you are ever working on a homework or quiz question and you feel that your solution is correct, but
not being graded as such, please let me know and I will investigate the issue. Occasionally, a homework
or quiz question’s coding will need to be tweaked.
Download