Using Confidence Intervals

advertisement
Chapter 9 – Estimating the Value of a Parameter
Using Confidence Intervals
There are two branches of statistical inference, 1) estimation of parameters and 2) testing hypotheses
about the values of parameters. We will consider estimation first.
Defn: A point estimate of a parameter (a numerical characteristics of a population) is a specific
numerical value based on the data obtained from a sample. To obtain a point estimate of a parameter,
we summarize the information contained in the sample data by using a statistic. A particular statistic
used to provide a point estimate of a parameter is called an estimator.
Characteristics of an Estimator
To provide good estimation, an estimator should have the following characteristics:
1) The estimator should be unbiased. In other words, the expectation, or mean, of the sampling
distribution of the estimator for samples of a given size should be equal to the parameter which the
statistic is estimating.
2) The estimator should be consistent. In other words, if we increase the size of the sample which we
select from the population, the estimator should yield a value which gets closer to the true value of the
parameter being estimated.
3) The estimator should be relatively efficient. In other words, of all possible statistics that could be
used to estimate a particular parameter, we want to choose the statistic whose sampling distribution has
the smallest variance.
It turns out that, for a population mean, , the sample mean, X , is an estimator which is unbiased,
consistent, and relatively efficient.
In any given situation, we have no way of knowing precisely how close the estimated value of a
parameter is to the true value of the parameter. However, if the estimator satisfies the three properties
listed above, we can be highly confident that the estimated parameter value is unlikely to differ from
the true parameter value by much.
On the other hand, it is nearly certain that the estimated value will not be exactly equal to the true
parameter value. We want to have some idea of the precision of the estimate. Hence, rather than
simply calculating a point estimate of the parameter value, we find a confidence interval estimate.
Defn: A confidence interval estimate of a parameter consists of an interval of numbers obtained from
a point estimate of the parameter, together with a percentage that specifies how confident we are that
the true parameter value lies in the interval. This percentage is called the confidence level, or
confidence coefficient.
Thus, there are three quantities that must be specified: 1) the point estimate of the parameter, 2) the
width of the interval (which is usually centered at the value of the point estimate), and 3) the
confidence level.
Estimation of Population Means, When  Is Unknown
Now the Central Limit Theorem says that for large samples (n  30), the following random variable
X 
has an approximate standard normal distribution: Z 
. This formula involves two unknown
  


 n
parameters, the population mean and the population standard deviation. We want to be able to
estimate the unknown population mean, but we need to do something about the unknown standard
deviation. We will use the sample standard deviation, S, as an estimate of the population standard
deviation. Hence, we are trying to estimate the population standard deviation using the random
X 
variable
. However, we cannot say that this random variable has an approximate standard
 S 


 n
normal distribution. It has a different distribution, called a t-distribution.
Instead of using the standard normal distribution for constructing the confidence interval, we use a
related distribution, called the t distribution.
Characteristics of the t Distribution:
1) It is bell-shaped.
2) The distribution is unimodal (one peak) and symmetric, and the mean, median and mode are all
equal to 0.
3) The curve is continuous; i.e., there are no gaps.
4) The total area under the curve is 1.
5) The curve extends indefinitely in both directions, approaching, but never touching, the horizontal
axis.
6) The variance is greater than 1.
7) There is actually a family of t-distributions, each one characterized by a parameter called the
degrees of freedom.
8) The t-curve is wider than the standard normal curve, but for larger sample sizes, the t-curve is
closer to the standard normal curve.
(See the graph on page 425 of your textbook for examples of t-curves, compared to the standard
normal distribution.)
Note: The symbol d.f. will be used for degrees of freedom. The concept denotes the number of values
in the data set which are free to vary. When we first select a random sample of size n, we have n
degrees of freedom. After we compute the sample mean, we have used up one degree of freedom, and
now d.f. = n – 1. If we compute a second statistic from the data, then we use up another degree of
freedom, and now d.f. = n – 2.
Formula for a Confidence Interval for :
(The development of the formula is presented below, for anyone who wants to go through it and
understand how confidence intervals are constructed. You will be responsible only for understanding
what a confidence interval is, and being able to find and interpret confidence intervals.)
If we are sampling from a population which has a normal distribution, or if our sample size is large
enough (n  30), then we may find the confidence interval as follows. We know that, in either of the
X 
situations (normality or large sample size), the random variable
has either a t-distribution (in
 S 


 n
the first case) or an approximate t-distribution (in the second case).
Hence, for a given percentage , we may make the following probability statement:




X 

P  t  
 t    1   .
,n 1
,n 1
S



 2 
 2
 n


We can rearrange the quantities in parentheses to obtain an equivalent probability statement by
multiplying throughout the inequality by the quantity S . We obtain
n

S
S 
 1 .
P  t
 X    t

, n 1
, n 1
n
n
2
 2

We want to isolate the parameter, , in the middle of the inequality, so we subtract X throughout,
obtaining

S
S 
 1 .
P  X  t
     X  t

, n 1
, n 1
n
n
2
2


We are almost done. We do not want - in the center, but rather , so we multiply throughout by –1,
obtaining

S
S 
  1 .
P X  t 
   X  t
, n 1
,
n

1
n
n 
2
2

Then the endpoints of our confidence interval are X  t 
2
,n 1
S
S
and X  t
. The confidence
, n 1
n
n
2
level is 1 - .

Thus, a (1 – )100% confidence interval for the mean is given by:  X  t 

2
S
, n 1
n
, X  t
2
, n 1
S 
.
n 
Here S is the sample standard deviation, and t  , n 1 is the called a critical value for the t distribution
2
with d.f. = n – 1. We can find t critical values using Table F in Appendix C if we know the degrees of
freedom of the t-distribution and the confidence level, 1- . For this purpose we would use the two-tail
values for . For example, if our confidence level is (1 - )100% = 95%, then  = 0.05. If our sample
size is 25 then d.f. = 24, and the t-critical value is 2.064.
Intervals constructed in the above manner are called confidence intervals for the corresponding
parameter (in this case, a population mean, ). We are usually interested in one of several specific
values of the confidence level, either 90%, or 95% or 99%.
We interpret a confidence interval as follows. A confidence interval with confidence level (1 )100% is an interval obtained from sample data by a method such that, (1 - )100% of all intervals
obtained by this method would, in fact, contain the true value of the parameter, and 100% of all
intervals so obtained would not contain the true value of the parameter.
Finding a Confidence Interval for  Using the TI-83 Calculator:
Given a data set of n sample values X1, X2, … , Xn, where these represent a random sample from a
population with unknown mean , we can find a (1 – )100% confidence interval for  as follows:
1) Enter the data in the calculator, using STAT and 1:Edit.
2) Choose STAT, then TESTS, then 8:T Interval.
3) Choose Data by hitting ENTER; or if the sample mean and sample standard deviation are given to
you, choose Stats.
4) If we don’t know the sample mean and sample standard deviation, then we would choose Data,
enter the variable name for List:, and enter the appropriate confidence level.
If we know the sample mean and the sample standard deviation, then we would choose Stats, rather
than Data, and enter the value of the sample mean, the value of the sample standard deviation, the
sample size, and the appropriate confidence level.
5) Hit ENTER.
Example: p. 432, Exercise 17.
Example: p. 432, Exercise 21.
Estimation of Population Proportions
Sometimes we want to use sample data to estimate the proportion, p, of a population that have a certain
characteristic. The point estimate of p would be the proportion,
the characteristic of interest.
p̂ , of the sample members that have
In this case, we are talking about using a binomial experiment to do the estimation. We would select a
random sample of n members of the population; the randomness insures that the n trials are
independent of each other. We are seeking the same information about each sample member, whether
the member has the characteristic of interest. Thus the trials are identical. Each trial results in one of
two possible outcomes; either the member has the characteristic, or the member does not have the
characteristic. Assuming that the sample is random, the probability that a member has the
characteristic is p, the proportion of the population who have the characteristic.
Hence the three conditions of a binomial experiment are satisfied. If we define X = number of
members of the sample who have the characteristic, then X has a binomial distribution with parameters
n and p.
The proportion of the sample who have the characteristic of interest is then pˆ 
X
. If the sample
n
size is large enough, then this random variable has an approximate normal distribution, and the random
pˆ  p
variable
has an approximate standard normal distribution.
pˆ (1  pˆ )
n
We can construct a (1 – )100% confidence interval for p using this fact. The form of the confidence

pˆ 1  pˆ 
pˆ 1  pˆ  
 . To find z  , we look in the table of the standard
, pˆ  z 
interval is  pˆ  z 

n
n
2
2
2


normal distribution, Table E in Appendix C. If our confidence level is (1 - )100% = 95%, then  =
0.05, and we divide this by two and substract from 0.5, to get 0.4750. Looking in the table, we find
that the critical value corresponding to the probability 0.4750 is 1.96.
Finding a Confidence Interval for p Using the TI-83 Calculator:
Given the number n of trials in our binomial experiment (size of sample) and the number of successes
(number of members of the sample with the characteristic of interest), where the sample was taken
from a population in which the unknown proportion of members with the characteristic is p, we can
find a (1 – )100% confidence interval for p as follows:
1)
2)
3)
5)
Enter the data in the calculator, using STAT and 1:Edit.
Choose STAT, then TESTS, then A:1-PropZInt.
Enter the value of X, the sample size, and the appropriate confidence level.
Hit ENTER.
Example: p. 442, Exercise 15.
Example: p. 442, Exercise 19.
Download