Chapter 3 ESTIMATION: SINGLE POPULATION

advertisement
Chapter 3
ESTIMATION:
SINGLE POPULATION
Two Methods of Inference About a Parameter
(page 437)

Estimation – inference about a parameter is made
by finding a single value or a range of values
computed from the sample data that may be used to
make a statement about the unknown value of the
parameter.

Hypothesis testing – inference about a parameter
is made by assessing whether or not the sample data
support an assertion made about the true value of
the parameter.
Chapter 3. Estimation: Single Population
Two Major Types of Estimates
 Definition 13.1 (page 437)
A point estimator is a single statistic whose realized value is used to estimate the
true but unknown value of the population parameter. The realized value of an
estimator is called the point estimate.
 Definition 13.2 (page 439)
An interval estimator of the unknown value of the population parameter is a rule
that tells us how to calculate two numbers based on sample data that will form an
interval within which we expect the population parameter to lie with a specified
degree of confidence. The realized pair of numbers computed from this estimator, say ,
is called the interval estimate or confidence interval estimate.
Chapter 3. Estimation: Single Population
Example 13.1
(page 438)
Suppose we have a population with mean, µ. A random sample of n
observations, (X1, X2, …, Xn), was taken from this population. Then the sample
mean, X
1
n
n
X i , is a point estimator of the population mean, µ.
i 1
The particular numerical value computed from a given sample,
estimate.
x , is a point
The point estimator is a random variable while the point estimate is its
realized value. The estimate can only be computed once the sample data
has been collected. For example, (X1, X2, X3) is a random sample and the
sample data are as follows: X1=2, X2=3, X3=4 then a point estimate for µ is
X
2 3 4
3
3.
Chapter 3. Estimation: Single Population
Example: Confidence interval estimator for
(page 456)
Suppose a random sample of size n from a normal distribution with unknown
mean, , but known variance, 2, is taken. Under this case, a (1- )100% confidence
interval estimator for the population mean, , is given by:
X
z
/2
n
,X
z
/2
n
Note:
This confidence interval estimator has a lower confidence limit and an upper
confidence limit. Both are random variables. They will only have realized
values once the sample data have been obtained. For example, after collecting
the data and computing for the values of the lower and upper confidence
limits where =0.1 as 10 and 20, respectively, then a 90% confidence interval
estimate for is (10,20). We then say that we are 90% confident that the
value of is in the interval (10,20).
Chapter 3. Estimation: Single Population
Common Point Estimators for Parameters under Random Sampling from an Infinite Population
(Table 13.3, page 444)
Parameter
Estimator
Mean: µ
2
Variance:
1 n
Xi
ni1
Sample Variance:
1 n
2
S2
Xi X
n 1i 1
Sample Standard Deviation:
Sample Mean: X
Standard Deviation:
2
S
Standard Error of the Sample
Mean SE ( X )
n
1
n 1i
n
Xi
X
2
1
Estimator of Standard Error of
the Sample Mean se( X )
Proportion: p
S
n
Sample Proportion: P̂
Estimator of Standard Error of
Standard Error of the Sample
the Sample Proportion
p
(
1
p
)
Pˆ (1 Pˆ )
Proportion SE ( Pˆ )
ˆ
se
(
P
)
n
n
Chapter 3. Estimation: Single Population
Note
(page 443)
A random sample of size n from an infinite population includes samples
selected using simple random sampling with replacement (SRSWR) because as
explained in Chapter 11, SRSWR satisfies the definition of a random sample
from an infinite population as stated in Definition 11.2. We can also include
samples selected using SRSWOR so long as the sample size n is small relative
to the population size N, so that the finite population correction
N n
N 1 will
be approximately equal to 1, or equivalently, the sampling fraction n/N is
very close to 0.
Chapter 3. Estimation: Single Population
Examples
Example 13.5 (page 445), Examples 13.6-13.8 (pages 447-449)
Suppose a random sample of men were selected and data on their waistline are as presented
in Exercise 1 (page 235)
30
32
a)
b)
c)
35
34
40
34
28
35
34
36
33
30
31
31
32
32
29
32
37
31
Estimate the mean waistline,
n
30 35 40 28 34 33 31 32 29 37
Xi
32 34 34 35 36 30 31 32 32 31
X i1
32.8
n
20
Estimate p= proportion of men with waistline greater than 32
9
Pˆ
0.45.
20
Estimate how varied the values of X from one sample to the other.
n
se( X )
d)
S
n
(Xi
X )2 / (n 1)
i 1
n
2.894641
0.647
20
Estimate how varied the values of P̂ from one sample to the other.
se( Pˆ )
Pˆ (1 Pˆ )
n
(0.45)(0.55)
20
0.1112
Chapter 3. Estimation: Single Population
Assignment 7
Assume that the dataset presented in Table 13.4 (page 449) on yield of mango is our
population data. Define =population mean yield of mango and P=population
proportion of countries with yield below 100,000. Note that =89,627.63 and
p=14/19.
Select a sample of size n=10 using SRSWR using the seed number assigned to your
group. Identify the 10 elements in your sample. (Note: Since this is SRSWR, the 10
countries in your sample need not be distinct.)
2. Using the sample data, estimate the following: (Show your solution.)
1.
a.
b. how varied the values of the sample mean are from one sample to the other
c. p
d. how varied the values of the sample proportion are from one sample to the other
Chapter 3. Estimation: Single Population
Some Desirable Properties
of a Point Estimator
 Unbiasedness (page 449)
 Reliability (page 452)
 Efficiency (page 453)
Chapter 3. Estimation: Single Population
Definition of Unbiasedness
(page 449)
Definition 13.3
An estimator is said to be unbiased for the parameter
being estimated if the average of the estimates it
produces under repeated sampling from the same
population is equal to the true value of the parameter
being estimated. In other words, the expected value of
the estimator is equal to the parameter it is estimating.
Chapter 3. Estimation: Single Population
Examples
Example 13.9 (pages 449-450)
Example 13.10 (pages 451 -452).
Population data={21, 23, 30, 34}
Select a sample of size n=2 from this population using SRSWR,
Show that S2 is unbiased for 2 but S is biased for .
TABLE 13.6a. List of All Possible Samples of Size 2
and Computed Values of S2 and S
Sample Data
Sample
S2
S
(x1, x2)
1
21
21
0
0
2
23
23
0
0
3
30
30
0
0
4
34
34
0
0
5
21
23
2
2
6
23
21
2
2
7
30
34
8
8
8
34
30
8
8
9
23
30
24.5 24.5
10
30
23
24.5 24.5
11
21
30
40.5 40.5
12
30
21
40.5 40.5
13
23
34
60.5 60.5
14
34
23
60.5 60.5
15
21
34
84.5 84.5
16
34
21
84.5 84.5
mean of S 2
(0)(4) (2)(2) (8)(2) (24.5)(2)
(40.5)(2) (60.5)(2) (84.5)(2)
16
27.5
( 0)(4) ( 2)(2) ( 8)(2) ( 24.5)(2)
( 40.5)(2) ( 60.5)(2) ( 84.5)(2)
16
mean of S
From the population data, we have
4
(Xi
2
i 1
4
and
Chapter 3. Estimation: Single Population
27) 2
27.5
27.5 5.244.
4.066
Remarks on Unbiasedness
(pages 451-453)
In general, the sample mean, X , is an unbiased estimator of µ under random sampling from an
infinite population which includes simple random sampling with replacement. It is also unbiased
under simple random sampling without replacement.
 Likewise, the sample proportion is an unbiased estimator of the population proportion under
random sampling from an infinite population which includes simple random sampling with
replacement. It is also unbiased under simple random sampling without replacement.
 Under simple random sampling with replacement, the sample variance, S 2, is an unbiased
estimator for the population variance, 2. On the other hand, the sample standard deviation, S, is
a biased estimator of the population standard deviation, , with the bias becoming smaller or
insignificant for large sample sizes.


An unbiased estimator is not necessarily the “best” estimator of a parameter.

A parameter may have more than one unbiased estimator.
Chapter 3. Estimation: Single Population
Measure of Reliability
 In statistical parlance, a statistic whose value does not
vary much from one sample to another is a reliable
estimator. (page 72)
 Our measure of reliability is the standard error of the
statistic. (page 452)
Chapter 3. Estimation: Single Population
Example: Three Different Estimators that Behave Differently in Terms of Bias and Reliability
(Figure 13.1, page 452)
True Value of Parameter: 500
0
250
500
750
1000
Estimator A: unbiased and reliable
Estimator C: reliable but biased
0
250
500
750
1000
Estimator B: unbiased but not reliable
0
250
500
750
1000
Estimator C: reliable but biased
Chapter 3. Estimation: Single Population
Most Efficient Estimator
(page 453)
Definition 13.4
An unbiased estimator of a parameter with the smallest variance among all
the other unbiased estimators is called the most efficient estimator.
 When sampling from the normal distribution, both the sample mean and the
sample median are unbiased estimators of . However, the sample mean has a
smaller standard error than the sample median. In fact, the sample mean is
the most efficient estimator when sampling from the normal distribution.
Chapter 3. Estimation: Single Population
Confidence Interval Estimation
(page 455)

Recall: An interval estimator of the unknown value of the population parameter is a rule that tells us how to calculate two
numbers based on sample data that will form an interval within which we expect the population parameter to lie with a specified
degree of confidence.
The (1- )100 % Confidence Interval Estimate
Definition 13.5.
The fraction (1- ) in a (1- )100 % confidence interval estimate is called the confidence coefficient, and the endpoints are
called the lower and upper confidence limits. The length of the interval is defined as the difference between the upper and
lower confidence limits.

Note:
The researcher chooses the value of . Naturally, we choose a value for that is close to 0 so that we can be more
confident about our inference. Common choices for are 0.10, 0.05, and 0.01. Smaller means higher the confidence
coefficient.
Example:
= 0.1 so that the confidence coefficient is (1- )=1 – 0.1 = 0.9.
If the 90% confidence interval estimate for is (2.2, 5.5) then the lower confidence limit is 2.2, the upper confidence limit is
5.5, and the length of the interval is: 5.5 – 2.2 = 3.3.
We say that we are 90% confident that will lie between 2.2 and 5.5. This statement does not mean P(2.2 < < 5.5)=0.9. The
probability that is in between 2.2 and 5.5 is either 0 or 1, depending on the true value of . We cannot determine the
probability of this event unless we know the value of .
Chapter 3. Estimation: Single Population
Interpretation of (1- )100% Confidence Interval for
(page 455)
Let be the parameter of interest and (T1, T2) is a (1- )100% confidence interval estimator for . The
confidence coefficient (1- ) satisfies the condition that P(T1 < < T2) = 1- .
This probability statement can be interpreted as follows:
“(1- ) is the probability of selecting a random sample whose computed interval estimate using the
estimator (T1, T2) contains the value of the population parameter, .”
If the sampling scheme used assigns the same chances of selection to all possible samples of size n then this
probability statement can also be interpreted using the classical or a priori definition as follows:
“If we consider all possible samples of the same size and a (1- )100% confidence interval estimate is computed
from each sample using the interval estimator (T1, T2), then (1- )100% of these intervals would include the true
population parameter, , somewhere within their interval, while 100 % of them would not.”
This probability statement can be interpreted using the relative frequency or a posteriori definition of
probability as follows:
“If we repeatedly take samples of the same size from the same population under the same conditions,
and a (1- )100% confidence interval estimate is computed using the estimator (T1, T2) each time we
take a sample, then (1- )100% of all the generated intervals would include the true population
parameter, , somewhere within their interval, while the remaining 100 % of them would not.”
Chapter 3. Estimation: Single Population
Example Repeated Sampling Using Excel
 Generate 500 samples of size n=15 from a normal distribution with mean
=20 and
standard deviation =1 using the Random Number Generation of the Data Analysis
ToolPak of Excel.
 Number of Variables=15
 Number of Random Numbers=500
 Distribution: Normal
 Mean=20, Standard deviation = 1
 Compute for 95% confidence interval estimate for for each sample using the formula
1
1
X 1.96
, X 1.96
15
15
 Determine if interval estimate contains the true value of =20 using the IF function,
=if(or(lower limit>20,upper limit<20,0,1).
 Count how many among the 500 intervals contain the true value of =20 using the
SUM function.
Chapter 3. Estimation: Single Population
Assignment 8
2.
Generate 500 samples of size n=15 from a normal distribution with mean =10 and
standard deviation =2 using the seed number assigned to your group.
Compute for the 95% confidence interval estimate for for each sample using the
formula:
2
2
X 1.96
, X 1.96
15
15
3.
For your assignment, present only the first 5 confidence interval estimates.
Among the 500 computed interval estimates, what percentage contain the true
value of =10?
1.
Chapter 3. Estimation: Single Population
(1- ) 100% Confidence Interval Estimators for
of a Normal Distribution
(Table 13.7, page 458)
Assume that we have a random sample (X1, X2, …, Xn) of size n taken from a normal population with mean, , and variance, 2.
This means that all the Xis are independent random variables and are all normally distributed with the same mean and variance.
Cases
Case 1:
Case 2:
2
Confidence Interval Estimators
is known
is unknown
(even if the sample size n ≤30)
Case 3:
X
z
/2
is unknown
(and the sample size n >30)
/2
(v
/2
S
,X
n
t
/2
z
S
,X
n
z
/2
2
X
z
n 1)
2
X t
n
,X
/2
n
(v
n 1)
S
n
S
n
The first two formulas were derived using the sampling distributions discussed in Chapter 11. (See derivation of first
formula in pages 479-480.)
Formula 3 is based on the fact that as the degrees of freedom approaches infinity, the t-distribution will approach the
standard normal distribution. This formula is useful only when the values of t are not available for large n. In fact, most
statistical software will use formula 2 whenever 2 is unknown since these software are capable of determining the value
of t for any degrees of freedom.
Chapter 3. Estimation: Single Population
Remarks on the Assumption of Normality
(page 483)

We actually do not require that the Xis in the random sample come from an exactly normal
distribution. These formulas will still provide good approximate (1- )100% confidence intervals
for even if there are slight deviations from normality. Studies show that light-tailedness or
heavy-tailedness of the parent population will have little effect on the sampling distribution of the
Z and T statistics. We only have to be careful in using these formulas when we suspect that the
parent population is badly skewed and the sample size is small. In this case, the actual confidence
coefficient may be lower than what we have set and, as a result, we have a false sense of
confidence on our inference.

All 3 formulas will still provide us with good approximate (1- )100% confidence intervals for
even when the parent population is not normal (including those that are badly skewed and those
that are discrete), provided that the sample size is large, that is, n > 30. This result is attributed to
the Central Limit Theorem.
Chapter 3. Estimation: Single Population
Examples
z
.10
1.282
.05
1.645
.025
1.960
.01
2.326
.005
2.576
.001
3.090
.0005 .00005
3.291
3.891
Example 13.11 to 13.15 (pages 459 – 462)
Exercise 2a (page 464). Laboratory tests of bacterial counts are often used for declaring a water source
“polluted”. Suppose that the distribution of bacterial counts in a sample taken from a certain lake is
normally distributed with a variance of 9,000,000. Suppose 25 water samples were taken over the
course of July 2004 and yielded a mean count of 12,000. Construct an 80% confidence interval
estimate of the unknown mean bacterial count in this lake at this time.
Parameter of interest:
= mean bacterial count
Problem: Find an 80% confidence interval estimate for
Given: (X1, X2, …, X25) is a random sample from a normal distribution with known
2
=9,000,000.
Its sample mean, X , is equal to 12,000.
We’ll use the first formula,
X
z
/2
n
,X
z
/2
n
coefficient is 80% then 1 - =0.8 so that =0.2 and
Lower confidence limit: X
z
/2
Upper confidence limit: X
z
/2
n
n
because
2
is known. Since the desired confidence
/2=0.1. Using Table B.1 page 604, , z0.1=1.282.
12, 000
1.282
9, 000, 000
25
11, 230.8
12, 000
1.282
9, 000, 000
25
12, 769.2
Chapter 3. Estimation: Single Population
Example (cont’d)
z
.10
1.282
.05
1.645
.025
1.960
.01
2.326
.005
2.576
.001
3.090
.0005
3.291
.00005
3.891
Our 80% confidence interval estimate is (11230.8, 12769.2).
The length of this interval is 12769.2-11230.8=1538.4.
Using the same data, let us compute for a 90% confidence interval estimate for . This time 1 that =0.1 and /2=0.05. Using Table B.1, page 604, z0.05=1.645.
Lower confidence limit: X
z
/2
Upper confidence limit: X
z
/2
n
n
12, 000
1.645
9, 000, 000
25
11, 013
12, 000
1.282
9, 000, 000
25
12,987
Our 90% confidence interval estimate is (11013, 12987).
The length of this interval is 12987-11013=1974.
We are more confident of our inference but at the expense of a longer interval.
Chapter 3. Estimation: Single Population
= 0.9 so
General Remarks on the Length of the Interval Estimate
(page 463)
 We assess the “goodness” of our interval estimate by checking the confidence
coefficient, together with the length of the interval. A good confidence interval
estimate is one that is as narrow as possible and has a large confidence coefficient. The
narrower the interval we have created, the more exactly we have located the
parameter. Whereas, the larger the confidence coefficient is, possibly near 1, the more
confidence we have that a particular interval encloses the true value of the parameter.
 For a fixed sample size n, as the confidence coefficient increases, the length of the
interval also increases. Thus, for a fixed sample size, the trade off of having a high
confidence on our interval estimate is a longer or wider interval. Similarly, for a fixed
sample size, the trade off of having a narrow interval is having a lower confidence on
our interval estimate.
Chapter 3. Estimation: Single Population
Length of the Interval Estimate for
(page 463)
2z
Length of interval estimate
,
/2
n
2t
/2
when
(v
n 1)s
n
, when
is known
is unknown
For a fixed n, as the confidence coefficient (1- ) increases, the length of the interval increases.
The reason for the increase in the length of the interval estimate for µ when the confidence
coefficient, 1- , increases is the corresponding increase in the values of z /2 and t /2(v=n-1) as
decreases.
 For a fixed confidence coefficient (1- ), as n increases, the length of the interval decreases. In
fact, as n goes to infinity, the length of the interval goes to 0. This is because the values of both
the standard error and its estimator, / n and S / n,respectively, approach 0 as n goes to infinity.
 If we are not satisfied with the resultant length of the interval at the desired confidence
coefficient, then we can improve on our estimates in the future by using a larger sample size.
Increasing the sample size will reduce the standard error, and consequently, reduce the length of
the interval.
 When the elements are homogeneous with respect to the characteristic under study, that is is
small, then we do not need a very large sample size in order to attain a short interval.

Chapter 3. Estimation: Single Population
Exercise 1
(page 463)
TRUE or FALSE.
a.) For a given sample variance and sample mean , a 90% confidence interval for an unknown mean is
narrower than a 99% confidence interval.
b.) Consider the construction of a 95% confidence interval. Suppose one repeats the same sampling
process indefinitely. Suppose further that, for each sample drawn, a new 95% confidence interval
calculation is performed. If for each sample, the investigator claims that the parameter is
contained in the interval, about 95% of his statements will be correct.
Chapter 3. Estimation: Single Population
More examples
Let us use the following data in Exercise 2, page 204 on the price of brown sugar per kilo based on a sample of 8
grocery stores:
20.50
21.25
19.95
22.50
20.00
22.75
23.50
21.75
Assuming that the price of brown sugar is normally distributed and that we have a random sample, compute for a
95% confidence interval estimate for =mean price of brown sugar.
Because
2
is unknown, we will use formula 2,
X t
/2
(v
n 1)
S
,X
n
t
/2
(v
n 1)
S
n
Using the sample data, we get X 21.525 and S=1.32745729.
Since the confidence coefficient is 95% then 1- =0.95 so that =0.05 and /2=0.025. The degrees of freedom is
v=n-1=7.
Referring to Table B.2, page 605, t0.025(v=7)= 2.365.
Lower confidence limit:
X t
/2
(v
n 1)
S
n
21.525
2.365
1.32745729
8
20.415.
Upper confidence limit:
X
/2
(v
n 1)
S
n
21.525
2.365
1.32745729
8
22.635.
We are 95% confident that
t
lies in the interval (20.415, 22.635).
Chapter 3. Estimation: Single Population
Assignment 9
Compute for the requested confidence interval estimates. Always present the formula used
to compute for the interval estimates with the appropriate values plugged-in. No
immediate rounding. Round-off final answer only to 2 decimal places.
1. A coin-operated soda machine was designed to discharge, on the average, 7 ounces of
beverage per cup. To test the machine, a random sample of 15 cupfuls of soda was
drawn from the machine and measured. The results were as follows:
6.95
7.00
6.99
6.92
6.88
6.98
7.02
7.00
6.99
7.10
7.01
6.96
7.04
7.00
7.05
Assuming that the amount of beverage dispensed is normally distributed, compute for a
99% confidence interval estimate for =mean amount of soda dispensed (in ounces).
2.
A random sample of 500 elementary school children was selected. Each student in the
sample was given a reading comprehension test. The sample mean and sample variance
were computed to be 64 and 52, respectively. Compute for a 90% confidence interval
estimate for =mean score in the reading comprehension test of all elementary school
children in the population.
Chapter 3. Estimation: Single Population
Confidence Interval for the Proportion
(page 465)
If the population proportion is not expected to be too close to 0 or 1 and the sample size n is
large, then an approximate 1
100% confidence interval estimator for the population
proportion, p, is given by:
Pˆ z
where z 2 is the 100 1
th
2
/2
Pˆ (1 Pˆ ) ˆ
,P z
n
/2
Pˆ (1 Pˆ )
n
percentile of the standard normal distribution.
Notes:
(i) The sample size n must be large because this formula is based on the Central Limit Theorem,
where the population proportion is viewed as the mean of the Xis where
Xi
1 if ith element possesses characteristic of interest
0 if ith element does not possess characteristic of interest
Clearly, Xi is discrete and cannot come from a normal distribution.
(ii) As the sample size n goes to infinity, the length of the interval goes to 0.
Chapter 3. Estimation: Single Population
Examples
Example 13.18 (page 468)
Exercise 1 (page 468) According to a 1984 American study, about one in three individuals feels shopping is an
unpleasant experience (Journal of Marketing Research, February/March 1984). Suppose we take a national sample of
4,100 Filipino male and female adults, and we determine each respondent’s opinion on the pleasantness of shopping.
The survey produced the following results:
Sample Size
Number who think shopping
is an unpleasant experience
Males
Females
2,015
850
2,085
570
a) Compute a 95% confidence interval for the proportion of males in the sample who think shopping is an unpleasant
experience.
Pˆ z
/2
Pˆ (1 Pˆ ) ˆ
,P z
n
/2
Pˆ (1 Pˆ )
n
850
1165
Pˆ (1 Pˆ )
2015
2015 .011 =estimate for the standard error
n
2015
Confidence coefficient=95% means that (1- )=0.95 so that =0.05 and /2=0.025 and z.025=1.96.
Pˆ
850
=point estimate for p
2015
Lower confidence limit: Pˆ
z
/2
Upper confidence limit: Pˆ
z
/2
Pˆ (1 Pˆ )
n
850
(1.96)(.011) .400
2015
Pˆ (1 Pˆ )
n
850
(1.96)(.011) .443
2015
Chapter 3. Estimation: Single Population
Margin of Error
(page 458)
Definition 13.6
The margin of error, denoted by e, is the upper bound on the absolute difference between the estimator
and the parameter called the error of estimation, though there is an associated risk of selecting a
sample that yields an estimate whose error of estimation is greater than this upper bound, e.
By Definition 13.16, when we use X to estimate , the margin of error, e, satisfies the condition:
P(| X
or ,equivalently, P(| X
| e)
| e) 1
margin of error, e, satisfies the condition: P(| Pˆ
P( X
z
P( Pˆ z
/2
/2
n
p(1 p)
n
X
z
p
/2
n
Pˆ z
) 1
/2
. Likewise, when we use P̂ to estimate p, the
p | e)
means that e
p(1 p)
) 1
n
z
/2
n
means that e
z
/2
p(1 p)
n
Chapter 3. Estimation: Single Population
Remarks About the Margin of Error
 We can use the margin of error to describe the “goodness” of the point estimate. The
smaller the margin of error, the better.
 The margin of error usually involves unknown parameters. In this case, we use sample
data to estimate these unknown parameters.
 In estimating the mean or proportion, the margin of error is the term that we
add/subtract to the point estimate to compute the (1- )100% confidence interval
estimate. It is a function of 2 factors: a) standard error and b) the risk .
 Example: The researchers’ estimate for the percentage of voters who will elect Person
A is 60% with a margin of error of 3 percentage points. (In most studies, when the
risk is not reported then it is understood that =0.05). This means that the chance
is as small as 0.05 that they have selected a sample where the estimated percentage is
beyond 3 percentage points away from the true value. That is,
P(| Pˆ p | 0.03)
0.05 or P(| Pˆ p | 0.03) P(Pˆ 0.03 p Pˆ 0.03)
Again, this is NOT the same as saying P(0.57 ≤ p ≤ 0.63) = 0.95.
Chapter 3. Estimation: Single Population
0.95
Confidence Interval Estimate for
2
(pages 470 and 472)
A (1- )100 % confidence interval estimator for the population variance
(n 1) S 2
,
2
v n 1
2
2
2
2
An approximate 100 1
2
percentiles of the chi-square
is given by:
(n 1) S 2
2
v n 1
1
2
v n 1 and
v n 1 are the 100 1 2
where
distribution with v=n-1 degrees of freedom, respectively.
2
th
2
% confidence interval estimator for the population variance
(n 1) S 2
,
2
v n 1
2
is given by:
(n 1) S 2
2
v n 1
1
v n 1 and 12 v n 1 are the 100 1 2 th and 100
where
distribution with v = n-1 degrees of freedom, respectively.
2
2
2
1
2
th
and 100
th
2
percentiles of the chi-square
Notes:
(i)
(ii)
The chi-square based inference about 2 depends on the assumption that the random sample is selected from a population that
has a normal distribution. If the distribution of the population is distinctly nonnormal, then the formula for the confidence
interval estimate of 2 (and of ) is not appropriate even if the sample size is large. Nonnormality, in the form of skewness or
heavy tails, can have serious effects on the level of confidence in estimating for 2 .
As the sample size n approaches infinity, the length of the interval goes to 0.
Chapter 3. Estimation: Single Population
Examples
Examples 13.19 to 13.21. (pages 470-472)
Exercise 2 (page 472). A mortgage is a type of loan that is secured by a designated piece of
property. If the borrower defaults on the loan, the lender can sell the property to recover the
outstanding debt. A federal bank examiner is interested in estimating the mean and standard
deviation of outstanding principal balance of all home mortgages foreclosed by the bank due to
default by the borrower during the last 3 years. A random sample of 12 foreclosed mortgages
yielded the following data (in dollars):
95,982
59,200
81,422
62,331
39,888
105,812
46,836
55,545
66,899
56,635
69,110
72,123
Find a 90% confidence interval for the standard deviation of interest. Are there any distributional
assumptions that we have to make to compute for the confidence interval estimate for the standard
2
2
deviation? .05 (v 11) 19.675, 0.95 (v 11) 4.575
(n 1) S 2
,
2
n 1)
/2 (v
2
1
(n 1) S 2
n 1)
/2 (v
(14,334.75, 29,727.06)
Note: Simultaneous confidence regions for and 2 using the same sample data is not as simple as
computing for the separate CI estimates using the formulas we have presented because the resulting
confidence coefficient is not (1- ) or (1- )2 since the two events are not independent.
Chapter 3. Estimation: Single Population
Assignment 10
1.
(MGB, page 400) Suppose that 200 heads and 300 tails resulted
from 500 tosses of a coin. Compute for an approximate 99%
confidence interval estimate for the probability of a head. Based
on your computed estimate, does it appear that the coin is not
fair? (Notes: (i) the probability of observing a head is the same as
the true proportion of tosses where a head comes up; and (ii) for
a fair coin, the probability of observing a head is 1/2. )
2.
Carlton Sign Company wanted to know the variance of the life of
the light fixtures it uses in its signs. It selected a random sample
of 25 signs and learned that the fixtures in the sample lasted an
average of 9,500 hours with a standard deviation of 81 hours.
Assuming that the distribution of the life of a light fixture is
normally distributed, compute for a 90% confidence interval
estimate for the population variance.
Chapter 3. Estimation: Single Population
Using PhStat
Choose Add-in then select PhStat.
For point and interval estimation: Choose Confidence Intervals then click appropriate action.
To estimate proportion:
 Encode data as follows:
1 – element possesses characteristic of interest
0 – element does not possess characteristic of interest
 Compute for Number of successes using =sum(cells containing dataset)
Examples:
1. Point estimate for mean and proportion
2. CI estimate for the mean with known variance
3. CI estimate for the mean with unknown variance
4. CI estimate for proportion
5. CI estimate for the variance
Chapter 3. Estimation: Single Population
Download