Chapter 4 ESTIMATION: TWO POPULATIONS

advertisement
Chapter 4
ESTIMATION:
TWO POPULATIONS
Parameters of Interest
(pages 482, 487, 489)
Objective: compare the means of two populations
1.



Population 1: mean Parameter of interest:
Interpretation:



2.
X X -
X
X
Population 2: mean -
Y=
0: means of the two populations are equal
>
0: mean of measurements in Population 1 is larger than mean of Population 2
Y
<
0: mean of measurements in Population 1 is smaller than mean of Population 2
Y
X
Objective: compare the proportions of two populations
 Population 1: proportion – p1
 Parameter of interest: p1 – p2
 Interpretation:



3.
Y
Y
Population 2: proportion – p2
p1 – p2 = 0: proportions of the two populations are equal
p1 – p2 > 0: proportion of elements possessing characteristic of interest is larger in Population 1 than in Population 2
p1 – p2 < 0: proportion of elements possessing characteristic of interest is smaller in Population 1 than in Population 2
Objective: compare the variances of two populations
 Population 1: variance - X2
 Parameter of interest: X2 / Y2
 Interpretation:



2
X
2
X
2
X
/
/
/
`Population
2: variance -
2
Y
1: variances of the two populations are equal
1: measures in Population 1 are more varied than measures in Population 2
Y
2 < 1: measures in Population 1 are less varied than measures in Population 2
Y
Y
2=
2>
Note: Variances are comparable when the means of the two populations are not too different from each other.
Chapter 4. Estimation: Two Populations
Two Approaches to Sampling
1. Select two independent samples
2. Select two related samples (matched samples)
Chapter 4. Estimation: Two Populations
Independent Sampling from Two Populations
(pages 482-483)
the selection of the random sample from one population will not affect the selection of the
random sample from the other population
Example: (Exercise 3d, page 485) The principal of a school wishes to determine if the Grade 6
boys are better in mathematics than the Grade 6 girls. A random sample of boys were selected.
Then a random sample of girls were selected. All of the students in the two random samples
were asked to take a standardized test in mathematics and their scores were determined.
Population 1
X,
Population 2
2
X
Y,
Sample 1 of size n1
(X1, X2,….,Xn1)
Y
2
Sample 2 of size n2
(Y1, Y2,….,Yn2)
X , S X2
Y , S Y2
Use these samples
to infer on X - Y
Chapter 4. Estimation: Two Populations
Matched Sampling/Paired Sampling
(pages 483-484)
 Recall:
An experiment is a data collection method where the researcher
intervenes by controlling the conditions that may affect the response
variable by: (i) using a randomization mechanism in assigning the
treatments and (ii) controlling the identified extraneous variables. By
doing so, the researcher can isolate the effects of the explanatory variable
on the response variable and clarify the direction and strength of their
relationship. In many experiments, the available experimental units may
considerably differ with respect to extraneous variables. Failure to
control these differences may mask the true difference between the
population means of the response variable for the two treatments or,
even worse, create an illusion of a difference between means when there
is actually none.
 In matched sampling, the elements of the samples drawn from the two
populations are carefully matched in pairs so that the two elements in
each pair are as similar as possible with respect to identified extraneous
variables. The observations in each pair being compared are therefore
related or associated by design.
Chapter 4. Estimation: Two Populations
Methods of Generating Paired Data
(page 484)
Paired data: {(X1,Y1), (X2,Y2), …, (Xn,Yn)}
Forming paired data will be beneficial when the two measures in the ith pair, Xi and Yi, exhibit strong
direct relationship so that when Xi is high then so is Yi as a result of sharing the same values on the
extraneous variable/s.
Method 1: all experimental units in the sample receive both treatments
Example: (Exercise 3c, page 485) Two formulations of a new whitening soap are to be compared as to
their whitening effect. A random sample of 40 potential users of the soap is selected. Each
person uses a randomization mechanism to determine which formulation is applied on the
left arm, so that the other formulation is applied on the right arm. After two weeks, they
measured the effect of each formulation.
Experimental unit: person
Response variable: degree of fairness of a person
2 Treatments: Formulation A and Formulation B of whitening soap
Xi = degree of fairness of arm of ith person where Formulation A was applied
Yi = degree of fairness of arm of ith person where Formulation B was applied
Parameter of interest: X - Y
Extraneous variables: original degree of fairness of person, biological characteristics that affect
person’s reaction to any treatment
Chapter 4. Estimation: Two Populations
Methods of Generating Paired Data
(page 484)
Method 2: taking measurements before and after the treatment is applied to the
experimental units (can be viewed as a special case of Method 1)
Example: (Exercise 3a-page 485) A police department wants to assess the
effects of an obvious radar trap on the speeds of cars. Ten cars are
randomly selected on a highway, and their speeds are measured just
before a radar trap comes into view and right after they pass the
obvious radar trap.
Experimental unit: car
Response variable: speed of car
Treatment: visible radar trap
Xi= speed of ith car before seeing radar trap
Yi = speed of ith car after seeing radar trap
Parameter of interest: X - Y
Extraneous variables: driver, type of car, age of car, etc.
Chapter 4. Estimation: Two Populations
Methods of Generating Paired Data
(page 484)
Method 3: use naturally occurring pairs such as twins, or husbands and wives, or siblings, etc.
Method 4: form pairs of experimental units that have the same values or levels of the extraneous variable
Example: (Exercise 3e-page 485) A science teacher has developed new teaching materials and wants
to evaluate the effectiveness of these materials in improving the students’
comprehension. Prior to sampling, the teacher formed pairs of students so that
students belonging in the same pair received about the same final grade in science the
previous term. The teacher then selected a sample of pairs of students. The teacher
randomly selects which student in each pair will be taught using the new materials so
that the other one will be taught using the old materials. At the end of the term, all the
students in the sample were given a standardized test..
Experimental unit: student
Response variable: score in standardized test
2 Treatments: old teaching materials, new teaching materials
Xi= score of ith student taught using the old teaching materials
Yi = score of ith student taught using the new teaching materials
Parameter of interest: X - Y
Extraneous variables: aptitude in science
Chapter 4. Estimation: Two Populations
Point Estimation
(pages486-490)
Parameter
Point Estimator
–
Y
X Y
p1 – p2
Pˆ1 Pˆ2
/
S2X / S2Y
x
2
X
2
Y
Chapter 4. Estimation: Two Populations
Confidence Interval Estimators for µX - µY Based on 2 Independent Samples
(page 495)
Cases
Confidence Interval Estimators
Case 1: X2 and
are known
2
Y
Case 2: X2 and
are unknown
but
2
Y
2
X
2
Y
(X Y ) z
(X Y ) t
/2
2
Y
n1
2
where S P2
/2
(v )
S X2
n1
,(X Y ) z
n2
1
n1
(v n1 n2 2) S p2
(X Y ) t
Case 3: X2 and
are unknown
2
2
but X
Y
2
X
/2
1
,(X Y ) t
n2
SY2
,(X Y ) t
n2
s X2
s
2
X
n1
2
n1
n1 1
2
2
X
2
Y
n1
n2
/2
(v n1 n2 2) S p2
(n1 1) S X2 (n2 1) SY2
n1 n2 2
2
Y
where v
/2
sY2
/2
(v )
S X2
n1
SY2
n2
2
n2
sY2
n2
2
n2
1
2
Case 4: X and Y
are unknown, but
(X Y ) z
/2
S X2
n1
SY2
,(X Y ) z
n2
/2
S X2
n1
n1 >30 and n2 >30
Chapter 4. Estimation: Two Populations
SY2
n2
1
n1
1
n2
Assumptions
(pages 491-495)






All formulas were derived under the assumption that the two independent random samples come from normal
distributions. These procedures are robust in the sense that these will still provide good approximate
(1- )100% confidence interval estimates even if there are slight deviations from the assumption of normality.
Because of the Central Limit Theorem, the assumption of normality can be dropped as long as both samples are
greater than 30.
Formula 2 was derived under the additional assumption that the two unknown variances are equal to each
other. However, the procedure is also robust in the sense that this will still provide good approximate
(1- )100% confidence interval estimates even if the 2 population variances are not equal to each other so long
as the sample sizes are equal to each other. This is one of the reasons why we consider using equal sample
sizes when we design our experiment.
Formula 3 adjusts the degrees of freedom (downwards). The result of this is to have a longer interval estimate.
Formula 3 also does not pool the information from the two samples to estimate the common variance since
the variances of the two populations are actually not equal. However, these two adjustments become negligible
when both sample sizes are large.
The degrees of freedom in Formula 3 is a computed value based on the sample sizes and sample variances so
that the resulting value will not always be an integer. Since our table presents the values for integral degrees of
freedom only, then we would have to round-off the computed value. We will take the more conservative
approach of always rounding-down instead of using the standard rules of rounding.
Formula 4 is relevant only when we cannot get the t-value from the t-table because the degrees of freedom is
very large. Again, we just replace t by z because as the degrees of freedom approaches infinity, the tdistribution approaches the standard normal distribution.
Chapter 4. Estimation: Two Populations
Flowchart
(page 496)
 We still need to satisfy
the assumption of
normality for the two
populations (or at least
approximately normal)
when at least one of the
sample sizes is less than
30.
Chapter 4. Estimation: Two Populations
Interpretation
 If the computed interval estimate contains 0 then we do
not have sufficient evidence to conclude that the two
means are different from each other.
 If the computed interval estimate does not contain 0 then
we can conclude with a (1- )100% degree of confidence
that the two means are different from each other.
 If the computed interval estimate contains positive values
only then we are highly confident that X is greater than
Y.
 If the computed interval estimate contains negative values
only then we are highly confident that X is less than Y.
Chapter 4. Estimation: Two Populations
Examples
Examples 14.7 and 14.8. (pages 496 – 498)
Exercise 1 for Section 14.4 (page 500). Suppose that company officials were concerned about the length of
time a particular drug retained its potency. A random sample of n1 = 20 bottles of the drug was drawn from
the production line and analyzed for potency. A second sample of n2 = 25 bottles was drawn and stored in
regulated environment for a period of one year. The readings obtained are shown below.
Sample 1: X =10.37, SX = 0.3234
Sample 2: Y =9.83, SY = 0.2406
Estimate the difference in mean potency for all bottles coming off the production line and the mean
potency for all bottles retained for a period of one year using a 95% confidence interval assuming (i) the
population variances are equal and (ii) the population variances are unequal.
X
S P2
Assuming normality and equal variances:
10.37 9.83 0.54
Y
2
X
2
Y
(n1 1) S
(n2 1) S
n1 n2 2
2
(20 1)(0.3234) (25 1)(0.2406)
20 25 2
Y)t
(X
t
/2
(v
/2 (v
n1
n1
n2
n2
2) S p2
2)
t.05/2 (v
2
0.078523
S p2
0.54  (2.017)(0.084066) (0.37, 0.71)
S2
( X Y ) t /2 (v) X
Assuming
normality
but
unequal
variances:
s
s
2
X
v
s X2
n1
n1
2
n1
1
2
Y
2
(0.3234) 2
n2
sY2
n2
20
2
2
(0.3234) 2
n2
25
20 1
0.54  (2.032)(0.08686)
2
(0.2406) 2
20
1
n1
2
(0.2406) 2
34.237
34
t0.025 (v
34)
1
n1
1
n1
1
n2
20 25 2)
1
n2
SY2
,(X Y ) t
n2
2.032
25
25 1
(0.36, 0.72)
Chapter 4. Estimation: Two Populations
where S P2
S X2
n1
t0.025 (v
0.078523
/2
(v )
SY2
n2
S X2
n1
(n1 1) S X2 (n2 1) SY2
n1 n2 2
1
20
43)
1
25
2.017
0.084066
SY2
n2
0.32342
20
0.24062
25
0.086861455
Preliminaries on Inference on µX - µY Based on 2 Related Samples
(page 498)
Sample Data={(X1,Y1), (X2,Y2), …, (Xn,Yn)}
Define: Di = Xi – Yi , i=1,2,…,n (Note: Dis are all random variables.)
Assumptions: (D1, D2,… Dn) is a random sample
Di ~ Normal( D, D2)
Following the same procedure to estimate the population mean based on a random sample
from a normal distribution:
n
1. the point estimator for the mean
2. the standard error of D is
D
D
is D
i 1
Di
n
sample mean of Di s
/ n
3. the estimator for the standard error is SD / n
n
where SD
i 1
(Di D)2
n 1
standard deviation of Dis
Chapter 4. Estimation: Two Populations
Remarks on
D and
2
D
We defined Di = Xi – Yi, i=1,2,….,n
We assumed (D1,D2,…,Dn) is a random sample from a normal
distribution with parameters D and D2.
Since Di = Xi – Yi then D = X - Y and D2 = X2 + Y2 – 2Cov(X,Y)
where X = common mean of the Xis
Y = common mean of the Yis
2
X = common variance of the Xis
2
Y = common variance of the Yis
Cov(X,Y) = common covariance of (Xi, Yi)s
The Cov(X,Y) is a measure of the linear relationship of X and Y. If X and
Y are not related then Cov(X,Y)=0. (The converse though is not always
true.) If the value of Y increases as X increases then Cov(X,Y)>0; but if
the value of Y decreases as X increases then Cov(X,Y)<0.
Chapter 4. Estimation: Two Populations
Confidence Interval Estimator for D= µX - µY Based on 2 Related Samples
(page 499)
A 1 100% confidence interval estimator for the mean of the
differences, D
X
Y , based on matched or paired samples is given by:
D t
/ 2 (v
SD
,D t
n
n 1)
where t v n 1 is the 100 1
v n 1 degrees of freedom.
2
th
2
/ 2 (v
n 1)
SD
n
percentile of the t – distribution with
Procedure:
Step 1:
Step 2:
Step 3:
Step 4:
Compute Di= Xi – Yi, i=1,2,…,n
Compute for the mean and standard deviation of the Dis.
Use t-table to determine t v n 1 where n=number of pairs.
Plug-in the computed values in Steps 2 and 3 in the formula.
2
Chapter 4. Estimation: Two Populations
Examples
Example 14.10 (pages499-500)
Exercise 18, page399 MGB. To test two promising new lines of hybrid corn under normal farming conditions, a
seed company selected eight farms at random in Iowa and planted both lines in experimental plots on each farm.
The yields (converted to bushels per acre) for the eight locations were:
Line A:
Line B:
86
80
87
79
56
58
93
91
84
77
93
82
75
74
79
66
Assuming that the two yields are jointly normally distributed, estimate the difference between the mean yields by a
95% confidence interval.
The parameter of interest is
line B of hybrid corn.
=
D
X
-
Y
where
=mean yield using line A of hybrid corn and
X
Y
=mean yield using
Step 1: Compute for Di = Xi – Yi, i=1,2,…,8
D1=86 – 80=6
D5=84 – 77=7
D2=87 – 79=8 D3=56 – 58=-2 D4=93 – 91=2
D6=93 – 82=11 D7=75 – 74=1 D8=79 – 66=13
Step 2: Compute for D and SD. (Use standard deviation mode of your calculator by entering the values of Di, i=1,2,…,8.
D =5.75
Step 3:
and
SD= 5.1199888
Use t-table to determine value of t.05/2(v=8-1). It is t0.025(v=7) = 2.365
Step 4: Plug-in computed values in the following formula
D t
/2
(v
n 1)
SD
,D t
n
/2
(v
n 1)
SD
n
95% confidence interval estimate for the mean difference is (1.47, 10.03)
Chapter 4. Estimation: Two Populations
Assignment 11
Always show your solution. Present the confidence interval estimator used and show the plugged=in values. No
immediate rounding. Whenever necessary, round-off final answer only to 3 decimal places.
1. A manufacturer of office machines is considering the production of a new word processor. The decision to start
large-scale production of the new machines will be based on the comparison of the mean operating speed using the
standard machines ( X) and the mean operating speed using the new machines ( Y). Since operators of the
machine have varying abilities, a random sample of 20 typists was selected and the speed of each typist in the
sample was observed once using the new word processor and once using the standard word processor. The
collected data on the speed (in minutes) are as follows:
Typist (i)
Standard Processor (Xi)
1
2
3
4
5
6
7
8
9
10
60.2 58.7 59.4 60.3 61.7 60.2 64.1 63.2 62.4 57.8
New Processor (Yi)
57.2 57.4 56.4 58.5 60.1 61.4 61.9 60.4 60.0 56.8
Typist (i)
Standard Processor (Xi)
11
12
13
14
15
16
17
18
19
20
55.4 61.2 64.7 64.1 62.9 65.8 69.3 56.4 58.5 63.7
New Processor (Yi)
50.2 58.4 63.5 60.5 62.2 66.3 68.5 56.6 58.3 60.2
Assuming normality, compute a 95% confidence interval estimate for
X-
Y
.
(cont’d)
Chapter 4. Estimation: Two Populations
Assignment 11 (cont’d)
2. A study is conducted between high school students and college students to compare their proficiency at writing
computer programs for microcomputers. For this study, the researchers wish to compare the mean time (in minutes)
of high school students to write an error-free program ( X) with the mean time (in minutes) of college students to
write an error-free program ( Y). Data taken from two independent samples were summarized as follows:
Statistics
Mean time
Standard deviation
Sample size
High school
College
70
10
10
84
12
10
Assuming normality of both populations, compute a 90% confidence interval estimate for
Chapter 4. Estimation: Two Populations
X-
Y.
Confidence Interval Estimator for p1 – p2 Based on 2 Independent Samples
(page 501)
An approximate 1
given by:
100% confidence interval estimator for p1 p2 when the sample sizes are large is
( Pˆ1 Pˆ2 ) z
where z 2 is the 1
2
2
Pˆ1 (1 Pˆ1 )
n1
Pˆ2 (1 Pˆ2 ) ˆ ˆ
, ( P1 P2 ) z
2
n2
Pˆ1 (1 Pˆ1 )
n1
Pˆ2 (1 Pˆ2 )
n2
100th percentile of the standard normal distribution.
Note:
This confidence interval estimator will provide a good approximate 1
100%
confidence interval estimate for p1 p2 when both sample sizes are large. Thus, we require that both sample
sizes are at least 30. Furthermore, we have the condition that both p1 and p2 are not expected to be too
close to 0 or 1.
Chapter 4. Estimation: Two Populations
Interpretation
Exercise 1 for Section 14.5 (page503)
Suppose that a 95% confidence interval estimate for the
difference is constructed.
a)
b)
c)
For what range of values is it not possible to conclude that the
population proportions are different from one another?
For what range of values can you conclude, with 95% confidence,
that the proportion in population 1 is statistically higher than the
proportion in population 2?
For what range of values can you conclude, with 95% confidence,
that the proportion in population 1 is statistically lower than the
proportion in population 2?
Chapter 4. Estimation: Two Populations
Examples
Example 14.11 and 14.12 (pages 501 to 503)
A company is considering the introduction of a new formulation of its Zippi Cola softdrink. It first
conducts a series of taste tests comparing Zippi to the leading brand of cola. In the first test based on
the original formula of Zippi, 120 of 500 people who tried it preferred Zippi. The test was repeated to
a new group of 1000 tasters to compare the new formulation of Zippie Cola to the leading brand. This
time, 300 of the 1000 tasters preferred the new Zippi to the leading brand. Compute for an
approximate 90% confidence interval estimate for the difference of population proportions who prefer
Zippi over the leading brand of cola.
Parameter of interest: p1 – p2
where p1=proportion who prefer the original formula of Zippi over the leading brand
p2=proportion who prefer the new formulation of Zippi over the leading brand
Point estimates : Pˆ1 120 / 500 0.24 and Pˆ2
Point estimate for p1 p2 : Pˆ1 Pˆ2 0.24 0.3
z0.1/2
z0.05
Interval estimator : ( Pˆ1 Pˆ2 ) z
2
1.645
interval estimate :
0.06 (1.645)
300 /1000
0.06
Pˆ1 (1 Pˆ1 )
n1
(0.24)(0.76)
500
0.3
Pˆ2 (1 Pˆ2 ) ˆ ˆ
Pˆ1 (1 Pˆ1 )
, ( P1 P2 ) z
2
n2
n1
Pˆ2 (1 Pˆ2 )
n2
(0.3)(0.7)
(0.24)(0.76)
, 0.06 (1.645)
1000
500
(0.3)(0.7)
1000
Chapter 4. Estimation: Two Populations
( 0.099, 0.021)
Download