252solngr3-072 10/23/07 Name: Class days and time:

advertisement
252solngr3-072
10/23/07 (Open this document in 'Page Layout' view!)
Name:
Class days and time:
Please include this on what you hand in!
Graded Assignment 3
In your outline there are 6 methods to compare means or medians, methods D1, D2, D3, D4, D5a and D5b. Method D6 compares
proportions and method D7 compares variances or standard deviations. In the following cases, identify H 0 and H 1 and identify
 and D  1   2 . If the
p and p  p1  p 2 . If the hypotheses involve standard deviations
which method to use. If the hypotheses involve a mean, state the hypotheses in terms of both
hypotheses involve a proportion, state them in terms of both
or variances, state them in terms of both
2
and
 12
 22
or
 22
 12
. All the questions involve means, medians, proportions or
variances. One of these problems is a chi-squared test.
Note: Look at 252thngs ( 252thngs) on the syllabus supplement part of the website before you start (and before you take exams) especially the new rules.
----------------------------------------------------------------------------------------------------------------------------Example: This may seem long but it appears on last year’s graded assignment 3.
A group of supervisors are given the exams on management skills before and after taking a course in
management. Scores are as follows.
Supervisor
Before
After
1
63
78
2
93
92
3
84
91
4
72
80
5
65
69
6
72
85
7
91
99
8
84
82
9
71
81
10
80
87
11
68
93
If we assume that the distribution of results is Normal, what method should we use to answer the question
“Has the course improved the scores of the managers?”
Solution: You are comparing means before and after the course. You can get away with using means
because the parent distributions are Normal. If  2 is the mean of the second sample, you are hoping that
 2  1 , which, because it contains no equality is an alternate hypothesis. So your hypotheses are
 H 0 : 1   2
 H 0 : 1   2  0
H 0 : D  0
or 
. If D  1   2 , then 
. The important thing to notice

H
:



H
:




0
2
2
 1 1
 1 1
H 1 : D  0
here is that the data are in before and after pairs, so you use Method D4.
-------------------------------------------------------------------------------------------------------------------General considerations.
1) All methods in section D are methods that can only be used for comparison of 2 samples. This is
because, if  (theta) is a parameter like  or p,   1   2 is easy to define and will be zero if  1 and
 1 are equal. If we go to more than two samples, say 3, we need something like
1   0 2   2   0 2   3   0 2
, where  0 is some sort of average of the parameters of the samples.
This will equal zero if all the parameters are equal and will not allow positive discrepancies in one sample
to cancel out negative discrepancies in another. This is what takes us to chi-squared and ANOVA methods.
252solngr3-072
10/23/07
Saying 1   0 2   2   0 2   3   0 2  0 is not the same as saying D 3  1   2   3  0 , because
D 3 would be negative if      , but saying    2     2  0 is the same as saying
1
2
3
1
0
2
0
D 2  1   2  0 . (Try proving this – it’s simple algebra.)
2) You can always substitute a method for the median for a method for the mean, but not vice versa.
However, if a Normal distribution applies, a method involving means will be more efficient and powerful.
3) The computer will used Method D3 when it is not told what method to use. This is quite general because
if the sample variances are similar, it gives results like D2 and if the sample sizes are large, it gives results
like D1. However, if variances are equal D2 is easier to use and if the samples are large D1 is easier to use.
4) The K-S and Lilliefors methods only exist because chi-square performs so poorly for small samples. K-S
needs  ,  or other parameters. Lilliefors uses x or s and only works to test for a Normal distribution.
5) ‘Significant’ in statistics means that we have rejected a hypothesis like H 0 :   0 and ‘significantly
different’ means that we have rejected a hypothesis like H 0 : 1   2 . Of course, if two parameters are
significantly different, their difference is significant.
6) Be careful of inequalities. If 1   2 or  2  1 and D  1   2 , then D  0. Please remember A hypothesis containing ,  or  is an alternative hypothesis.
7) In most problems you are better off trying to figure out what the alternative hypothesis is before
you try to state the null hypothesis.
8) Do not lose sight of the fact that the purpose of samples is to compare populations. We may look at
numbers in methods D6b and chi-squared tests, but our purpose is to deal with proportions of a population.
Part 1.
1. You have data on income in two villages ( x1 in village 1, x 2 in village 2). You want to test the
hypothesis that village 2 has higher earnings than village 1. You know that income has an extremely skewed
distribution and you have to decide whether to use the mean or the median income.
 H 0 : 1   2
Solution: If  is the median. 
. Since we are comparing medians and the data are not paired,
 H 1 : 1   2
use Method D5a.
Question for a Later Exam: What if we want to compare three or more villages? Solution: KruskalWallis.
2. You have a sample of earned incomes for 25 couples, both of whom are teachers. ( x1 is the women's
incomes in a column, x 2 is the men's. Each line represents one couple. ) Test to see if the men make more
than the women.
 H 0 : 1   2
Solution: If  is the median. 
. Since we are comparing medians and the data are paired, use
 H 1 : 1   2
Method D5b.
Question for a Later Exam: What if we want to compare the incomes of 25 members each of three
different ethnic groups? Each of the 25 lines of our table have three incomes, one for an individual of each
group, but the individuals on each line have been matched for education, experience and personality.
Solution: Friedman
2
252solngr3-072
10/23/07
3. You have interviewed a sample of 80 small businesses in the Northeast and 75 small businesses in the
Southeast. Each business has indicated whether they sell in foreign markets. 60 firms in the Northeast and
50 in the Southeast export. You want to show that businesses in the Northeast are more likely to export. ( x1
is the total number of firms that export in the Northeast sample, x 2 in the Southeast).
 p  x1
 1
n1  H 0 : p1  p 2
H : p  p 2  0
Solution: If 
or  0 1
. If p  p1  p 2 , then

 H 1 : p1  p 2
H 1 : p1  p 2  0
 p2  x2
n2

Since we are comparing proportions, use Method D6a.
H 0 : p  0
.

H 1 : p  0
4. You interview a sample of 57 Pennsylvania businesses in 2002 and reinterview the same sample in
2007. You ask them whether they export. Your data consists of two items for each firm: whether they
exported in 2002 and whether they exported in 2007. You want to show that the proportion exporting has
increased. Of the 57 firms 30 exported in both years and 10 did not export in the first year, but did so in the
second. 4 firms discontinued exports after 2002.
Solution: This can be called a paired comparison of proportions and the method is D6b. Let p1 represent
proportion of the population that exported in 2002 p 2 represent the proportion of the population that
exported in 2007. We want to test for p 2  p1 or p1  p 2 . Whichever way we write it, it’s an alternative
hypothesis because it contains no equality. Let x11 be those who exported in 2002 and 2007, x12 those
who exported in 2002 but not in 2007, x 21 be the number that exported in 2007 but not 2002e and x 22 be
H : p  p 2
those who never exported. Our hypotheses are given along with the table to be analyzed.  0 1
or
H 1 : p1  p 2
question 2
question 1
yes no
H : p  0
if p  p1  p 2 ,  0
yes
 x11 x12 
H 1 : p  0
x

no
 21 x 22 
5. You expand the sample in 3 by adding 60 small businesses in the Midwest, ( x3 is the number of these
that export). You test the hypothesis that the same fraction of businesses export in each region.
 p  x1
 1
n1

H 0 : p1  p 2  p 3

x
Solution: If  p 2  2 n 
. This is a chi-squared test of homogeneity. Since we are
2  H 1 : not all ps equal.

 p  x3
 3
n3
comparing multiple proportions, use a chi-squared test.
6. You have profit rates, x1 , for a sample of 20 pharmaceutical firms in Europe and profit rates, x 2 , for a
sample of 17 pharmaceutical firms in the US. You believe that they are normally distributed and you wish
to see whether the European firms were more profitable than the American firms.
 H 0 : 1   2
 H 0 : 1   2  0
H 0 : D  0
Solution: 
or 
. If D  1   2 , then 
. Because you believe
 H 1 : 1   2
 H 1 : 1   2  0
H 1 : D  0
that the Normal distribution applies, you use a method that compares means. The total sample size is too
small to use Method D1, which means that D2 or D3 should work. You could test the variances for equality
and use D2, or not bother and use D3.
Question for a Later Exam: What if we want to compare three or more groups of firms? Solution: Oneway ANOVA.
3
252solngr3-072
10/23/07
7. In order to see which garage to use under contract for automobile repairs, 35 cars are towed first to
garage 1 and than to garage 2. You end up with two data sets, the first data column, x1 , is estimates from
the first garage and the second data column, x 2 , is estimates for the second garage. Each of the 35 lines of
data refers to one car. You believe that the estimates are approximately normally distributed. Compare the
estimates in garage 1 and 2.
H :    2
Solution: There is no reason to assume that one garage is cheaper than the other, so  0 1
or
 H 1 : 1   2
 H 0 : 1   2  0
H : D  0
. If D  1   2 , then  0
. Again, you compare means because you are,

 H 1 : 1   2  0
H 1 : D  0
presumably, interested in the total amount that you will pay for the repairs, which means that you want the
lowest average cost. The important thing to notice here is that the data are in pairs, so you use Method D4.
Question for a Later Exam: What if we want to check 3 garages? Solution: 2-way ANOVA, with one
measurement per cell.
8. You are having a part produced in two different machines. x1 is 200 randomly selected data points that
represent the length of parts from machine one, x 2 is 200 randomly selected data points that represent the
length of parts from machine two. You want to test your suspicion that parts from machine 2 are longer than
parts from machine 1. In a problem of this type you would assume that the lengths are normally distributed.
Solution: You could use Method D2 (if you tested the variances for equality) or D3 here, but, since you
have two large samples, it would be far easier to use Method D1.
 H 0 : 1   2
 H 0 : 1   2  0
H 0 : D  0
or 
. If D  1   2 , then 
.

 H 1 : 1   2
 H 1 : 1   2  0
H 1 : D  0
9. You also suspect that parts from machine two are more variable in length than parts from machine one.
(This is the same as saying that machine 2 is less reliable than machine 1). Test this suspicion.
H 0 :   
 H 0 :  12   22
2
2
1
2
Solution: 
or 
. In terms of the variance ratio 12 or 22 , the alternate
2
1
 H 1 :  12   22
H 1 :  1   2
hypothesis rules, so H 0 :
 22
 12
 1 and H 1 :
 22
 12
 1 . Since you are comparing variances, use Method D7.
Question for a Later Exam: What if you doubt that the Normal Distribution applies? Solution: Levene
Test.
Question for a Later Exam: What if there are 3 machines? Solution: Levene or Bartlett Test.
10. You are going to do the exercise in 8) again, but this time you have done a test like that in Exercise 9)
and not rejected your null hypothesis. However, you have only 30 lengths from each machine.
Solution: Hypotheses are the same as for 8. Because this is a small sample and we have found that the
variances are equal we can use D2
Location - Normal distribution.
Compare means.
Location - Distribution not
Normal. Compare medians.
Paired Samples
Method D4
Method D5b
Independent Samples
Methods D1- D3
D1 Large samples, D2 Equal
variances, D3 More general
Method D5a
Proportions
Method D6
Variability - Normal distribution.
Compare variances.
Method D7
4
252solngr3-072
10/23/07
Part 2: Do problems 3 and 4, using a 95% confidence level. Find p-values.
3. You have interviewed a sample of 80 small businesses in the Northeast and 75 small businesses in the
Southeast. Each business has indicated whether they sell in foreign markets. 60 firms in the Northeast and
50 in the Southeast export. You want to show that businesses in the Northeast are more likely to export. ( x1
is the total number of firms that export in the Northeast sample, x 2 in the Southeast).
 p  x1
 1
n1  H 0 : p1  p 2
H : p  p 2  0
H : p  0
Solution: If 
or  0 1
. If p  p1  p 2 , then  0
.

 H 1 : p1  p 2
H 1 : p1  p 2  0
H 1 : p  0
 p2  x2
n2

Since we are comparing proportions, use Method D6.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
pcv  p0  z 2  p
Difference
p  p 0
p  p  z 2 s p
H 0 : p  p 0
z

between
If p  0
 p
H 1 : p  p 0
p  p1  p 2
proportions
 1
1 
If p  0
p 0  p 01  p 02
 p  p 0 q 0   
p1 q1 p 2 q 2
q  1 p
n
n
s p 

1
2 

p 01q 01 p 02 q 02
 p 

n1
n2
or p 0  0
n1
n2
n p  n2 p 2
p0  1 1
Or use s p
n1  n 2
So   .05 , z  z .05  1.645 , x1  60, n1  80, x 2  50 and n 2  75 . This means p1 
q1  .2500  ,
50
 .6667
75
p  .7500  .6667  .0833.
p2 
q 2  .3333  and
p0 
60
 .7500
80
60  50 110

 .7097 q 0  .2903  .
80  75 155
1 
 1
   .7097 .2903 0.02583   0.005322  .072954
80
75


 p  .7097 .2903 
 .7500 .2500  .6667 .3333  
s p  

  .002344 .002964  .005307  .072856
80
75


.0833  0
 1.142 . Make a diagram of a Normal curve with 0 in the middle and a
.072954
‘reject’ region above 1.645. Since 1.142 is not in the ‘reject region, do not reject the null
hypothesis. Since the alternative hypothesis is H 1 : p  0 , the p-value is the probability that p
Test ratio: z 
is greater than or equal to .0833, p  value  Pz  1.142   .5  .3729  .1271 . Because this is
above .05, we cannot reject the null hypothesis.
Critical value: Since the alternative hypothesis is H 1 : p  0 , the critical value must be above
zero. p cv  0  1.645 .072954   .1200 . Make a diagram of a Normal curve with 0 in the middle
and a ‘reject’ region above .1200. Since p  .0833 is not in the ‘reject region, do not reject the
null hypothesis.
Confidence interval: p  p  z s p  .0833 1.645.072856  .0365 . Make a diagram of a
Normal curve with .0833 in the middle represent the confidence interval by shading the entire
region above -.0365. Since p 0  0 is in the confidence interval, do not reject the null hypothesis.
Even better, represent the null hypothesis H 0 : p  0 by shading the area below zero and note
that this overlaps the confidence interval.
5
252solngr3-072
10/23/07
4. You interview a sample of 57 Pennsylvania businesses in 2002 and reinterview the same sample in
2007. You ask them whether they export. Your data consists of two items for each firm: whether they
exported in 2002 and whether they exported in 2007. You want to show that the proportion exporting has
increased. Of the 57 firms 30 exported in both years and 10 did not export in the first year, but did so in the
second. 4 firms discontinued exports after 2002.
Solution: This can be called a paired comparison of proportions and the method is D6b, the McNemar Test.
Let p1 represent proportion of the population that exported in 2002 p 2 represent the proportion of the
population that exported in 2007. We want to test for p 2  p1 or p1  p 2 . Whichever way we write it, it’s
an alternative hypothesis because it contains no equality. Let x11 = 30 be those who exported in 2002 and
2007, x12 = 10 those who exported in 2002 but not in 2007, x 21 =4 be the number that exported in 2007
but not 2002 and x 22 = 57 – 30 – 10 – 4 = 13 be those who never exported. Our hypotheses are given along
H : p  0
H : p  p 2
with the table to be analyzed.  0 1
or if p  p1  p 2 ,  0
. The general design of the
H 1 : p  0
H 1 : p1  p 2
question 2
question 1
yes no
x  x 21
30 4 
table is
, in this case 
. We will compare z  12

yes
 x11 x12 
x12  x 21
10 13 
x

x
no
22 
 21
4  10
62
  2.5714  1.604 against  z  . Make a diagram of a Normal curve with 0 in the
14
10  4
middle and a ‘reject’ region below -1.645. Since -1.604 is not in the ‘reject region, do not reject the null
hypothesis. Since the alternative hypothesis is H 1 : p  0 , the p-value is the probability that p is less


than or equal to the value it actually takes. p  value  Pz  1.604   .5  .4452  .0548 . Because this is
above .05, we cannot reject the null hypothesis.
6
Download