252grass3-052 10/25/05 Name: Class days and time:

advertisement
252grass3-052 10/25/05 (Open this document in 'Page Layout' view!)
Name:
Class days and time:
Please include this on what you hand in!
Graded Assignment 3
A. In your outline there are 6 methods to compare means or medians, methods D1, D2, D3, D4, D5a and
D5b. Methods D6a and D6b compare proportions and method D7 compares variances or standard
deviations. In the following cases, identify H 0 and H 1 and identify which method to use. If the hypotheses
involve a mean, state the hypotheses in terms of both  and   1   2 . If the hypotheses involve a
proportion, state them in terms of both p and p  p1  p 2 . If the hypotheses involve standard deviations
or variances, state them in terms of both  2 and
 12
 22
or
 22
 12
. All the questions involve means, medians,
proportions or variances. One of these problems is a chi-squared test. Remember that a yes answer is not
acceptable without an explanation.
Note: Look at 252thngs ( 252thngs) on the syllabus supplement part of the website before you start (and
before you take exams) . Remember that I use  ,  ,  and p as parameters and x , s, x.50 , and p 
x
n
as sample statistics.
----------------------------------------------------------------------------------------------------------------------------Example: This may seem long but it appears on an old Graded Assignment 3.
A group of supervisors are given the exams on management skills before and after taking a course in
management. Scores are as follows.
Supervisor
Before
After
1
63
78
2
93
92
3
84
91
4
72
80
5
65
69
6
72
85
7
91
99
8
84
82
9
71
81
10
80
87
11
68
93
If we assume that the distribution of results is Normal, what method should we use to answer the question
“Has the course improved the scores of the managers?”
Solution: You are comparing means before and after the course. You can get away with using means
because the parent distributions are Normal. If  2 is the mean of the second sample, you are hoping that
 2  1 , which, because it contains no equality is an alternate hypothesis. So your hypotheses are
 H 0 : 1   2
 H 0 : 1   2  0
H 0 : D  0
or 
. If D  1   2 , then 
. The important thing to notice

 H 1 : 1   2
 H 1 : 1   2  0
H 1 : D  0
here is that the data are in before and after pairs, so you use Method D4.
-------------------------------------------------------------------------------------------------------------------------------1. You have data on income in two villages ( x1 in village 1, x 2 in village 2). You want to test the
hypothesis that village 2 has higher earnings than village 1. You know that income has an extremely skewed
distribution. and you have to decide whether to use the mean or the median income.
2. The data in the file CONCRETE 1 on your CD represents the strength (measured by how many
thousands of pounds/square inches that they can take without buckling) of 40 concrete samples on the
second and seventh days after pouring. ( x1 is the strength on the second day and x 2 is the strength on the
seventh day, each line refers to a single sample.) Assume that the underlying distribution is Normal and test
the hypothesis that it is stronger on the seventh day.
3. You have interviewed a sample of 80 small businesses in the Northeast and 75 small businesses in the
Southeast. Each business has indicated whether they sell in foreign markets. You want to show that
businesses in the Northeast are more likely to export. ( x1 is the total number of firms that export in the
Northeast sample, x 2 in the Southeast).
4. You expand the sample in 3 by adding 60 small businesses in the Midwest, ( x3 is the number of these
that export). You test the hypothesis that the same fraction of businesses export in each region.
6. In order to see which garage to use under contract for automobile repairs, 10 cars are towed first to
garage 1 and than to garage 2. You end up with two data sets, the first data column, x1 , is estimates from
the first garage and the second data column, x 2 , is estimates for the second garage. Each of the 10 lines of
data refers to one car. You believe that the estimates are approximately normally distributed. Compare the
estimates in garage 1 and 2. Would you change your method if there were 200 cars?
7. You have processing times in seconds, x1 , for a sample of 5 computer jobs from the accounting
department and for 6 jobs from the research department, x 2 . You believe that the underlying distributions
are Normal and want to show that research jobs take longer than accounting jobs. Would you change your
method if n1  n 2  205 ?
8. You are having a part produced in two different machines. x1 is 200 randomly selected data points that
represent the length of parts from machine one, x 2 is 200 randomly selected data points that represent the
length of parts from machine two. You want to test your suspicion that parts from machine 2 are longer than
parts from machine 1. In a problem of this type you would assume that the lengths are normally distributed.
9. You also suspect that parts from machine two are more variable in length than parts from machine one
(This is the same as saying that machine 2 is less reliable than machine 1). Test this suspicion.
10. A panel is exposed to an ad for Smelly-Welly Dirt Devourer. Before seeing the ad, 5 out of the 40
members had a favorable impression of Smelly-Welly. After seeing the ad, 2 more members of the panel
plus the original 5 had a favorable impression. Has the proportion with favorable impressions risen
significantly?
B. You have 3 methods that can be used for goodness of fit tests. Chi-squared, Kolmogorov-Smirnov and
Lilliefors. Which would you use in the following cases? 1. You want to know if the Normal distribution
applies to a data set.
a. The data set consists of 15 numbers – you do not know the population mean and variance and
will have to compute sample means and variances from the data.
b. The data set consists of 15 numbers – you think that you know the population mean and
variance.
c. The data set consists of 5000 numbers and you have observed frequencies for the following
intervals: below 1000, 1000-11199.99, 1200-1399.99, 1400-1599.99 ……….2600-2799.99, 2800 and
above. You think you know the population mean and variance.
d. The data set consists of 5000 numbers and you have observed frequencies for the following
intervals: below 1000, 1000-11199.99, 1200-1399.99, 1400-1599.99 ……….2600-2799.99, 2800 and
above. You have computed a sample mean and variance from the data.
2
Download