Homework_02

advertisement
Homework Problems (submit to Dropbox for Homework 2 by its due
day)
1. (18 pts; 3 pts each) Ratio of DDE to PCB concentrations in bird eggs has shown to
have had a number of biological implications. The ratio is used as an indication of the
movement of contamination through the food chain. The paper “The ratio of DDE to
PCB concentrations in Great Lakes herring gull eggs and its use in interpreting
contaminants data” [Journal of Great Lakes Research (1998) 24(1): 12 -31] reports the
following ratios for eggs collected at 13 study sites from the five Great Lakes. The eggs
were collected from both terrestrial- and aquatic-feeding birds.
DDE to PCB Ratio
Terrestrial Feeders
76.50 6.03 3.51 9.96 4.24 7.74 9.54 41.70 1.84 2.50 1.64
Aquatic Feeders
0.27 0.61 0.54 0.14 0.63 0.23 0.56 0.48 0.16 0.18
a. Compute the mean and median for the 21 data points, ignoring the type of feeder.
b. Compute the mean and median separately for each type of feeder.
c. Using your results from parts (a) and (b), comment on the relative sensitivity of
the mean and median to extreme values in a data set.
d. Comment on the appropriateness of using mean and median as measure of the
central location for DDE to PCB ratio for the two types of feeders separately.
Explain your answer.
e. Compute a measure of dispersion for the two types of feeder separately that is
resistant.
f. Compute a measure of dispersion for the two types of feeder separately that is non
resistant.
2. (6 pts; 3 pts each) When there are extreme outliers, we know that the trimmed mean is
a better measure of central location than the mean since it is more resistant, as mentioned
in Lesson 2. However, can we push this one step further to do the following? Explain
briefly.
a. When the data set has some extreme outliers, can we trim some of the data and
use the trimmed data set to carry out analysis to make inference about the central
location? (hint: when making inferences, in addition to the point estimate,
standard deviation of the estimate is important too )
b. When the data set has some extreme outliers, can we trim some of the data and
use the trimmed data set to carry out analysis to make inference about the
dispersion of the population?
3. (12 pts; 4 pts each) In the manufacturing of soft contact lenses, the power (the strength)
of the lens needs to be very close to the target value. A comparison of several suppliers
is made relative to the consistency of the power of the lens. The following table contains
the distance from the target power value of the lens from three different suppliers. (Note
that the measurements are distance from target power value. Measurements consistently
close to zero will be deemed good)
Supplier
1
2
3
Distance from target power value
189.9 191.9 190.9 183.8 185.5 190.9 192.8 188.4 187.0
158.6 156.4 157.7 154.1 152.3 159.5 158.1 150.9 156.9
218.6 208.4 187.1 199.5 202.0 211.1 197.6 204.4 206.8
a. Compute the mean and standard deviation for the distances of each supplier.
b. Plot the sample distance data. (Note that Boxplot is not the best choice to answer
b since it shows median and IQR, not corresponding to mean and standard
deviation asked in part a. You should use another plot that will give a more
complete illustration of the data.)
c. Which supplier appears to provide material that produces lenses having power
closest to the target value? You should use a numerical measure to justify your
answer.
4. (24 pts; 4 pts each) To control the risk of severe core damage during a commercial
nuclear power station blackout accident, the reliability of the emergency diesel generators
to start on demand must be maintained at a high level. The following table contains data
on the failure history of seven nuclear power plants. The following data are the number
of successful demands between failures for the diesel generators at one of these plants.
a.
b.
c.
d.
e.
f.
28 50 193 65 4 7 147 76 10 0 10 84 0 9 1 0 62
26 15 226 54 46 108 4 105 40 4 273 184 7 55 41 26 6
Calculate the mean and median for the data.
Which measure appears best to represent the center of the data?
Calculate the range and the standard deviation, s.
Use the range approximation to estimate s. How close is the approximation to the
true value?
Construct the intervals y  s , y  2s , y  3s . Count the number of data
falling in each of the three intervals. Convert these numbers to percentages and
compare your results to the Empirical Rule.
Why do you think the Empirical Rule and your percentages do not match well?
Provide a relevant graph for the data to support your explanation.
5. (15 pts; 3 pts each) For the data given in problem 4 (the previous problem), use
Minitab to draw the Boxplot and answer the following questions:
a. Is the data left skewed, symmetric or right skewed?
b. What are the outliers?
c. Is the median closer to the lower quartile or the upper quartile? Does that indicate
that the density of data between first quartile and the median is higher than the
density of data between the median and the third quartile?
6. The following table provides data on the price of 24 brands of paper towels. The
prices are given in both cost per roll and cost per sheet because the brands had varying
numbers of sheets per roll.
Brand
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Price per Roll
1.59
0.89
0.97
1.49
1.56
0.84
0.79
0.75
0.72
0.53
0.59
0.89
0.67
0.66
0.59
0.76
0.85
0.59
0.57
1.78
1.98
0.67
0.79
0.55
Numbers of Sheets per Roll
50
55
62
96
90
60
52
72
80
52
85
80
85
80
80
80
85
85
78
180
180
100
100
90
Cost Per Sheet
0.0318
0.0162
0.0156
0.0155
0.0173
0.0140
0.0152
0.0104
0.0090
0.0102
0.0069
0.0111
0.0079
0.0083
0.0074
0.0095
0.0100
0.0069
0.0073
0.0099
0.0110
0.0067
0.0079
0.0061
a. (4 pts) Compute the standard deviation for both the price per roll and the price per
sheet.
b. (4 pts) Which is more variable, price per roll or price per sheet? Should you use s
or CV? Justify your answer.
c. (2 pts) Do you think that IQR is a good measure to compare the variability of
these two variables?
7. Using Singer's data file answer the following:
a. (8 pts) Use Minitab to find the descriptive statistics for each type of singers.
For each case, does the approximate value of s give a good estimate of s?
b. (7 pts) Use Minitab to draw the Boxplots for these cases side by side.
Comment on the central tendency and the dispersion for the four types of
singers.
Download