Homework Problems (submit to Dropbox for Homework 2 by its due day) 1. (18 pts; 3 pts each) Ratio of DDE to PCB concentrations in bird eggs has shown to have had a number of biological implications. The ratio is used as an indication of the movement of contamination through the food chain. The paper “The ratio of DDE to PCB concentrations in Great Lakes herring gull eggs and its use in interpreting contaminants data” [Journal of Great Lakes Research (1998) 24(1): 12 -31] reports the following ratios for eggs collected at 13 study sites from the five Great Lakes. The eggs were collected from both terrestrial- and aquatic-feeding birds. DDE to PCB Ratio Terrestrial Feeders 76.50 6.03 3.51 9.96 4.24 7.74 9.54 41.70 1.84 2.50 1.64 Aquatic Feeders 0.27 0.61 0.54 0.14 0.63 0.23 0.56 0.48 0.16 0.18 a. Compute the mean and median for the 21 data points, ignoring the type of feeder. b. Compute the mean and median separately for each type of feeder. c. Using your results from parts (a) and (b), comment on the relative sensitivity of the mean and median to extreme values in a data set. d. Comment on the appropriateness of using mean and median as measure of the central location for DDE to PCB ratio for the two types of feeders separately. Explain your answer. e. Compute a measure of dispersion for the two types of feeder separately that is resistant. f. Compute a measure of dispersion for the two types of feeder separately that is non resistant. 2. (6 pts; 3 pts each) When there are extreme outliers, we know that the trimmed mean is a better measure of central location than the mean since it is more resistant, as mentioned in Lesson 2. However, can we push this one step further to do the following? Explain briefly. a. When the data set has some extreme outliers, can we trim some of the data and use the trimmed data set to carry out analysis to make inference about the central location? (hint: when making inferences, in addition to the point estimate, standard deviation of the estimate is important too ) b. When the data set has some extreme outliers, can we trim some of the data and use the trimmed data set to carry out analysis to make inference about the dispersion of the population? 3. (12 pts; 4 pts each) In the manufacturing of soft contact lenses, the power (the strength) of the lens needs to be very close to the target value. A comparison of several suppliers is made relative to the consistency of the power of the lens. The following table contains the distance from the target power value of the lens from three different suppliers. (Note that the measurements are distance from target power value. Measurements consistently close to zero will be deemed good) Supplier 1 2 3 Distance from target power value 189.9 191.9 190.9 183.8 185.5 190.9 192.8 188.4 187.0 158.6 156.4 157.7 154.1 152.3 159.5 158.1 150.9 156.9 218.6 208.4 187.1 199.5 202.0 211.1 197.6 204.4 206.8 a. Compute the mean and standard deviation for the distances of each supplier. b. Plot the sample distance data. (Note that Boxplot is not the best choice to answer b since it shows median and IQR, not corresponding to mean and standard deviation asked in part a. You should use another plot that will give a more complete illustration of the data.) c. Which supplier appears to provide material that produces lenses having power closest to the target value? You should use a numerical measure to justify your answer. 4. (24 pts; 4 pts each) To control the risk of severe core damage during a commercial nuclear power station blackout accident, the reliability of the emergency diesel generators to start on demand must be maintained at a high level. The following table contains data on the failure history of seven nuclear power plants. The following data are the number of successful demands between failures for the diesel generators at one of these plants. a. b. c. d. e. f. 28 50 193 65 4 7 147 76 10 0 10 84 0 9 1 0 62 26 15 226 54 46 108 4 105 40 4 273 184 7 55 41 26 6 Calculate the mean and median for the data. Which measure appears best to represent the center of the data? Calculate the range and the standard deviation, s. Use the range approximation to estimate s. How close is the approximation to the true value? Construct the intervals y s , y 2s , y 3s . Count the number of data falling in each of the three intervals. Convert these numbers to percentages and compare your results to the Empirical Rule. Why do you think the Empirical Rule and your percentages do not match well? Provide a relevant graph for the data to support your explanation. 5. (15 pts; 3 pts each) For the data given in problem 4 (the previous problem), use Minitab to draw the Boxplot and answer the following questions: a. Is the data left skewed, symmetric or right skewed? b. What are the outliers? c. Is the median closer to the lower quartile or the upper quartile? Does that indicate that the density of data between first quartile and the median is higher than the density of data between the median and the third quartile? 6. The following table provides data on the price of 24 brands of paper towels. The prices are given in both cost per roll and cost per sheet because the brands had varying numbers of sheets per roll. Brand 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Price per Roll 1.59 0.89 0.97 1.49 1.56 0.84 0.79 0.75 0.72 0.53 0.59 0.89 0.67 0.66 0.59 0.76 0.85 0.59 0.57 1.78 1.98 0.67 0.79 0.55 Numbers of Sheets per Roll 50 55 62 96 90 60 52 72 80 52 85 80 85 80 80 80 85 85 78 180 180 100 100 90 Cost Per Sheet 0.0318 0.0162 0.0156 0.0155 0.0173 0.0140 0.0152 0.0104 0.0090 0.0102 0.0069 0.0111 0.0079 0.0083 0.0074 0.0095 0.0100 0.0069 0.0073 0.0099 0.0110 0.0067 0.0079 0.0061 a. (4 pts) Compute the standard deviation for both the price per roll and the price per sheet. b. (4 pts) Which is more variable, price per roll or price per sheet? Should you use s or CV? Justify your answer. c. (2 pts) Do you think that IQR is a good measure to compare the variability of these two variables? 7. Using Singer's data file answer the following: a. (8 pts) Use Minitab to find the descriptive statistics for each type of singers. For each case, does the approximate value of s give a good estimate of s? b. (7 pts) Use Minitab to draw the Boxplots for these cases side by side. Comment on the central tendency and the dispersion for the four types of singers.