MTCS3063/MT1613 - Probability and Statistics Exercise 3 Q 1. The following are figures on a well’s daily production of oil in barrels: 214, 203, 226, 198, 243, 225, 207, 203, 208, 200, 217, 202, 208, 212, 205 and 220 Calculate the variance s2 and the standard deviation of the data. Display the well’s daily production on a boxplot. Q 2. Consider the following data on type of relative frequencies for the various cathealth complaint (J = joint swelling, F egories, and draw a bar graph. = fatigue, B = back pain, M = muscle weakness, C = coughing, N = nose running/irritation, O = other) made by tree planters. Obtain frequencies and Q 3. In five tests, one student averaged 63.2 with a standard deviation of 3.3, whereas another student averaged 78.8 with a standard deviation of 5.3. Which student is relatively more consistent? Q 4. Every score in the following batch of exam scores is in the 60s, 70s, 80s, or 90s as given in the table below: 74 89 81 85 89 81 95 98 80 81 84 84 93 71 81 68 64 74 80 90 67 82 70 82 72 85 69 69 70 63 66 72 66 72 60 87 85 81 83 88 Compute the mean, median and mode of the exam scores. Q 5. If k sets of data consist, respectively, of n1 , n2 , . . . nk observations and have the means x1 , x2 , . . . , xk then the overall mean of all the data is given by the formula Pk x = Pi=1 k ni xi i=1 ni i) The average annual salaries paid to top-level management in three companies are $94,000, $102,000, and $99,000. If the respective numbers of top-level executives in these companies are 4, 15, and 11, find the average salary paid to these 30 executives. ii) In a nuclear engineering class there are 22 juniors, 18 seniors, and 10 graduate students. If the juniors averaged 71 in the midterm examination, the seniors averaged 78, and the graduate students averaged 89, what is the mean for the entire class? 1 Q 6. The formula for preceding exercise is a special case of the following formula for the weighted mean Pk wi xi xw = Pi=1 k i=1 wi where wi is a weight indicating the relative importance of the i-th observation. i) If an instructor counts the final examination in a course four times as much as each 1-hour examination, what is the weighted average grade of a student who received grades of 69, 75, 56, and 72 in four 1-hour examinations and a final examination grade of 78? ii) From 1999 to 2004 the cost of food increased by 53% in a certain city, the cost of housing increased by 40% and the cost of transportation increased by 34%. If the average salaried worker spent 28% of his or her income on food, 35% on housing, and 14% on transportation, what is the combined percentage increase in the cost of these items? Q 7. The national Highway Traffic Safety Administration reported the relative speed (rounded to the nearest 5 mph) of automobiles involved in accidents one year. The percentages at different speeds were as recorded in the given table. 20 mph or less 25 or 30 mph 35 or 40 mph 45 or 50 mph 55 mph 60 or 65 mph 2.0% 29.7% 30.4% 16.5% 19.2% 2.2% i) From these data can we conclude that it is quite safe to drive at high speeds? Why or why not? ii) Why do most accidents occur in the 35 or 40 mph and in the 25 or 30 mph ranges? iii) Construct a density histogram using the endpoints 0, 22.5, 32.5, 42.5, 52.5, 57.5, 67.5 for the intervals. Q 8. According to Chebyshev’s theorem, what can we assert about the percentages of any set of data that must lie within k standard deviations of the mean when a) k = 2; b) k = 2.5; c) k = 3.1; d) k = 9; e) k = 12? Q 9. Twenty power failures last 18 125 45 33 44 96 89 12 31 103 26 80 49 75 40 80 125 63 61 28 minutes. Find the mean, median, and standard deviation. What proportion of the data lies in the intervals (x − s, x + s) and (x − 2s, x + 2s). Q 10. Measurements made with one micrometer of the diameter of a ball bearing have a mean of 3.92 mm and a standard deviation of 0.0152 mm, whereas measurements made with another micrometer of the unstretched length of a spring have a mean of 1.54 inches and a standard deviation of 0.0086 inch. Which of these two measuring instruments is relatively more precise? 2 Downtime (minutes) Frequency 0−9 2 10 − 19 15 20 − 29 17 30 − 39 13 40 − 49 3 Total = 50 Q 11. In a factory or office, the time during working hours in which a machine is not operating as a result of breakage or failure is called a downtime. The table displays the distribution of a sample of the length of the downtimes of a certain machine. Find i) the mean and the median, iii) the standard deviation, ii) lower and upper quartiles, iv) π10 , π30 , π65 , π95 . v) Calculate the Pearson’s coefficient of skewness for the distribution and discuss the symmetry or skewness of the data. Q 12. The following are measurements of the breaking strength (in ounces) of a sample of 60 linen threads: i) Find the mean breaking strength of this sample data. Calculate the standard deviation and describe the population using Chebyshev’s Rule. ii) Compute again the mean and s from the grouped data and then compare with the results of Part i) above. iii) Find the interquartile range. iv) What are the 20th and 97th percentile of the give data. v) Draw the boxplot of the given data set. Q 13. The compressive strength of high-performance concrete had previously been investigated, but not much was known about flexural strength (a measure of ability to resist failure in bending). The sample data on flexural strength (in Mega Pascal (MPa) where 1 Pa (Pascal) = 1.45 × 10–4 psi) are given below: 5.9 7.9 9.7 7.2 7.3 9.0 7.0 7.8 7.7 6.3 8.1 6.8 6.5 8.2 8.7 7.8 9.7 11.6 11.3 11.8 10.7 i) Compute the mean flexural strength. ii) Analyse the data by construction its boxplot. 3 7.0 6.3 7.4 7.7 7.6 6.8 Q 14. In a 2-week study of the productivity of workers, the following data were obtained on the total number of acceptable pieces which 100 workers produced: Use the frequency distribution and cumulative frequency distribution of this data set from Exercise 1 and estimate the mean, median and mode of this grouped data. Q 15. Derive the following equivalent computing formula for the variance: n n X X n· xi2 − xi i=1 i=1 s2 = n(n − 1) Q 16. Find and interpret the z-scores of the following exam scores: i) 67, ii) 95, iii) 45, iv) 100. Q 17. Answer the following questions briefly: i) Identify the two most commonly used measures of center for quantitative data. Explain the relative advantages and disadvantages of each. ii) Among the measures of central tendency discussed, which is the only one appropriate for qualitative data? iii) Data Set A has more variation than Data Set B. Decide which of the following statements are necessarily true. a) Data Set A has a larger mean than Data Set B. b) Data Set A has a larger standard deviation than Data Set B. iv) What do you mean by the z-score of an observed value of a variable? v) Identify the statistic that is used to estimate a) a population mean, and b) a population standard deviation. vi) What are the a) Empirical Rule and b) Chebyshev’s Rule for the distribution of statistical data. 4