Assignment #1 – STAT 450 1.) For each of the following situations, identify the population of interest, the inferential objective, and how you might go about collecting a sample. (3 pts. each) a) A city engineer wants to estimate the average weekly water consumption for a single-family dwelling units in the city. b) The National Highway Safety Council wants to estimate the proportion of automobile tires with unsafe tread among all tires manufactured by a specific company during the current production year. c) A medical scientist wants to estimate the average length of time until the recurrence of a certain disease. 2.) The data in the file Stock Trade.JMP linked to on the assignment portion of my website contains the top 40 stocks on the over-the-counter (OTC) market, ranked by percentage of outstanding shares on one particular day last year. (10 pts.) a) Construct a histogram, boxplot, etc. to describe these data. b) What proportion of these top 40 stocks traded more than 4% of the outstanding shares? c) If one of the stocks is selected at random from the 40 stocks, what would you estimate is the probability that is will have traded fewer than 5% of its outstanding shares? d) Calculate 𝑦 and 𝑠 2 (and 𝑠). e) Calculate the interval 𝑦 ± 𝑘𝑠 for 𝑘 = 1,2, 𝑜𝑟 3. Count the number (and percentage) of stocks that fall within each interval. How do these compare to what you would expect applying the Empirical Rule. f) Notice there is one stock with a very high percentage of outstanding shares (11.88%). Eliminate this value and redo parts (d) and (e) with this stock eliminated. Compare your results to parts (d) and (e) above. 3.) The following results on summations will be useful in simplifying formulae such as that for the sample variance (𝑠 2 ). For any constant 𝑐 show: (2 pts. each) a) ∑𝑛𝑖=1 𝑐 = 𝑛𝑐 b) ∑𝑛𝑖=1 𝑐𝑦𝑖 = 𝑐 ∑𝑛𝑖=1 𝑦𝑖 c) ∑𝑛𝑖=1 𝑥𝑖 + 𝑦𝑖 = ∑𝑛𝑖=1 𝑥𝑖 + ∑𝑛𝑖=1 𝑦𝑖 4.) Prove that the sum of the deviations of a set of measurements about their mean is equal to zero, i.e. (3 pts.) 𝑛 ∑(𝑦𝑖 − 𝑦̅) = 0 𝑖=1 5.) CHALLENGING PROBLEM, BUT HINT SHOULD HELP Let 𝑘 ≥ 1. Show that, for any set of 𝑛 measurements, the fraction included in the interval (𝑦 − 𝑘𝑠, 𝑦 + 𝑘𝑠) is at least (1 − 1 ). 𝑘2 (6 pts.) Hint: 𝑛𝑠 2 ≥ (𝑛 − 1)𝑠 2 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2 = ∑𝑖:|𝑦𝑖−𝑦̅|≥𝑘𝑠(𝑦𝑖 − 𝑦̅)2 + ∑𝑖:|𝑦𝑖−𝑦̅|<𝑘𝑠(𝑦𝑖 − 𝑦̅)2 6.) A random sample of 100 foxes was examined by a team of veterinarians to determine the prevalence of a specific parasite. Counting the number of parasites of this specific type, the veterinarians found that 69 foxes had no parasites of the type of interest, 17 had one parasite of the type under study, and so on. A summary of their results is given in the following table: (6 pts.) # of 0 Parasites # of 69 Foxes 1 2 3 4 5 6 7 8 17 6 3 1 2 1 0 1 a) Draw an appropriate graphical display to see the distribution of the number parasites per fox. b) Calculate 𝑦 and 𝑠 for the data given. c) What fraction of the parasite counts falls within 2 standard deviations of the mean? Within 3 standard deviations of the mean? Do your results agree with Chebyshev’s/Tchebysheff’s Theorem and/or the Empirical Rule? 7.) Studies indicate that drinking water supplied by some old lead-lined city piping systems may contain harmful levels of lead. Based on data presented by researchers it appears that the distribution of lead content readings for individual water specimens has mean .033 mg/L and standard deviation .10 mg/L. Explain why it is obvious that the lead content readings are NOT normally distributed. (3 pts.) 8.) For the data contained in the Failure Times.JMP contains times until failure for a sample of n = 88 radio transmitter-receivers. Using these data answer the following questions. Use the range to approximate the sample standard deviation (s) and compare it to the actual sample SD calculated from these data. Discuss. (3 pts.) b) Construct a histogram, outlier boxplot, and CDF plot for these data. Discuss the distributional shape. (2 pts.) c) Use find the sample mean (𝑦 ̅) and SD (s) and find the range of failure times given by the intervals 𝑦̅ ± 𝑘𝑠 for 𝑘 = 1,2, 𝑎𝑛𝑑 3. Find the percentage of failure times in these intervals for sample data and compare these percentages to those given by Chebyshev’s ( or Tchebysheff’s) Theorem and the Empirical Rule. Discuss. (6 pts.) a)