Assignment 1

advertisement
Assignment #1 – STAT 450
1.) For each of the following situations, identify the population of interest, the inferential
objective, and how you might go about collecting a sample. (3 pts. each)
a)
A city engineer wants to estimate the average weekly water consumption for a
single-family dwelling units in the city.
b) The National Highway Safety Council wants to estimate the proportion of
automobile tires with unsafe tread among all tires manufactured by a specific
company during the current production year.
c) A medical scientist wants to estimate the average length of time until the recurrence
of a certain disease.
2.)
The data in the file Stock Trade.JMP linked to on the assignment portion of my website
contains the top 40 stocks on the over-the-counter (OTC) market, ranked by percentage
of outstanding shares on one particular day last year. (10 pts.)
a) Construct a histogram, boxplot, etc. to describe these data.
b) What proportion of these top 40 stocks traded more than 4% of the outstanding
shares?
c) If one of the stocks is selected at random from the 40 stocks, what would you
estimate is the probability that is will have traded fewer than 5% of its outstanding
shares?
d) Calculate 𝑦 and 𝑠 2 (and 𝑠).
e) Calculate the interval 𝑦 ± 𝑘𝑠 for 𝑘 = 1,2, 𝑜𝑟 3. Count the number (and percentage) of
stocks that fall within each interval. How do these compare to what you would
expect applying the Empirical Rule.
f) Notice there is one stock with a very high percentage of outstanding shares (11.88%).
Eliminate this value and redo parts (d) and (e) with this stock eliminated. Compare
your results to parts (d) and (e) above.
3.) The following results on summations will be useful in simplifying formulae such as that
for the sample variance (𝑠 2 ). For any constant 𝑐 show: (2 pts. each)
a) ∑𝑛𝑖=1 𝑐 = 𝑛𝑐
b) ∑𝑛𝑖=1 𝑐𝑦𝑖 = 𝑐 ∑𝑛𝑖=1 𝑦𝑖
c) ∑𝑛𝑖=1 𝑥𝑖 + 𝑦𝑖 = ∑𝑛𝑖=1 𝑥𝑖 + ∑𝑛𝑖=1 𝑦𝑖
4.) Prove that the sum of the deviations of a set of measurements about their mean is equal
to zero, i.e. (3 pts.)
𝑛
∑(𝑦𝑖 − 𝑦̅) = 0
𝑖=1
5.) CHALLENGING PROBLEM, BUT HINT SHOULD HELP
Let 𝑘 ≥ 1. Show that, for any set of 𝑛 measurements, the fraction included in the interval
(𝑦 − 𝑘𝑠, 𝑦 + 𝑘𝑠) is at least (1 −
1
).
𝑘2
(6 pts.)
Hint: 𝑛𝑠 2 ≥ (𝑛 − 1)𝑠 2 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2 = ∑𝑖:|𝑦𝑖−𝑦̅|≥𝑘𝑠(𝑦𝑖 − 𝑦̅)2 + ∑𝑖:|𝑦𝑖−𝑦̅|<𝑘𝑠(𝑦𝑖 − 𝑦̅)2
6.)
A random sample of 100 foxes was examined by a team of veterinarians to determine the
prevalence of a specific parasite. Counting the number of parasites of this specific type,
the veterinarians found that 69 foxes had no parasites of the type of interest, 17 had one
parasite of the type under study, and so on. A summary of their results is given in the
following table: (6 pts.)
# of
0
Parasites
# of
69
Foxes
1
2
3
4
5
6
7
8
17
6
3
1
2
1
0
1
a) Draw an appropriate graphical display to see the distribution of the number
parasites per fox.
b) Calculate 𝑦 and 𝑠 for the data given.
c) What fraction of the parasite counts falls within 2 standard deviations of the mean?
Within 3 standard deviations of the mean? Do your results agree with
Chebyshev’s/Tchebysheff’s Theorem and/or the Empirical Rule?
7.)
Studies indicate that drinking water supplied by some old lead-lined city piping systems
may contain harmful levels of lead. Based on data presented by researchers it appears
that the distribution of lead content readings for individual water specimens has mean
.033 mg/L and standard deviation .10 mg/L. Explain why it is obvious that the lead
content readings are NOT normally distributed. (3 pts.)
8.) For the data contained in the Failure Times.JMP contains times until failure for a
sample of n = 88 radio transmitter-receivers. Using these data answer the following
questions.
Use the range to approximate the sample standard deviation (s) and compare it to
the actual sample SD calculated from these data. Discuss. (3 pts.)
b) Construct a histogram, outlier boxplot, and CDF plot for these data. Discuss the
distributional shape. (2 pts.)
c) Use find the sample mean (𝑦
̅) and SD (s) and find the range of failure times given by
the intervals 𝑦̅ ± 𝑘𝑠 for 𝑘 = 1,2, 𝑎𝑛𝑑 3. Find the percentage of failure times in these
intervals for sample data and compare these percentages to those given by
Chebyshev’s ( or Tchebysheff’s) Theorem and the Empirical Rule. Discuss. (6 pts.)
a)
Download