Word

advertisement
BIOL 933
Fall 2015
Problem Set 1
Topics 1 & 2
Due Thursday, September 17, at the beginning of lecture. Answer all parts of the questions
completely, and clearly document the procedures used in each exercise. Please refer to the "Homework
Tips" on the course website. R can generate prolific output; it is your job to wade through it and submit
only that which is relevant to the question being asked.
Quantity ≠ Quality
Question 1 (now)
Fundamental statistical concepts
The following six samples of 10 observations each were randomly drawn from a population with mean =
40 and variance = 100.
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
38
37
42
39
49
39
20
24
46
45
36
40
22
27
45
31
33
50
50
25
40
42
50
30
46
38
50
21
42
47
25
58
43
60
42
72
45
50
45
48
46
60
40
32
29
26
33
34
39
57
38
51
44
47
43
38
51
38
25
34
1.1 What is the mean, standard deviation, coefficient of variation (CV), and standard error of the mean
(SE) for each sample? [You can do this with a calculator or with Excel, but I encourage/challenge you to
write and use an R script for this!] Show one sample calculation for each of the four requested statistics.
1.2 Add 15 to each value in the data set and determine the effect on the mean, standard deviation, CV,
and SE for each sample.
1.3 Now consider the six means as a sample of six values, and calculate the overall mean and standard
deviation of this new dataset. Comment on the relationship of the mean of the six means with both the
overall mean of the 60 observations and the theoretical (i.e. given) mean for the population. Comment on
the relationship of the standard deviation of the six means with the theoretical SE for the population (i.e.
the SE based on the given population parameters and the given sample sizes).
1.4 Explain in your own words why the variation among the six sample means is smaller than the
average sample variation.
1.5 The following table was derived by subtracting Sample 1 from Sample 2 (S2-S1), Sample 3 from
Sample 4 (S4-S3), and Sample 5 from Sample 6 (S6-S5):
S2-S1
S4-S3
S6-S5
BIOL 933 - Fall 2015
-1
-3
-10
4
-1
4
5
-14
17
-25
2
-20
-8
-29
5
1.1
33
17
30
5
3
14
-8
-3
1
18
13
3
-5
-13
9
HW1; topics 1 & 2
Calculate the mean and variance for each of these three derived samples, and then find the averages of
these statistics. Comment of how these averages compare with what you would expect. [Phrased another
way: What mean and variance do you expect when you subtract one random variable from another?]
Question 2 (Tues, 9/8)
Distributions and probabilities
For these questions, use R to find the exact probabilities associated with critical values and vice-versa.
[Hint: With questions like these, it is very helpful to draw a figure first...see the reading from Lecture 1
for examples.]
2.1
Find Z0 such that P(Z0≤Z≤1.66) = 0.200.
2.2 Given a normal distribution Y with mean = -12 and variance = 16, find Y0 such that P(Y≤ Y0) =
0.02.
2.3 Y is normally distributed with mean = 15 and variance = 9. For a random sample of 10 observations,
find Y0 such that P(Y  Y0 )  0.72 .
2.4 The mean weight of a Conserviola olive is 1.9 g, with a sample variance of 0.4 g2 (population
variance unknown). What is the probability that a bag of 18 randomly-picked Conserviola olives has an
average per-olive weight of more than 2.0 g?
2.5
Given a t distribution, find t0 such that 60% of the values are within the (-t0,t0) interval for df = 24.
2.6 Describe in words the relationship between the Z and t distributions. Provide a numerical example
to illustrate this relationship.
Question 3 (Thurs, 9/10)
t-test; independent observations
To test the lateral effect of a new sprayed herbicide on insect diversity, a researcher selected 32 fields and
randomly assigned them to two treatments. One week after application, she examined five random 1 m2
sections in each field and calculated the average number of insect species present. Her data:
Field
Control
Sprayed
1
4.6
6.4
2
2.8
4.3
3
6.4
2.9
4
4.7
4.3
5
4.8
2.4
6
6.8
4.4
7
1.1
2.8
8
7.1
1.6
9
9.5
0.6
10
11.0
4.9
11
9.1
4.0
12
3.8
5.0
13
6.6
5.7
14
3.9
1.5
15
5.8
3.3
16
4.8
0.2
Answer the following questions using R.
3.1 For each sample (i.e. treatment), test normality using the Shapiro-Wilk test and present a QuantileQuantile plot. Comment.
3.2
What is the probability that these two samples are different just by chance?
3.3 Calculate the power of the test with R. Confirm the R result with a hand calculation of the power;
show formulas and intermediate steps.
BIOL 933 - Fall 2015
1.2
HW1; topics 1 & 2
3.4 How many replications would be required to detect a significant difference between these two
groups with a power of 95% and α = 0.01? Refer to lecture notes section 2.4.4; you can solve either by
hand or using R.
Question 4 (Tues, 9/8)
Power and sample size
4.1 How many baby chipmunks must you weigh to achieve 90% confidence that their average weight
deviates no more than 1% from the true mean weight of the entire newborn chipmunk population?
Assume a CV of less than 5%.
4.2 Prepare a graph showing the number of replicates required to detect significant differences between
means that are ¼, ½, ¾, 1, 1¼, 1½, 1¾, and 2 standard deviations apart, with α = 0.05 and Power = 0.80.
[You can do this by hand, with Excel, or with R (if you're feeling brave)...whatever works for you].
4.3 Researchers want to determine whether over-irrigation decreases antioxidant levels in kiwis. They
conclude that an irrigation rate of 15% over the ET rate does not significantly decrease antioxidant levels
at α = 0.05. Assume the true mean of the over-irrigated treatment is 1.4 standard deviations less than the
untreated mean and that there were 5 replications per treatment. Assuming a one-tailed test, does the
experiment have adequate power (i.e. > 80%) to support the researchers' claim?
Question 5
t-test; paired observations
A researcher sets out to study the effect of a vegetarian diet on the level of triglycerides in the blood. To
do this, he measures cholesterol levels in 15 volunteers before and after two months on a strict vegetarian
diet. The data:
Volunteer
Before
After
1
252.7
209.6
2
215.6
256.4
3
214.5
169.9
4
249.9
228.1
5
219.6
223.9
6
232.6
234.3
7
267.3
217.3
8
223.7
201.4
9
258.0
214.5
10
228.0
200.0
11
226.9
218.2
12
232.3
216.2
13
242.6
207.9
14
271.7
220.8
Since both measures were taken from the same individual, they are not independent (i.e. these are paired
comparisons).
5.1 (Tues, 9/8) Use R to calculate the mean and standard deviation of each treatment and of their
differences (Before - After); comment on the relative sizes of these statistics.
5.2 (Thurs, 9/10) Use R to determine the power of this two-tailed t-test (use α = 5%). Now create a
new variable EFFECT = BEFORE - AFTER, and calculate the power of the test for that new variable (H0:
μ = 0; α = 5%). Compare the powers of the two analyses and comment. Finally, did the treatment affect
the level of saturated lipids?
5.3 (Tues, 9/8) Prepare a graphical depiction of the power of the test for variable EFFECT by hand,
using Figure 3 in lecture topic 2 as a template. Use the numerical values from the previous analysis of
variable EFFECT (H0: μ = 0; H1: μ = 21.673), and assume normal distributions about these means.
Indicate the areas corresponding to α, β, and power.
BIOL 933 - Fall 2015
1.3
HW1; topics 1 & 2
15
232.4
224.2
Download