biol.283.s2013.lab4.exercise

advertisement
BIOL 283 Lab 4: Sampling Distributions
Lab Objectives:
1
1. Understand the process for determining if data are normally
distributed
2. Produce normal probability (normal quantile) plots.
3. To understand how to generate sampling distributions, both
empirically and theoretically.
4. To understand how sample size affects sampling distributions
5. Get a feel for the R language
6. Develop a resourceful attitude
This lab will require a little more teamwork. At the beginning of the lab, team members will
collect data on a population of C. ellipticus, as found on page 25 in your textbook, and
described in problems 1.3.5 and 5.2.1. There are 100 values to collect. This task will be easiest
if responsibilities are divided. For example, four people can each collect 25 values and share
them with all members of the group. Alternatively, a group of four can split into two subgroups
of two, and within the subgroups, one person can make measurements and the other can serve
as a scribe as values are called out. It is up to you how to divide the responsibilities. Keep in
mind that doing this lab alone might increase the amount of time needed.
Part I. Defining a population.
For every ellipse (i.e., the body size and shape of individual C. ellipticus), measure the greatest
length to the nearest millimeter (N = 100). Exclude bristles from measurement. Record below.
00
20
40
60
80
01
21
41
61
81
02
22
42
62
82
03
23
43
63
83
04
24
44
64
84
05
25
45
65
85
06
26
46
66
86
07
27
47
67
87
08
28
48
68
88
09
29
49
69
89
10
30
50
70
90
11
31
51
71
91
12
32
52
72
92
13
33
53
73
93
14
34
54
74
94
15
35
55
75
95
16
36
56
76
96
17
37
57
77
97
18
38
58
78
98
19
39
59
79
99
BIOL 283 Lab 4: Sampling Distributions
Now create a variable in R for the whole population. This can be done as, for example,
Population = c(y1, y2, y3, …, yN)
Where the values, yi, are data for the variable, Y, which is ellipse length in mm. Make sure you
do this in the exact order of values presented in the table! Note that in R, one can use multiple
lines for one command. This might make it easier to input data. For example, one can do the
following (values are hypothetical):
Population = c(
5, 6, 5, 12, 4, 3, 9, 10, 11, 8,
4, 9, 8, 6, 7, 12, 11, 4, 7, 9,
….
….
….)
Thus, one can do 10 lines of 10 values, or 5 lines of 20 values, or any combination that makes it
easier to know if you have input all 100 values.
Part II. Collecting a sample from the population.
To sample from the population in R, one can use the sample( ) function. Within the
parentheses, add the variable name first, then a comma, then the sample size. For example,
sample(Population, 20)
draws a random sample of 20 subjects from the population (Note that Population means the
name you gave the population). Note that failure to add sample size means that a random
sample of N will be drawn, which effectively just mixes up the values of the population. It is
wise to give the sample a name, so that you can refer to the data later.
For example,
s.20 = sample(Population, 20)
s.15 = sample(Population, 15)
2
BIOL 283 Lab 4: Sampling Distributions
Randomly sample 10
subjects from the
population, and give it a
name
3
What did R provide as output? Do you know which subjects were
chosen? Do you know their lengths?
Now write down the 10 values in increasing order
i
Value:
1
2
3
4
5
6
7
8
Now calculate the percentiles for each value (refer to text or notes)
i
1
2
3
4
5
6
7
9
10
8
9
10
8
9
10
Percentile:
Now calculate the adjusted percentiles (refer to text or notes)
i
1
2
3
4
5
6
7
Adjusted
Percentile:
Now find the standard deviates for each score, assuming a standard normal distribution
i
z:
1
2
3
4
5
6
7
8
9
10
BIOL 283 Lab 4: Sampling Distributions
Finally, make a normal quantile plot by plotting the observed values (y-axis) versus the
quantiles, a. k. a. standard (normal) deviates (x-axis). You can do this by hand, or if you are
savvy, you might try it in R. If you choose the latter, just delete the box below and add a graph
that you made in R. (Make sure to label axes!)
Do the data from the sample pass the “fat pencil” test? I.e., are they normally distributed? If
not, what can you say about the distribution. (Feel free to use the hist( ) or boxplot( ) functions
in R to attain a better understanding of distributional shape. This is why it was a good idea to
give the sample a name)
Provide a comment
about your assessment
of “normality” from the
data.
4
BIOL 283 Lab 4: Sampling Distributions
R has a short-cut for the analysis you just did. Simply use the function, qqnorm(sample), where
sample means your sample name.
Did you get the same
result using the built in
qqnorm( ) function?
Explain.
5
BIOL 283 Lab 4: Sampling Distributions
Part 2. Creating a sampling distribution.
Before creating a sampling distribution, let’s take a look at the population. Using skills you have
learned in this and prior labs, find the population mean and standard deviation, and comment
on the shape of the distribution (using any plotting options you choose). Note: if you use the
function sd( ), the standard deviation will be wrong, as sd( ) calculates the sample standard
deviation. There are several ways to figure out the population standard deviation. Try to find
your own and check with the instructor to make sure you did it right.
Population parameters:
μ:
σ:
Comment on
distribution of lengths
for the population. Use
graphs if you like. Also
comment on how you
found the population
parameters above.
6
BIOL 283 Lab 4: Sampling Distributions
7
Using the sample of 10 you found before, provide the sample mean and standard deviation in
the table below. Also, repeat the sampling procedure you used to get your sample of 10
subjects, originally, 9 more times to produce a total of 10 sample means and standard
deviations. Add those to the table below. In each iteration, calculate the mean and standard
deviation of sample means for all iterations until that point. For example, in iteration 6
calculate the mean and standard deviation of 6 sample means from 6 iterations; in iteration 7,
do the same for 7 values; etc.
Iteration
y
s
Cumulative mY
(Calculate each time)
Cumulative s Y
(Calculate each time)
1
(original sample)
2
3
4
5
6
7
8
9
10
Now compare mY and s Y to m and s , respectively. What appears to be the relationship
between the population and sampling distribution parameters? How might sample size
contribute to your interpretation?
BIOL 283 Lab 4: Sampling Distributions
8
Compare population
and sampling
distribution parameters.
Are your results
consistent with what
you expect?
Part 3. Determining the effect of sample size
Instead of repeating the previous procedure many times with many sample sizes, download the
companion R script and use it to determine how sample size affects the sampling distribution of
Y . Use the script to fill in the table below, along with your own calculations. Then comment
on how the “Law of Large” numbers and “The Central Limit Theorem” apply to this exercise, as
well as what the resampling experiment demonstrates.
10
permutations
n
mY
sY
50
permutations
mY
sY
100
permutations
mY
sY
500
permutations
mY
sY
1000
permutations
mY
sY
5
10
20
40
You can copy and paste plots at the end of this lab exercise, if it helps you remember the
output for the future.
s
n
BIOL 283 Lab 4: Sampling Distributions
CHALLENGE:
At which points in your
simulation exercise did
you find accurate results
with respect to theory?
What does this tell you
about the “Law of Large
Numbers” and the
“Central Limit Theorem”?
9
Download