Describing Samples Based on Chapter 3 of Gotelli & Ellison (2004)

advertisement
Describing Samples
Based on Chapter 3 of Gotelli & Ellison (2004)
and Chapter 4 of D. Heath (1995). An Introduction
to Experimental Design and Statistics for Biology.
CRC Press.
• The basic output of any scientific investigation is
a collection of observations or data. (Ex. If Y is a
random variable, then we use Yi to denote the
ith observation in our sample.)
• Often, we will use our sample data to estimate
unknown population parameters (Ex. We can
use the sample mean,Y, to estimate the
population mean, μ)
• The construction of frequency distributions is
usually the first step in summarizing data
Hypericum cumulicola:
• Small, short-lived perennial herb
• Narrowly endemic and endangered
• Flowers are small and bisexual
Histogram of plant height (1995)
Measures of location
• It is useful to identify a “typical value” to
summarize our observations (i.e., an
“average”)
• Examples include:
1. Mean
2. Median
3. Mode
The Arithmetic Mean
The arithmetic mean (or simply the
mean) of a list of numbers is the sum of
all the observations (Yi) in the list divided
by the number of the observations (n):
n
 Yi
i

1
Y 
n
The Arithmetic Mean
• Remember the formula for the expected
value of a discrete random variable?
n
E (Y )   Yi pi
i 1
• Since we assume, for our sample, that the
Yi are the values of a random variable and
that pi = 1/n for all Yi, we get:
n
1 n
E (Y )   Yi (1 / n)   Yi  Y
n
i 1
i 1
The Arithmetic Mean
• The arithmetic mean of the observations in
our sample (Y ) is an unbiased estimator
of the population mean (μ) if 3 conditions
are met:
1. Observation are made on randomly selected
individuals
2. Observations in the sample are independent
3. Observations are drawn from a larger
population that is distributed as a normal
random variable
The Law of Large Numbers
• As the sample size n increases, the
arithmetic mean of Yi approaches the
expected value of Y
 n

 Y

 i

lim  i 1  Yn   E (Y )  

n   n






The Median
• The value of a set of ordered observations
that has an equal number of observations
above and below it.
The Median
• Estimation:
– For an odd number of observations, the
median is the middle observation of the set.
– Ex. Median of {1, 2, 3, 4, 5} = 3
– For an even number of observations, the
median is the average of the two middle
observations of the set.
– Ex. Median of {1, 2, 3, 4, 5, 6} = (3+4)/2 = 3.5
The sample mean and the median height of
Hypericum cumulicola (ADULTS ONLY)
The normal
distribution
with the
observed
sample mean
and variance
The Mode
• The value of the observations that
occurs most frequently in the sample.
• This will be the peak of the frequency
distribution in a histogram
The distribution of height of
Hypericum cumulicola is bimodal.
Could you suggest why?
Plotting seedlings and adults separately
Final Comments on Measures
of Location
• When the underlying distribution is
symmetrical (or nearly so), the mean,
median, and mode are all similar in value,
BUT…
• …when there are extreme observations,
the median or mode may better describe
the location of the data
Measures of variability
• It is never sufficient to just state the mean
or other measure of location of our data!
• Because there is variability in nature,
variability due to our sampling, etc., we
also need to estimate the spread of our
observations around the average value
• Examples include:
The range, the variance, and the standard
deviation
The sample variance
An individual value (Yi  Y ) is called a
deviation from the mean. The sum of the
squared deviations is called the sum of
squares (SS). We divide SS by one less
than the sample size to get the sample
variance (s2), which is an unbiased
estimator of the population variance (σ2).
n
2
 Yi  Y
Sum of squares
SS
2
i

1
s 



n 1

n 1
n 1
The sample standard deviation
The units in which the variance is
expressed are (original units)2, which is
conceptually awkward. To get around this,
the sample variance is converted to the
sample standard deviation (s), by simple
taking the square root:
n
 Yi  Y 
s
2
s 
i 1
2
n 1
68.26 %
15.87 %
15.87 %
Mean + One
standard
deviation
The Standard Error of the Mean
• Remember the Central Limit Theorem: if the Yi
are independent random observations and the
sample size is “reasonably large”, the sample
mean ( Y ) is approximately normally distributed
with mean E[Y] and variance σ2(Y)/n
• Thus, we can calculate the standard error of the
mean as follows:
sY   2 (Y ) n  s 2 n  s
n
Download