UBC - BIOLOGY 300

advertisement
Page 38
Biology 300
Lab Exercise # 5
7. ONE-SAMPLE INFERENCE FOR A NORMAL POPULATION
t Tests
In a previous exercise, we demonstrated the central limit theorem. One of the consequences
of this theorem, and in fact of the normal distribution itself, is that a large number of samples
taken at random from a normal population will produce a distribution of sample means, ,
that is also normally distributed. We can convert values (means) from this distribution of
sample means to generate the standard normal distribution, Z. We subtract the parametric
mean of the population from each sample mean, and divide the result by the standard error of
the mean.
All values from the standard normal distribution are in units of standard deviation regardless
of the original units of measurement (g, cm, ml, etc.). As with all continuous distributions,
the area under the standard normal curve equals 1.0.
We must know both the population mean and standard deviation to convert data to the Zdistribution. If we don't know the population standard deviation, however, we may estimate it
from the sample standard deviation. As we also showed previously, the sample standard
deviation will provide a good estimate of the population standard deviation if n is sufficiently
large. Consequently, the sample standard error (i.e. s/n) will be a good estimate of the
standard error of the population mean if n is sufficiently large. Transformation of each
sample mean by subtracting the parametric mean and dividing by the sample standard error
produces the equation:
This equation will result in a distribution of values that is very similar to the Z-distribution
when n is large. On the other hand, if n is small, the distribution of transformed values will
be wider and flatter than the Z-distribution. A distribution of this sort is referred to as a t
distribution. It may be convenient to think of the t distribution as a normal distribution that
is corrected for sample size.
The t-distribution has different shapes for different degrees of freedom (DF). Since one
parameter estimate (the sample mean) is required to calculate s, the t statistic has n-1 degrees
of freedom. As n gets smaller, the t distribution becomes wider and flatter, and as n
approaches infinity, the t distribution becomes more similar to the Z-distribution.
Biology 300
Page 39
Although the t distribution has a different shape than the Z distribution, it is still symmetrical
and extends from negative to positive infinity. The units along the x-axis are measured in
standard errors of the mean and the y-axis indicates the probability density of a particular t
statistic. Consequently, the t distribution can be used to test the same types of hypotheses as
the Z distribution and, in fact, must be used if sample sizes are small and the population
standard deviation is unknown.
Confidence Intervals
Sample statistics such as the mean or standard deviation are estimates of population
parameters. These estimates are required because it is often impossible to measure all of the
individuals in a population and the true values of the parameters will remain unknown. This
raises the question: How good are these estimates of the parameters? A commonly used
measure of the reliability of a sample statistic is the confidence interval. One of the most
commonly used measures is the confidence interval of the sample mean,
.
We know that the means of random samples taken from a normal distribution are themselves
normally distributed. Thus, 95% of values of will fall between  - 1.96/n and  +
1.96/n (s is the standard deviation of  , and n is the sample size). The true standard
deviation of  is usually unknown, but the estimate s can be used in place of : 95% of
values of will fall between  - t0.05s/ n and  + t 0.05s/n. This statement can be
rearranged to show that in 95% of the samples  will be bracketed by - t 0.05s/n and by
+ t0.05s/ n. This interval is referred to as the 95% confidence interval.
In general, confidence intervals provide us with a measure of the reliability of our parameter
estimates. They are, however, not a statistical test. They work with the same information but
should not be used for that purpose if there is an available statistical test.
Page 40
Biology 300
Using the Program
In order to carry out t and Z tests or calculate confidence intervals we simply choose a
continuous variable and analyze the distribution of Y. The 95% confidence intervals for
the mean are displayed on both the quantile and outlier boxplots as a diamond shape, with
the mean being the midpoint of the diamond. The values for the upper and lower limits to the
interval are shown in the moments table. While confidence intervals can be hand calculated
for any  level, the JMPin program will only calculate 95% intervals.
To carry out t and Z tests once the histograms and boxplots are displayed, click on the button
to the right of the variable name and choose test mean = value. This opens a sub-menu
where you can:
a) carry out a t test by entering a hypothetical mean value
b) carry out a Z test if you also include the parametric standard deviation
c) carry out a Wilcoxon test if you believe that your data is from a non-normal
population and samples are too small to invoke the central limit theorem. We will learn
more about non-parametric or distribution-free testing in future lab exercises.
Problems
1. In a moose population in northern Ontario, the average weight is 423 kg. A random sample
of 9 moose was taken in western Ontario and the following weights were recorded to
determine if these western moose showed any difference in average weight: 401, 380, 393,
450, 420, 435, 426, 397 and 415 kg.
a) What would our null hypothesis be in this case? What are the two main assumptions
made in testing the null hypothesis of this example? How can we test these assumptions?
b) Test the assumption that you can check.
Biology 300
Page 41
c) Was the sample drawn from a population with the same mean? Show all steps taken in
testing the null hypothesis.
d) Can we invoke the central limit theorem based on our sample size? Why or why not?
Do we need to invoke the central limit theorem for this data set?
e) Is this a one or a two tailed test? Why?
Page 42
Biology 300
f) Carry out the Wilcoxon test, which makes no assumptions about normality. Does your
answer agree with the one provided by the t test? How do the probabilities for the two
tests compare?
2. Milk production data (litres/day) from a small herd of Jersey cows is stored in a data file
called jersey. This file is stored on the shared directory of the server. To access this data
choose the shared directory from the look in dialog box when you open a file then choose
jersey from the list of files.
a) Examine the data. Do they appear to be normally distributed? If not, how would this
affect analyses such as confidence intervals or hypothesis tests that assume normality?
b) What is the 95% confidence interval of the population mean?
c) How does the spread for the confidence interval compare to the spread for the
interquartile range. Is the interquartile range affected by non-normality? Does it provide
Biology 300
Page 43
as much information as the confidence interval?
d) Mean level of milk production in Jersey cows is known to be 3.72 litres/day. Do the
data provide any grounds for thinking that mean milk production in this herd is atypical?
Show all steps taken in testing the null hypothesis.
e) What was the probability of committing a Type I error in your analysis?
f) Suggest one way of reducing the chance of a Type I error. What effect does this have
on Type II errors? Suggest two ways of reducing Type II errors.
Page 44
Biology 300
3. The mean specific activity (at 37°C) of Na+ -K+-ATPase in gills of most freshwater teleost
fishes is known to be 3.33 micromoles of Phosphate per milligram of protein per hour. The
specific activity of this enzyme in the gills of marine fishes is postulated to be higher than in
freshwater fishes due to the greater salinity of their environment. To test this, the specific
activity of Na+ -K+-ATPase in gills of a sample of marine-dwelling hagfish (Eptatretus
stouti) was measured in mmoles of phosphate / mg of protein / hour. The data are stored in a
file called hagfish, also on the S drive. (Unless otherwise mentioned all data files for future
labs will be on the s drive).
a) Examine the data. Do they appear normally distributed?
b) Is the specific activity of Na+-K +-ATPase in gills of the hagfish greater than in gills of
most freshwater teleost fishes. Show all steps taken in testing the null hypothesis.
c) By examining the values in your textbooks' statistical tables, convince yourself that the
t-distribution is identical to the Z-distribution if sample size is infinite. You can do this by
finding a t value for infinite degrees of freedom and any alpha and comparing it to the Z
Biology 300
value for the equivalent probability.
Page 45
Download