Page 38 Biology 300 Lab Exercise # 5 7. ONE-SAMPLE INFERENCE FOR A NORMAL POPULATION t Tests In a previous exercise, we demonstrated the central limit theorem. One of the consequences of this theorem, and in fact of the normal distribution itself, is that a large number of samples taken at random from a normal population will produce a distribution of sample means, , that is also normally distributed. We can convert values (means) from this distribution of sample means to generate the standard normal distribution, Z. We subtract the parametric mean of the population from each sample mean, and divide the result by the standard error of the mean. All values from the standard normal distribution are in units of standard deviation regardless of the original units of measurement (g, cm, ml, etc.). As with all continuous distributions, the area under the standard normal curve equals 1.0. We must know both the population mean and standard deviation to convert data to the Zdistribution. If we don't know the population standard deviation, however, we may estimate it from the sample standard deviation. As we also showed previously, the sample standard deviation will provide a good estimate of the population standard deviation if n is sufficiently large. Consequently, the sample standard error (i.e. s/n) will be a good estimate of the standard error of the population mean if n is sufficiently large. Transformation of each sample mean by subtracting the parametric mean and dividing by the sample standard error produces the equation: This equation will result in a distribution of values that is very similar to the Z-distribution when n is large. On the other hand, if n is small, the distribution of transformed values will be wider and flatter than the Z-distribution. A distribution of this sort is referred to as a t distribution. It may be convenient to think of the t distribution as a normal distribution that is corrected for sample size. The t-distribution has different shapes for different degrees of freedom (DF). Since one parameter estimate (the sample mean) is required to calculate s, the t statistic has n-1 degrees of freedom. As n gets smaller, the t distribution becomes wider and flatter, and as n approaches infinity, the t distribution becomes more similar to the Z-distribution. Biology 300 Page 39 Although the t distribution has a different shape than the Z distribution, it is still symmetrical and extends from negative to positive infinity. The units along the x-axis are measured in standard errors of the mean and the y-axis indicates the probability density of a particular t statistic. Consequently, the t distribution can be used to test the same types of hypotheses as the Z distribution and, in fact, must be used if sample sizes are small and the population standard deviation is unknown. Confidence Intervals Sample statistics such as the mean or standard deviation are estimates of population parameters. These estimates are required because it is often impossible to measure all of the individuals in a population and the true values of the parameters will remain unknown. This raises the question: How good are these estimates of the parameters? A commonly used measure of the reliability of a sample statistic is the confidence interval. One of the most commonly used measures is the confidence interval of the sample mean, . We know that the means of random samples taken from a normal distribution are themselves normally distributed. Thus, 95% of values of will fall between - 1.96/n and + 1.96/n (s is the standard deviation of , and n is the sample size). The true standard deviation of is usually unknown, but the estimate s can be used in place of : 95% of values of will fall between - t0.05s/ n and + t 0.05s/n. This statement can be rearranged to show that in 95% of the samples will be bracketed by - t 0.05s/n and by + t0.05s/ n. This interval is referred to as the 95% confidence interval. In general, confidence intervals provide us with a measure of the reliability of our parameter estimates. They are, however, not a statistical test. They work with the same information but should not be used for that purpose if there is an available statistical test. Page 40 Biology 300 Using the Program In order to carry out t and Z tests or calculate confidence intervals we simply choose a continuous variable and analyze the distribution of Y. The 95% confidence intervals for the mean are displayed on both the quantile and outlier boxplots as a diamond shape, with the mean being the midpoint of the diamond. The values for the upper and lower limits to the interval are shown in the moments table. While confidence intervals can be hand calculated for any level, the JMPin program will only calculate 95% intervals. To carry out t and Z tests once the histograms and boxplots are displayed, click on the button to the right of the variable name and choose test mean = value. This opens a sub-menu where you can: a) carry out a t test by entering a hypothetical mean value b) carry out a Z test if you also include the parametric standard deviation c) carry out a Wilcoxon test if you believe that your data is from a non-normal population and samples are too small to invoke the central limit theorem. We will learn more about non-parametric or distribution-free testing in future lab exercises. Problems 1. In a moose population in northern Ontario, the average weight is 423 kg. A random sample of 9 moose was taken in western Ontario and the following weights were recorded to determine if these western moose showed any difference in average weight: 401, 380, 393, 450, 420, 435, 426, 397 and 415 kg. a) What would our null hypothesis be in this case? What are the two main assumptions made in testing the null hypothesis of this example? How can we test these assumptions? b) Test the assumption that you can check. Biology 300 Page 41 c) Was the sample drawn from a population with the same mean? Show all steps taken in testing the null hypothesis. d) Can we invoke the central limit theorem based on our sample size? Why or why not? Do we need to invoke the central limit theorem for this data set? e) Is this a one or a two tailed test? Why? Page 42 Biology 300 f) Carry out the Wilcoxon test, which makes no assumptions about normality. Does your answer agree with the one provided by the t test? How do the probabilities for the two tests compare? 2. Milk production data (litres/day) from a small herd of Jersey cows is stored in a data file called jersey. This file is stored on the shared directory of the server. To access this data choose the shared directory from the look in dialog box when you open a file then choose jersey from the list of files. a) Examine the data. Do they appear to be normally distributed? If not, how would this affect analyses such as confidence intervals or hypothesis tests that assume normality? b) What is the 95% confidence interval of the population mean? c) How does the spread for the confidence interval compare to the spread for the interquartile range. Is the interquartile range affected by non-normality? Does it provide Biology 300 Page 43 as much information as the confidence interval? d) Mean level of milk production in Jersey cows is known to be 3.72 litres/day. Do the data provide any grounds for thinking that mean milk production in this herd is atypical? Show all steps taken in testing the null hypothesis. e) What was the probability of committing a Type I error in your analysis? f) Suggest one way of reducing the chance of a Type I error. What effect does this have on Type II errors? Suggest two ways of reducing Type II errors. Page 44 Biology 300 3. The mean specific activity (at 37°C) of Na+ -K+-ATPase in gills of most freshwater teleost fishes is known to be 3.33 micromoles of Phosphate per milligram of protein per hour. The specific activity of this enzyme in the gills of marine fishes is postulated to be higher than in freshwater fishes due to the greater salinity of their environment. To test this, the specific activity of Na+ -K+-ATPase in gills of a sample of marine-dwelling hagfish (Eptatretus stouti) was measured in mmoles of phosphate / mg of protein / hour. The data are stored in a file called hagfish, also on the S drive. (Unless otherwise mentioned all data files for future labs will be on the s drive). a) Examine the data. Do they appear normally distributed? b) Is the specific activity of Na+-K +-ATPase in gills of the hagfish greater than in gills of most freshwater teleost fishes. Show all steps taken in testing the null hypothesis. c) By examining the values in your textbooks' statistical tables, convince yourself that the t-distribution is identical to the Z-distribution if sample size is infinite. You can do this by finding a t value for infinite degrees of freedom and any alpha and comparing it to the Z Biology 300 value for the equivalent probability. Page 45