Simple Statistical Analyses There are very few simple Yes or No questions in biology. Most biological questions are answered by gathering quantitative data (rainfall rates, Potassium influx rates, area of an animal’s home range, etc.) and making INFERENCES from this information. When you begin working with quantitative data you come in contact with at least three sources of potential errors: 1) BIOLOGICAL VARIABILITY (no two individual animals are exactly alike) and 2) RANDOM EVENTS (or chance occurrences). 3) ERRORS IN MEASUREMENT (misreading instruments, mistakes in note taking etc.) How do you determine if your quantitative measurements are accurately describing the “real” biological situation you are studying? The basic theme of this laboratory is the use of several simple statistical tests to first, extract worthwhile information about a POPULATION when you only have data from a small portion (SAMPLE) of the entire population; and secondly, to determine whether you can say with some degree of confidence that two samples come from different (or similar) populations. This question is very frequently encountered in many branches of science. A POPULATION is considered here to be some set of items, which are all similar and which reside within a definable boundary. It’s up to the investigator to decide what “population” she is going to work with. Examples of defined populations include, all residents of Kāne‘ohe between the ages of 30 and 60; all pineapple plants in a 5 acre plots the heights of all trees on a particular site; the monthly rainfall values at UH over the last 10 years, etc. Note that individuals or measurements on individuals may compose a population. Since it is usually impractical, or even impossible, to measure all of the items under study in a population, (i.e. a CENSUS) you generally try to describe the characteristics of a population by taking measurements of a portion of the population. The portion is our SAMPLE. The methods used to obtain unbiased samples will be extensively dealt with below, but for the moment you need not worry about how you get your sample, but as you are sampling in this exercise think of the possible sources of bias. Once the sample is obtained, you will estimate the value of PARAMETERS, which describe the population and assign some CONFIDENCE LEVEL to these descriptions. Then you will compare your population estimates to determine if they are different. II. Making the Measurements In the lab you will be provided with several bags containing Koa Haole (Leucaena leucocephala (Lam. de Wit) seedpods. (See: http://www.hear.org/starr/hiplants/images/thumbnails/html/leucaena_leucocephala.htm for images of this plant) Two of the bags of seedpods were collected from one tree, while the third bag of seedpods came from a different tree. The goal of today’s exercise is to determine which 1 two bags came from the same tree and which bag came from a different tree. To do this you will make measurements and observations of the seedpods and analyze the data you collect. BUT there are far too many seedpods in each bag for you to measure them all (i.e. do a census) so you will have to work with only some seedpods from each bag – which ones?? Remember, for the time being you are ignoring the manner in which the seedpods were collected from the trees; but this is certainly another potential source of bias. In other words the bags of seedpods are samples from the two trees – one bag of seed pods from one tree and two bags from the other tree. A second goal of today’s exercise will be to determine how can you obtain an unbiased sample of seedpods to measure from the population of seedpods in each bag? Each bag will be coded with a letter and you will collect and analyze samples from all three bags. Your first chore will be to determine how to get an unbiased sample from the bag. This is not an easy task. Your group should discuss this among yourselves and try to 1) think about possible sources of bias in getting the sample 2) devise ways to avoid or reduce each of these sources of bias. Your next chore will be to determine what the sample size should be. Obviously the total number of seedpods in the bag is too large for you to be able to measure them all (also remember the contents of each bag are themselves a sample [of the tree]). On the other hand too small a sample may not give you a good estimate of the population parameters. Can you devise an objective mechanism or procedure to tell you what an appropriate sample size should be? What information would you need to know before you can answer that question? Before you begin, check with the TA or teacher to go over your sampling protocol. There are many traits you could determine for the seedpods including chemical and physical factors. For the purposes of this exercise you will limit the traits measured to two, one continuous (seedpod length) and one discrete (number of seeds per pod). Continuous variables are those where any number of values are possible along a spectrum i.e. lengths, weights, hemoglobin levels etc. Discrete variables are for traits than can be counted i.e. number of bristles on a fly’s foreleg, number of eggs laid by a frog etc. [However think about this. If you get a mean density, say the number of coral heads per square meter is that continuous or discrete?] Measure the length of each pod using metric values (millimeters or centimeters) and also record the number of seeds in that pod. Record all pertinent data in your lab notebook. III. Sample Population Statistics It is always important to get a “feel” for your data before you spend much time (and money) in more complex analyses. Simply looking at the data (as in a list of measurements) isn’t likely to be much help, so graphic visualizations are used. A very basic type of graphic visualization is the FREQUENCY HISTOGRAM. To make a frequency histogram you need 2 to group the measurement data into discrete intervals. This is more straightforward for discrete variables (such as number of seeds per pod) but may require some thinking for continuous variables (such as pod length). The choice of interval is up to you but there is usually some optimum number of intervals that maximizes the visual information available. This, of course, may be different in different circumstances. But remember if you want to make comparisons between data sets you may want to use the same X axis for all of them. Tabulate your data values in the interval categories to get that the numbers of measurements (pods) in each category. These are then plotted as histogram with the interval along the horizontal (X) axis and the frequency along the vertical (Y) axis. For many biological samples, the resulting frequency distribution is often “bell-shaped”. This bell-shaped curve is informally referred to as a NORMAL DISTRIBUTION (The real definition of a normal distribution is given by the relationship between two parameters [which we will discuss below]). Why should most measurements on biological material be distributed in such a manner, with most of the measurements clustering about an “average” value, but with a few extreme values on either side? The answer lies partly in the genetic variability, which is inherent in all biological material. No two individuals are genetically exactly alike. While closely related individuals have many of the same genes in common, slight genetic differences do exist. Thus, if the fur color of rabbits has a genetic basis, then a good camouflage color like brown will be the most common color in the population. However, a few rabbits will carry genes for both lighter and darker coat color, especially if fur color is under the control of several genes. But you also see normal distributions in situations where genetics – or even biology – play no part so there is more (lots more) to this tendency for values to be distributed this way. One of the nice things about the normal distribution is that it is symmetrical, so that any particular normal distribution can be completely specified (or described) if you know just two parameters, the central value - the MEAN (or average); and the distance from the mean to the point of inflection - the STANDARD DEVIATION. You are very familiar with the concept of a mean or an average. The standard deviation is basically a measure of how broadly the values are scattered around the mean. Statistical analyses dealing with normal (also called parametric) distributions rely heavily on these two parameters - mean and standard deviation (or its square - the variance). The majority of today’s exercise will be devoted to the methods of calculating these statistics and seeing what can be done with them. But you must always remember that parametric statistics should only be used when the distribution of data is known to be normal. (See Ch.6 in Sokal & Rolf or http://mathworld.wolfram.com/NormalDistribution.html and http://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.htmlfor a more extensive discussion of the normal distribution. 3 IV. Calculation of sample Mean and Standard Deviation: If the distribution of sample values is normal, the sample mean and sample standard deviation may be calculated from the sample measurements by following the procedures given below. 1. The mean X̄ = Xi /n where X̄ Xi n is the symbol for the mean is the ith individual measurement is the total number of measurements (the sample size) is a symbol, which means “the sum of all measurements from i = 1 to i = n” 2. The standard deviation S= { (Xi - X̄)2 / (n-1)} or a similar formula obtained through algebra, which is more convenient when using calculators, S= { Xi2 –[(X1)2/n] / (n-1) where s is the symbol for the standard deviation and the rest of the notation is the same as given for the mean. Many pocket calculators are programmed to calculate the mean and standard deviation directly from your data but you should study these formulae to get an understanding of the meaning of these statistics. These basic sample descriptive parameters are also easily determined directly on spreadsheet applications such as MS Excel®. Other statistics (e.g. mode, median, range, etc.) also have their uses but you will use them less in this lab. Because many applications make calculating statistical parameters so quick and easy it is important to really know what you think you are doing before hitting the Ғx (function) key. V. Comparing Two Populations In almost all cases where you compare data obtained from two samples they will be different, even if they are from the same population. Imagine you flip a coin 10 times and keep a record of the results (e.g. H T T H T etc.) its unlikely (and more and more unlikely as you flip 4 the coin more and more times) that you will get the same sequence of heads and tails in two sets of trials. BUT what about the mean number of heads? If it is an honest coin you expect that on average you will get about 50% heads in a sample of appropriate size (and exactly 50% heads in a sample of infinite size). But in two sets of 10 coin tosses are you going to always get 5 heads and 5 tails even if it is an honest coin with no bias in the tossing? There are statistical tests to help you determine whether two samples that are somewhat different may in fact have come from the same population (in this example the population of heads and tails available from this honest coin; in this lab two samples of seed pods). An assumption of the tests you will use in this exercise is that the underlying distributions are normal. These parametric tests are based on asking the question “What are the chances that two unbiased samples taken from the same population will differ by the amount that my two samples differ?” Note there are actually two populations under consideration here; the real world population that you took your samples from, and an ideal population with certain known features. The tables you will use (or can find in any statistics book) are based (at least theoretically) on drawing many samples of a given size (n) from this ideal population and determining how much difference is found. That is (in theory), the table-maker takes thousands of samples of size 2, thousands of size three, thousands of size four, and so on and so on, and gets the distribution of the parameters (say mean and standard deviation) of each set of 1000 samples. Since the table-maker knows that the samples were drawn from the same population, the differences she finds are those expected when two (or more) samples are drawn from what is really the same source population. As you would expect most of the pairwise differences are small (e.g. for most pairs of means the differences are not very big) but out of 1000 samples you would be sure to find a couple of means that were pretty different by chance alone. The tables you will use are constructed such that for two samples (so far you are dealing with pairwise tests, but all this stuff can be generalized to more than two samples) of a specific size (n), drawn from the same population, you can look up the chances (probability) that the differences between your two samples from the real world are bigger than you would expect if they had been drawn from the same population. So to enter the table you need to have some statistic (a single number) that compares your two populations (in this exercise you will use one ratio and one difference) the sample size [or actually the degree of freedom (in our case df = n-1)] and the level of probability that will make you happy. What does this happiness depend on? You’ll think back to the ideal population and the thousands of samples (or pairs of samples with their comparative statistic). As you saw most sample parameters will be similar (so the ratio will be close to 1 if you are looking at ratios, or the difference will be close to zero if you are looking at differences), but in a few cases the numbers will be pretty big. In 1000 pairs of samples from the same population there will be a few with a really big difference just by chance. With a 1000 sample pairs you can count how many are greater than some value (what you will come to call the significance level – though significant to who or what is never really clear). You will find that in 1000 samples only100 have a difference (or ratio) greater than some value. So then you can say: “In only 10% (100 out of 1000) of cases of pairs of samples drawn from the same population will the differences (or ratios) be bigger than this”. Of course you can do this for 5% 1% .01% or whatever. {Don’t worry about the poor table maker all this is done by algorithms on computers today} 5 So the end result of all this is that you can decide on some chance of being wrong (That is saying the difference (or ratio) you found is due to chance when it actually isn’t. For your edification this acceptance of a false null hypothesis is called a type II error) that you are willing to accept then set that value as your level of rejection of the null hypothesis (that there is really no difference). To get this measure of the difference between the estimates of the population measurements for two samples in this exercise, you will perform two statistical tests. This will allow you to objectively state (with a certain degree of confidence, say 95% sure) that you think that your two samples are from one population or two. The first test indicates whether the differences in variability among measurements in your two samples (as measured by the standard deviation – or actually the variance) are similar. If they are not (i.e., there is a different amount of variability in the two samples (the shapes of the two distributions differ) then you are reasonably sure that you have samples from two populations (Why?) and need not carry out the second part of the test. The sample standard deviations are compared arbitrarily by assigning the larger standard deviation to s1 and the smaller to s2. The F value (which is always equal to or greater than 1.0) is calculated as [Note here that you are actually using the variance which is defined as the standard deviation squared] F = s21 / s22 The next thing you need to know is the degrees of freedom for each sample. These are calculated as df1 = n1 – 1 df2 = n2 – 1 You use these values (which are really correction factors which compensate for the fact that you are using samples rather than the whole population) in Table 2 to obtain the listed F value. Remember if the two variances are equal the ratio is 1. If they are very similar, the ratio is not very far from 1. If the F value (the ratio of the two variances) which you calculated, is greater than the value listed in Table 2, you are 95% confidant that the two samples are from different populations (if you use the 95% table of course). If this is found to be the case with your comparison you need not make any further tests and can state you are 95% confident your samples came from two different populations. Think about this means – If your samples have a normal distribution and their standard deviations are so different that you get a large F value (for your sample sizes). You may state with a one in twenty chance of being wrong that the two samples come from different populations. If the 95% confidence level is not high enough for you, there is F table available for 99% and even 99.9% confidence! A calculated F value, which is less, than that taken from the F table indicates there is no evidence that the samples are from different populations based on the shapes of the distributions. Note – a small F value does NOT mean the samples ARE from the same population, only that the shapes of the frequency distributions of the samples do not permit you to feel confident that they are different. Look carefully at the F table noting how the values change with sample size and with the selected confidence level. Tables with this statistic can be found in statistics books or online. The site:http://www.stat.lsu.edu/EXSTWeb/statlab/Tables/TABLES98-F-pg2.html 6 gives an easy to use version. Sometimes available tables do not have the degrees of freedom listed that you want. If you look at the values in the table you can tell right away if using a df that is close to yours will make a big difference of not. If you do this choose the more conservative value. If you want to get a better approximation you can interpolate (http://www.acted.co.uk/forums/showthread.php?t=657) When the F test does not suggest the two samples are from different populations you will then test the difference between the means. To test whether two sample means are estimators of the same population, you perform a t-test. The calculation is t = where Y1 is the mean of sample 1 (ditto for sample 2) n1 and sp where s1 2 Y1 – Y2 sp * ( 1/n1 + 1/n2)1/2 is the size of sample 1 (ditto for sample 2) = ({(df1 * s12) + (df2 * s22)} / (df1 + df2) )1/2 is the square of the standard deviation and is referred to as the variance of the values. Note that all the t value is the difference between your two sample means, corrected for variability in the samples and the sample size. Look in the t-table under the degrees of freedom given by df1 + df2. If the t value which you calculated is greater (i.e. there is a big difference between the two means) than the value listed in Table 1, you can be 95% confident (assuming you are looking in the 95% section) that the means of the two samples are significantly different. If your calculated t value is smaller than the value from the t-table, you have no evidence that the samples are significantly different. If your calculated t values are smaller than the value from the t-table, you have no evidence that the samples come from different populations. Tables for critical values of the t-statistic can also be easily found : http://davidmlane.com/hyperstat/t_table.html You could still be wrong (in fact you will be wrong 1 time out of every 20 tries on average if you use the 95% confidence level). This is called a type II error - you accept the null hypothesis (of no difference) when in fact they samples come from different populations. A type I error is the opposite – you incorrectly reject the null hypothesis when in fact the two samples did come from the same population. How will selection of a different confidence level affect the chances of a type I or a type II error? THINK about this. 7 VI. The chi-square test If you are working with counts (discrete measures) rather than continuous measurements (e.g., length, weight, etc.), you can use the Chi-square test to test for deviations from some EXPECTED result. For example, modern genetic theory tells us that the sex ratio of most animal populations should be 50% males and 50% females. In a sample of finite size (such as the students in this class) a departure from this 50:50 ratio is not surprising. If the sample is big enough and the departure from 50:50 is great enough then you might suspect something is skewing the sex ratio in the population you sampled. A significant departure from this 50:50 ratio (your expected result) when you sample a population indicates that you need to look into the reasons behind such a large deviation from the expected result. Possible explanations for differing sex ratio include high mortality upon one sex, sex reversal, or an anomalous genetic system. VII. Assignments There will be three bags of seedpods, labeled A, B, C. Two bags will contain seedpods from the same population (a single tree in this case), and the other will have pods from a different population (tree). 1. Divide up into groups of two – three persons each. 2. Take one sample from each of the three boxes. There are two questions you must ask before you begin (These two questions will be central to the rest of your semester’s work in this course!) a) How big a sample will you need to take to be able to have some assurance that your data will be sufficient? You have seen how the sample size (actually the degree of freedom) determines the outcome of your statistical tests so how big a sample will you need to be able to test your data? b) How do you obtain your sample? That is, assuming you will not use all the seedpods in the bag, how will you choose which seed pods you will measure? Do not begin sampling, or measuring until you have worked out your sample size and your sampling technique!! 3. Measure the lengths of each pod and also record the number of seeds per pod, for each of the three samples. 4. Plot the frequency histogram for each sample. 5. Compare all three samples using the F-test. 6. Where appropriate do the t-test. Which of the three samples is from a different population? How do you know? Are there any biological clues, which also aid in your decision? 7. You may be given a data set for which you can try out the Chi square test. If so, calculate 2 values for that data set 8 8. From the current ecological literature obtain data suitable for 2 analysis where the author hasn’t done so. Perform the analysis and discuss your results. 9. Hirth (Ecological Monographs 33:83-112, 1963) studied two species of lizards inhabiting a beach in Costa Rica. Each month for four months he captured and sexed all the juveniles and adults of each species. The data for one species is listed below. The procedure for using the Chi-square test for the June adult sample is as follows: Chi-square (x2) = (observed – expected)2 expected The expected values for both males and females is 38 + 58 = 48 2 2 2 Therefore x = (38 – 48) + (58 – 48)2 = 4.17 48 48 Compare this value with that given in a chi square table (ex. http://people.richland.edu/james/lecture/m170/tbl-chi.html) for 1 degree of freedom. The value 4.17 is larger than 3.84, so you can be 95% confident that the June samples deviates significantly from the expected 50:50 sex ratio. (The degrees of freedom for the X2 test are calculated as the number of groups minus 1. You had two groups, male and female, so our df is 2 – 1 = 1). Calculate the X2 value for each of the months and also for the total, for both juveniles and adult data separately. Which are significantly different? Does the statistical test tell you why they are different? Another example of the use of the Chi-square test is shown in Krebs on page 375-379. 9