Comparing Populations Objectives: To understand how to test a hypothesis To be introduced to the concept of chance events and probability To learn how to calculate a Student’s t-test To understand how to interpret the results of a Student’s t-test Introduction: Ecologists often want to compare two populations to see if they are different. For example, you might be curious about whether the number of stoma (pores on the surface of a leaf) differ on trees that grow on north facing slopes versus trees that grow on south facing slopes. You might want to compare populations that have been exposed to different treatments in a manipulative study. For example, do two groups of bluegill fed different diets (soft-bodied insects like midges versus hardshelled snails) grow at different rates? There are a number of statistical tests that can be used to analyze your data and to answer these types of questions. Today we will look at how to use a Two-Sample Student’s t-test. Testing hypotheses: Two-sample Student’s t-test Knowing that measurements taken from subjects in sample populations are variable and that chance events can occur that might lead to false conclusions, how can we be sure that two populations are different? In this example, you will see how the student’s t-test can be used to see whether two populations are really different. Observation: Your neighbor uses fertilizer on his lawn, but you do not. Question: How does fertilizer affect lawns? Hypothesis: Fertilizers affect the growth rate of grass. Prediction: The mean grass height in your neighbor’s lawn ( x 1) will be higher than the mean grass height in your lawn ( x 2) one week after mowing. In other words x 1 ≠ x 2 Table 1. Grass heights resulting from different fertilizer treatments. Grass Height (cm) No Fertilizer Treatment Grass Height (cm) Fertilizer Treatment 3.86 4.76 3.30 3.52 3.88 4.32 5.86 5.08 4.76 4.38 Average = 4.37 4.47 5.95 4.98 7.22 3.95 5.24 6.25 4.58 6.05 5.88 Average = 5.46 Testing hypotheses: Two-Sample Student’s t-test, cont. Just from looking at the data you would probably have difficulty saying with any certainty whether your neighbor’s grass grows at a different rate than yours. Based on the means of each sample set ( x 1 = 4.37cm and x 2 = 5.46cm) you might conclude that the mean grass heights in the two lawns are different, but the grass heights are relatively variable. What if you had sampled a different set of 25 blades from each lawn? Would the means still be different? Student’s t-test can be used to account for variability, reducing the likelihood of falsely concluding that the populations are different and providing a more objective way of determining whether a difference does exist. It is most appropriate to use the two –sample student’s t-test to analyze continuous data (data can be measured with a ruler and can be broken into smaller parts and still remain meaningful, i.e. time, money, temperature, size). If you were comparing count data (discrete and non continuous data, i.e. number of birds at each site), you would use the Chi Square goodness of fit test. To run the t-test we need to calculate a t-value referred to as tcalc fo our data. The tcalc for the lawn data is calculated below: t calc = s 2 1 = s x1 - x 2 1.09 4.37 - 5.46 = 2.744 = = æ 0.59984 ö æ 0.97734 ö 0.397186 æ s21 ö æ s2 2 ö ÷ ÷ +ç ç ç ÷ -ç ÷ è 10 ø è 10 ø è n1 ø è n 2 ø å( x = 1 - x1 ) 2 n1 -1 (3.86 - 4.37) 2 + (4.76 - 4.37) 2 + ... + (4.38 - 4.37) 2 = 0.59984 10 -1 2 2 å( x = 2 - x2 ) 2 n 2 -1 (4.47 - 5.46) 2 + (5.95 - 5.46) 2 + ... + (5.88 - 5.46) 2 = 0.977734 = 10 -1 df = n1 + n 2 - 2 = 10 +10 - 2 = 18 Testing hypotheses: Two-Sample Student’s t-test 0.1 0.05 0.025 df 1 3.08 6.31 12.71 Now that the tcalc has been calculated, how is it 2 1.89 2.92 4.31 used? 3 1.64 2.35 3.18 4 1.53 2.13 2.78 To reach a conclusion, we need to use a table 5 1.48 1.02 2.57 and find our tcalc value in there. The table is 6 1.44 1.94 2.45 organized with rows being the degrees of 7 1.42 1.90 2.36 freedom (df). To use the correct row, you need 8 1.40 1.86 2.31 to calculate your own df. The degrees of freedom 9 1.38 1.83 2.26 for the student’s t-test are calculated from your 10 1.37 1.81 2.23 sample sizes (N1 and N2) as 15 1.34 1.75 2.13 18 1.33 1.73 2.10 df = N1+N2 -2 20 1.33 1.73 2.09 30 1.31 1.70 2.04 Once you’ve located the row corresponding to 40 1.30 1.68 2.02 your df, find one this row the value the closest to 1.67 2.00 your tcalc. Then look at the head of the column, 60 1.30 120 1.29 1.66 1.98 and find what value is written on top of that column. This value is a probability (p), therefore referred to as a p-value. 0.01 31.82 6.97 4.54 3.75 3.37 3.14 3.00 2.90 2.82 2.76 2.60 2.55 2.53 2.46 2.42 2.39 2.36 This p-value corresponds to the probability that our conclusion is false (i.e., the probability we have a false-positive situation or a “fluke”). This p-value is what is mostly reported in scientific studies and what is used to evaluate how reliable conclusions are. Having a p-value = 0.05 menas that we have a 5% chance that our conclusion (that the two populations are different) is actually false. This 0.05 is the threshold level considered an appropriate margin of errors for most scientific studies. The smaller the probability, or p-value, the more confident you are that you do not have a false-positive conclusion. Therefore, this p-value enables you to conclude whether you reject or not your hypothesis. If p≤ 0.05, then we don’t reject our biological hypothesis (in this case we do not reject the hypothesis that fertilizer affects the growth rate of grass because we see a statistically positive effect of fertilizer on grass height). This means that there is less than 5% of chance that any difference we are observing in the two populations is random. For this t-test, this means that the two populations samples are statistically different from one another. If p>0.05, then we reject our biological hypothesis (in this case we reject the hypothesis that fertilizer affects the growth rate of grass because we do not see a statistically positive effect of fertilizer on grass height). This means that there is much variation in our sample and it is likely that differences observed between two populations are caused by chance. For this t-test, this means that the two populations sampled are NOT statistically different from one another. Another way to determine if your data are significantly different from the statistical null hypothesis (or no significant difference between the two sample populations) is to look at the t value that you calculated. If the calculated t value is greater than the value given by the Student’s t distribution for the proper significance level and degrees of freedom, also called the critical value, then the statistical null hypothesis (no significant difference between the two samples) can be rejected and we can conclude that the means of the two populations are significantly different. If the calculated t value is less than the critical t value, then the data fail to reject the null hypothesis and you can conclude that there is no statistical difference between the means of the two populations. Let’s go through our example: We have found previously that tcalc=2.744 with df=18. On that row of df of 18, the closest to our tcalc is located in the column α = 0.01. So this means that our p-value = 0.01. Because this number is below 0.05, we can rely on our conclusion quite well. The probability that the conclusion we obtained is false is extremely small (1%). Therefore, we can conclude that the two lawns were different, meaning that our hypothesis is supported by our study, and that fertilizers have indeed a positive effect on the height of grass. Conclusions: Statistics: Decision: Statistical interpretation: Biological Interpretation: tcalc = 2.744, p=0.01 Do not reject the hypothesis The two lawns are statistically different from one another. The use of fertilizer causes grass in the two lawns to grow at different rates. Activity Together as a group, come up with a testable question that would allow you to compare two populations (based on sex, hair color, diet, year in college, etc.) and to collect data on a measurable trait (height, time to complete a task, finger length, etc.). Clearly outline your question, hypothesis, and prediction, make sure that everyone will measure the trait in the same way and then go and collect as much data as you can in one hour. When you return, we will analyze the data, graph the data and make a conclusion based on what was found. At the end of lab hand in your Question, Hypothesis, Prediction, Results, Graph and Conclusion.