Fieldston Biology: Introduction to Statistics Name: _____________________________ Date:___________ FIELDSTON BIOLOGY Introduction to Statistics In science experiments we often have to compare measurements from two different treatments and decide if the independent variable has a real effect on what we are measuring (the dependent variable). In other words, are the results/difference significant? How can we do that? In the scientific community we are not allowed to just say, “Well clearly A is bigger than B!” Even though the results of an experiment are sometimes obvious, we have to have some kind of separate measure to indicate to anyone who looks at our results that there really is a difference. How do we get a handle on the difference between measurements in an experiment? We use STATISTICS. There are many statistical tools we can use when comparing data. In this worksheet we will look at: I. II. III. Mean, Median, Mode (and Rounding) Absolute average deviations of a sample population T Test / p value (comparing two means to see if they are significantly different) Page 1 of 29 Fieldston Biology: Introduction to Statistics Name: _____________________________ Date:___________ ACORN STUDY Let’s say we are ecologists and we are studying oak trees. Let’s say you have a hypothesis (testable guess) that the acorns from Red Oak trees (Quercus rubrum) are heavier than the acorns that come from the White Oak species (Quercus alba). How will you test your hypothesis? You know you are going to mass acorns. Will one acorn of each species suffice? It’s pretty clear to us that weighing one acorn of each species is not going to be a strong enough test of our hypothesis. OK so let’s say you collect 21 acorns of each species, mass them and compare the data. They might look like this: (Note: the first thing we did was order the measurements from lowest to highest mass for each species) Acorn # White Oak Red Oak 1 2.6 2 3.26 3 3.40 4 3.5 5 3.54 6 3.59 7 3.6 8 3.67 9 3.7 10 3.71 11 3.81 12 3.9 13 3.94 14 4.03 15 4.1 16 4.10 17 4.13 18 4.29 19 4.31 20 5.2 21 6.01 Figure 1. Data Set 1: Acorn Masses (grams) Page 2 of 29 2.8 2.86 2.87 3.0 3.17 3.56 3.68 3.71 4.0 4.03 4.05 4.11 4.59 4.98 4.9 4.90 4.97 5.06 5.3 5.42 5.5 Fieldston Biology: Introduction to Statistics Name: _____________________________ Date:___________ I. MEAN, MEDIAN, MODE Mean, median and mode are three of the most basic concepts in statistics. Each piece of statistical analysis tells us something about the data we collected, in this case the comparison between White and Red Oak Acorns. Exercise 1. Find the Mean, Median and Mode for the White and Red Oak Acorns. For both Mean and Median, use the Acorn Data Excel File – USE “ACORN DATA SET 1” sheet. See Appendix for instructions on how to calculate Mean and Median on Excel. ROUNDING: How many decimal places do you think you need to round the calculated data? (See Appendix for instructions on how to ROUND using Excel.) a. Based on the Mean, what conclusions, if any, can you make about the difference in mass between White and Red Oak Acorn? b. When you factor in (include) Median, what more does this piece of information tell you? (hint: distribution of the data) c. What is helpful about each of the three statistical tools: mean, median, mode? d. What are the limitations for each function? Page 3 of 29 Fieldston Biology: Introduction to Statistics Name: _____________________________ Date:___________ The median is a valuable quick check against the mean because two sets of numbers can have the same mean but very different spread, or range, in value. II. ABSOLUTE AVERAGE DEVIATION When there’s quite a range of values in a sample it’s helpful to look at the absolute average deviation (to be defined later). Exercise 2. Let’s take a break from the Acorn Study and look at another example - Student Test Scores: STUDENT TEST SCORES Student A 86 86 85 84 84 Student B 96 94 92 73 70 mean median What is the mean? Student A: _____________ Student B: _______________ What is the median? Student A: _____________ Student B: _______________ a. Compare mean and median for both students. What do you notice? What can you conclude? The median for the first student, 85, is right on the mean and confirms that the spread around the mean is not very great. The median for the second student, 92, is seven points off from the mean of 85. If nothing else, the difference between the mean and median indicate that the test scores for the second student are far less consistent than that of the first student. Page 4 of 29 Fieldston Biology: Introduction to Statistics Name: _____________________________ Date:___________ A more direct measure of the “spread” of the values in a sample is the absolute average deviation (similar to standard deviation). Absolute Average Deviation tells you the spread (range) of the numbers around the mean of a sample. In order to calculate the absolute average deviation, you: a. Take the difference between each score relative to the mean b. And then take the absolute value average of all the deviations The value you get represents the spread, or range, of your data. Exercise 3. USING EXCEL – “STUDENT GRADES” sheet: Calculate the Absolute Average Deviation for each student’s test score (see Appendix for Excel instructions). Answer: The Absolute Average Deviation of student A’s grade around .67. The Absolute Average Deviation of student B’s grade is nearly 9. a. What do these two different Absolute Average Deviation values tell you about the range of Student A’s test scores and the range of Student B’s test scores? b. Calculate what % the ABS AVE DEV is relative to the mean (ABS AVE DEV / MEAN * 100): Student A: Student B: c. In your own words, define Absolute Average Deviation. Include how you calculate the value, and what the value tells us about the numbers within the data set: Page 5 of 29 Fieldston Biology: Introduction to Statistics Name: _____________________________ Date:___________ Exercise 4. Let’s go back to the acorn data. USING EXCEL – “ACORN DATA SET 1” sheet: Find Mean, Median and ABS AVE DEV White Oak 3.9 3.81 0.5 Mean Median ABS AVE DEV Red Oak 4.2 4.05 0.8 a. Which has a greater spread around the mean? Calculate both the absolute value of the spread and what percentage the spread is of the mean. Page 6 of 29 Fieldston Biology: Introduction to Statistics Name: _____________________________ Date:___________ In this case the means, medians and ABS AVE DEV’s are comparable to each other for both kinds of acorns. There is a little more spread in the Red Oak data; you can see that the ABS AVE DEV is 0.8 or about 19% of the mean (0.8/4.2), as opposed to the White Oak data in which the ABS AVE DEV, 0.5 is about 13% of the mean (0.5/3.9). But the means are different; can we say that the average mass of Red Oak acorns in this case is greater than the mass of White Oak acorns? III. T TESTS: DETERMINING SIGNIFICANCE____________________________________________ o T test o p value The T test: As seen earlier, the means of the two acorns are different; can we say that the average mass of Red Oak acorns in this case is greater than the mass of White Oak acorns? I know you want to say “yes” but the statisticians say “hold on there!” There is a specific test that tells you the probability that two means are different from each other due to chance. Read the next paragraph carefully and make sure you get it. If two sets of numbers have the same mean, you could imagine that actually they are measured, or “pulled”, from only one big set of numbers. The T test is a statistical test that gives you the probability (p-value) that two samples are pulled from the same big set of numbers. Another way to look at it is that the T test tells you the probability that the difference between two means is due to chance. Rewrite your understanding of p value in your own words, especially your understanding of what it means for the difference being due to chance: Page 7 of 29 Fieldston Biology: Introduction to Statistics Name: _____________________________ Date:___________ Exercise 5. USE EXCEL “ACORN DATA SET 1” sheet, run a t test between the White and Red Oak Acorn Data. Calculate the p value. (See Appendix for instructions on how to run T Tests on Excel. a. What is your p value for DATA SET #1? What does this mean? What does this suggest about our two data sets? It means that there is an approximately 0.34 or 34% chance that any differences between the two means are due to chance. For a scientist that is way too much to accept that the two means are different. As ecologists we have not supported the hypothesis that Red Oak acorns weigh more than White Oak acorns. Exercise 6. Now let’s look at a different set of data for these acorns: DATA #2. USE EXCEL “ACORN DATA SET 2” sheet to find Mean, Median, ABS AVE DEV, % ABS AVE DEV is relative to Mean, and run a t Test. a. When you calculate the p-value for the acorn data, what do you get? In this case the mean mass of Red Oak acorns, 4.5 grams is again greater than the mean mass of White Oak acorns, 3.9 grams. The medians and ABS AVE DEV’s are again comparable. So what can we conclude? Page 8 of 29 Fieldston Biology: Introduction to Statistics Name: _____________________________ Date:___________ Comparing two sets of Acorn Data (Acorn Data Set 1 vs Acorn Data Set 2): Data Set 1. White Oak 3.9 3.81 0.5 12% 0.341613 Mean Median ABS AVE DEV ABS AVE DEV % of mean t Test Red Oak 4.2 4.05 0.8 19% Data Set 2. White Oak 3.9 3.81 0.4 11% 0.041437791 Mean Median ABS AVE DEV ABS AVE DEV % of Mean Red Oak 4.5 4.59 0.8 18% T Test Before running the T test, you might think that these two data sets are very similar in number. However, we can’t conclude anything until we perform a T test. When we do, we get a p-value of 0.04. What does this mean? It means that there is a 0.04 or 4% chance that the difference between the two means is due to chance. This is very different from the first case. What do you think is the different between Data Set A and Data Set B that resulted in the different p value results? In general, when scientists see that there is a p-value of less than 0.05, they accept that the differences between the two samples are real. Because in this case the p<.05, we ecologists would say that the Red Oak acorns in Data Set B. really are heavier than the White Oak acorns. Page 9 of 29 Fieldston Biology: Introduction to Statistics Name: _____________________________ Date:___________ Exercise 7. Using Excel “ACORN DATA SET 3” sheet, find the following for the 4 sets of White and Red Oak Acorn Data (White/Red Oak A; White/Red Oak B; White/Red Oak C; White/Red Oak D). Mean Median ABS AVE DEV ABS AVE DEV % around mean T Test a. With each set within Data Set #3, please fill in the following and write out your observations: White Oak A White Oak B White Oak C White Oak D Red Oak A Red Oak B Red Oak C RedOak D Mean Median ABS AVE DEV % around Mean T Test Mean Median ABS AVE DEV % around Mean T Test b. Graph the mean mass of White Oak A, B, C, D. Include ERROR BARS for each mean. What relationship, if any, do you notice between the absolute average deviation and t tests? Explain. Page 10 of 29 Fieldston Biology: Introduction to Statistics Name: _____________________________ Date:___________ Be sure you can define the following terms and explain how it can be calculated in your own words: I. Mean /Median / Mode II. Absolute Average Deviation III. T Test IV. P value Page 11 of 29 Fieldston Biology: Introduction to Statistics Name: _____________________________ Date:___________ APPENDIX FOR EXCEL EXERCISE 1: CALCULATING MEAN and MEDIAN and ROUNDING on EXCEL CALCULATING MEAN ON EXCEL 1. Put your data in a column. 2. Click in the box at the bottom of the column. 3. Click Insert and then Function: fx. 4. From the list, click on AVERAGE. 3 2 4 Page 12 of 29 Biology 5. Introduction to Statistics Select the column of data for which you want to calculate the mean by: highlighting desired data or typing in the “coordinates” (ex. B2:B22), and then click Enter. 5 6. The mean will appear in the box you selected. 6 Page 13 of 29 Biology 7. Introduction to Statistics To apply to the next column: a. (Red Oak in this case), bring your cursor to the bottom, right corner and drag it over to the desired box(es). b. You will notice that the formula will be carried over and applied to the appropriate column. The White Oak mean column is B2:B22; the Red Oak mean column is automatically applied as C2:C22. 7a 7b Page 14 of 29 Biology Introduction to Statistics CALCULATING MEDIAN ON EXCEL To calculate in Excel: 1. Click Insert and then Function (fx) 2. Click on MEDIAN from the list 3. Select all the values in the column by highlighting or typing in the appropriate box. **Be careful to include ONLY the boxes that hold data. Ex. In this case, make sure you highlight B2:B22 instead of B2:B23 2 Page 15 of 29 Biology Introduction to Statistics 3 ROUNDING To ROUND in Excel: 1. Click Format and Cells 2. Click on NUMBERS from the list 3. Choose the Decimal Point value (ex., 1, 2, 3) 4. Click Enter and your value will be rounded 1 Page 16 of 29 Biology Introduction to Statistics 2 3 4 Page 17 of 29 Biology Introduction to Statistics EXERCISE 3: CALCULATING ABSOLUTE AVERAGE DEVIATION with STUDENT GRADES CALCULATING ABSOLUTE AVERAGE DEVIATION Absolute Average Deviation tells you the spread (range) of the numbers around the mean of a sample. In order to calculate the absolute average deviation, you: 1. Take the difference between each score relative to the mean 2. And then take the absolute value average of all the deviations The value you get represents the spread, or range, of your data. Take a look at the test scores again. In order to calculate the Abs Ave Dev using Excel: Mean Student Mean Abs Value(AA A Mean) 86 86 85 84 84 85 85 Student Mean Abs Value(BB B Mean) 96 94 92 73 70 85 85 Calculate ABS AVE DEV in Excel: 1. Find the MEAN of the scores, and put that value in the “Mean A” column (85). 1 Page 18 of 29 Biology Introduction to Statistics 2. Find the Absolute Value difference between the Student Score and the Mean: ABS(Student Score – Mean). The formula would be =ABS(B2-C2) 3. You then use the drag function in Excel to calculate the deviation for each score relative to the mean. 2 3a Page 19 of 29 Biology Introduction to Statistics 3b 3c Page 20 of 29 Biology Introduction to Statistics 4. Find the MEAN of Student B’s score, and put that value in the “Mean B” column (85). 5. Find the Absolute Value difference between Student Score B and the Mean B: ABS(Student Score – Mean). The formula would be =ABS(F2-G2) 6. You then use the drag function in Excel to calculate the deviation for each score relative to the mean. 4 5 5, 6a Page 21 of 29 Biology Introduction to Statistics 6b 6c Page 22 of 29 Biology Introduction to Statistics 7. Then find the Average of all the deviations relative to the Mean: 7a 7b Page 23 of 29 Biology Introduction to Statistics 7c 7d: ABS AVE DEV value Page 24 of 29 Biology Introduction to Statistics EXERCISE 5: CALCULATING T TEST for ACORN DATA SET #1 Calculate p Value / RUN T TEST in Excel: 1. Click on a box at the bottom of one of your columns of numbers. 2. Click on Insert Function (fx) and select T test (TTEST). You will see a dialogue box with four different rows. 2 1 Page 25 of 29 Biology Introduction to Statistics 3. For the first row, Array1, select White Oak RAW data (the first column of numbers B2:B22). 2, 3 4. For the second row, Array2, select the Red Oaks RAW data (second column of numbers C2:C22). 4 Page 26 of 29 Biology Introduction to Statistics 5. In the third row, Tails, type “2” for a two-tailed test. 5 6. In the fourth row Type, type “3”. 6 Page 27 of 29 Biology Introduction to Statistics 7. The result is your p VALUE generated from running the t test that compares the White Oak Raw Data Set and the Red Oak Raw Data Set. 7 = p Value Page 28 of 29 Biology Introduction to Statistics EXERCISE 6: SCREEN SHOTS FOR ACORN DATA SET #2 Page 29 of 29