IntroductiontoStatistics2012(Pink)

advertisement
Fieldston Biology: Introduction to Statistics
Name: _____________________________
Date:___________
FIELDSTON BIOLOGY
Introduction to Statistics
In science experiments we often have to compare measurements from two different treatments and decide if
the independent variable has a real effect on what we are measuring (the dependent variable). In other
words, are the results/difference significant?
How can we do that? In the scientific community we are not allowed to just say, “Well clearly A is bigger than
B!” Even though the results of an experiment are sometimes obvious, we have to have some kind of separate
measure to indicate to anyone who looks at our results that there really is a difference.
How do we get a handle on the difference between measurements in an experiment? We use STATISTICS.
There are many statistical tools we can use when comparing data. In this worksheet we will look at:
I.
II.
III.
Mean, Median, Mode (and Rounding)
Absolute average deviations of a sample population
T Test / p value (comparing two means to see if they are significantly different)
Page 1 of 29
Fieldston Biology: Introduction to Statistics
Name: _____________________________
Date:___________
ACORN STUDY
Let’s say we are ecologists and we are studying oak trees. Let’s say you have a hypothesis (testable guess)
that the acorns from Red Oak trees (Quercus rubrum) are heavier than the acorns that come from the White
Oak species (Quercus alba). How will you test your hypothesis? You know you are going to mass acorns. Will
one acorn of each species suffice?
It’s pretty clear to us that weighing one acorn of each species is not going to be a strong enough test of our
hypothesis. OK so let’s say you collect 21 acorns of each species, mass them and compare the data. They
might look like this: (Note: the first thing we did was order the measurements from lowest to highest mass for
each species)
Acorn #
White Oak
Red Oak
1
2.6
2
3.26
3
3.40
4
3.5
5
3.54
6
3.59
7
3.6
8
3.67
9
3.7
10
3.71
11
3.81
12
3.9
13
3.94
14
4.03
15
4.1
16
4.10
17
4.13
18
4.29
19
4.31
20
5.2
21
6.01
Figure 1. Data Set 1: Acorn Masses (grams)
Page 2 of 29
2.8
2.86
2.87
3.0
3.17
3.56
3.68
3.71
4.0
4.03
4.05
4.11
4.59
4.98
4.9
4.90
4.97
5.06
5.3
5.42
5.5
Fieldston Biology: Introduction to Statistics
Name: _____________________________
Date:___________
I. MEAN, MEDIAN, MODE
Mean, median and mode are three of the most basic concepts in statistics. Each piece of statistical analysis
tells us something about the data we collected, in this case the comparison between White and Red Oak
Acorns.
Exercise 1. Find the Mean, Median and Mode for the White and Red Oak Acorns. For both Mean and
Median, use the Acorn Data Excel File – USE “ACORN DATA SET 1” sheet. See Appendix for
instructions on how to calculate Mean and Median on Excel.
ROUNDING: How many decimal places do you think you need to round the calculated data? (See
Appendix for instructions on how to ROUND using Excel.)
a. Based on the Mean, what conclusions, if any, can you make about the difference in mass
between White and Red Oak Acorn?
b. When you factor in (include) Median, what more does this piece of information tell you? (hint:
distribution of the data)
c. What is helpful about each of the three statistical tools: mean, median, mode?
d. What are the limitations for each function?
Page 3 of 29
Fieldston Biology: Introduction to Statistics
Name: _____________________________
Date:___________
The median is a valuable quick check against the mean because two sets of numbers can have the same mean
but very different spread, or range, in value.
II. ABSOLUTE AVERAGE DEVIATION
When there’s quite a range of values in a sample it’s helpful to look at the absolute average deviation (to be
defined later).
Exercise 2. Let’s take a break from the Acorn Study and look at another example - Student Test Scores:
STUDENT TEST SCORES
Student A
86
86
85
84
84
Student B
96
94
92
73
70
mean
median
What is the mean? Student A: _____________
Student B: _______________
What is the median? Student A: _____________
Student B: _______________
a. Compare mean and median for both students. What do you notice? What can you conclude?
The median for the first student, 85, is right on the mean and confirms that the spread around the mean is not very
great. The median for the second student, 92, is seven points off from the mean of 85. If nothing else, the difference
between the mean and median indicate that the test scores for the second student are far less consistent than that of the
first student.
Page 4 of 29
Fieldston Biology: Introduction to Statistics
Name: _____________________________
Date:___________
A more direct measure of the “spread” of the values in a sample is the absolute average deviation (similar to
standard deviation). Absolute Average Deviation tells you the spread (range) of the numbers around the
mean of a sample. In order to calculate the absolute average deviation, you:
a. Take the difference between each score relative to the mean
b. And then take the absolute value average of all the deviations
The value you get represents the spread, or range, of your data.
Exercise 3. USING EXCEL – “STUDENT GRADES” sheet: Calculate the Absolute Average Deviation for each
student’s test score (see Appendix for Excel instructions).
Answer:
The Absolute Average Deviation of student A’s grade around .67. The Absolute Average Deviation of student
B’s grade is nearly 9.
a. What do these two different Absolute Average Deviation values tell you about the range of Student
A’s test scores and the range of Student B’s test scores?
b. Calculate what % the ABS AVE DEV is relative to the mean (ABS AVE DEV / MEAN * 100):
Student A:
Student B:
c. In your own words, define Absolute Average Deviation. Include how you calculate the value, and what
the value tells us about the numbers within the data set:
Page 5 of 29
Fieldston Biology: Introduction to Statistics
Name: _____________________________
Date:___________
Exercise 4. Let’s go back to the acorn data. USING EXCEL – “ACORN DATA SET 1” sheet: Find Mean, Median
and ABS AVE DEV
White Oak
3.9
3.81
0.5
Mean
Median
ABS AVE DEV
Red Oak
4.2
4.05
0.8
a. Which has a greater spread around the mean? Calculate both the absolute value of the spread and
what percentage the spread is of the mean.
Page 6 of 29
Fieldston Biology: Introduction to Statistics
Name: _____________________________
Date:___________
In this case the means, medians and ABS AVE DEV’s are comparable to each other for both kinds of acorns.
There is a little more spread in the Red Oak data; you can see that the ABS AVE DEV is 0.8 or about 19% of the
mean (0.8/4.2), as opposed to the White Oak data in which the ABS AVE DEV, 0.5 is about 13% of the mean
(0.5/3.9). But the means are different; can we say that the average mass of Red Oak acorns in this case is
greater than the mass of White Oak acorns?
III. T TESTS: DETERMINING SIGNIFICANCE____________________________________________
o T test
o p value
The T test:
As seen earlier, the means of the two acorns are different; can we say that the average mass of Red Oak
acorns in this case is greater than the mass of White Oak acorns? I know you want to say “yes” but the
statisticians say “hold on there!” There is a specific test that tells you the probability that two means are
different from each other due to chance. Read the next paragraph carefully and make sure you get it.
If two sets of numbers have the same mean, you could imagine that actually they are measured, or “pulled”,
from only one big set of numbers. The T test is a statistical test that gives you the probability (p-value) that
two samples are pulled from the same big set of numbers. Another way to look at it is that the T test tells you
the probability that the difference between two means is due to chance.
Rewrite your understanding of p value in your own words, especially your understanding of what it means for
the difference being due to chance:
Page 7 of 29
Fieldston Biology: Introduction to Statistics
Name: _____________________________
Date:___________
Exercise 5. USE EXCEL “ACORN DATA SET 1” sheet, run a t test between the White and Red Oak Acorn Data.
Calculate the p value. (See Appendix for instructions on how to run T Tests on Excel.
a. What is your p value for DATA SET #1? What does this mean? What does this suggest about
our two data sets?
It means that there is an approximately 0.34 or 34% chance that any differences between the two means are
due to chance. For a scientist that is way too much to accept that the two means are different. As ecologists
we have not supported the hypothesis that Red Oak acorns weigh more than White Oak acorns.
Exercise 6. Now let’s look at a different set of data for these acorns: DATA #2. USE EXCEL “ACORN DATA SET
2” sheet to find Mean, Median, ABS AVE DEV, % ABS AVE DEV is relative to Mean, and run a t Test.
a. When you calculate the p-value for the acorn data, what do you get?
In this case the mean mass of Red Oak acorns, 4.5 grams is again greater than the mean mass of White Oak
acorns, 3.9 grams. The medians and ABS AVE DEV’s are again comparable. So what can we conclude?
Page 8 of 29
Fieldston Biology: Introduction to Statistics
Name: _____________________________
Date:___________
Comparing two sets of Acorn Data (Acorn Data Set 1 vs Acorn Data Set 2):
Data Set 1.
White Oak
3.9
3.81
0.5
12%
0.341613
Mean
Median
ABS AVE DEV
ABS AVE DEV % of mean
t Test
Red Oak
4.2
4.05
0.8
19%
Data Set 2.
White Oak
3.9
3.81
0.4
11%
0.041437791
Mean
Median
ABS AVE DEV
ABS AVE DEV % of Mean
Red Oak
4.5
4.59
0.8
18%
T Test
Before running the T test, you might think that these two data sets are very similar in number.
However, we can’t conclude anything until we perform a T test. When we do, we get a p-value of 0.04. What
does this mean?
It means that there is a 0.04 or 4% chance that the difference between the two means is due to chance. This
is very different from the first case. What do you think is the different between Data Set A and Data Set B
that resulted in the different p value results?
In general, when scientists see that there is a p-value of less than 0.05, they accept that the differences
between the two samples are real.
Because in this case the p<.05, we ecologists would say that the Red Oak acorns in Data Set B. really are
heavier than the White Oak acorns.
Page 9 of 29
Fieldston Biology: Introduction to Statistics
Name: _____________________________
Date:___________
Exercise 7. Using Excel “ACORN DATA SET 3” sheet, find the following for the 4 sets of White and Red
Oak Acorn Data (White/Red Oak A; White/Red Oak B; White/Red Oak C; White/Red Oak D).





Mean
Median
ABS AVE DEV
ABS AVE DEV % around mean
T Test
a. With each set within Data Set #3, please fill in the following and write out your observations:
White Oak A
White Oak B
White Oak C
White Oak D
Red Oak A
Red Oak B
Red Oak C
RedOak D
Mean
Median
ABS AVE DEV
% around Mean
T Test
Mean
Median
ABS AVE DEV
% around Mean
T Test
b. Graph the mean mass of White Oak A, B, C, D. Include ERROR BARS for each mean. What relationship,
if any, do you notice between the absolute average deviation and t tests? Explain.
Page 10 of 29
Fieldston Biology: Introduction to Statistics
Name: _____________________________
Date:___________
Be sure you can define the following terms and explain how it can be calculated in your own words:
I. Mean /Median / Mode
II. Absolute Average Deviation
III. T Test
IV. P value
Page 11 of 29
Fieldston Biology: Introduction to Statistics
Name: _____________________________
Date:___________
APPENDIX FOR EXCEL
EXERCISE 1: CALCULATING MEAN and MEDIAN and ROUNDING on EXCEL
CALCULATING MEAN ON EXCEL
1. Put your data in a column.
2. Click in the box at the bottom of the column.
3. Click Insert and then Function: fx.
4. From the list, click on AVERAGE.
3
2
4
Page 12 of 29
Biology
5.
Introduction to Statistics
Select the column of data for which you want to calculate the mean by: highlighting desired data or
typing in the “coordinates” (ex. B2:B22), and then click Enter.
5
6.
The mean will appear in the box you selected.
6
Page 13 of 29
Biology
7.
Introduction to Statistics
To apply to the next column:
a.
(Red Oak in this case), bring your cursor to the bottom, right corner and drag it over to the
desired box(es).
b.
You will notice that the formula will be carried over and applied to the appropriate column.
The White Oak mean column is B2:B22; the Red Oak mean column is automatically applied as
C2:C22.
7a
7b
Page 14 of 29
Biology
Introduction to Statistics
CALCULATING MEDIAN ON EXCEL
To calculate in Excel:
1. Click Insert and then Function (fx)
2. Click on MEDIAN from the list
3. Select all the values in the column by highlighting or typing in the appropriate box. **Be careful to
include ONLY the boxes that hold data. Ex. In this case, make sure you highlight B2:B22 instead of
B2:B23
2
Page 15 of 29
Biology
Introduction to Statistics
3
ROUNDING
To ROUND in Excel:
1. Click Format and Cells
2. Click on NUMBERS from the list
3. Choose the Decimal Point value (ex., 1, 2, 3)
4. Click Enter and your value will be rounded
1
Page 16 of 29
Biology
Introduction to Statistics
2
3
4
Page 17 of 29
Biology
Introduction to Statistics
EXERCISE 3: CALCULATING ABSOLUTE AVERAGE DEVIATION with STUDENT GRADES
CALCULATING ABSOLUTE AVERAGE DEVIATION
Absolute Average Deviation tells you the spread (range) of the numbers around the mean of a sample. In
order to calculate the absolute average deviation, you:
1. Take the difference between each score relative to the mean
2. And then take the absolute value average of all the deviations
The value you get represents the spread, or range, of your data.
Take a look at the test scores again. In order to calculate the Abs Ave Dev using Excel:
Mean
Student Mean Abs Value(AA
A
Mean)
86
86
85
84
84
85
85
Student Mean Abs Value(BB
B
Mean)
96
94
92
73
70
85
85
Calculate ABS AVE DEV in Excel:
1. Find the MEAN of the scores, and put that value in the “Mean A” column (85).
1
Page 18 of 29
Biology
Introduction to Statistics
2. Find the Absolute Value difference between the Student Score and the Mean: ABS(Student Score –
Mean). The formula would be =ABS(B2-C2)
3. You then use the drag function in Excel to calculate the deviation for each score relative to the mean.
2
3a
Page 19 of 29
Biology
Introduction to Statistics
3b
3c
Page 20 of 29
Biology
Introduction to Statistics
4. Find the MEAN of Student B’s score, and put that value in the “Mean B” column (85).
5. Find the Absolute Value difference between Student Score B and the Mean B: ABS(Student Score –
Mean). The formula would be =ABS(F2-G2)
6. You then use the drag function in Excel to calculate the deviation for each score relative to the mean.
4
5
5, 6a
Page 21 of 29
Biology
Introduction to Statistics
6b
6c
Page 22 of 29
Biology
Introduction to Statistics
7. Then find the Average of all the deviations relative to the Mean:
7a
7b
Page 23 of 29
Biology
Introduction to Statistics
7c
7d: ABS AVE DEV value
Page 24 of 29
Biology
Introduction to Statistics
EXERCISE 5: CALCULATING T TEST for ACORN DATA SET #1
Calculate p Value / RUN T TEST in Excel:
1. Click on a box at the bottom of one of your columns of numbers.
2. Click on Insert Function (fx) and select T test (TTEST). You will see a dialogue box with four different
rows.
2
1
Page 25 of 29
Biology
Introduction to Statistics
3. For the first row, Array1, select White Oak RAW data (the first column of numbers B2:B22).
2, 3
4. For the second row, Array2, select the Red Oaks RAW data (second column of numbers C2:C22).
4
Page 26 of 29
Biology
Introduction to Statistics
5. In the third row, Tails, type “2” for a two-tailed test.
5
6. In the fourth row Type, type “3”.
6
Page 27 of 29
Biology
Introduction to Statistics
7. The result is your p VALUE generated from running the t test that compares the White Oak Raw Data
Set and the Red Oak Raw Data Set.
7 = p Value
Page 28 of 29
Biology
Introduction to Statistics
EXERCISE 6: SCREEN SHOTS FOR ACORN DATA SET #2
Page 29 of 29
Download