Exercise2

advertisement
Biology 404 – Ecology
Lab Exercise 2
Dr. Fontenot
Student Learning Objectives:
1) Learn how regression analysis is used to determine the relationship
between an independent and a dependant variable.
2) Learn to interpret the results of a regression analysis based on
correlation coefficients, R2 values, and p-values
3) Learn to use regression analysis to predict a response
Introduction:
We know that if we add fertilizer to a pond, we will get an increase in
primary production (measured as chlorophyll a). Also, we expect that the more
fertilizer we add the greater the increase in primary production. In other words,
the level of primary production depends on the amount of fertilizer we add. The
amount of fertilizer we add does not depend on (independent) what the primary
production level is. In this sense primary production is the dependent variable (y)
and fertilizer amount is the independent variable (x). The general equation for a
line is:
y = mx + b
where y = the dependent variable, x=the dependent variable, m=the slope of the
line, and b=the y-intercept. In the pond fertilization experiment, m would indicate
how much primary production (y) increased for every unit of increase in fertilizer
(x). The y-intercept (b) estimates what the primary production rate without
fertilizer was. This relationship could be determined by adding different levels of
fertilizer to several different ponds and measuring the resulting primary
production. The data may look like:
Fertilizer
0
5
10
15
20
Primary
Production
23
39
44
65
70
It appears that primary production increases with fertilization, but how much? To
first understand the relationship between fertilization and primary production,
make a graph.
80
70
GPP
60
50
40
30
20
0
5
10
15
20
Fertilizer
This graph demonstrates that as fertilizer increases, so does GPP. It does not
tell us how much, on average, GPP increases if we increase fertilizer by one.
Regression analysis analyzes the data and determines the equation for a line
that best explains the relationship between the two variables:
GPP
80
70
y = 2.4x + 24.2
R2 = 0.9633
60
p = 0.003
50
40
30
20
0
5
10
Fertilizer
15
20
The line equation that best explains this data set is:
y = 2.4x + 24.2
This means that for every increase in fertilizer, I would expect to get an increase
of primary production by 2.4. What can I predict my primary production to be if I
added 7.5 units of fertilizer:
Y = 2.4(7.5) + 24.2
Y=42.2
The R2 value can range from zero to one. This value explains how close all of
the data points are to the estimated regression line. For example, if all of the
data points fell exactly on the line then R2 would be 1.0. The farther away the
points are from the estimated regression line, the smaller the R2. Because we
would like to use regression to make predictions, we like a high R2 value. The
higher the R2 value, the more comfortable I am in making a prediction about
response. The p-value tells us if the estimated regression line has a slope
significantly different than zero. If the p-value is greater than 0.05, then the slope
is not significantly different than zero. That means that the average value is as
good prediction as a line equation. This also means that there is no response in
y due to x (fertilization does not affect primary production for this example). If the
regression line is not significant, we cannot make predictions. If the p-value is
less then or equal to 0.05, we can make predictions.
Correlation coefficients tell us the strength and nature of the relationship.
Positive correlation tells us that as x increases, y increases. A negative
correlation tells us that as x increases, y decreases. Therefore, the correlation
between two variables increases as the correlation coefficient moves away from
zero. The nature of the relationship (+ or -) can be determined by looking at the
slope coefficient from the regression analysis.
Lab Exercise 2 (20 points)
Part 1. The data set (Otolithsize.xls) used for this exercise shows the
relationship between otolith size and fish size. Otoliths are earbones in fish
heads that grow in proportion to fish length. This relationship can be used to
predict fish size from otolith size and to recreate an individual growth history by
measuring the distance between each daily increment. For this exercise, otolith
size is the independent variable and otolith size is the dependent variable. This
specific data was collected from larval tilapia (Oreochromis nilotica) reared in a
laboratory. For this lab, you are to do the following assignments:
1) Graph the data with otolith size on the x-axis and fish length on the yaxis. Add a trend line to this graph.
2) What is the equation that describes the linear relationship between fish
size and otolith size?
3) In a summary paragraph please answer the following questions:
a. What is the nature (positive, neutral, negative) of the
relationship?
b. How strong (R2) is the relationship? Do you feel comfortable
predicting fish size from otolith size? Why?
c. How long is a fish with an otolith radius of 451.2 m?
d. If an otolith grows 18 m in a day, how much has the fish
grown?
Part 2. The second part of this exercise involves you collecting your own data
and analyzing it. Measure the length and width (in millimeters) of at least 20
leaves from the same tree (no pine needles). Graph the data and describe the
relationship (similar to part one) between leaf length and width. You should
identify the tree so we can compare relationships among different trees.
Download