Biology 404 – Ecology Lab Exercise 2 Dr. Fontenot Student Learning Objectives: 1) Learn how regression analysis is used to determine the relationship between an independent and a dependant variable. 2) Learn to interpret the results of a regression analysis based on correlation coefficients, R2 values, and p-values 3) Learn to use regression analysis to predict a response Introduction: We know that if we add fertilizer to a pond, we will get an increase in primary production (measured as chlorophyll a). Also, we expect that the more fertilizer we add the greater the increase in primary production. In other words, the level of primary production depends on the amount of fertilizer we add. The amount of fertilizer we add does not depend on (independent) what the primary production level is. In this sense primary production is the dependent variable (y) and fertilizer amount is the independent variable (x). The general equation for a line is: y = mx + b where y = the dependent variable, x=the dependent variable, m=the slope of the line, and b=the y-intercept. In the pond fertilization experiment, m would indicate how much primary production (y) increased for every unit of increase in fertilizer (x). The y-intercept (b) estimates what the primary production rate without fertilizer was. This relationship could be determined by adding different levels of fertilizer to several different ponds and measuring the resulting primary production. The data may look like: Fertilizer 0 5 10 15 20 Primary Production 23 39 44 65 70 It appears that primary production increases with fertilization, but how much? To first understand the relationship between fertilization and primary production, make a graph. 80 70 GPP 60 50 40 30 20 0 5 10 15 20 Fertilizer This graph demonstrates that as fertilizer increases, so does GPP. It does not tell us how much, on average, GPP increases if we increase fertilizer by one. Regression analysis analyzes the data and determines the equation for a line that best explains the relationship between the two variables: GPP 80 70 y = 2.4x + 24.2 R2 = 0.9633 60 p = 0.003 50 40 30 20 0 5 10 Fertilizer 15 20 The line equation that best explains this data set is: y = 2.4x + 24.2 This means that for every increase in fertilizer, I would expect to get an increase of primary production by 2.4. What can I predict my primary production to be if I added 7.5 units of fertilizer: Y = 2.4(7.5) + 24.2 Y=42.2 The R2 value can range from zero to one. This value explains how close all of the data points are to the estimated regression line. For example, if all of the data points fell exactly on the line then R2 would be 1.0. The farther away the points are from the estimated regression line, the smaller the R2. Because we would like to use regression to make predictions, we like a high R2 value. The higher the R2 value, the more comfortable I am in making a prediction about response. The p-value tells us if the estimated regression line has a slope significantly different than zero. If the p-value is greater than 0.05, then the slope is not significantly different than zero. That means that the average value is as good prediction as a line equation. This also means that there is no response in y due to x (fertilization does not affect primary production for this example). If the regression line is not significant, we cannot make predictions. If the p-value is less then or equal to 0.05, we can make predictions. Correlation coefficients tell us the strength and nature of the relationship. Positive correlation tells us that as x increases, y increases. A negative correlation tells us that as x increases, y decreases. Therefore, the correlation between two variables increases as the correlation coefficient moves away from zero. The nature of the relationship (+ or -) can be determined by looking at the slope coefficient from the regression analysis. Lab Exercise 2 (20 points) Part 1. The data set (Otolithsize.xls) used for this exercise shows the relationship between otolith size and fish size. Otoliths are earbones in fish heads that grow in proportion to fish length. This relationship can be used to predict fish size from otolith size and to recreate an individual growth history by measuring the distance between each daily increment. For this exercise, otolith size is the independent variable and otolith size is the dependent variable. This specific data was collected from larval tilapia (Oreochromis nilotica) reared in a laboratory. For this lab, you are to do the following assignments: 1) Graph the data with otolith size on the x-axis and fish length on the yaxis. Add a trend line to this graph. 2) What is the equation that describes the linear relationship between fish size and otolith size? 3) In a summary paragraph please answer the following questions: a. What is the nature (positive, neutral, negative) of the relationship? b. How strong (R2) is the relationship? Do you feel comfortable predicting fish size from otolith size? Why? c. How long is a fish with an otolith radius of 451.2 m? d. If an otolith grows 18 m in a day, how much has the fish grown? Part 2. The second part of this exercise involves you collecting your own data and analyzing it. Measure the length and width (in millimeters) of at least 20 leaves from the same tree (no pine needles). Graph the data and describe the relationship (similar to part one) between leaf length and width. You should identify the tree so we can compare relationships among different trees.