TOPIC: Do rare males have a mating advantage? Using mathematical modeling to explore sexual selection - Evolutionary Ecology Application TUTOR GUIDE MODULE CONTENT: This module contains simple exercises for biology majors taking an introductory course in ecology and evolution to begin applying and interpreting mathematical models of biological problems. TABLE OF CONTENTS Alignment to HHMI Competencies for Entering Medical Students………………...1 Outline of concepts covered, module activities, and implementation……..……....2 Module: Worksheet for completion in class......................................................3 - 7 Pre-laboratory Review Questions (optional) Preparation Questions..….……..8 - 9 Pre-laboratory Exercises (mandatory)..........................................................10 - 12 Suggested Questions for Formative Assessment................................................13 Guidelines for Implementation……………………………...............…............13 - 14 Contact Information for Module Developers........................................................15 Alignment to HHMI Competencies for Entering Medical Students: Competency E1. Apply quantitative reasoning and appropriate mathematics to describe or explain phenomena in the natural world. Learning Objective E1.1 Demonstrate quantitative numeracy and facility with the language of mathematics Activity 1,5 E1.2. Interpret data sets and communicate those interpretations using visual and other appropriate tools. E1.3. Make statistical inferences from data sets (evaluating best fit linear relationships based on calculating error sums of squares) E1.5. Make inferences about natural phenomena using mathematical models 1,2,3 1 2,3,5 7 Mathematical Concepts covered: - mathematical modeling in a biological context - linear models - regression models In class activities: - group discussion - graphing and interpreting data - construction of linear models - using regression approach calculations for determining “best fit” relationships between two variables. Components of module: - preparatory assignment to complete and turn in as homework before class - in class worksheet: - discussion questions - plotting and interpreting data - calculations of sums of squared error values to quantitatively assess the goodness of fit of lines to observed data. - suggested assessment questions - guidelines for implementation Estimated time to complete in class worksheet - 60 minutes Targeted students: - first year-biology majors in introductory biology course covering ecology and evolution Quantitative Skills Required: - Basic arithmetic - Logical reasoning - Interpreting data from tables - Graph/Data Interpretation 2 Worksheet: Introduction to Mathematical Modeling in Biology Discussion Questions To begin, form groups of 3 students. Discuss and write down the answers to the following questions (individual answers should be turned in with your lab write up): 1. What is a mathematical model? What is a mathematical model of a biological system? What value would models have in these contexts? 2. How does a scientist know if a model is “good” at modelling a biological phenomenon? 3 Lab Exercises – Part I Students: To receive credit for this exercise, provide short answers and supporting graphs and calculations for all questions below (except 6 instructors will do this). In a species of cichlid fish, most males have yellow fins that they display when attracting mates or defending their territories. However, in certain populations of this species a small number of males have red fins instead. An evolutionary biologist hypothesized that the red fins may be under frequency-dependent sexual selection. According to this hypothesis, the fitness of fish with red fins is greater when fewer males have this trait (i.e., when more males have yellow fins). To test this hypothesis, the biologist set up several experimental pools with cichlids in which the frequency of males with red fins was controlled. The biologist then determined the number of offspring that each red male had. Observation (i) 1 2 3 4 5 6 7 8 9 10 Number of Red Males per 100 Individual Fish 8 11 13 15 19 23 27 33 35 44 Average Number of Offspring per Red Male 325 380 337 248 147 103 152 175 153 79 4 Instructions: Open the Excel program and get a piece of graph paper. 1) Individually: graph the data in the table on the graph paper provided. a. Draw a straight line that gets as close as you can to as many of the data points as possible. This is your linear model of the data. b. Determine the equation for your model and write it here*: *Remember that y = mx + b, where m is the slope and b is the y-intercept. 2) As a group: a. Choose the “best” model from those within your group. b. Open the Excel spreadsheet and edit the formula in cell C5 to reflect your equation (there is already an equation in that cell: =m*A5+b, referenced to the value in cell A5 - change m and b to your calculated values of m and b). c. Copy cell C5 into the remaining cells in column C. 3) As a group: Copy cell D5 into the remaining cells in column D. Examine and discuss the Yi - Ŷi values, answering the following questions: a. Is the line generally too high or low at one end of the data range? ______________________________________________________________ ______________________________________________________________ b. What would happen to the values of Yi - Ŷi if you increased the slope of the line? ______________________________________________________________ ______________________________________________________________ c. What if you increased the y-intercept? 5 ________________________________________________________________ __ ______________________________________________________________ d. Decide together what would improve your linear model. 4) Now obtain data from another group. Write the equation for this model: ___________________________ 5) Whose group had the better fitting line? How do you know? Can you tell easily by looking at lines or do you need error sum of squares calculations to evaluate your two models? (to calculate error sum of squares, copy cell E5 into the remaining cells in column E) ______________________________________________________________ ______________________________________________________________ ______________________________________________________________ ______________________________________________________________ 6) Instructors: take example from one group & edit equations on their computer, then graph data (Yi) and Ŷi for each model (scatter chart with line but no markers). This will display data & the Student Group’s two models, which can be projected for entire class to see. Can also show the SS for each of their models. 7) Explain your own results (from your own model) in the context of the original hypothesis. Do the data support the idea that male color might be a sexually selected trait (and under frequency dependent selection)? If so, how would you expect the frequency of red and yellow color morphs to change across generations? MODULE FEEDBACK - Each year we work to improve the modules in the active learning "discussion" sections. Please answer the following question with regard to this module on this sheet and turn in your answer to the TA. You can do this anonymously if you like by turning in this sheet separately from your module answers. How helpful was this module in helping you understand a fundamental concept in mathematical modeling of biological data? A = Extremely helpful B= Very helpful 6 C= Moderately helpful D= A little bit helpful E = Not helpful at all Module Rating ____________ Thank you! 7 Pre-laboratory Review Questions (OPTIONAL) This homework is designed to review lines and linear models and to prepare you for the upcoming module on mathematical modeling and subsequent models that you will have to work with. If you encounter an unfamiliar term, please refer to a textbook for high school algebra or pre-calculus. On her way to her volleyball game, Joanne stops by the grocery store for a healthy snack that will give her energy for the game. She decides to get a jar of peanut butter and some bananas. She notices that the peanut butter costs $3.99 for a jar and that the bananas cost $0.49 per pound of bananas. 1. How much will one pound of bananas and a jar of peanut butter cost? 2. Joanne decides to bring a snack for each of the 6 girls on the volleyball team. How much will it cost for six pounds of bananas and a jar of peanut butter? 3. On the grid provided below, draw a Cartesian coordinate system with number of pounds of bananas on the x-axis and the total cost of the peanut butter and bananas on the y-axis. Plot the two points that you found for the previous two questions. 8 7 6 5 4 3 2 1 0 4. Plot the line on the grid above through the two points that you found. Now compute the slope of the line that goes through these two points (remember, “rise over run”: (y2-y1) = m (x2-x1) where m is the slope of the line). 8 Now write the equation of the line in slope-intercept form, y= mx + b, where b is the y-intercept (the place where the line crosses the y axis). Congratulations! You have just made a mathematical model of peanut butter and bananas! 5. What does the slope correspond to in terms of the scenario above? What is the y-intercept of the line? What does it correspond to in the scenario above? 6. What is the independent variable? What is the dependent variable? Respond in symbols as well as in words. 7. How would the line change if the price of bananas increased or decreased? How would the line change if the cost of the jar of peanut butter increased or decreased? 9 Pre-laboratory Exercise: Evolutionary Ecology (excel file that goes with this exercise: math modeling pre-laboratory ecology data.xlsx) An ecologist was studying a transition zone from grassland to conifer forest in central Asia. The ecologist wanted to see if a linear model could describe the relationship between latitude and the density of conifer trees. Using a small amount of data from a study in North America (NA), the ecologist developed the following model: Ŷ = 36x -1767 where Ŷ is the predicted conifer density and x is the latitude. The ecologist then collected data from the first five study sites in Asia, shown in Table 1. In the following exercises you will evaluate how well the model developed from data collected in North America fits the data collected from Asia. A common method for evaluating a model involves comparing the actual data with the predictions of the model. Table 1. (KEY) Study Site (i) 1 2 3 4 5 Latitude (degrees) 48.97 49.35 49.55 49.70 50.24 Conifer density (number of trees per hectare; Yi) 2.2 25.3 17.5 47.8 46.9 Ŷi (estimated from the NA model) Yi – Ŷi 1. Begin by using a calculator to compute the conifer density that the North American model predicts at each of the latitudes of the five study sites. This will give you a series of Ŷi for each site, where i = 1, 2, 3, 4, or 5 corresponding to the study sites at each latitude. Record Ŷ in Table 1 of your worksheet and in your excel spreadsheet. 2. The next column in the table will show the difference between the measured conifer density in Asia (Yi) and the predicted density (from the model; Ŷi) at each latitude. Use a calculator to fill in this column. 3. Plot the actual observed data (Asian). To do this, place a mark at each 10 value of Yi at each value of i (1 through 5) on a piece of graph paper. 4. Next, graph the model Ŷ = 36x -1767 (Hint: To graph the model, you will find it easier to plot two or three values of Ŷi and draw a line directly through them, rather than drawing a line through the y-intercept). 5. At what latitude does this model predict that conifer density equals zero trees per hectare (ha)? __________________________________________ 6. Look at where the model (the line you’ve just drawn) is relative to the data points. How could the model be changed to improve the fit of the line to the actual data? In one or two sentences, support your recommendation for how to improve the model, referring to the values that you calculated. ______________________________________________________________ ______________________________________________________________ ______________________________________________________________ 11 Notice that the data points don’t actually all fall on the line. The distance that an observation (Yi) at each value of i falls from the line (Yi – Ŷi) is a form of "error". In a linear model, the line through the data that best explains, or fits, the data is the one that the observed data points are closest to. We can evaluate how well a model fits the data by adding up (summing) the errors for all the data points. However, some errors may be negative while others are positive (some points will be above the line while others will be below it). So to evaluate how well a model fits the data we first we square the value of each error (Yi – Ŷi) and then sum these values for all data points. This procedure is formally described by the following: or another way of putting it is = (Y1 - Ŷ)2 + (Y1 - Ŷ) 2 +.... (Yn - Ŷ) 2 where the "error sum of squares" is the summation of all of the (Yi – Ŷi)2 values for every observation, where n is the total number of observations. So now we can say the model that best explains the data is the one with the least sum of squares error. 7. Next you will use the Excel spreadsheet provided (math modeling prelaboratory ecology data.xlsx) to compute the sum of squares for the model. This will give you a chance to practice the Excel skills you will need to complete the exercises in class. a. Double click on cell F3 and enter the formula ‘=E3^2’ (which tells excel to square what’s in cell E3). b. Hit enter to save this edit. c. Copy cell F3 and paste into cells F4 through F7. i. Excel will automatically adjust the formula to square the values in cells E4 through E7. d. Record the Error sum of squares (SS) value displayed in cell F8 here __________. 8. Model 2 in the spreadsheet is based on the equation computed by Excel for this data set (y = 36.051x - 1758.8). Is Model 2 a better fit of the data? How do you know? __________________________ ___________________________ ______________________________________________________________ ______________________________________________________________ ______________________________________________________________ 12 Suggested Questions for Formative Assessment Learning Objective E1.2. Interpret data sets and communicate those interpretations using visual and other appropriate tools. E1.3. Make statistical inferences from data sets (evaluating best fit linear relationships based on calculating error sums of squares) E1.5. Make inferences about natural phenomena using mathematical models Activity 1 3,5 7 Guide for implementation: Discussion Have students break up into groups of 3 to discuss and come up with answers to the questions. Groups get together to talk about each question – work pauses after 10 minutes or as soon as you think each group has come up with something for both questions (no longer than 15 minutes). The TA should then pick a person from 3 groups chosen at random to share with the class their group’s answer to one of the questions. Tell them that everyone in the group should be prepared to share the answer with the class, as you will choose who speaks randomly among the group. Suggested ideas that should emerge from each question are listed below. An alternative way to run this discussion section of the class would be to run it “Question Time” style where you let them discuss each question in turn for three minutes, then ring a bell or use another method to cut off discussion, then have the whole group report their answer. Then move on to the next question. 13 Guide for Implementation: Lab Activity Students work in groups. Each group has a computer. Materials needed: Graph paper for each student 1) TA hands out graph paper to each member of a group and instructs students to open excel spread sheet (Introduction to Mathematical Modeling.xlsx). Each individual is instructed to plot the data points and construct a line through the data points based on the two model equations. Each individual will then decide which model (line) more accurately “fits” the data and provide rationale. One or two groups share their decision with the class. a) The graph paper should have x and y axes drawn in lower left when held in landscape orientation. b) If possible, project the data plot & regression lines for discussion –maybe via instructor’s computer & projector or overhead projector. c) Discussion of possible data outliers may come up – explain why the data may not fall exactly along the best fit line. 2) Instructor provides brief explanation of least squares method of evaluating regression lines, writing the equation for this calculation on the board. Students then perform these calculations in their groups (can divide up who does computation for which model). a) Sample explanation: By plotting the data it looks like y varies with x. The models describe a linear relationship where y varies with x. But the data points don’t actually all fall on the line. The distance a point falls from the line is a form of error. The model that best explains, or fits, the data is the one that the data points are closest to, or has the least error (the smallest error sum of squares). n ErrorSumOfSquares = å (Yi - Yˆi ) 2 i=1 Yi is a data point at Xi, and Yhat is the value of Y that falls on the line at Xi as defined by the model. 3) Instructor-led discussion (as a class) about which group’s model is the “best fit”. 14 Module Developers: Please contact us if you have comments/suggestions/corrections Kathleen Hoffman Department of Mathematics and Statistics University of Maryland Baltimore County khoffman@math.umbc.edu Jeff Leips Department of Biological Sciences University of Maryland Baltimore County leips@umbc.edu Sarah Leupen Department of Biological Sciences University of Maryland Baltimore County leupen@umbc.edu Acknowledgments: This module was developed as part of the National Experiment in Undergraduate Science Education (NEXUS) through Grant No. 52007126 to the University of Maryland, Baltimore County (UMBC) from the Howard Hughes Medical Institute. 15