Biol 131 Intro to Evolution Fall 2001 Quantitative Variation in Helianthus annus (Sunflower) Seed Stripes Quantitative variation within and among groups is at the base of all studies of evolutionary change. In order to discuss changes in populations we need to develop a language and a set of concepts that allow us to describe and compare characteristics of populations. These skills involves understanding the distribution of characters in two broad categories: measures of central tendency; and, ways of describing the distribution of values around these measures of centrality. In this lab we will measure the extent of variation in a natural population within and among groups using the common sunflower (Helianthus annus) seed as the subject of our investigation. The language and concepts that we begin to explore today are the basis for making comparisons between different populations that exist at the same time and for examining changes in a population at different times. We will use histograms to represent the distribution of a populations with respect to a particular character. In a histogram the character being discussed is represented on one axis while the number of individuals in the population with that value of the character are represented on the other axis. 1 Biol 131 Intro to Evolution Fall 2001 In this graph we see a representation of the following hypothetical data: #Stripes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 #Seeds 5 3 25 65 68 24 74 32 25 12 8 14 10 5 2 Part 1 – Collecting data Each group will get a pile of sunflower seeds. First, take a few minutes to spread them out and look at the ways they vary. In this first part of the activity you will divide up your sunflower seeds based on the number of stripes that they have. We will probably have some class discussion of how to score stripes. Keep them in separate piles based on the number of stripes and score as many as you can in 30 minutes. Part 2 – Representing your data Record the number of seeds in each category (number of stripes). Then place them in test tubes arranged in a rack to create a histogram-like representation of your data. Compare your population to several other group’s populations. Then write 2-3 sentences describing your population with respect to the number of stripes. Part 3 – Measures of central tendency Calculate the mean, median and mode for your population. Show where each of these occurs by laying a piece of paper at the base of your test tubes and marking where each of these measures occur. Compare your results with a couple of other groups and write a sentence or 2 describing your population. mean = (∑ x)/n where x is a value and n is the number of values This statistic is also commonly referred to as the average. It is computed by adding each of the values in the distribution and dividing by the number of values in the distribution. 2 Biol 131 Intro to Evolution Fall 2001 Median: the middle value The median of a distribution can be found by putting the numbers in ascending order and taking the number in the middle, e.g. if the distribution has five values ( 5, 5, 4, 2, 1), then the third value is the median (4). If there is more than one middle value, then the mean of the two values is used. Mode: the most frequent value The mode is simply the value which occurs more often, e.g. the mode of this distribution, 8, 7 ,5 ,5, 5, 3, 1 is five. Mode Rat Weight Media n 7.5 5 Number of rats 10 Mean 2.5 1 2 3 4 6 5 7 8 9 10 Weight (g) Part 4 – Measures of distribution Enter your data into the JMP statistics program and have it calculate the standard deviation, skewness, and kurtosis values for your population distribution. Standard Deviation = √[∑ (x - "frequency mean")2/(n - 1) ] where x is a value and n is the number of values The standard deviation is a measure of the spread of a distribution. It is calculated by first calculating the mean of the distribution. Then, the difference between each value and the mean should be squared and all of those numbers added up. That value is divided by the total number of values minus one in the distribution and the square root of that number is the standard deviation. The standard deviation of a distribution tells you where most of the values are found. If you look at the values between the mean and one standard deviation above the mean 32% 3 Biol 131 Intro to Evolution Fall 2001 of the population values are represented (assuming that the population is normally distributed). We can extend this understanding of how the standard deviation describes a distribution to arrive at the following conclusions: 68% of the population falls within ±1 SD of the mean 95% of the population falls within ±2 SD of the mean 99.7% of the population falls within ±3 SD of the mean This is shown graphically on the figure below. Shell Length MEAN 40 20 NUMBER 30 10 10 20 30 40 Shell Length ± 1- 68% ± 2- 95% ± 3- 99.7% Skewness = (1/ns3)∑(x - "frequency mean")3 where s is the std. deviation, x and n as before Skewness is a statistic that describes the relative sizes of the tails of the distribution. A negative skewness value implies that the left tail of the distribution is longer. A positive skewness value implies that the right tail of the distribution is longer. The calculation is similar to calculations for the standard deviation. After finding the frequency mean, the cubes of the differences between the each of the values and the mean are summed. Then, that number is divided by the number of values in the distribution and the cube of the standard deviation. 4 Biol 131 Intro to Evolution Fall 2001 Fruit Weight 30 NUMBER 20 10 1 2 3 4 5 6 7 8 9 10 Fruit Weight A positively skewed distribution. The red line represents a symmetrical distribution for reference. Note that in a positively skewed distribution the mean will be greater than the median population value. Kurtosis = (1/ns4)∑(x - "frequency mean")4 - 3 s, n and x as before Kurtosis is a statistic that describes how sharp the peak of the distribution is. A negative score indicates platykurtosis (a relatively flat peak), while a positive score indicates leptokurtosis (a relatively sharp peak). mesokurtic "normal" platykurtic 300 100 leptokurtic 10 20 30 X 5 Number 200 Biol 131 Intro to Evolution Fall 2001 Part 5 – Multidimensional variation Using the seeds from within one category of stripe number, take one additional sets of measurements on your seeds. For example, if you looked at all 19 seeds that had 11 stripes (hypothetical data), you could measure seed weight, buoyancy, seed length, etc. to examine intraclass variation. Lab Write up Your write up should include: Your definition for the character stripe number. A clearly labeled graph of the distribution of your population on which you have indicated the mean, median, mode (with reference to the histogram) and the calculated values for standard deviation, skew and kurtosis. A clearly labeled graph of the distribution of another groups population. A short description of your data (not simply listing the statistics but describing it as you might to a friend who was not familiar with statistics) and comparison of your data to the other group’s data. Do you think they are both samples from the same population? Why or why not? A definition of the second character you measured. A description of your findings on interclass variation including a histogram of your results. A brief discussion of how this detailed information about variation is relevant to evolution. Your raw data for variation with respect to stripe number in your population. This is a modified version of a lab developed by John Jungck at Beloit College. Several of the figures were taken from the Biometrics module by Daniel Hornbach published the BioQUEST Library. See http://bioquest.org/biostat for additional information about these statistics. 6 Biol 131 Intro to Evolution Fall 2001 A more detailed explanation of how to calculate these statistics can be found in Biometry by Sokal and Rohlf, which is the text for the Beloit College Biometrics course. Chapter 4, Desriptive Statistics, gives a good explanation of the first four statistics. Chapter 2 is on frequency distributions and may be of some help. Range Standard deviation mean ±1 SD is 68% of population; ±2SD is 95% of population; ±3SD is 99.7% of pop. Skewedness +skew is shifted left or pulled right (mean is greater than the median) –skew is shifted right or pulled left (mean is less that the median) Kurtosis – is platykurtic (flat); + is leptokurtic (pointy); and 0 is mesokurtic (normal). [Intro to JMP: columns, variables, distributions, changing axes, looking at moments. Enter your data into JMP and generate a graph of the distribution of your population. Print it. [print preview} putting it on paper in front of test tubes 10-15 minutes intro to measures of distribution 10-15 minute intro to JMP They enter data do caluculations print graphs [put up on board some of the means and see if that is a useful way to describe populations for comparisons. Discuss why central tendency may not be such an important factor for natural historian with a darwinian perspective. ] 7