Artificial Selection Lab Big Idea 1 – Lab 1 Grow Wisconsin Fast Plants (Brassica rapa) Artificial Selection Lab Big Idea 1 – Lab 1 Grow Wisconsin Fast Plants (Brassica rapa) and now we wait…and observe variations in the plants…like…? Hmmm….height? OK. Artificial Selection Lab Big Idea 1 – Lab 1 Grow Wisconsin Fast Plants (Brassica rapa) Finally, day 7…. Trichomes on Cannabis - epidermal outgrowths of various kinds Now we need to measure the heights of all our plants and do some appropriate descriptive statistics for the class data. Where is the class data? On the next slide… Artificial Selection Lab Big Idea 1 – Lab 1 Grow Wisconsin Fast Plants (Brassica rapa) Plant Height Data for a sample size of 41 plants or N=40 at day 7. What type of descriptive stats should we do with this data to study the population as a whole in terms of height? (Watch anderson video on standard deviation) 1. histogram 2. mean 3. median 4. range 5. Standard deviation Let’s do this… Artificial Selection Lab Big Idea 1 – Lab 1 What is a histogram? These are all histograms. What is the commonality? Artificial Selection Lab Big Idea 1 – Lab 1 Histograms are graphs that reveal the distribution/frequency of your data (how often particular values appear). What goes on the X-axis? Your range of data. This can be individual values (left) or ranges/bins of values (right) Artificial Selection Lab Big Idea 1 – Lab 1 Histograms are graphs that reveal the distribution/frequency of your data (how often particular values appear). What goes on the y-axis? Frequency or the number of times a given value appears. Ex1) How many times did a value of 16 appear in a multiple choice test given to a class of students according to the histogram on the left? 3 Ex2) How many people were paid between 77 and 87 thousand dollars according to the above histogram? ~330 Artificial Selection Lab Big Idea 1 – Lab 1 Histograms are graphs that reveal the distribution/frequency of your data (how often particular values appear). mean If enough data is collected, histograms can reveal a normal distribution in the data around a central mean. What is the approx. mean of the birth weight data shown on the right? 3.5 kg (the apex of the normal distribution curve) Artificial Selection Lab Big Idea 1 – Lab 1 Now let’s build a histogram for our plant height data… What should we first do? Sort the data!! Artificial Selection Lab Big Idea 1 – Lab 1 Sorting the Data Mac Highlight the two columns data sort according to height PC Figure it out…lol What next? Estimate appropriate bin size (or no bins). Write bins in one column on excel sheet and determine frequency next door…see right. Bin size is critical as you will see on the next slide so “choose wisely”. Data sorted by height Artificial Selection Lab Big Idea 1 – Lab 1 Binning and Graphing the Data Scenario 1 (large bin size) Scenario 2 (small bin size) Outlier? 10 20 # of plants # of plants 25 15 10 5 8 6 4 2 0 1 to 5 6 to 10 11 to 15 16 to 20 21 to 25 Height bins (cm) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Height bins (cm) Which histogram provides more information about the distribution of plant heights in our population? Scenario 2 as the data’s resolution is superior and tells a more complete story Artificial Selection Lab Big Idea 1 – Lab 1 Calculate the mean,… =average() 8 # of plants median,… 10 6 4 and range. 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Height bins (cm) The mean and median - Measure of central tendancy The Range - A measure of Spread Artificial Selection Lab Descriptive Statistics Histograms and Distributions Big Idea 1 – Lab 1 The MEDIAN: This is simply the data value that falls in the middle after sorting the data from low to high. For example, in the sample to the right, the value that separates the higher and lower halves of data is 291ms, which is the median. Reaction Time (ms) 265 273 286 291 293 Just arrange the data from highest to lowest or vice versa and find the middle number… 300 330 Artificial Selection Lab Descriptive Statistics Histograms and Distributions Big Idea 1 – Lab 1 The MEDIAN This is simply the value in a data set that separates the higher half of a sample from the lower half. What if there is an even number of data points like shown on the right? Again, sort the data from low to high and now just average the two middle numbers. In this case you average 286 and 291 to get a median of 289. Reaction Time (ms) 265 273 286 292 293 300 Artificial Selection Lab Descriptive Statistics Histograms and Distributions Big Idea 1 – Lab 1 Stats can be misleading…be very weary… For example, a college boasts that the average starting salary of their last years graduating class was $362,000 per year. This sounds quite impressive… However, what they did not tell you was that the class size was 30 students of which 29 started at $30,000 a year and one student was first round draft pick in the NFL making approximately $10,000,000 per year. Histogram 18 16 14 12 10 8 Series1 6 4 2 Time (ms) 501-510 481-490 461-470 441-450 421-430 401-410 381-390 361-370 341-350 321-330 301-310 281-290 261-270 241-250 221-230 0 201-210 An outlier can be seen in the histogram to the right of our athlete data…perhaps the person blinked while the reaction time was being measured. frequency Such a data point ($10,000,000 per year) can be considered an outlier, which is a data point much higher or lower than the rest of the data points. Artificial Selection Lab Descriptive Statistics Histograms and Distributions Big Idea 1 – Lab 1 Stats can be misleading…be very weary… For example, a college boasts that the average starting salary of their last years graduating class was $362,000 per year. This sounds quite impressive… However, what they did not tell you was that the class size was 30 students of which 28 started at $30,000 a year and one student was first round draft pick in the NFL making approximately $10,000,000 per year. What is the median of this data set? $30,000 The median is far less sensitive to outliers than the mean. Artificial Selection Lab Descriptive Statistics Histograms and Distributions Big Idea 1 – Lab 1 Stats can be misleading…be very weary… That said, the median can hide extremes... Ex) Let us consider the wages of 'The Widget Company’ below, we will increase the earnings of the CEO from $100,000 to $500,000. How does the median reported to the public change? $30,000 It doesn’t. You can change it to a trillion and the median will not budge… Artificial Selection Lab Big Idea 1 – Lab 1 Stats can be misleading…be very weary… Look at our data… We have what appears to be an outlier…a single plant with a height of 1 cm way off the beaten path… 10 What does the histogram inform us about the mean then? It may not be so accurate because of this potential outlier and therefore the median may be the better value to use as the center of data. What would be the mean without the outlier? 10.6…closer to the median… # of plants 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Height bins (cm) Artificial Selection Lab Big Idea 1 – Lab 1 Stats can be misleading…be very weary… The range can also be misleading… Ex) The range of 1 to 16 makes it seem that our plant heights might be evenly spread across the entire range. However, what does the histogram show us? The vast majority falls between 6 and 16. A very different picture indeed. Then what other measure of spread will help us talk about the middle and not just 10 8 # of plants -The range is a measure of spread, but should never be used as the only measure of spread as it tells you nothing about what is going on in the middle. 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Height bins (cm) Artificial Selection Lab Big Idea 1 – Lab 1 Then what other measure of spread will help us talk about the middle and not just the edges of the data? 10 8 # of plants Standard Deviation (s or σ) 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Height bins (cm) σ = the lower case Greek letter sigma Artificial Selection Lab Big Idea 1 – Lab 1 What is Standard Deviation (s or σ)? - The standard deviation is a number that you calculate based on your data. This number will tell you more precisely than the range where your data is located relative to the mean…not just between 1 and 16 like before. How does it do that? 8 7 # of plants What does this number tell me? 9 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Height bins (cm) Quite simply. Our data has a mean of 10.37 cm. Let’s say we calculate the standard deviation to be σ = 1.1. Therefore we would write 10.37 +/- 1.1 cm. This tells you to add 1.1 to the mean getting 11.47cm, and subtract it from the mean getting 9.27cm. Great, so what? So what? This tells you that between 9.27cm and 11.47cm is 68% of your data!! Or that the next plant you grow will have a 68% chance of being between 9.27 and 11.47 cm. Artificial Selection Lab Big Idea 1 – Lab 1 What is Standard Deviation (s or σ)? Which data set, red or blue, has the greater mean? They have the same mean and the peaks of both normal distributions align. Which data set has the greater standard deviation? The red data is tighter, closer to the mean. Therefore the standard deviation should be smaller (68% of the data will be closer to the mean than in the blue data set). Histogram of two sets of data, blue and red, of any data you want it to be…. Conclusion: The smaller the standard deviation… the closer the data is to the mean and the more narrow the peak!! Artificial Selection Lab Big Idea 1 – Lab 1 What is Standard Deviation (s or σ)? What do researchers hope for their standard deviation values to be? As small as possible making the data peaks as narrow as possible. Why? Because we typically compare two or more data sets to each other as we will do later… Look to the right. We are comparing the blue data, say blood pressure of standard people, to the green data, blood pressure of people on medication to lower blood pressure. Now can you figure out why they want to peaks to be as narrow as possible? To tell if there if a difference between the groups!!! Artificial Selection Lab Big Idea 1 – Lab 1 What is Standard Deviation (s or σ)? Why do these peaks have spread associated with them? Why can’t all the data just fall on one point giving us a line? Why can’t all the plants just have one height?? 1. Natural Variation in a population…and there if nothing you can do about this. 2. Variables not being controlled tight enough like temperature, water, sunlight, etc… or variables that you are not considering, but should be. 3. Error in one’s instruments of measurement (not making a mistake)…a ruler can only measure so well…significant digits…cough, cough! 4. Small sample size CONCLUSION: Nature has enough variation. Researchers need to control important variables tightly, develop and utilize instruments of measure appropriate to the study, and to do one’s best to have a large sample size. Artificial Selection Lab Big Idea 1 – Lab 1 What’s up with this kid? What is Standard Deviation (s or σ)? Now that you understand standard deviation (SD, s, σ), what is the meaning of the figure to the left? 68% of data falls within 1 SD of the mean 95% of data falls within 2 SD of the mean 99.7% of data falls within 3 SD of the mean s = standard deviation = mean Artificial Selection Lab Big Idea 1 – Lab 1 What’s up with this kid? What is Standard Deviation (s or σ)? I love SD. Please can you show me how to calculate it from my data? It’s so simple! You really just want to know how far away all of your data points from the mean!!...and a little more: s = standard deviation = mean n = sample size x = data value Artificial Selection Lab Big Idea 1 – Lab 1 What’s up with this kid? What is Standard Deviation (s or σ)? I love SD. Please can you show me how to calculate it from my data? 1. Determine the average (mean) 2. Subtract the mean from every one of your data values in the population. Artificial Selection Lab Big Idea 1 – Lab 1 What’s up with this kid? What is Standard Deviation (s or σ)? I love SD. Please can you show me how to calculate it from my data? 1. Determine the average (mean) 2. Subtract the mean from every one of your data values in the population. 3. Square each of the differences Artificial Selection Lab Big Idea 1 – Lab 1 What’s up with this kid? What is Standard Deviation (s or σ)? I love SD. Please can you show me how to calculate it from my data? 1. Determine the average (mean) 2. Subtract the mean from every one of your data values in the population. 3. Square each of the differences 4. Sum up the Squares…called Sum of Squares (SOS) Artificial Selection Lab Big Idea 1 – Lab 1 What is Standard Deviation (s or σ)? I love SD. Please can you show me how to calculate it from my data? 1. Determine the average (mean) 2. Subtract the mean from every one of your data values in the population. 3. Square each of the differences 4. Sum up the Squares…called Sum of Squares (SOS) 5. Divide the SOS by the sample size (n) – 1 (this number is called the variance). Artificial Selection Lab Big Idea 1 – Lab 1 What’s up with this kid? What is Standard Deviation (s or σ)? I love SD. Please can you show me how to calculate it from my data? 1. Determine the average (mean) 2. Subtract the mean from every one of your data values in the population. 3. Square each of the differences 4. Sum up the Squares…called Sum of Squares (SOS) 5. Divide the SOS by the sample size (n) – 1 (this number is called the variance). Why divide by n-1 and not just n? You are almost averaging the squares of the differences. If n = 1000 and you minus 1, makes really no difference so SD. However, if the sample size is 2 and you subtract 1, SD is much larger…penalized for a small sample size you are!!!!!! Artificial Selection Lab Big Idea 1 – Lab 1 What’s up with this kid? What is Standard Deviation (s or σ)? I love SD. Please can you show me how to calculate it from my data? 1. Determine the average (mean) 2. Subtract the mean from every one of your data values in the population. 3. Square each of the differences 4. Sum up the Squares…called Sum of Squares (SOS) 5. Divide the SOS by the sample size (n) – 1 (this number is called the variance). 6. Now just square root the variance to go back and there you go…the SD! Artificial Selection Lab Big Idea 1 – Lab 1 What’s up with this kid? What is Standard Deviation (s or σ)? I love SD. Please can you show me how to calculate it from my data? 1. Determine the average (mean) 2. Subtract the mean from every one of your data values in the population. 3. Square each of the differences 4. Sum up the Squares…called Sum of Squares (SOS) 5. Divide the SOS by the sample size (n) – 1 (this number is called the variance). 6. Now just square root the variance to go back and there you go…the SD! Artificial Selection Lab Big Idea 1 – Lab 1 Average height of your population of Wisconsin Fast Plants (Brassica rapa): 10.4 ± 2.94 cm 10 # of plants 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Height bins (cm) Artificial Selection Lab Big Idea 1 – Lab 1 We now need to do some artificial selection…. What should we do? Directional? Disruptive? Stabilizing? Type of graph? Histogram! Directional? Me too…lets do it. But how? Let’s kill the tallest 25% before formation of flowers (you should know why) of the plants and push the population towards being shorter (select for allele combinations that give shorter plants)… Remove these Artificial Selection Lab Big Idea 1 – Lab 1 We now need to do some artificial selection…. Now recalculate the descriptive stats for height of your new parental population before you breed them Remove these Artificial Selection Lab Big Idea 1 – Lab 1 We now need to do some artificial selection…. Original Population Selected Population (P generation) Average 10.4 9.26 Standard Deviation (σ) 2.94 2.45 Now breed the selected P generation and look at phenotype (height in this case) of the F1 generation. Artificial Selection Lab Big Idea 1 – Lab 1 F1 generation data that you collected. Excel file is on website. Guess what you do now with this data? Descriptive stats of course…histogram, average, sigma (SD),… Artificial Selection Lab Big Idea 1 – Lab 1 Original Population Selected Population (P generation) F1 Generation Average 10.4 9.26 9.61 Standard Deviation (σ) 2.94 2.45 2.53 ? The big question now…Is the original population significantly different from the F1 generation in terms of height due to the artificial selection? Artificial Selection Lab Big Idea 1 – Lab 1 Original Population Selected Population (P generation) F1 Generation Average 10.4 9.26 9.61 Standard Deviation (σ) 2.94 2.45 2.53 ? How can we determine this? Is the difference in average enough to make a conclusion? Try this…make a bar chart and a histogram of both populations in the same chart. Artificial Selection Lab Big Idea 1 – Lab 1 Standard Error and Error Bars Original Population Selected Population (P generation) F1 Generation Average 10.4 9.26 9.61 Standard Deviation (σ) 2.94 2.45 2.53 12 10 8 6 4 2 0 1 Original population 2 F1 Generation n = 38 n = 41 A bar graph showing averages of a group without error bars is meaningless… Error bars typically indicate either standard deviation or standard error. We will use standard error. How does one calculate standard error you ask? SEx = standard error of the mean S = standard deviation n = sample size Artificial Selection Lab Big Idea 1 – Lab 1 Standard Error and Error Bars Original Population Selected Population (P generation) F1 Generation Average 10.4 9.26 9.61 Standard Deviation (σ) 2.94 2.45 2.53 Standard Error (Sex) 0.376 12 11.5 11 10.5 10 9.5 9 1 Original population 2 F1 Generation n = 38 n = 41 Error bars indicate standard error of each group. Error bars typically indicate either standard deviation or standard error. SEx = standard error of the mean S = standard deviation n = sample size .410 Artificial Selection Lab Big Idea 1 – Lab 1 Histogram of both groups: 9 8 Original Populatio n Selected Population (P generation) F1 Generation 10.4 9.26 9.61 2.94 2.45 2.53 7 # of plants 6 Average 5 Standard Deviation (σ) 4 3 2 Gold = original generation Red = F1 generation 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Height bins (cm) Even though the averages are different, the histogram shows that the data overlaps dramatically, which you would expect if you looked at the standard deviations of group. How does a researcher deal with this? We would need to use a statistics test known as a t-test to give us a p-value, which of course would tell us… The probability of the null (no difference between groups) hypothesis being supported!! Artificial Selection Lab Big Idea 1 – Lab 1 Histograms are graphs that reveal the distribution/frequency of your data (how often particular values appear). If enough data is collected, histograms can reveal a normal distribution in the data around a central mean. What is the approx. range of the birth weight data shown on the right? ~0.9 to 5.0kg range Artificial Selection Lab Descriptive Statistics Histograms and Distributions Big Idea 1 – Lab 1 So should we be focusing on the median more than the mean???? No. Generally speaking, the mean is TYPICALLY a far more accurate measurement in terms of central tendency than the median when outliers have been dealt with. To convince yourself, try this exercise from Seeing Statistics (www.seeingstatistics.com): The median is more resistant to extreme, misleading data values so it would seem to be the clear choice. However, we also need to consider accuracy. Is the median or the mean more likely to be close to the true value? To evaluate the relative accuracy of the median and the mean, let's consider how they do when we know the true center of the data. Suppose that the only possible scores are the whole numbers between 0 and 100. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 The center of these 101 numbers, whether we use the median or the mean, is 50. What if we were to select five numbers randomly from this set of 101 and calculate the median and mean of those five numbers? Would the median or the mean be closer to what we know is the true value of 50?