AP Lab Skills Guide Data will fall into three categories: 1. Parametric (normal) data - Normal distribution around mean - Mean and SD can predict future observations Ex. Heart rate, plant height, body temp 2. Nonparametric data - Does not fit normal distribution - May include large “outliers” 3. Frequency or count data - Counting how many of an item fit into a category Ex) Doing a genetic cross (Aa x Aa) and counting how many offspring are AA, Aa and aa. Data collected as percentages like the percentage of cells in interphase of a root Ex2) tip…you are just counting. AP Lab Skills Guide Ex2) Data collected as percentages like the percentage of cells in interphase of a root tip…you are just counting. AP Lab Skills Guide Let’s make up a problem and build a histogram, calculate SD (standard deviation) and SEM (standard error of the mean) Let’s say I measured the heart rate of 10 people: Heart rate (bpm) 76 82 90 73 58 65 63 74 71 68 Mean = 72 (Mean – data point)2 = 16 (72-76)2 = 100 (72-82)2 = 324 (72-90)2 =1 (72-73)2 = 36 (72-58)2 = 49 (72-65)2 = 81 (72-63)2 =4 (72-74)2 =1 (72-71)2 = 16 (72-68)2 Add them up: 628 Now divide by the sample size minus one: 628/(9) = 69.8 Lastly, take the sqrt Sqrt 69.8 = 8.4 (this is the SD) 72 +/- 8.4 bpm AP Lab Skills Guide Let’s make up a problem and build a histogram, calculate SD (standard deviation) and SEM (standard error of the mean) Let’s say I measured the heart rate of 10 people: Histogram of Resting Heart Rates Heart rate (bpm) 76 82 90 73 58 65 63 74 71 68 Mean = 72 Number of individuals 3.5 3 2.5 2 1.5 1 0.5 0 51 to 55 56 to 60 61 to 65 66 to 70 71 to 75 76 to 80 81 to 85 Heart rate bins (bpm) 86 to 90 91 to 95 AP Lab Skills Guide What does the SD tell you? SD describes the predicted spread or variation of the measured variable in the ENTIRE population… Remember, you only measured a small sampling…what if you measured everyone? Ex) If the mean is 50 and the standard deviation is 10 (50+/- 10) then 68% of the population is predicted to be between 40 (50-10) and 60 (50+10). And… 95% of the population predicted to be between 2 standard deviations - 30 (50 minus 20) and 70 (50 plus 20). 3.5 3 2.5 2 1.5 1 0.5 0 51 56 61 66 71 76 81 86 91 to to to to to to to to to 55 60 65 70 75 80 85 90 95 AP Lab Skills Guide Let’s make up a problem and build a histogram, calculate SD (standard deviation) and SEM (standard error of the mean) Let’s say I measured the heart rate of 10 people: Heart rate (bpm) 76 82 90 73 58 65 63 74 71 68 Mean = 72 (Mean – data point)2 = 16 (72-76)2 = 100 (72-82)2 = 324 (72-90)2 =1 (72-73)2 = 36 (72-58)2 = 49 (72-65)2 = 81 (72-63)2 =4 (72-74)2 =1 (72-71)2 = 16 (72-68)2 Add them up: 628 72 +/- 8.4 bpm Now explain this data in words This says that the mean is 72 and that 68% of the entire population is predicted to have heart rates between 63.6 and 80.4, and that 95% of the population is between 55.2 and 88.8 AP Lab Skills Guide Now lets determine the SEM(SE): = 8.4/SQRT(10) = 2.65 bpm Standard Error of the Mean (SEM or SE) Standard Deviation (S, SD, σ) 72 +/- 2.65bpm AP Lab Skills Guide What does the SEM tell you? Exactly what it says…it is the predicted error in the mean itself. It gives you a range over which the actual mean of the entire population is predicted to be. Again, remember that you only measured a small sampling…the mean you calculate is not likely the actual mean…what if you measured everyone? Ex) If the mean is 40 and the SEM is 4 then there is a 68% chance (CI) that the actual mean of the entire population is between 40+/-4 or between 36 and 44. There is a 95% chance (CI) that the mean is between 2SEMs in either direction of between 32 and 48. CI = confidence interval 3.5 3 2.5 2 1.5 1 0.5 0 51 56 61 66 71 76 81 86 91 to to to to to to to to to 55 60 65 70 75 80 85 90 95 AP Lab Skills Guide Now lets determine the SEM(SE): = 8.4/SQRT(10) = 2.65 bpm Standard Error of the Mean (SEM or SE) Standard Deviation (S, SD, σ) 72 +/- 2.65bpm Now explain this data in words This says that the mean is 72 and that there is a 68% chance that the true mean of the population is between 69.35 and 74.65, and a 95% chance that it is between 66.7 and 77.3 bpm. AP Lab Skills Guide Let’s sum up the data heart rate data… MEAN SD SEM 72 7 2.65 This tells us that the mean of the OBSERVED SAMPLE is 72 bpm. The descriptive stats tell us that if the entire population were to be measured then 68% is predicted to fall between 65 and 79, and 95% between 58 and 86. In addition, the true mean of the entire population has a 68% chance of being between 69.35 and 74.65, and a 95% chance that it is between 66.7 and 77.3 bpm. AP Lab Skills Guide REVIEW Standard Deviation (SD, S, σ) describes the range of a particular variable that is predicted to include 68% of the population. Example – 70+/-7 bpm would imply that 68% of the total population would have heart rates between 63 and 77bpm. Standard Error of the Mean (SEM, SE) describes the range where the actual mean of the total population if predicted to be with 68% confidence. Example – 70+/-3 bpm would imply that there is a 68% chance that the actual mean of the total population is between 67 and 73bpm. AP Lab Skills Guide Explain in words the data: The Shady leaf width has a mean of ~7.2 and there is a 68% chance that the true population mean is between 7.0 and 7.6 (looks like SEM is .3) Statistical significance Are these two groups (shady and sunny) significantly different statistically? Justify If the means are different and the error bars do not overlap then you would predict them to be significantly different. AP Lab Skills Guide AP Lab Skills Guide AP Lab Skills Guide AP Lab Skills Guide Scatterplots - Comparing two MEASURED VARIABLES - If a linear relationship is predicted, a linear regression can be performed (best fit line; Figure 3) 2 - R (R-squared or coefficient of determination) •Typically ranges from 0 to 1 •Describes “goodness of fit” or how well the line drawn fits the points. •R2 = 0 implies not relationship •R2 = 1 implies prefect relationship (all points on line) AP Lab Skills Guide Box-and-Whisker Plots (Boxplot) - Used with nonparametric data (data that is not assumed to follow a normal distribution). - Vertical lines indicate highest and lowest points in dataset - Top of box shows upper quartile and bottom shows lower quartile. 70 Upper Quartile - Horizontal line represents the median median Lower Quartile Determining Lower (Q1) and Upper (Q3) Quartiles: 32 (Q2) (Q1 - lower) (Q3 - upper) You are simply dividing the data into quarters by medians…the upper quartile is the median of the upper half of the data and vice versa… AP Lab Skills Guide Box-and-Whisker Plots (Boxplot) Determine the upper and lower quartile of the sycamore and beech leaf data: Sycamore: Median (Q2) Equals 42 33 35 40 40 44 48 52 63 Lower Quartile(Q1) Equals 37.5 Upper Quartile (Q3) Equals 50 Notice how the upper and lower quartile range give you a sense of the center of the data without the influence of outliers that might exist in nonparametric data!! AP Lab Skills Guide Box-and-Whisker Plots (Boxplot) Determine the upper and lower quartile of the sycamore and beech leaf data: Beech: Median (Q2) Equals 42 11 15 19 21 26 32 34 Lower Quartile(Q1) Equals 37.5 Upper Quartile (Q3) Equals 50 http://www.brainingcamp.com/resources/math/box-plots/questions.php AP Lab Skills Guide Histograms Used to determine if a given set of measurements, like plant height from art. sel. lab, approximates a normal distribution (parametric) or if data is nonparametric. Histogram showing parametric data Histogram showing NONparametric data AP Lab Skills Guide Warning AP Lab Skills Guide AP Lab Skills Guide AP Lab Skills Guide