Significant Science: Statistics for Planet Earth A lesson developed through the Environmental Science Institute’s NSF GK-12 Program at the University of Texas Beth Dawson, Fellow, University of Texas at Austin bethdawson@mail.utexas.edu Kirstin Busch, Teacher, LBJ High School kbusch@austinisd.org The Problem Students were mis-interpreting the significance of their Biodiversity Project field data Average Number of Individuals by Proximity to Water at Mayfield Preserve 40 Number of Individuals 35 30 25 20 15 10 5 0 5m 270m Proximity to Water (m) The Solution Students were led through a 2-day exercise: 1. To apply the terms population, sample, and learn to calculate mean and standard deviation 2. To learn how to perform and interpret a twosample t-test using Excel The Results Students successfully reported the results from their t-test and communicated these results through a written conclusion # Individuals: Close to Water Far from Water 27 33 23 27.67 5.03 34 36 40 mean # indiv. 36.67 STDEV # indiv 3.06 P-value 0.05713 5.71% Result: The number of individuals living near the water is not significantly larger than the number of species living far from the water (p=.05713). Lesson Plans Included on the Environmental Science Institute’s Lesson Plan Website (http://www.esi.utexas.edu/gk12/lessons.php) Student handouts Teacher notes Spreadsheet of examples and solutions Math concepts and t-test formula Helpful References and Links Choosing and Using Statistics, Calvin Dytham (Blackwell Science) http://www.ruf.rice.edu/~lane/rvls.html http://mathworld.wolfram.com http://www.StatisticalPractice.com http://www.ccnmtl.columbia.edu/projects/qmss/t_about.html http://www-micro.msb.le.ac.uk/1010/1011-17.html http://www.esi.utexas.edu/gk12/index.html Page 1 Statistics: Teacher notes for the two-sample t-test Goals • Students learn to use statistics to test hypotheses • Students discuss concepts such as population, sample, mean, and standard deviation • Students learn how to perform and interpret a two-sample t-test Materials • Teacher notes • Student handouts with practice problems • Excel spreadsheets with examples and solutions • Math background Overview This lesson helps high school science students learn how statistics are used to interpret results of scientific experiments. It helps resolve misunderstandings about data and shows students how mathematics can help test hypotheses. This lesson plan was developed to help students test their own hypotheses with data collected during field trips. Any data appropriate for a two-sample t-test can be used for the final project. We encourage you to read through these notes, practice with the examples, and use the statistics with your students’ own data. Math teachers can use these notes and the accompanying math background document to build lesson plans to support this curriculum. We’ve created a student handout to guide the students through discussion questions and give them structured space to take notes. There is also a Microsoft Excel spreadsheet of examples and practice problems. Contact us! This lesson was developed by Beth Dawson, a graduate student in Integrative Biology at the University of Texas, Austin, and Kirstin Busch, a life science teacher at the Liberal Arts and Science Academy at LBJ High School, Austin Independent School District, Texas. We worked together as part of a National Science Foundation GK-12 grant to the Environmental Sciences Institute at the University of Texas in Austin. We would like to thank Dr. Susan Empson and Dr. Jay Banner for their help and encouragement. We’d love to get your feedback on this lesson and hear how you’ve incorporated statistics into your classroom. Please email: bethdawson@mail.utexas.edu kbusch@austinisd.org Page 2 Introduction – Are boys the same height as girls? Let’s start by asking a question: are boys the same height as girls? Or are they usually taller? We could probably find an example in our class of a girl who is taller than a boy. Can we take that one example and use it to conclude that all girls are taller than all boys? Let’s take our discussion and use it to formulate a hypothesis: Hypothesis #1: boys are not the same height as girls We could write an even more specific hypothesis: Hypothesis #2: boys are taller than girls Today, we’re going to learn how scientists use math to test hypotheses. And we’re going to test these two hypotheses. First, let’s collect data. Let’s separate the class into boys and girls and have each group write down their heights, in inches. We can use a spreadsheet to chart the heights. (NOTE: See the data table in Student Handouts. You can use the Microsoft Excel spreadsheet of examples to enter student data and generate a histogram similar to the figure below. The example uses 14 girls and 14 boys. Note that this works best with large class sizes; if you have a small class, you might want to combine data from other classes. You may enter the data from your own class into the spreadsheet or use the data supplied in the example. If you use your own data, you will need to regenerate the histogram and create new charts. To generate a histogram in Excel, select “Data Analysis” under the Tools menu, then select “Histogram” and follow the directions. Put the output for boys in cell D5 and the output for girls in F5.) Histogram of Heights 7 6 5 4 boys girls 3 2 1 0 57 59 61 63 65 67 69 Height (inches) 71 73 75 77 Page 3 Here’s our chart. It looks like most boys are taller than most girls, but there is some overlap. If we were studying this, what conclusion might we draw if we happened to select very short boys and very tall girls for our study? Is it possible that we might conclude that boys and girls are generally the same height? Just looking at this chart, can you prove or disprove our hypothesis? Now let’s define some key terms and look at how scientists use math to answer these types of questions. Population and sample When scientists design their studies, they need to be very careful about how they word their hypothesis and how they collect their data. Each scientist defines two key terms: sample and population. Population: all of the organisms that can be included in the study The term “population” is used to refer to all of the plants, animals, humans, etc. that could be included in your study. Another way to look at this: it’s the group to which you want to apply your results. Let’s think about an example. Let’s say we want to study musical abilities in U.S. high school students. Our population is defined as “all high school students in the United States.” Because we have selected only high school students, we can’t apply our results to elementary school students. If we wanted to study musical abilities in everyone younger than 18, then we would need to redefine our population to include younger students. In our study of student heights, our population might be all of the high school students in our town, or our state. Often, populations are so large that scientists can’t study every single member of the population. How many high school students are there in the United States? Thousands? Millions? Scientists use samples from the population. Sample: a subset of the population Samples let scientists study fewer people (or plants or animals) but still apply their results to the population at large. In our study of musical abilities in high school students, we could select 1500 high school students from across the U.S. Or, we could just take all of the students enrolled in band and use them. Could we? Would this be a good sample? Would we get different results if our sample included only students enrolled in art class? It is very important that the sample meet the same definition as the population. Generally, scientists select their sample at random, with no bias. Can you think of a way to randomly select 100 students from your high school? Page 4 Mean and Standard Deviation Every population and every sample can be described mathematically. Most of you probably know or have heard the word “average.” But can you define it? Can you calculate the average for your group’s height? Average: the middle point of a dataset, defined as the total of all values divided by the number of values Average is also called “mean” and is a very good way to describe your data. If you wanted to tell someone how tall everyone in your class is, you could give them a list of all of the heights, or you could just tell them the average height. Average is a good way to describe your data. Let’s calculate the average height for the boys and then for the girls. Another way to describe your data is to look at the “spread” of data. We could look at our data of heights like this: Scatterplot of Height Data 75 73 Height (inches) 71 69 67 65 boys girls 63 61 59 57 55 (If you enter the height data for your students in the Excel spreadsheet, this chart should automatically update with your students’ data. If you have more than 14 boys or girls, you will need to update the source data for this chart to include the additional data. Select the chart, then choose Source Data, Data Series and update the Y values.) Describe what you see in this chart. We can talk about whether the boys are generally taller than the girls. But what can you say about the “spread” of the heights? Are girls more or less the same height? Do the heights of boys differ more? We can mathematically calculate the “spread” of the data using standard deviation. Page 5 Standard deviation: a measure of the spread, or dispersion, of data relative to the mean Standard deviation can be calculated with a spreadsheet or a calculator. Let’s look at the standard deviation of our height data. (In the example spreadsheet supplied, the girls have a much smaller standard deviation than do the boys. If you are using your class’s own data, you will have different results.) (In the StatsPractice.xls spreadsheet, you will find sample datasets that your students can use to practice calculating mean and standard deviation.) Writing and testing hypotheses Statistics are used to test hypotheses, so it is very important to carefully write your hypothesis. Let’s go back to our example of heights. We can write two hypotheses: H1: boys are not the same height as girls H2: boys are taller than girls Notice that the first hypothesis is very general. It simply predicts that the two samples are essentially different. The second hypothesis makes a more specific prediction. It says that boys are taller than girls. But can we test either hypothesis by looking at our data? Or at the chart? What is the population for these hypotheses? How have we sampled that population for the data in our chart? (Charts can be used to visualize data but that visualization can be imprecise and inaccurate. The populations are “boys” and “girls”. Our sample isn’t really random – since all of our boys and girls are about the same age and attend the same school.) One way to think about testing this hypothesis is to look at your sample of boys and your sample of girls and ask if they come from the same population. In terms of height, are boys and girls essentially the same or different? Statistics Years ago, in fact back in the 1800’s, scientists realized that they needed a more definite way to test hypotheses. One scientist in particular was a man named William Gossett. He was a chemist employed by the Guinness Brewery in Ireland to make sure that each batch of beer met a specific standard. Every day, he took a sample of beer and performed various chemistry tests. Then, he compared the tests from that batch of beer to the results they wanted and asked the question “are these two batches of beer the same?” He devised a mathematical formula for comparing two samples to test the hypothesis that the two samples are from the same population. In 1908, he published his work under the pen Page 6 name “Student” and described a test known as Student’s t-test. Gossett also designed other mathematical tests that are widely used in statistics. He was an important contributor to the field of statistics. Statistics: a field of mathematics used to interpret scientific data There are many statistical tests, each designed to be used for a different type of data and to test different relationships. Scientists design their experiments with statistics in mind, planning how they will analyze their data even before they collect it. Gossett’s t-test is a widely used statistical test. Two-sample t-test: a mathematical test that compares two groups to test the hypothesis that the two samples come from populations with the same mean A t-test produces a result known as a “p value” or the probability value. For example, a p value of 0.5 means there is a 50% chance that your two samples are really the same. In this case, you could not reject the hypothesis that the two samples come from two populations with the same mean. A p value of 0.1 means there is a 10% probability that these two samples are essentially the same. If you are looking to find differences between your two samples, then a lower the p value is “better” in terms of testing your hypotheses. Let’s do a two-sample t-test with our height data. Remember our first hypothesis: H1: boys are not the same height as girls This is the simplest way to say “these two samples are not from the same population.” Performing the t-test Many spreadsheet software programs can perform a t-test for us. You need to have data for the first group, data for the second group and then you need to make some decisions about how to perform the t-test. The t-test has some assumptions that must be met. One of these is that your data is normally distributed, that is to say that it is on a bell curve. Our histogram shows us that our data is normally distributed so we’re fine there. Another assumption is that the “spread” between the two groups is equal. We performed a standard deviation so we know the result of that, too. If your standard deviations are fairly close, then you can assume the spread of your data is equal. If your standard deviations are very difference, just to be safe, don’t assume that your spread is equal. Here are the steps to perform a t-test in Microsoft Excel. Under the Insert menu, select Function. This brings up a window that lists all of the functions. Select statistical then select “ttest” to insert a two-sample t-test. This brings up a dialog window similar to this: Page 7 For Array1, select the cells that contain the boys’ heights. For Array2, select the cells that contain the girls’ heights. Where it says “tails”, type a 2. We’ll explain this in a minute. For type, you need to decide whether your standard deviations are the same (in other words, do you have equal variance?). If you have equal variance, type 2. If not, type 3. (For now, we won’t do a paired t-test, which is type 1.) This produces a p value for your two samples, which is the probability that the two samples are from the same population. Now, the question is how to interpret this p value. How different is different enough? Scientists use a standard p value of 0.05 to conclude that two samples are not from the same population. That is to say, there is a 5% chance that they are but a 95% chance that they aren’t from the same population. This is a pretty stringent requirement and ensures that before we say two samples are different we are pretty confident of that result. What if your p value is 0.06? You’re almost to that magic 0.05 but not quite. What can you do? The best way to respond is to collect more data. In general, the larger the sample size, the more powerful your result from the t-test. Tails and more tails In the example above, we selected “2” for tails, to perform a two-tailed t-test. What’s all this about tails? Let’s look at our histogram of heights and our original hypothesis: Page 8 Histogram of Heights 7 6 5 4 boys girls 3 2 1 0 57 59 61 63 65 67 69 71 73 75 77 Height (inches) H1: boys are not the same height as girls Our hypothesis doesn’t predict whether boys are taller or shorter than girls. That is to say, we don’t know if the blue line is going to be on one “tail” or the other “tail” of the girls’ distribution. A two-tailed t-test tests for differences in either direction – taller or shorter. If we have a pre-existing reason to predict that boys are going to be taller than girls, we can use our second hypothesis: H2: boys are taller than girls This hypothesis makes a more specific prediction. To test this, we can ignore the case where boys are shorter than girls and just test whether boys are taller. In t-test language, this means we can perform a “one-tailed” t-test. Try re-doing your t-test in Excel. This time, when the dialog box asks for “tails,” type 1. Interpreting the p value Now, back to that p value. We’ve decided that a p value of less than 0.05 supports the hypothesis that our two samples are from a different population. Let’s write that in a sentence as if we were presenting our results in a scientific publication: “We determined that boys are significantly taller than girls (p=0.0000165).” Notice that we used the word “significantly.” If you want to use the word “significant,” you need to perform some type of statistical test to establish that significance. In this example, our two groups are very significantly different, with a very small p value. Page 9 The most important step to a t-test is writing your results. Just reporting a p value isn’t enough; you need to know how to state your results and when to use the word “significant.” As you practice performing a t-test with practice data and with your own data, remember that the final step is writing a sentence like the one above. Congratulations! You are well on your way to statistical significance! Now you can not only perform scientific experiments and record your data but you also know how to use one of the most important statistical tests – the t-test – to interpret the significance of your results! Challenging students Students who are strong in math may want to go beyond this lesson. There are several steps you can take with these students. The first is to present the formula for a t-test and have the students build a spreadsheet that calculates their t-test. In this case, you will need a statistics textbook (or any one of a number of sites on the Internet) to look up the results of their t-test and find the associated p value. The calculation for a t-test is included in the math background. Students who quickly grasp this lesson can also be valuable mentors for other students. You could pair them with a student who’s just learning the t-test and have the two share their experiences. Another commonly-used t-test is the paired t-test. This is a valuable tool in “before/after” experiments. For example, we might measure the height of students at the beginning of the year and at the end. For each student we will have a pair of data – beginning and end. Many scientific experiments produce data that is best studied using a paired t-test. In Excel this is easy to perform; simply enter “1” for type in the t-test dialog box. Have students write their hypothesis, results and conclusions for the experiments performed in class. This is an important exercise to let them learn how to use a t-test to test hypotheses and how to interpret the results of a t-test. Another exercise would be to build more sampling distributions, as we did in the beginning of this with the histogram of student heights. You might encourage students to build a data log of the number of species of birds they see on the way to school every morning throughout the school year. You could use a paired t-test to test if the number of species seen in the fall is the same as the number of species seen in the spring. This would be a good way to connect the statistical tests with an ecology lesson on native species of birds. Page 10 For more information See the Links section on our Stats.ppt file for links to valuable teaching sites on the Internet. You can find more in-depth explanations of the t-test on these sites, practice problems, and more advanced statistical tests. Statistics: Math background for the two-sample t-test Goals • Students learn to use statistics to test hypotheses • Students discuss concepts such as population, sample, mean, and standard deviation • Students learn how to perform and interpret a two-sample t-test Materials • Teacher notes • Student handouts with practice problems • Excel spreadsheets with examples and solutions • Math background Overview These notes are designed to support the Statistics lesson plan for biology teachers. Math teachers can use the Statistics teacher notes and this math background document to build lesson plans to support this curriculum. The goal is for students to understand the mathematics behind mean, standard deviation (variance) and the two-sample t-test. This eliminates the “black box” mystery behind these statistical tests and helps students understand how to interpret statistical tests. The Microsoft Excel spreadsheet of examples and practice problems also has a section for the math examples used here. Contact us! This lesson was developed by Beth Dawson, a graduate student in Integrative Biology at the University of Texas, Austin, and Kirstin Busch, a life science teacher at the Liberal Arts and Science Academy at LBJ High School, Austin Independent School District, Texas. We worked together as part of a National Science Foundation GK-12 grant to the Environmental Sciences Institute at the University of Texas in Austin. We would like to thank Dr. Susan Empson and Dr. Jay Banner for their help and encouragement. We’d love to get your feedback on this lesson and hear how you’ve incorporated statistics into your classroom. Please email: bethdawson@mail.utexas.edu kbusch@austinisd.org Math Concepts for the t-test • • • Mean Standard Deviation Variance Mean Also called average, the arithmetic mean of a dataset is a way to describe the middle point of the data. Along with standard deviation and variance, it is a common descriptive statistic that can be easily calculated by hand or with computer spreadsheet software. Mean is calculated as the sum of all data divided by the number of data. For example, here are the test scores from a History class: Test scores: 95, 77, 82, 85, 91, 76, 88, 87 Number of scores (n) = 8 First, we need to add all of the test scores. Then we divide that sum by n (the number of scores). Sum = Number of scores (n) Sum / n 95 77 82 85 91 76 88 87 681 8 85.125 If students are not comfortable with calculating the mean, encourage them to work through several examples. Ideally, students should be able to calculate the mean themselves before they use the “AVERAGE” function in Microsoft Excel to perform this calculation for them. Standard Deviation and Variance Both of these are also descriptive statistics. They describe the “spread” of your data, or how far from the mean each datum is. They can be used to evaluate the dispersion of the data. In other words, are the values in your data pretty similar or are they all very different? Standard deviation can defined as the positive square root of the variance. This definition is valuable because it is always a positive number and it will always have the same units as the original data. To calculate standard deviation, we first need to define variance. The definition of variance is more of a tongue twister: it is the mean of the squared deviations of data from their mean. Variance is not in the same units as the original data but it is used in many statistical tests, so we need to think about how it is calculated. For any given sample, here is a mathematical definition for the sample variance: Sample variance (s2) = average squared deviation of values from the mean It sounds a bit intimidating but it’s really easy if you break it down into steps. 1. 2. 3. 4. 5. Calculate the mean. Subtract the mean from each observation (obs-mean). Square each value from step 2 (squares). Add all the values from step 3 (sum of squares). Divide by the total number of observations minus one (variance). From here, standard deviation is easy! It’s just the square root of the variance. T-Test When students are confident with mean and standard deviation, then the formula for the two-sample t-test is fairly straight forward. For this test, we always have two sets of data – set 1 and set 2 – and we are testing the hypothesis that they come from populations that are essentially the same population. A t-test lets us either reject the null hypothesis that the two samples are from populations with the same mean or the t-test says we should fail to reject this hypothesis. Hypothesis: the population mean for sample 1 = the population mean for sample 2 1. Calculate the mean and the variance for each of your two samples. Do this as above so you have the sum of squares for each data set (i.e. SS1 and SS2). 2. Divide SS1+SS2 by (number of data -1) in set 1 and number of data-1) in set 2. 3. Divide this value by the n in set 1 and then divide it by the n for set 2. Add these values then take the square root. 4. Subtract the mean of set 1 from the mean of set 2 then divide by the value from Step 3. This is your t-test statistic. Now, you need a table of critical values for the t distribution o look up your results. Use alpha (p value) of 0.05, two-tailed, and a degrees of freedom equal to your combined (n-1). Rejecting or failing to reject our hypothesis comes from comparing our calculated t statistic to the critical value from the t distribution. If the absolute value of your calculated t is greater than or equal to the critical t value, then you should reject the hypothesis that the population means are equal. If you use an alpha value of 0.05, then there is at least a 95% probability that the two samples come from populations that do not have the same mean. Assumptions The t-test makes assumptions about your data. The first assumption is that your data are continuous, not discrete. Also, the t-test assumes your data come from a normal distribution and that the variances of the two samples are the same. Some statistical software programs test these assumptions before performing a t-test. While the t-test is fairly robust to minor deviations from these assumptions, it is best to use data that meet these assumptions for teaching the t-test. Conclusion This is a very brief introduction to one of the simplest statistical tests – the two-sample ttest. For more information, there are a number of valuable textbooks that present basic statistics. We encourage you to read more about sampling distributions, t-test, analysis of variance and related topics to gain more familiarity with statistics. Page 1 Statistics: Student notes Goals • Learn to use statistics to test hypotheses • Discuss population, sample, mean, and standard deviation • Learn how to perform and interpret a two-sample t-test Sections 1. Introduction 2. Mean and standard deviation 3. Statistics and the t-test Section 1. Introduction – Are boys the same height as girls? Hypothesis 1 = Hypothesis 2 = Your height data BOYS GIRLS Page 2 Discussion Questions If we were studying the heights of students, what conclusion might we draw if we happened to select only very short boys and very tall girls for our study? If we selected students from the basketball teams for our height data, how would this data compare to the data from your class? Key Definitions Population = Sample = Discussion Questions How would you randomly select 100 students from your school? Is it possible for a randomly selected sample of 100 students to include only boys? Page 3 Section 2. Mean and Standard Deviation Key Definitions Average (mean) = Standard deviation = Practice problems for Mean and Standard Deviation In Microsoft Excel, "mean" is called "average." Enter the data below. Click on the cell below your data. From the Insert menu, select Function. Then click on Statistical and select AVERAGE. Where the dialog box says "number 1" select all of the test scores. Press enter to calculate the mean. Practice problem – English grades A group of students took an English exam and made these grades. Calculate the mean and standard deviation for these test scores. 92 78 77 67 52 84 86 Mean (average) = Standard Deviation = Practice problem – height data Now, calculate the mean and standard deviation for your height data. Boys Mean (average) Standard Deviation Girls Page 4 Describe your student height data. Using the mean and standard deviation, write a short paragraph that describes the data for boys and the data for girls. Practice Problem – Birds. A biologist wants to know if birds that live in cities are smaller or larger than birds that live in the country. He measured the weight of birds in town and in the country. Calculate the mean and standard deviation for both samples. Weight of birds in town (grams) 13 22 11 18 26 15 19 20 16 Weight of birds in country (grams) 27 12 11 23 29 18 27 28 21 Mean (average) Standard Deviation Write a short paragraph that compares these two samples – the weight of birds in town versus the weight of birds in the country. Use the mean and standard deviation you calculated in your paragraph. What conclusions do you think the biologists might make from this data? Page 5 Section 3. Statistics and the t-test Key Definitions Statistics = Two-sample t-test = Performing the t-test Here are the steps to perform a t-test in Microsoft Excel. First, enter two sets of data. To start, we’ll use the height data for boys and girls that you have already entered. Under the Insert menu, select Function. This brings up a window that lists all of the functions. Select statistical then select “ttest” to insert a two-sample t-test. This brings up a dialog window similar to this: For Array1, select the cells that contain the boys’ heights. For Array2, select the cells that contain the girls’ heights. Where it says “tails”, type a 2. We’ll explain this in a minute. For type, enter a 3. p value = Page 6 Interpreting the p value Write your interpretation of the p value: Practice problem – doing a two-sample t-test on frog data. A scientist is studying the call of male frogs that live in a creek. She performs an experiment to measure the duration of the calls of frogs that live upstream and the calls of frogs that live downstream. The water in the upstream part of the creek is from a fresh spring and tends to be very cold. The water downstream is warmer. She wants to test this hypothesis: H: Frogs call differently in the warm water than they do in the cold water. Here is her data on the duration of frog calls at each location, measured in seconds: Upstream, cold 37.1 41.0 38.4 40.9 38.5 38.8 36.5 40.3 41.3 39.8 37.6 Downstream, warm 44.3 46.6 40.3 50.5 42.7 41.9 50.8 44.5 42.1 48.3 41.9 Mean = St Dev = P value = Write 2-3 sentences interpreting the result of your t-test: Stats example: are boys taller than girls? Sample data for height in inches boys 74 67 66 74 68 64 70 68 71 68 72 66 64 67 girls 62 62 63 65 64 65 66 61 65 60 63 63 64 62 68.50 3.28 bins 57 59 61 63 65 67 69 71 73 75 77 Bin 57 59 61 63 65 67 69 71 73 75 77 More Frequency Bin 0 0 0 0 2 4 3 2 1 2 0 0 More Frequency 57 59 61 63 65 67 69 71 73 75 77 0 0 2 6 5 1 0 0 0 0 0 0 63.21 MEAN 1.72 ST DEV Histogram of Heights 7 6 5 4 boys 3 girls 2 1 0 57 59 61 63 65 67 69 Height (inches) 71 73 75 77 Stats example: are boys taller than girls? Sample data for height in inches boys 74 67 66 74 68 64 70 68 71 68 72 66 64 67 girls 62 62 63 65 64 65 66 61 65 60 63 63 64 62 68.50 3.28 63.21 1.72 Two-sample t-test 0.0000165 p value 0.00165% as a percentage one-tailed test not assuming equal variance MEAN ST DEV Scatterplot of Height Data Height (inches) 75 70 65 60 55 boys girls Practice - English grades 92 78 77 67 52 84 86 Mean (average) 76.57 Standard Deviation 13.41 Practice - Birds Mean (average) Standard Deviation town 13 22 11 18 26 15 19 20 16 17.78 country 27 12 11 23 29 18 27 28 21 21.78 4.63 6.83 cold 37.1 41.0 38.4 40.9 38.5 38.8 36.5 40.3 41.3 39.8 37.6 39.11 warm 44.3 46.6 40.3 50.5 42.7 41.9 50.8 44.5 42.1 48.3 41.9 44.90 1.66 0.0001107 3.65 Practice - Frogs Mean (average) Standard Deviation p value Examples from Math lesson Test Scores - calculating - calculating - calculating from History Class mean (average) variance standard deviation Observation s 95 77 82 85 91 76 88 87 Sum = Number of scores (n) Mean (sum/n) Sum of squares (SS) n-1 Variance (SS divided by n-1) Standard Deviation (sqrt of var) Calculated by Excel Variance Standard Deviation 681 8 85.125 302.8750 7 43.2679 6.5778 43.2679 6.5778 obs-mean 9.875 -8.125 -3.125 -0.125 5.875 -9.125 2.875 1.875 squares 97.516 66.016 9.766 0.016 34.516 83.266 8.266 3.516 Examples from Math lesson Test Scores - Comparing History vs. English - calculating a t-test (obs - mean) squared Step 1 Mean Variance n n-1 History (1) 92 71 82 75 91 70 88 82 English (2) 88 72 98 94 71 70 96 89 81.375 75.411 8 7 84.750 140.786 8 7 Sum of squares Step 2 Step 3 Step 4 Calculated t Critical t SS1 + SS2 / (n-1)1 + (n1)2 1 112.89 107.64 0.39 40.64 92.64 129.39 43.89 0.39 2 10.56 162.56 175.56 85.56 189.06 217.56 126.56 18.06 527.8750 985.5000 108.098 Divide each by its n Add them together Take the square root 13.512 27.025 5.199 Subtract mean1 from mean2 Divide by -3.375 5.199 Your t statistic is 0.649 absolute value Alpha Degrees of Freedom t (0.05, 14) = 0.05 14 2.145 this is an established value combined n-1 look this up in a table of critical values of the t distribution 13.512 Our calculate t is less than the critical t, so we can not reject the hypothesis that our two samples are from populations with the same mean. Excel's calculated p value 0.527