Chapter 5 Comparing Two Means or Two Medians Objectives Students will be able to: 1) Test for a difference in means 2) Test for a difference in medians • In Chapter 4 we learned how to graph and calculate summary statistics for distributions of numerical data. We also learned how to compare PERFORMANCES in two different contexts using these graphs and summary statistics. • Our question in Chapter 4 was “Does the DH increase offense in Major League Baseball?” • Using our newly acquired skillset, we were able to come up with a preliminary answer to our question. There is evidence that teams in the AL have a greater ABILITY to score runs than teams in the NL. However, we are unable to say we have convincing evidence. • In Chapter 5, we will conduct hypothesis tests to test for a difference in center. We will then be able to state whether or not we have convincing evidence. • The processes in Chapter 5 are going to be similar to the processes in Chapter 2. The major difference is that in Chapter 2 we used categorical variables and measured an athlete’s PERFORMANCE with a percentages of success. • In Chapter 5, we use numerical variables and measure an athlete’s PERFORMANCE with a mean or median. • As in Chapter 2, we are going to state hypotheses, simulate test statistics, and draw conclusions. Testing a Difference in Means • Based on our comparison of the distribution of runs scored for the AL and the NL in 2008, it is clear that the average offensive PERFORMANCE of teams in the AL is higher than the average offensive PERFORMANCE of teams in the NL. • Remember that PERFORMANCE=ABILITY+RANDOM CHANCE • We must test to see if we can essentially rule out RANDOM CHANCE. • We’ll run a hypothesis test using the difference in means as our test statistic. Later in the chapter we’ll use difference in medians as our test statistic. • • • • The mean of the AL distribution is 774.6 runs. The mean of the NL distribution is 733.8 runs. Our test statistic (AL – NL) is 40.8 runs. What would be our hypotheses? • Now we can set up the simulation to test for the possible differences in means that could occur by RANDOM CHANCE, assuming that the two leagues have the same ABILITY to score runs. • We will want to see how likely it is to get a difference in means of 40.8 runs or larger, simply due to RANDOM CHANCE. • Let’s do this using note cards. • We will start with 30 cards (for 30 MLB teams). • We will write each of the 30 teams run totals on a note card. • Pg 120 • Now that the cards are set up, shuffle them. • Next, deal them into two piles. One pile should have 14 cards to represent the AL teams and one pile should have 16 cards to represent the NL teams. • Calculate the mean of each pile and take the difference (AL – NL). Note: The difference will be negative if the NL pile has a higher mean. Here are the results of 100 trials of the simulation. On the previous slide, we saw that 4 of the 100 simulated seasons produced a difference in means of at least 40.8. Therefore, what would be our p-value? With that p-value, what would be our conclusion if we use a 5% level of significance? • p-value: 4% • • Because the p-value is so close to 5%, we can repeat the simulation using more trials. Instead of 100 trials, let’s use 10,000 trials. 521 of 10,000 simulated seasons produced a difference in means of at least 40.8. What is our new p-value? As a result, would our conclusion change? Something to remember… • Since this was not an experiment, we cannot claim causation. Even if we found convincing evidence that AL teams had a greater ABILITY to score runs, we cannot say that the cause of the increase is the DH. There are other variables that can have caused an increase in offense, and these variables were not controlled for. Experiments: Heating a Football? Let’s take some time to review the concepts of experiments introduced in Chapter 2. We’ll then apply these concepts to a new experiment, and introduce a few new ideas. • Think about kicking a football in different weather conditions. Do you think a kicker might be able to kick the ball farther in certain weather conditions as opposed to others? • Suppose a kicker notices he can kick a ball farther when the weather is warm compared to when it is cold. • What might be some reasons for this? – His leg muscles might be looser when it is warm – The warm air outside provides less resistance for the ball as it moves through the air – The air inside the ball is warmer, increasing the pressure inside and making it better to kick • Remember, we say the variables are confounded because we do not know which variable is causing the footballs to travel further. • What we can do is perform an experiment to test for one of these variables. We would then need to make sure we control all other variables. • Let’s design an experiment to test to see if a kicker can kick a football farther after it has been heated compared to when it is cold. • Reminder: the response variable measures the outcome of interest and the explanatory variable is what is deliberately changed. What would these variables be for this experiment? – Response variable: distance the kicker kicks the footballs – Explanatory variable: temperature of the football • Note: A difference between Chapter 5 and Chapter 2 is that our response variable in our experiment is now numerical. • For this experiment it would be impossible to use the same football, since we need the footballs to be at two different temperatures. • What we can do is use 10 similar footballs. We will randomly choose 5 to be put in a refrigerator for 1 hour and the other 5 to be put in the direct sun for 1 hour. • It is important that the assignment of the footballs is random so that any differences in the footballs themselves are roughly balanced out and do not favor one temperature over the other. (Something you would not want to do is take 5 older footballs and refrigerate them and 5 newer footballs and put them in the sun). • A new concept is keeping the subject blind. This means the subject does not know which treatment they are receiving. We do not want our kicker knowing if they are kicking a heated or a cooled football because it may consciously or subconsciously cause the kicker to alter his response. For example, if the kicker knows he is about to kick a cooled football, maybe he won’t kick it as hard for fear of hurting his foot. • Ideally, another participant would be there randomly putting a football on the tee for the kicker, and a third person would be there measuring the distance the footballs travel. • If the person placing the ball on the tee does not know if they are selecting a heated or cooled ball, and the person measuring the distance the ball travels does not know either, then these people are blind as well. This type of experiment is called double-blind. • Remember to control everything. Keep all other variables the same except the temperature of the football. • What hypotheses would we be testing? • Let’s say we ran this experiment and received the following results. • The mean distance of the warm footballs is 59.4 yards and of the cold footballs is 56.2 yards. What is our test statistic? • (warm – cold) = 59.4 – 56.2 = 3.2 yards • Here is a dotplot showing the results of 100 trials of a simulation. • From the previous slide we see our p-value is 10%. Therefore, what is our conclusion? • One of the reasons for our result could be the small sample size, which means it is possible we may have committed a Type II error. • What we can do is increase the number of trials for each treatment. This is called replication. • In an experiment, replication means making sure that each treatment has an adequate number of trials so that any difference in the effect of the treatments can be identified. Testing for a Difference in Medians • If a distribution contains outliers or is skewed, the value of the mean may no longer be a good indication of what is typical. • Remember that medians are resistant to unusually large or small values. Therefore, when comparing distributions that are skewed, we should consider comparing their medians rather than their means. • The process for testing a difference in medians is almost the same to that of testing a difference in means. The only difference is that we will use a median when calculating the test statistic and simulating the distribution of the test statistic. • Let’s look at a baseball example. Decline of the Triple • It has recently been suggested that the number of triples hit by baseball players has decreased for a few reasons: – Teams prefer power hitters over speed – Teams are more risk-averse, and prefer a sure double rather than risking an out with a hitter going for a triple • Has the ABILITY of MLB players to hit triples gone down in the 25 years from 1979-2004? • We want to test these hypotheses using the difference in medians as our test statistic: • Why medians and not means? • Both distributions are skewed right with several outliers, making the mean a poor choice. • Calculate the test statistic (difference in medians). • (1979 – 2004)= (4 – 2) = 2 • Here are 1000 trials of the simulation. • p-value: 0.1% Conclusion: