Chapter 5 - Lyndhurst School

Chapter 5
Comparing Two Means or Two
Students will be able to:
1) Test for a difference in means
2) Test for a difference in medians
• In Chapter 4 we learned how to graph and calculate
summary statistics for distributions of numerical data. We
also learned how to compare PERFORMANCES in two
different contexts using these graphs and summary
• Our question in Chapter 4 was “Does the DH increase
offense in Major League Baseball?”
• Using our newly acquired skillset, we were able to come
up with a preliminary answer to our question. There is
evidence that teams in the AL have a greater ABILITY to
score runs than teams in the NL. However, we are unable
to say we have convincing evidence.
• In Chapter 5, we will conduct hypothesis tests to test for a
difference in center. We will then be able to state whether
or not we have convincing evidence.
• The processes in Chapter 5 are going to be similar
to the processes in Chapter 2. The major
difference is that in Chapter 2 we used categorical
variables and measured an athlete’s
PERFORMANCE with a percentages of success.
• In Chapter 5, we use numerical variables and
measure an athlete’s PERFORMANCE with a mean
or median.
• As in Chapter 2, we are going to state hypotheses,
simulate test statistics, and draw conclusions.
Testing a Difference in Means
• Based on our comparison of the distribution of
runs scored for the AL and the NL in 2008, it is
clear that the average offensive PERFORMANCE
of teams in the AL is higher than the average
offensive PERFORMANCE of teams in the NL.
• Remember that
• We must test to see if we can essentially rule
• We’ll run a hypothesis test using the
difference in means as our test statistic. Later
in the chapter we’ll use difference in medians
as our test statistic.
The mean of the AL distribution is 774.6 runs.
The mean of the NL distribution is 733.8 runs.
Our test statistic (AL – NL) is 40.8 runs.
What would be our hypotheses?
• Now we can set up the simulation to test for
the possible differences in means that could
occur by RANDOM CHANCE, assuming that
the two leagues have the same ABILITY to
score runs.
• We will want to see how likely it is to get a
difference in means of 40.8 runs or larger,
simply due to RANDOM CHANCE.
• Let’s do this using note cards.
• We will start with 30 cards (for 30 MLB teams).
• We will write each of the 30 teams run totals on a
note card.
• Pg 120
• Now that the cards are set up, shuffle them.
• Next, deal them into two piles. One pile
should have 14 cards to represent the AL
teams and one pile should have 16 cards to
represent the NL teams.
• Calculate the mean of each pile and take the
difference (AL – NL). Note: The difference will
be negative if the NL pile has a higher mean.
Here are the results of 100 trials of the simulation.
On the previous slide, we saw that 4 of the 100
simulated seasons produced a difference in means of
at least 40.8. Therefore, what would be our p-value?
With that p-value, what would be our conclusion if
we use a 5% level of significance?
• p-value: 4%
• Because the p-value is so close to 5%, we can
repeat the simulation using more trials.
Instead of 100 trials, let’s use 10,000 trials.
521 of 10,000 simulated seasons produced a
difference in means of at least 40.8. What is our
new p-value? As a result, would our conclusion
Something to remember…
• Since this was not an experiment, we cannot
claim causation. Even if we found convincing
evidence that AL teams had a greater ABILITY
to score runs, we cannot say that the cause of
the increase is the DH. There are other
variables that can have caused an increase in
offense, and these variables were not
controlled for.
Experiments: Heating a Football?
Let’s take some time to review the concepts of
experiments introduced in Chapter 2. We’ll then
apply these concepts to a new experiment, and
introduce a few new ideas.
• Think about kicking a football in different weather
conditions. Do you think a kicker might be able to
kick the ball farther in certain weather conditions
as opposed to others?
• Suppose a kicker notices he can kick a ball farther
when the weather is warm compared to when it is
• What might be some reasons for this?
– His leg muscles might be looser when it is warm
– The warm air outside provides less resistance for the
ball as it moves through the air
– The air inside the ball is warmer, increasing the
pressure inside and making it better to kick
• Remember, we say the variables are confounded
because we do not know which variable is causing
the footballs to travel further.
• What we can do is perform an experiment to test
for one of these variables. We would then need
to make sure we control all other variables.
• Let’s design an experiment to test to see if a
kicker can kick a football farther after it has been
heated compared to when it is cold.
• Reminder: the response variable measures the
outcome of interest and the explanatory variable
is what is deliberately changed. What would
these variables be for this experiment?
– Response variable: distance the kicker kicks the
– Explanatory variable: temperature of the football
• Note: A difference between Chapter 5 and
Chapter 2 is that our response variable in our
experiment is now numerical.
• For this experiment it would be impossible to use the
same football, since we need the footballs to be at two
different temperatures.
• What we can do is use 10 similar footballs. We will
randomly choose 5 to be put in a refrigerator for 1 hour
and the other 5 to be put in the direct sun for 1 hour.
• It is important that the assignment of the footballs is
random so that any differences in the footballs themselves
are roughly balanced out and do not favor one
temperature over the other. (Something you would not
want to do is take 5 older footballs and refrigerate them
and 5 newer footballs and put them in the sun).
• A new concept is keeping the subject blind. This means
the subject does not know which treatment they are
receiving. We do not want our kicker knowing if they are
kicking a heated or a cooled football because it may
consciously or subconsciously cause the kicker to alter his
response. For example, if the kicker knows he is about to
kick a cooled football, maybe he won’t kick it as hard for
fear of hurting his foot.
• Ideally, another participant would be there randomly
putting a football on the tee for the kicker, and a third
person would be there measuring the distance the
footballs travel.
• If the person placing the ball on the tee does
not know if they are selecting a heated or
cooled ball, and the person measuring the
distance the ball travels does not know either,
then these people are blind as well. This type
of experiment is called double-blind.
• Remember to control everything. Keep all
other variables the same except the
temperature of the football.
• What hypotheses would we be testing?
• Let’s say we ran this experiment and received
the following results.
• The mean distance of the warm footballs is 59.4
yards and of the cold footballs is 56.2 yards. What
is our test statistic?
• (warm – cold) = 59.4 – 56.2 = 3.2 yards
• Here is a dotplot showing the results of 100 trials
of a simulation.
• From the previous slide we see our p-value is
10%. Therefore, what is our conclusion?
• One of the reasons for our result could be the
small sample size, which means it is possible we
may have committed a Type II error.
• What we can do is increase the number of trials
for each treatment. This is called replication.
• In an experiment, replication means making sure
that each treatment has an adequate number of
trials so that any difference in the effect of the
treatments can be identified.
Testing for a Difference in Medians
• If a distribution contains outliers or is skewed,
the value of the mean may no longer be a
good indication of what is typical.
• Remember that medians are resistant to
unusually large or small values. Therefore,
when comparing distributions that are
skewed, we should consider comparing their
medians rather than their means.
• The process for testing a difference in medians is
almost the same to that of testing a difference in
means. The only difference is that we will use a
median when calculating the test statistic and
simulating the distribution of the test statistic.
• Let’s look at a baseball example.
Decline of the Triple
• It has recently been suggested that the number of
triples hit by baseball players has decreased for a
few reasons:
– Teams prefer power hitters over speed
– Teams are more risk-averse, and prefer a sure double
rather than risking an out with a hitter going for a
• Has the ABILITY of MLB players to hit triples gone
down in the 25 years from 1979-2004?
• We want to test these hypotheses using the
difference in medians as our test statistic:
• Why medians and not means?
• Both distributions are skewed right with several
outliers, making the mean a poor choice.
• Calculate the test statistic (difference in medians).
• (1979 – 2004)= (4 – 2) = 2
• Here are 1000 trials of the simulation.
• p-value: 0.1%