Ch 15 PP - Lyndhurst School

advertisement
Chapter 15
Conditional Probability, Expected
Value, and Strategy in Sports
Objectives
Students will be able to:
1) Use the general addition rule to calculate probability
(union and intersection of events)
2) Calculate conditional probability for dependent events
3) Use tree diagrams to organize events and calculate
probability using the general multiplication rule
4) Find expected value of random variables
5) Use expected value to make strategy decisions in sports
• On November 15, 2009, the New England Patriots
were playing the Indianapolis Colts. New England
had the football at their own 28 yard line with 2:08
left on the clock, and they led 34-28. It was 4th
down and 2. New England had no time outs left,
and the Colts had 1 time out left.
• The conventional move would be to punt and play
defense. However, if they go for it and pick up 2
yards they will essentially win the game. If they go
for it and don’t pick up the 2 yards, Manning will
have a good chance to throw for the game winning
touchdown, as he was on a roll in the second half.
• If you were Bill Belichick, what would you do?
Two-Way Tables and the General
Addition Rule
• In Chapter 2, we introduced two-way tables as
a way to organize information about the
distribution of a categorical variable in two
different contexts.
• Example: the outcomes of regular season
games for the 2008 Arizona Cardinals.
• Two-way tables can also be used to summarize
the relationship between two categorical
variables.
• Example: Let’s say the Tampa Bay Rays had a
promotion for home games in 2010. If the
team scored 7 or more runs, each fan will get
a free taco (they scored 7 or more runs 15
times that season). The only thing better than
getting a free taco would be getting a free
taco and watching the Rays win at the same
time.
• Here is a two-way table to show the relationship
between taco status and the outcome of the
game for the Rays’ 81 regular season home games
in 2010.
• 13 games yielded the ideal combination of free
tacos and a win. If we randomly select a game,
the probability that a fan got a free taco and saw
a win is P(taco and win) = 13/81= 0.16, or 16%.
The General Addition Rule
• What if we want to know the probability that
a fan saw a win or got a free taco?
• For this to occur, just one or the other event
needs to take place (or if both events took
place that would work as well).
• Keep in mind there is some overlap between
the two events, as 13 games produced both a
free taco and a win.
• We cannot just add the probability of getting a
taco and the probability of getting a win, due
to the overlap of the events (taco and win).
We have to account for that overlap.
• Looking at the two-way table, we should see
that we could add three separate mutually
exclusive events (events that can’t happen at
the same time) to get the probability of a taco
or a win.
• What would be incorrect would be if we just
added the probability of a taco and the
probability of a win:
• The calculation can also be done by adding the
probability of a taco and the probability of a win,
and then subtracting the overlap (taco and win):
• This new rule is called the general addition
rule:
STAT 101
• Instead of using the words “and” and “or” to
describe probability situations, some more
traditional statistics books use set theory notation.
• The word “or” is replaced by the union symbol and
the word “and” is replaced by the intersection
symbol.
• Let’s try another example.
Mr. Falcicchio is a big fan of the NJ Jackals for a
variety of reasons, one of which is due to the
Jackals having firework promotions. Every time
the Jackals score 6 runs in a game, they shoot
off fireworks at the completion of the game. On
the next slide is a two-way table summarizing
the 2013 results for the NJ Jackals, including
wins and losses, and number of time fireworks
were shot and not shot.
Find the following.
a) P(fireworks and win)
b) P(fireworks or win)
Conditional Probability and Independence
• Let’s revisit our taco example.
• If we randomly select one of the Rays’
victories from 2010, what is the probability
that a fan at that game received a free taco?
• Probabilities like the “probability that free
tacos were distributed, given that the Rays
won the game” are called conditional
probabilities.
• Conditional probability describes the
probability that an event occurs, given that we
know that a different event has already
occurred.
• Just looking at the win column makes this
probability easy to see.
• Conditional probability has the following
formula:
• For our example, the probability of both
events occurring was 13/81 and the
probability of a win occurring was 49/81, so:
• Let’s try another example. Find the
probability that the Rays won the game, given
that free tacos were given away.
• This is quite easy to see if we limit our
attention to the “taco” row of the two-way
table.
Independence
• Scenario: Kobe steps to the free-throw line for 2 shots.
• On his first shot, he has an 85% chance of making the
free-throw.
• Make or miss, on his second shot he still has an 85%
chance of making the free-throw.
• If this is true, the outcomes of his free-throw attempts are
independent, meaning Kobe’s ABILITY to make a freethrow is the same following a make as it is following a
miss. In other words, knowing that he makes the first
shot doesn’t help us predict the outcome of his second
shot.
• Using conditional probability notation:
• In general, two events are independent if
knowing the outcome of one event does not
affect the probability of the other event.
• Events A and B are independent if:
– This means event A has the same probability of
happening whether or not event B happens.
• Let’s go back to our taco example.
• Are the events “taco” and “win” independent? If
so, then knowing the outcome of the game would
not provide any additional information about the
probability of getting a free taco.
• However, if knowing the outcome of the game
changes the probability of getting a free taco,
then the events “taco” and “win” are not
independent.
• If the events are independent, then the following
relationship should exist:
• Let’s investigate.
Clearly knowing the outcome of the game changes
the probability of getting a taco. Therefore, the
events “taco” and “win” are not independent.
Tree Diagrams and the General
Multiplication Rule
• In tennis, the player serving has two chances to get
a serve into play.
• Generally, the player is more aggressive on the
first-serve.
• If the first serve is a fault, the player will be more
conservative on the second-serve.
• Since the player is more conservative, they tend to
win a smaller percentage of points on secondserves than on successful first-serves.
• On the 2011 Association of Tennis Professionals (ATP)
tour, Roger Federer made 63% of his first-serves. When
he made his first-serve, he won 78% of points. When he
missed his first-serve, he only won 57% of points.
• Using probability notation:
• Because the probability of winning a point changes based
on the outcome of the first-serve, the outcome of the
point is not independent of the outcome of the firstserve.
• This information can also be expressed in a
tree diagram.
• To do this:
– Show the outcome of the first-serve as one set of
“branches” and the outcome of the point with a
second set of “branches”.
– Include the probability of each branch.
– Label the outcomes at the end of the branches.
– Note: The probabilities that go on the second set
of branches are conditional probabilities because
the outcome of the point depends on the
outcome of the first-serve.
• Let’s make a tree diagram.
• What is the probability Federer makes the firstserve and wins the point?
• The previous calculation was an example of
the general multiplication rule, which is used
to find the probability that two events both
occur.
• The general multiplication rule says that for
any two events A and B:
• Find the remaining probabilities.
• Now we can replace the “outcome” section of
the tree diagram with the probabilities.
• When Federer is serving, what is the probability
that he wins the point?
Reversing the Conditioning
• Let’s say you are watching Federer serve.
Brennan Huff sends you a text message and you
get distracted. You look back in time to see that
Federer won a point. How likely is it that he
made his first-serve? In other words, what is the
probability that he made the first-serve, given
that he wins a point?
• To find the probability, we have to work in
reverse.
• Use our conditional probability formula:
Random Variables and Expected Value
• One of the most exciting times for sports fans
is a game 7 in a playoff series.
• Unfortunately, not all best-of-seven series
make it to a 7th game. Instead, one team
might win the series in 4, 5, or 6 games.
• In 2003, a New York Times article suggested
that in baseball, a 7-game World Series is
unusually common. Is this true?
• A random variable takes on numerical values
that describe the outcomes of a chance
process.
• Let’s define the random variable X as the
number of games played in a randomly
selected World Series.
• A probability distribution lists the possible values of a
random variable and how likely they are to occur.
• The table below uses the results of the World Series from
1945 to 2010 to estimate the probability distribution of X.
This probability distribution lists the possible number of
games and how often those values occurred.
• It is also possible to display the probability
distribution using a graph, such as a histogram.
The Mean (Expected Value) of a Random Variable
• On average, how many games does a World Series
last? In other words, what is the mean of the random
variable X?
• One way to estimate the mean value of X is to locate
the balancing point of the histogram displaying the
probability distribution of X.
• Finding the balancing point can be done a few
ways.
• We can find the average as we did in Chapter 4.
• Needless to say this could be a bit tedious. There
is a more efficient way this can be done.
• We know how many times each value occurs, so we
can rewrite the numerator.
• Now, rewrite the fraction as four separate fractions.
• Finally, rearrange each fraction to reveal a helpful
pattern.
• Each term of the sum has two factors:
– The numbers in front of the parentheses are the
possible values of the random variable X.
– The numbers in the parentheses are the
corresponding probabilities.
• In general, for a random variable X, the mean value
of X (also called the expected value of X) can be
found by multiplying each value of X by its
probability and then adding together the products.
– The sigma symbol means “add them up”.
– E(X) represents the expected value of X.
– This is saying that the mean value of X is equal to the
expected value of X, which is equal to the sum of the X
values times their probabilities.
• The expected value of X is 5.86 games. How
do we interpret this value?
– If we were to randomly select World Series
over and over, the average number of
games in the selected Series would be
about 5.86.
Ex. 2: Hole #13 at the Augusta National golf course is
one of the most famous holes in golf. Lined with the
course’s signature azaleas, this hole is also a favorite
of players for its relative ease. The hole is a par 5,
meaning that professional golfers would be expected
to complete the hole in 5 strokes. Let X = the score
on hole #13 for a randomly selected golfer on day 1
of the 2011 Masters. The probability distribution of
X is shown in the table on the next slide.
1) Calculate the expected value of X.
2) Interpret the expected value of X.
If we randomly select golfers over and over on day 1
of the 2011 Masters, their average score on hole
#13 would be about 4.627.
Expected Values and Strategy in Sports
• On April 15, 1947, Jackie Robinson, of the
Brooklyn Dodgers, became the first black
player in MLB since the 1880’s (Moses
Fleetwood Walker played for the Toledo Blue
Stockings of the American Association).
• He had many career accomplishments,
including Rookie of the Year in 1947, NL MVP
in 1949, and he played in six World Series.
• Robinson was extremely aggressive on the bases.
He stole home 19 times in his career (an MLB
record).
• However, he was caught attempting to steal home
11 times.
• While sometimes he provided an additional run,
other times he cost his team potential runs.
• The question becomes, overall, was Robinson’s
aggressive base running a good strategy?
• One way to evaluate the value of stealing
home is by examining run expectancy for
various combinations of base runners and
outs.
• In baseball, a team’s run expectancy (expected
number of runs scored) in a particular
situation is the average number of additional
runs that the team would score if they could
keep playing in that context over and over.
• Based on data from Robinson’s playing years, when there
was a runner on third with 2 outs, teams could expect to
score an additional 0.36 runs that inning.
• If a runner on third could steal home, his team would
score 1 run. This represents a “gain” of 0.64 runs, because
1 actual run is 0.64 more than 0.36 potential runs.
• Additionally, if the steal was successful, the inning would
steal be alive, still with 2 outs, but now no runners on
base.
• With 2 outs and no one on base, teams could expect to
score an additional 0.10 runs.
• To recap: A successful steal of home with a
runner on third and 2 outs gives a team 1.10
expected runs compared to the 0.36 expected
runs if the runner did not try to steal home.
• With 2 outs and a runner on third, an
unsuccessful steal reduces run expectancy
from 0.36 to 0.
• Let’s now look at expected value of this
situation.
• Suppose that a base runner in this context has an
80% chance of successfully stealing home.
• Let X = run expectancy when attempting to steal
home.
• There are then two possible values for X:
– x=1.10 and x=0, with corresponding probabilities of
0.80 and 0.20.
• This means that if a team has a runner on
third with two outs and the runner has an
80% chance of successfully stealing home, the
team would score 0.88 runs, on average if
they followed this strategy in many, many
innings.
• Because the expected number of runs is
greater than 0.36, attempting to steal home in
this circumstance is a good strategy.
• What if the base runner only had a 50%
chance of successfully stealing home?
• Because the expected number of runs is still
greater than 0.36, attempting to steal home in
this circumstance is still a good strategy.
• When would attempting to steal home become a
bad strategy?
• In other words, for what probabilities of success
will the expected number of runs be less than
0.36?
• Here is the probability distribution of X, with p
representing the probability of success and (1-p)
representing the probability of failure.
• To find out if stealing is a good strategy, we
want to know what value of p results in an
expected value greater than 0.36.
• If the base runner has at least a 32.7% chance
of successfully stealing home with a runner on
third and 2 outs, then the expected change in
run expectancy is greater than 0.36.
• Thus, if a base runner has a greater than
32.7% chance of stealing home, then
attempting to steal home is a good strategy.
• If the base runner has a less than 32.7%
chance, then attempting to steal home is a
bad strategy.
• So how did Robinson PERFORM with 2 outs
and a runner at third?
• He was successful in 7 of his 14 attempts
(50%).
• Because 50% is greater than 32.7%,
attempting to steal home was a good strategy
for Robinson.
End of Game Strategy: Win Probability
• Another useful concept in evaluating strategy in
sports in win probability.
• A team’s win probability measures the proportion
of games a team would win if they could replay
the game over and over again in the same
context.
• Using historical data, it is possible to estimate the
probability that a team will win a game based on
the context of the game at the time.
• Example: A baseball team playing at home, down
by 1 run, with runners at second and third with 1
out in the bottom of the 9th has a 54.0% chance of
winning the game. However, if the next hitter
strikes out, leaving the runners in the same
position with 2 outs, the win probability goes
down to 24.7%.
• The crucial strikeout reduced the win probability
by 29.3%.
• Many websites show up-to-the-minute win
probabilities.
• Here is an example from
www.live.advancednflstats.com.
• Let’s now return to the Patriots-Colts example
from the beginning of the chapter.
• To recap:
– New England had the football at their own 28 yard line
with 2:08 left on the clock, and they led 34-28. It was
4th down and 2. New England had no time outs left,
and the Colts had 1 time out left.
– The conventional move would be to punt and play
defense. However, if they go for it and pick up 2 yards
they will essentially win the game. If they go for it and
don’t pick up the 2 yards, Manning will have a good
chance to throw for the game winning touchdown, as
he was on a roll in the second half.
– If you were Bill Belichick, what would you do?
• Historically, when teams go for it on 4th down with 2
yards to go, they successfully gain the 2 yards 60% of
the time.
– If the Patriots get the 2 yards, their win probability is 100%.
– If the Patriots don’t get the 2 yards, their win probability is
47%.
• The other option would be to punt. This
would have given the Patriots a win
probability of about 70%.
• If the Patriots go for it, there are two ways they
can win the game:
– Get the necessary 2 yards.
– Fail to get the 2 yards but prevent the Colts from
scoring a TD.
• If the Patriots punt, they can win the game by
preventing the Colts from scoring.
• Going for the 4th and 2 results in a win
probability of 0.788, as opposed to punting
which results in a win probability of 0.70.
• Therefore, going for it on 4th down would be a
better strategy, statistically speaking.
• Unfortunately for Belichick, he went for it and
the Patriots did not get the first down (it sure
was close though!). They consequently lost
the game.
• The play
• Sports Nation debate
• Just because the Patriots did not get the 1st
down doesn’t mean Belichick’s decision was
wrong.
• Win probability tells us that for if they were
able to replay this context 1000 times (for
example), the Patriots would win about 788
times and the Colts would win about 212
times.
Download