+
Section 6.1
Discrete and Continuous Random Variables
The Practice of Statistics, 4 th edition – For AP*
STARNES, YATES, MOORE
+
6.2
6.3
+
Section 6.1
Discrete and Continuous Random Variables
Learning Objectives
After this section, you should be able to…
APPLY the concept of discrete random variables to a variety of statistical settings
CALCULATE and INTERPRET the mean (expected value) of a discrete random variable
CALCULATE and INTERPRET the standard deviation (and variance) of a discrete random variable
DESCRIBE continuous random variables
Greedy Pig
How can we use probability to make and justify decisions?
The Game: The match consists of 5 rounds. Each round consists of a number of games. Before each game, each person decides whether to stop (and retain the points earned in that round) or continue to play (and win more points, or lost all points gained in the round, depending on the outcome.
•All players get the first two throws for free. A die is tossed twice. The points are added together.
•All players stand up.
•Each player has two options: quit, sit down, and keep the score that is the total of the two dice for
Round 1; or continue to play and remain standing.
2
3
Round
1
4
5
TOTAL
Points
Greedy Pig
How can we use probability to make and justify decisions?
The Game: The match consists of 5 rounds. Each round consists of a number of games. Before each game, each person decides whether to stop (and retain the points earned in that round) or continue to play (and win more points, or lost all points gained in the round, depending on the outcome.
•If players continue to play, they remain standing. The die is tossed. If the number 1, 3, 4, 5, or 6 is rolled, the player adds this number to his or her total for that round. But if the number is 2, the students that are still standing lose all points for that round, and record a score of 0 for that round.
•This continues until all players have sat down, or a 2 is rolled.
Then that round is over.
2
3
Round
1
Points
4
5
TOTAL
The object of the game is to determine a strategy that in the long term will maximize the total points.
Greedy Pig
•What strategies did you use to decide when to stop?
•Choose a strategy to use to play again. Choose this strategy and stick to it. Play the game again. How did this strategy work for you?
•Record your results in a back-to-back stem-and-leaf plot.
Compare the results.
Bottled Water vs. Tap Water
1) Look at your index card with your station written on it.
2) Go to the corresponding station. Pick up three cups (one each A,
B, and C) and take them back to your seat.
3) Your task is to determine which one of the three cups contains the bottled water. Drink all the water in Cup A first, then the water in
Cup B, and finally the water in Cup C. Write down the letter of the cup that you think held the bottled water. Do not discuss your results with any of your classmates.
4) When Ms. Raskin tells you to do so, go to the board and record your station number and the letter of the cup you identified as containing bottled water.
Station Bottled Water Cup?
Truth
Bottled Water vs. Tap Water
5) Let’s assume that no one in the class can distinguish tap water from bottled water. In that case, students would just be guessing which cup of water tastes different. If so, what’s the probability that an individual student would guess correctly?
6) How many correct identifications would you need to see to be convinced that the students in your class aren’t just guessing?
Choose a partner and design a simulation to answer this question.
What do you conclude about your class’s ability to distinguish tap water from bottled water?
When my Nerd Camp class did this activity, 13 out of the 21 teachers made the correct identifications. If you assume that the teachers can’t tell bottled water from tap water, you would assume that 7 teachers would guess correctly (1/3 of the total). How likely is it that 13 of the 21 teachers would guess correctly? To answer this, we will need a different kind of probability model than what we have been using.
Bottled Water vs. Tap Water
When my Nerd Camp class did this activity, 13 out of the 21 teachers made the correct identifications. If you assume that the teachers can’t tell bottled water from tap water, you would assume that 7 teachers would guess correctly (1/3 of the total). Here is a dotplot showing 100 trials of the simulation to see how often there are 13 or more correct guesses (using RandInt; letting 1 be a correct guess and 2 & 3 be incorrect; looking in groups of 21; in the simulation below, there were 4 trials in which three teachers guessed correctly).
X
X
X
X
X X X
X X X
X X X X
X X X X X
X X X X X
X X X X X X
X X X X X X
X X X X X X
X X X X X X
X X X X X X
X X X X X X
X X X X X X X
X X X X X X X X
X X X X X X X X X
X X X X X X X X X
X X X X X X X X X X X
2 3 4 5 6 7 8 9 10 11 12 13 14
How likely is it that 13 of the 21 teachers would guess correctly? To answer this, we will need a different kind of probability model than what we have been using.
A probability model describes the possible outcomes of a chance process and the likelihood that those outcomes will occur.
A numerical variable that describes the outcomes of a chance process is called a random variable . The probability model for a random variable is its probability distribution
Definition:
A random variable takes numerical values that describe the outcomes of some chance process. The probability distribution of a random variable gives its possible values and their probabilities.
Example: Consider tossing a fair coin 3 times.
Define X = the number of heads obtained
X = 0: TTT
X = 1: HTT THT TTH
X = 2: HHT HTH THH
X = 3: HHH
Value 0 1 2 3
Probability 1/8 3/8 3/8 1/8
What’s the probability that you will get at least one
“heads” in three tosses?
How would you interpret that probability?
There are two main types of random variables: discrete and continuous . If we can find a way to list all possible outcomes for a random variable and assign probabilities to each one, we have a discrete random variable .
Shoe size
(discrete)
Foot length
(continuous)
There are two main types of random variables: discrete and continuous . If we can find a way to list all possible outcomes for a random variable and assign probabilities to each one, we have a discrete random variable .
Discrete Random Variables and Their Probability Distributions
A discrete random variable X takes a fixed set of possible values with gaps between. The probability distribution of a discrete random variable
X lists the values x i and their probabilities p i
:
Value : x
1
Probability : p
1 x p
2
2 x p
3
3
…
…
The probabilities p i must satisfy two requirements:
1. Every probability p i is a number between 0 and 1.
2. The sum of the probabilities is 1.
To find the probability of any event, add the probabilities p i values x i that make up the event.
of the particular
In 1952, Dr. Virginia Apgar suggested five criteria for measuring a baby’s health at birth: skin color, heart rate, muscle tone, breathing, and response to stimuli.
She developed a 0 – 1 – 2 scale to rate a newborn on each of the five criteria.
A baby’s Apgar score is the sum of the ratings on the five scales, which gives a whole-number value from 0 to 10.
What Apgar scores are typical? To find out, researchers recorded the Apgar scores of over 2 million newborn babies in a single year. Imagine selecting one of these newborns at random. (That’s our chance process.)
Define X as the Apgar score of a randomly-selected baby one minute after birth.
The table on the next slide gives the probability description for X.
(a)
Show that the probability distribution for X is legitimate.
(b)
Make a histogram of the probability distribution. Describe what you see.
(c)
Apgar scores of 7 or higher indicate a healthy baby. What is P( X
≥ 7)?
Value: 0 1 2 3 4 5 6 7 8 9 10
Probability: 0.001
0.006
0.007
0.008
0.012
0.020
0.038
0.099
0.319
0.437
0.053
(a) All probabilities are between 0 and 1 and they add up to 1.
This is a legitimate probability distribution.
(c) P ( X ≥ 7) = .908
We’d have a 91 % chance of randomly choosing a healthy baby.
Notice the difference between > and ≥.
With discrete
(b) The left-skewed shape of the distribution suggests a randomly selected newborn will have an Apgar score at the high end of the scale. variables, these are different but not with
There is a small chance of getting a baby with a score of 5 or lower.
continuous variables.
(a)
Show that the probability distribution for X is legitimate.
(b)
Make a histogram of the probability distribution. Describe what you see.
(c)
What is the probability that the number of goals scored by a randomlyselected game is at least 6?
Goals: 0 1 2 3 4 5 6 7 8 9
Probability: 0.061
0.154
0.228
0.229
0.173
0.094
0.041
0.015
0.004
0.001
(a) All probabilities are between 0 and 1 and they add up to 1.
This is a legitimate probability distribution.
(c) P ( X ≥ 6) = .041 +
.015 + .004 + .001 =
0.061.
We’d have a 6.1 % chance that the number of goals scored is at least 6.
(b) The histogram is right-skewed, which means that the majority of games are relatively low scoring. It is pretty unusual for a team to score
6 or more goals.
CHECK YOUR UNDERSTANDING
North Carolina State University posts the grade distribution for its courses online. Students in Statistics 101 in a recent semester received
26% As, 42% Bs, 20% Cs, 10% Ds, and 2% Fs. Choose a Statistics student at random. The student’s grade on a four-point scale is a discrete random variable X with this probability distribution:
Value of X 0
Probability 0.02
1
0.10
2
0.20
3
0.42
4
0.26
1) Say in words what the meaning of P(X ≥ 3) is. What is this probability?
2) Write the event “the student got a grade worse than C” in terms of values of the random variable X. What is the probability of this event?
3) Sketch a graph of the probability distribution. Describe what you see.
CHECK YOUR UNDERSTANDING
North Carolina State University posts the grade distribution for its courses online. Students in Statistics 101 in a recent semester received
26% As, 42% Bs, 20% Cs, 10% Ds, and 2% Fs. Choose a Statistics student at random. The student’s grade on a four-point scale is a discrete random variable X with this probability distribution:
Value of X 0
Probability 0.02
1
0.10
2
0.20
3
0.42
4
0.26
1) Say in words what the meaning of P(X ≥ 3) is. What is this probability?
The probability that the student gets either an A or a B is 0.68.
2) Write the event “the student got a grade worse than C” in terms of values of the random variable X. What is the probability of this event?
P(X < 2) = 0.12
3) Sketch a graph of the probability distribution. Describe what you see.
The histogram is left-skewed. Higher grades are more likely, but there are a few lower grades.
When analyzing discrete random variables, we’ll follow the same strategy we used with quantitative data – describe the shape, center, and spread, and identify any outliers.
The mean of any discrete random variable is an average of the possible outcomes, with each outcome weighted by its probability.
Definition:
Suppose that X is a discrete random variable whose probability distribution is
Value : x
1
Probability : p
1 x p
2
2 x p
3
3
…
…
To find the mean (expected value) of X , multiply each possible value by its probability, then add all the products:
x
E ( X ) x
1 p
1
x
2
x i p i p
2
x
3 p
3
...
Consider the random variable X = Apgar Score
Compute the mean of the random variable X and interpret it in context.
Value: 0 1 2 3 4 5 6 7 8 9 10
Probability: 0.001
0.006
0.007
0.008
0.012
0.020
0.038
0.099
0.319
0.437
0.053
x
E ( X )
x i p i
(0)(0.001) (1)(0.006) (2)(0.007) ...
(10)(0.053)
8.128
The mean Apgar score of a randomly selected newborn is 8.128. This is the long-
term average Agar score of many, many randomly chosen babies.
Note: The expected value does not need to be a possible value of X or an integer!
It is a long-term average over many repetitions.
Consider the random variable X = net gain from a single $1 bet on red.
Compute the mean of the random variable X and interpret it in context.
Value: -1 1
Probability: 20/38 18/38
x
E ( X )
x i p i
( $1)(20 /38) ($1)(18 /38)
$0.05
In the long run, the player loses (and the casino gains) five cents per bet.
The ordinary average of 1 and 1 is 0, but $0 isn’t the average winnings because the player is less likely to win $1 than to lose $1.
Another wager players can make in roulette is called a “corner bet.” To make this bet, a player places his chips on the intersection of four numbered squares on the roulette table. If one of these numbers comes up on the wheel and the player bet $1, the player gets his $1 back plus $8 more. Otherwise, the casino keeps the original $1 bet.
Consider the random variable X = net gain from a single $1 corner bet
Compute the mean of the random variable X and interpret it in context.
Value: -1 8
x i p i
Probability: 34/38 4/38
E ( X )
x
( $1)(34 /38) ($8)(4 /38)
$0.05
If a player were to make $1 corner bets many, many times, the average gain would
be about -$0.05 per bet.
Consider the random variable X = Number of goals scored by a randomlyselected team in a randomly-selected game
Goals: 0 1 2 3 4 5 6 7 8 9
Probability: 0.061
0.154
0.228
0.229
0.173
0.094
0.041
0.015
0.004
0.001
x
E ( X )
x i p i
(0)(0.061) (1)(0.154) ...
(9)(0.001)
2.851
The number of goals for a randomly-selected team in a randomly-selected game is
the mean number of goals scored would be about 2.851 in the long run.
AP ERROR ALERT!
Many students incorrectly believe that the expected value of a random variable must be equal to one of the possible values of the variable.
This is not the case. In the roulette examples, the expected value was
-$0.05, even though this was not a possible gain from a $1 single bet.
In the Apgar example, the average score 8.128, is not a possible value of the random variable. If you think of the mean as a long-run average over many repetitions, these facts make sense.
Since we use the mean as the measure of center for a discrete random variable, we’ll use the standard deviation as our measure of spread. The definition of the variance of a random variable is similar to the definition of the variance for a set of quantitative data.
Definition:
Suppose that X is a discrete random variable whose probability distribution is
Value : x
1
Probability : p
1 x
2 p
2 x
3 p
3
…
… and that µ
X is the mean of X . The variance of X is
Var ( X ) 2
X
( x
( x
1 i
X
) 2
X p i
) 2 p
1
( x
2
X
) 2 p
2
( x
3
X
) 2 formula sheet of the AP exam.
p
3
...
This formula is given to you on the
To get the standard deviation of a random variable , take the square root of the variance.
Consider the random variable X = Apgar Score
Compute the standard deviation of the random variable X and interpret it in context. Recall that the mean Apgar score was 8.128.
Value: 0 1 2 3 4 5 6 7 8 9 10
Probability: 0.001
0.006
0.007
0.008
0.012
0.020
0.038
0.099
0.319
0.437
0.053
2
X
( x i
X
) 2 p i
(0 8.128) 2 (0.001) (1 8.128) 2 (0.006) ...
(10 8.128) 2 (0.053)
2.066
Variance
X
2.066
1.437
The standard deviation of X is 1.437. On average, a randomly selected baby’s
Apgar score will differ from the mean 8.128 by about 1.4 units.
Analyzing Random Variables on the
Calculator
1) Entering the values of the random variable in L1 and the corresponding probabilities in L2. (Practice by using the Apgar
Scores)
2) To graph a histogram of the probability distribution:
1) Set up a statistics plot with Xlist: L1 and Freq: L2
2) Adjust your window settings to:
1) Xmin = -1
2) Xmax = 11 When you are done
3) Xscl = 1 with the histogram,
4) Ymin = -0.1
remember to reset Freq
5) Ymax = 0.5
back to 1.
6) Yscl = 0.1
3) Press GRAPH ( F3 on TI-89)
3) To calculate the mean and standard deviation of the random variable, use one-variable statistics with the values in L1 and the probabilities in L2:
1) TI-83/84: Execute the command 1-Var Stats L1, L2
2) TI-89: In the Statistics/List Editor, press F4 (Calc) and choose 1:
1-Var Stats. Use the inputs List: list1 and Freq: list 2
Analyzing Random Variables on the
Calculator
Analyzing Random Variables on the
Calculator
Consider the random variable X = goals scored
Compute the standard deviation of the random variable X and interpret it in context. Recall that the mean goals scored was 2.851.
Goals 0 1 2 3 4 5 6 7 8 9
Probability: 0.061
0.154
0.228
0.229
0.173
0.094
0.041
0.015
0.004
0.001
2
X
( x i
X
) 2 p i
(0 2.851) 2 (0.061) (1 2.851) 2 (0.154) ...
(9 1.54) 2 (0.001)
2.66
X
2.66
1.63
The standard deviation of X is 1.63. On average, a randomlyselected team’s number of goals in a randomly-selected game will differ from the mean by about
1.63 goals.
CHECK YOUR UNDERSTANDING
A large auto dealership keeps track of sales made during each hour of the day. Let X = the number of cars sold during the first hour our of business on a randomly-selected Friday. Based on previous records, the probability distribution of X is as follows:
Cars Sold
Probability
0
0.3
1
0.4
2
0.2
3
0.1
1) Compute and interpret the meaning of X.
2) Compute and interpret the standard deviation of X.
CHECK YOUR UNDERSTANDING
A large auto dealership keeps track of sales made during each hour of the day. Let X = the number of cars sold during the first hour our of business on a randomly-selected Friday. Based on previous records, the probability distribution of X is as follows:
Cars Sold
Probability
0
0.3
1
0.4
2
0.2
3
0.1
1) Compute and interpret the meaning of X.
μ x
= 1.1. The long-run average, over many Friday mornings, will be about 1.1 cars sold.
2) Compute and interpret the standard deviation of X.
σ x
= 0.943. On average, the number of cars sold on a randomly-selected Friday will differ from the mean (1.1) by about 0.943 cars sold.
AP ERROR ALERT!
In many cases where you are asked to calculate the mean (expected value) or standard deviation of a random variable, you will lose credit for not showing adequate work – or for not showing work at all. To get full credit, you need to go beyond just reporting your commands used on your graphing calculator. Show the first couple of terms with an ellipsis
(. . .) for the calculation of the mean or standard deviation.
Discrete random variables commonly arise from situations that involve counting something. Situations that involve measuring something often result in a continuous random variable .
Definition:
A continuous random variable X takes on all values in an interval of numbers. The probability distribution of X is described by a density curve . The probability of any event is the area under the density curve and above the values of X that make up the event.
The probability model of a discrete random variable X assigns a probability between 0 and 1 to each possible value of X .
A continuous random variable Y has infinitely many possible values.
All continuous probability models assign probability 0 to every individual outcome. Only intervals of values have positive probability.
Calculator Commands:
The calculator command rand will generate a random number from 0 to 1.
The command 2rand will generate a random number from 0 to
2.
To generate a random number from 1 to 2, use the command rand+1.
You can combine these commands to produce other intervals.
Random Numbers:
The random number generator will spread its output uniformly across the entire interval from 0 to 1 as we allow it to generate a long sequence of random numbers. The results of many trials are represented by the density curve of a uniform distribution . This density curve appears at right. It has height
1 over the interval from 0 to 1.
The area under the density curve is 1, and the probability of any event is the area under the density curve and above the event in question.
Remember that each individual probability is 0.
The heights of young women closely follow the Normal distribution with mean =
64 inches and standard deviation = 2.7 inches. This is a distribution for a large set of data. Choose one young woman at random. Call her height Y. If we repeat the random choice many, many times, the distribution of values of Y is the same Normal distribution that describes the heights of all young women.
Find the probability that the chosen woman is between 68 and 70 inches tall.
Define Y as the height of a randomly chosen young woman. Y is a continuous random variable whose probability distribution is N (64, 2.7).
What is the probability that a randomly chosen young woman has height between 68 and 70 inches?
P
z
68
2.7
1.48
64 z
70
2.7
2.22
64
P (1.48 ≤ Z ≤ 2.22) = P ( Z ≤ 2.22) – P ( Z ≤ 1.48)
– 0.9306
= 0.0562
There is about a 5.6% chance that a randomly chosen young woman has a height between 68 and 70 inches.
2 nd VARS (gives you DISTR)
Use normalcdf(68, 70, 64, 2.7)
Define Y as the weight of a randomly-chosen 3-yr.-old female. Y is a continuous random variable whose probability distribution is N (30.7, 3.6).
What is the probability that a randomly chosen 3-yr.-old female weighs at least 30 pounds?
P(Y
z
30 30.7
3.6
0.19
P ( Z ≥ -0.19) = 1 – P( Z < -0.19)
= 1 – 0.4247
= 0.5753
There is about a 58% chance that the randomly selected 3-yr.-old female will weigh at least 30 pounds.
AP ERROR ALERT!
When you solve problems involving random variables, start by defining the random variable of interest. For example, let X = the Apgar score of a randomly-selected baby or let Y = the height of a randomly-selected young woman. Then state the probability you’re trying to find in terms of the random variable P (68 ≤ Y ≤ 70) or P ( X ≥ 7).
+
Section 6.1
Discrete and Continuous Random Variables
Summary
In this section, we learned that…
A random variable is a variable taking numerical values determined by the outcome of a chance process. The probability distribution of a random variable X tells us what the possible values of X are and how probabilities are assigned to those values.
A discrete random variable has a fixed set of possible values with gaps between them. The probability distribution assigns each of these values a probability between 0 and 1 such that the sum of all the probabilities is exactly 1.
A continuous random variable takes all values in some interval of numbers. A density curve describes the probability distribution of a continuous random variable.
+
Section 6.1
Discrete and Continuous Random Variables
Summary
In this section, we learned that…
The mean of a random variable is the long-run average value of the variable after many repetitions of the chance process. It is also known as the expected value of the random variable.
The expected value of a discrete random variable X is
x
x i p i
x
1 p
1
x
2 p
2
x
3 p
3
...
The variance of a random variable is the average squared deviation of the values of the variable from their mean. The standard deviation is the square root of the variance. For a discrete random
variable X ,
2
X
( x i
X
) 2 p i
( x
1
X
) 2 p
1
( x
2
X
) 2 p
2
( x
3
X
) 2 p
3
...
+