X - jmullenkhs

advertisement

+

Chapter 6: Random Variables

Section 6.1

Discrete and Continuous Random Variables

The Practice of Statistics, 4 th edition – For AP*

STARNES, YATES, MOORE

+

Chapter 6

Random Variables

6.1

Discrete and Continuous Random Variables

 6.2

Transforming and Combining Random Variables

 6.3

Binomial and Geometric Random Variables

+

Section 6.1

Discrete and Continuous Random Variables

Learning Objectives

After this section, you should be able to…

 APPLY the concept of discrete random variables to a variety of statistical settings

 CALCULATE and INTERPRET the mean (expected value) of a discrete random variable

 CALCULATE and INTERPRET the standard deviation (and variance) of a discrete random variable

 DESCRIBE continuous random variables

 Greedy Pig

How can we use probability to make and justify decisions?

The Game: The match consists of 5 rounds. Each round consists of a number of games. Before each game, each person decides whether to stop (and retain the points earned in that round) or continue to play (and win more points, or lost all points gained in the round, depending on the outcome.

•All players get the first two throws for free. A die is tossed twice. The points are added together.

•All players stand up.

•Each player has two options: quit, sit down, and keep the score that is the total of the two dice for

Round 1; or continue to play and remain standing.

2

3

Round

1

4

5

TOTAL

Points

 Greedy Pig

How can we use probability to make and justify decisions?

The Game: The match consists of 5 rounds. Each round consists of a number of games. Before each game, each person decides whether to stop (and retain the points earned in that round) or continue to play (and win more points, or lost all points gained in the round, depending on the outcome.

•If players continue to play, they remain standing. The die is tossed. If the number 1, 3, 4, 5, or 6 is rolled, the player adds this number to his or her total for that round. But if the number is 2, the students that are still standing lose all points for that round, and record a score of 0 for that round.

•This continues until all players have sat down, or a 2 is rolled.

Then that round is over.

2

3

Round

1

Points

4

5

TOTAL

The object of the game is to determine a strategy that in the long term will maximize the total points.

 Greedy Pig

•What strategies did you use to decide when to stop?

•Choose a strategy to use to play again. Choose this strategy and stick to it. Play the game again. How did this strategy work for you?

•Record your results in a back-to-back stem-and-leaf plot.

Compare the results.

 Bottled Water vs. Tap Water

1) Look at your index card with your station written on it.

2) Go to the corresponding station. Pick up three cups (one each A,

B, and C) and take them back to your seat.

3) Your task is to determine which one of the three cups contains the bottled water. Drink all the water in Cup A first, then the water in

Cup B, and finally the water in Cup C. Write down the letter of the cup that you think held the bottled water. Do not discuss your results with any of your classmates.

4) When Ms. Raskin tells you to do so, go to the board and record your station number and the letter of the cup you identified as containing bottled water.

Station Bottled Water Cup?

Truth

 Bottled Water vs. Tap Water

5) Let’s assume that no one in the class can distinguish tap water from bottled water. In that case, students would just be guessing which cup of water tastes different. If so, what’s the probability that an individual student would guess correctly?

6) How many correct identifications would you need to see to be convinced that the students in your class aren’t just guessing?

Choose a partner and design a simulation to answer this question.

What do you conclude about your class’s ability to distinguish tap water from bottled water?

When my Nerd Camp class did this activity, 13 out of the 21 teachers made the correct identifications. If you assume that the teachers can’t tell bottled water from tap water, you would assume that 7 teachers would guess correctly (1/3 of the total). How likely is it that 13 of the 21 teachers would guess correctly? To answer this, we will need a different kind of probability model than what we have been using.

 Bottled Water vs. Tap Water

When my Nerd Camp class did this activity, 13 out of the 21 teachers made the correct identifications. If you assume that the teachers can’t tell bottled water from tap water, you would assume that 7 teachers would guess correctly (1/3 of the total). Here is a dotplot showing 100 trials of the simulation to see how often there are 13 or more correct guesses (using RandInt; letting 1 be a correct guess and 2 & 3 be incorrect; looking in groups of 21; in the simulation below, there were 4 trials in which three teachers guessed correctly).

X

X

X

X

X X X

X X X

X X X X

X X X X X

X X X X X

X X X X X X

X X X X X X

X X X X X X

X X X X X X

X X X X X X

X X X X X X

X X X X X X X

X X X X X X X X

X X X X X X X X X

X X X X X X X X X

X X X X X X X X X X X

2 3 4 5 6 7 8 9 10 11 12 13 14

How likely is it that 13 of the 21 teachers would guess correctly? To answer this, we will need a different kind of probability model than what we have been using.

Random Variable and Probability Distribution

A probability model describes the possible outcomes of a chance process and the likelihood that those outcomes will occur.

A numerical variable that describes the outcomes of a chance process is called a random variable . The probability model for a random variable is its probability distribution

Definition:

A random variable takes numerical values that describe the outcomes of some chance process. The probability distribution of a random variable gives its possible values and their probabilities.

Example: Consider tossing a fair coin 3 times.

Define X = the number of heads obtained

X = 0: TTT

X = 1: HTT THT TTH

X = 2: HHT HTH THH

X = 3: HHH

Value 0 1 2 3

Probability 1/8 3/8 3/8 1/8

What’s the probability that you will get at least one

“heads” in three tosses?

How would you interpret that probability?

Discrete Random Variables

There are two main types of random variables: discrete and continuous . If we can find a way to list all possible outcomes for a random variable and assign probabilities to each one, we have a discrete random variable .

Shoe size

(discrete)

Foot length

(continuous)

Discrete Random Variables

There are two main types of random variables: discrete and continuous . If we can find a way to list all possible outcomes for a random variable and assign probabilities to each one, we have a discrete random variable .

Discrete Random Variables and Their Probability Distributions

A discrete random variable X takes a fixed set of possible values with gaps between. The probability distribution of a discrete random variable

X lists the values x i and their probabilities p i

:

Value : x

1

Probability : p

1 x p

2

2 x p

3

3

The probabilities p i must satisfy two requirements:

1. Every probability p i is a number between 0 and 1.

2. The sum of the probabilities is 1.

To find the probability of any event, add the probabilities p i values x i that make up the event.

of the particular

Example: Babies’ Health at Birth (Apgar Scores)

In 1952, Dr. Virginia Apgar suggested five criteria for measuring a baby’s health at birth: skin color, heart rate, muscle tone, breathing, and response to stimuli.

She developed a 0 – 1 – 2 scale to rate a newborn on each of the five criteria.

A baby’s Apgar score is the sum of the ratings on the five scales, which gives a whole-number value from 0 to 10.

What Apgar scores are typical? To find out, researchers recorded the Apgar scores of over 2 million newborn babies in a single year. Imagine selecting one of these newborns at random. (That’s our chance process.)

Define X as the Apgar score of a randomly-selected baby one minute after birth.

The table on the next slide gives the probability description for X.

Example: Babies’ Health at Birth (Apgar Scores)

(a)

Show that the probability distribution for X is legitimate.

(b)

Make a histogram of the probability distribution. Describe what you see.

(c)

Apgar scores of 7 or higher indicate a healthy baby. What is P( X

≥ 7)?

Value: 0 1 2 3 4 5 6 7 8 9 10

Probability: 0.001

0.006

0.007

0.008

0.012

0.020

0.038

0.099

0.319

0.437

0.053

(a) All probabilities are between 0 and 1 and they add up to 1.

This is a legitimate probability distribution.

(c) P ( X ≥ 7) = .908

We’d have a 91 % chance of randomly choosing a healthy baby.

Notice the difference between > and ≥.

With discrete

(b) The left-skewed shape of the distribution suggests a randomly selected newborn will have an Apgar score at the high end of the scale. variables, these are different but not with

There is a small chance of getting a baby with a score of 5 or lower.

continuous variables.

Example: NHL Goals

(a)

Show that the probability distribution for X is legitimate.

(b)

Make a histogram of the probability distribution. Describe what you see.

(c)

What is the probability that the number of goals scored by a randomlyselected game is at least 6?

Goals: 0 1 2 3 4 5 6 7 8 9

Probability: 0.061

0.154

0.228

0.229

0.173

0.094

0.041

0.015

0.004

0.001

(a) All probabilities are between 0 and 1 and they add up to 1.

This is a legitimate probability distribution.

(c) P ( X ≥ 6) = .041 +

.015 + .004 + .001 =

0.061.

We’d have a 6.1 % chance that the number of goals scored is at least 6.

(b) The histogram is right-skewed, which means that the majority of games are relatively low scoring. It is pretty unusual for a team to score

6 or more goals.

 CHECK YOUR UNDERSTANDING

North Carolina State University posts the grade distribution for its courses online. Students in Statistics 101 in a recent semester received

26% As, 42% Bs, 20% Cs, 10% Ds, and 2% Fs. Choose a Statistics student at random. The student’s grade on a four-point scale is a discrete random variable X with this probability distribution:

Value of X 0

Probability 0.02

1

0.10

2

0.20

3

0.42

4

0.26

1) Say in words what the meaning of P(X ≥ 3) is. What is this probability?

2) Write the event “the student got a grade worse than C” in terms of values of the random variable X. What is the probability of this event?

3) Sketch a graph of the probability distribution. Describe what you see.

 CHECK YOUR UNDERSTANDING

North Carolina State University posts the grade distribution for its courses online. Students in Statistics 101 in a recent semester received

26% As, 42% Bs, 20% Cs, 10% Ds, and 2% Fs. Choose a Statistics student at random. The student’s grade on a four-point scale is a discrete random variable X with this probability distribution:

Value of X 0

Probability 0.02

1

0.10

2

0.20

3

0.42

4

0.26

1) Say in words what the meaning of P(X ≥ 3) is. What is this probability?

The probability that the student gets either an A or a B is 0.68.

2) Write the event “the student got a grade worse than C” in terms of values of the random variable X. What is the probability of this event?

P(X < 2) = 0.12

3) Sketch a graph of the probability distribution. Describe what you see.

The histogram is left-skewed. Higher grades are more likely, but there are a few lower grades.

Mean of a Discrete Random Variable

When analyzing discrete random variables, we’ll follow the same strategy we used with quantitative data – describe the shape, center, and spread, and identify any outliers.

The mean of any discrete random variable is an average of the possible outcomes, with each outcome weighted by its probability.

Definition:

Suppose that X is a discrete random variable whose probability distribution is

Value : x

1

Probability : p

1 x p

2

2 x p

3

3

To find the mean (expected value) of X , multiply each possible value by its probability, then add all the products:

 x

 E ( X )  x

1 p

1

 x

2

 x i p i p

2

 x

3 p

3

 ...





Example: Apgar Scores – What’s Typical?

Consider the random variable X = Apgar Score

Compute the mean of the random variable X and interpret it in context.

Value: 0 1 2 3 4 5 6 7 8 9 10

Probability: 0.001

0.006

0.007

0.008

0.012

0.020

0.038

0.099

0.319

0.437

0.053

 x

 E ( X ) 

 x i p i

 (0)(0.001)  (1)(0.006)  (2)(0.007)  ...

 (10)(0.053)

 8.128

The mean Apgar score of a randomly selected newborn is 8.128. This is the long-

 term average Agar score of many, many randomly chosen babies.



Note: The expected value does not need to be a possible value of X or an integer!

It is a long-term average over many repetitions.



Example: Winning (and Losing) at Roulette

Consider the random variable X = net gain from a single $1 bet on red.

Compute the mean of the random variable X and interpret it in context.

Value: -1 1

Probability: 20/38 18/38

 x

 E ( X ) 

 x i p i

 (  $1)(20 /38)  ($1)(18 /38)

  $0.05



In the long run, the player loses (and the casino gains) five cents per bet.



The ordinary average of 1 and 1 is 0, but $0 isn’t the average winnings because the player is less likely to win $1 than to lose $1.



Example: More Roulette

Another wager players can make in roulette is called a “corner bet.” To make this bet, a player places his chips on the intersection of four numbered squares on the roulette table. If one of these numbers comes up on the wheel and the player bet $1, the player gets his $1 back plus $8 more. Otherwise, the casino keeps the original $1 bet.

Consider the random variable X = net gain from a single $1 corner bet

Compute the mean of the random variable X and interpret it in context.

Value: -1 8

 x i p i

Probability: 34/38 4/38

 E ( X ) 

 x

 (  $1)(34 /38)  ($8)(4 /38)

  $0.05

If a player were to make $1 corner bets many, many times, the average gain would



 be about -$0.05 per bet.



Example: NHL Goals

Consider the random variable X = Number of goals scored by a randomlyselected team in a randomly-selected game

Goals: 0 1 2 3 4 5 6 7 8 9

Probability: 0.061

0.154

0.228

0.229

0.173

0.094

0.041

0.015

0.004

0.001

 x

 E ( X ) 

 x i p i

 (0)(0.061)  (1)(0.154)  ...

 (9)(0.001)

 2.851

The number of goals for a randomly-selected team in a randomly-selected game is



 the mean number of goals scored would be about 2.851 in the long run.

 AP ERROR ALERT!

Many students incorrectly believe that the expected value of a random variable must be equal to one of the possible values of the variable.

This is not the case. In the roulette examples, the expected value was

-$0.05, even though this was not a possible gain from a $1 single bet.

In the Apgar example, the average score 8.128, is not a possible value of the random variable. If you think of the mean as a long-run average over many repetitions, these facts make sense.



Standard Deviation of a Discrete Random Variable

Since we use the mean as the measure of center for a discrete random variable, we’ll use the standard deviation as our measure of spread. The definition of the variance of a random variable is similar to the definition of the variance for a set of quantitative data.

Definition:

Suppose that X is a discrete random variable whose probability distribution is

Value : x

1

Probability : p

1 x

2 p

2 x

3 p

3

… and that µ

X is the mean of X . The variance of X is

Var ( X )   2

X

 ( x

 ( x

1 i





X

) 2

X p i

) 2 p

1

 ( x

2



X

) 2 p

2

 ( x

3



X

) 2 formula sheet of the AP exam.

p

3

 ...

This formula is given to you on the

To get the standard deviation of a random variable , take the square root of the variance.









Example: Apgar Scores – How Variable Are They?

Consider the random variable X = Apgar Score

Compute the standard deviation of the random variable X and interpret it in context. Recall that the mean Apgar score was 8.128.

Value: 0 1 2 3 4 5 6 7 8 9 10

Probability: 0.001

0.006

0.007

0.008

0.012

0.020

0.038

0.099

0.319

0.437

0.053

 2

X

( x i



X

) 2 p i

 (0  8.128) 2 (0.001)  (1  8.128) 2 (0.006)  ...

 (10  8.128) 2 (0.053)

 2.066

Variance

X

 2.066

 1.437

The standard deviation of X is 1.437. On average, a randomly selected baby’s

Apgar score will differ from the mean 8.128 by about 1.4 units.

Technology:

Analyzing Random Variables on the

Calculator

1) Entering the values of the random variable in L1 and the corresponding probabilities in L2. (Practice by using the Apgar

Scores)

2) To graph a histogram of the probability distribution:

1) Set up a statistics plot with Xlist: L1 and Freq: L2

2) Adjust your window settings to:

1) Xmin = -1

2) Xmax = 11 When you are done

3) Xscl = 1 with the histogram,

4) Ymin = -0.1

remember to reset Freq

5) Ymax = 0.5

back to 1.

6) Yscl = 0.1

3) Press GRAPH (  F3 on TI-89)

3) To calculate the mean and standard deviation of the random variable, use one-variable statistics with the values in L1 and the probabilities in L2:

1) TI-83/84: Execute the command 1-Var Stats L1, L2

2) TI-89: In the Statistics/List Editor, press F4 (Calc) and choose 1:

1-Var Stats. Use the inputs List: list1 and Freq: list 2

Technology:

Analyzing Random Variables on the

Calculator

Technology:

Analyzing Random Variables on the

Calculator









Example: NHL Goals

Consider the random variable X = goals scored

Compute the standard deviation of the random variable X and interpret it in context. Recall that the mean goals scored was 2.851.

Goals 0 1 2 3 4 5 6 7 8 9

Probability: 0.061

0.154

0.228

0.229

0.173

0.094

0.041

0.015

0.004

0.001

 2

X

( x i



X

) 2 p i

 (0  2.851) 2 (0.061)  (1  2.851) 2 (0.154)  ...

 (9  1.54) 2 (0.001)

 2.66

X

 2.66

 1.63

The standard deviation of X is 1.63. On average, a randomlyselected team’s number of goals in a randomly-selected game will differ from the mean by about

1.63 goals.

 CHECK YOUR UNDERSTANDING

A large auto dealership keeps track of sales made during each hour of the day. Let X = the number of cars sold during the first hour our of business on a randomly-selected Friday. Based on previous records, the probability distribution of X is as follows:

Cars Sold

Probability

0

0.3

1

0.4

2

0.2

3

0.1

1) Compute and interpret the meaning of X.

2) Compute and interpret the standard deviation of X.

 CHECK YOUR UNDERSTANDING

A large auto dealership keeps track of sales made during each hour of the day. Let X = the number of cars sold during the first hour our of business on a randomly-selected Friday. Based on previous records, the probability distribution of X is as follows:

Cars Sold

Probability

0

0.3

1

0.4

2

0.2

3

0.1

1) Compute and interpret the meaning of X.

μ x

= 1.1. The long-run average, over many Friday mornings, will be about 1.1 cars sold.

2) Compute and interpret the standard deviation of X.

σ x

= 0.943. On average, the number of cars sold on a randomly-selected Friday will differ from the mean (1.1) by about 0.943 cars sold.

 AP ERROR ALERT!

In many cases where you are asked to calculate the mean (expected value) or standard deviation of a random variable, you will lose credit for not showing adequate work – or for not showing work at all. To get full credit, you need to go beyond just reporting your commands used on your graphing calculator. Show the first couple of terms with an ellipsis

(. . .) for the calculation of the mean or standard deviation.

Continuous Random Variables

Discrete random variables commonly arise from situations that involve counting something. Situations that involve measuring something often result in a continuous random variable .

Definition:

A continuous random variable X takes on all values in an interval of numbers. The probability distribution of X is described by a density curve . The probability of any event is the area under the density curve and above the values of X that make up the event.

The probability model of a discrete random variable X assigns a probability between 0 and 1 to each possible value of X .

A continuous random variable Y has infinitely many possible values.

All continuous probability models assign probability 0 to every individual outcome. Only intervals of values have positive probability.

Continuous Random Variables

Calculator Commands:

The calculator command rand will generate a random number from 0 to 1.

The command 2rand will generate a random number from 0 to

2.

To generate a random number from 1 to 2, use the command rand+1.

You can combine these commands to produce other intervals.

Continuous Random Variables

Random Numbers:

The random number generator will spread its output uniformly across the entire interval from 0 to 1 as we allow it to generate a long sequence of random numbers. The results of many trials are represented by the density curve of a uniform distribution . This density curve appears at right. It has height

1 over the interval from 0 to 1.

The area under the density curve is 1, and the probability of any event is the area under the density curve and above the event in question.

Remember that each individual probability is 0.

Example: Young Women’s Heights

The heights of young women closely follow the Normal distribution with mean =

64 inches and standard deviation = 2.7 inches. This is a distribution for a large set of data. Choose one young woman at random. Call her height Y. If we repeat the random choice many, many times, the distribution of values of Y is the same Normal distribution that describes the heights of all young women.

Find the probability that the chosen woman is between 68 and 70 inches tall.

Example: Young Women’s Heights

Define Y as the height of a randomly chosen young woman. Y is a continuous random variable whose probability distribution is N (64, 2.7).

What is the probability that a randomly chosen young woman has height between 68 and 70 inches?

P

(68 ≤ Y ≤ 70) = ???

z 

68 

2.7

 1.48

64 z 

70 

2.7

 2.22

64



P (1.48 ≤ Z ≤ 2.22) = P ( Z ≤ 2.22) – P ( Z ≤ 1.48)

 – 0.9306

= 0.0562

There is about a 5.6% chance that a randomly chosen young woman has a height between 68 and 70 inches.

Example: Young Women’s Heights – Using a

Calculator

2 nd VARS (gives you DISTR)

Use normalcdf(68, 70, 64, 2.7)

Example: Weights of 3-yr.-old Females

Define Y as the weight of a randomly-chosen 3-yr.-old female. Y is a continuous random variable whose probability distribution is N (30.7, 3.6).

What is the probability that a randomly chosen 3-yr.-old female weighs at least 30 pounds?

P(Y

≥ 30) = ???

z 

30  30.7

3.6

  0.19



P ( Z ≥ -0.19) = 1 – P( Z < -0.19)

= 1 – 0.4247

= 0.5753

There is about a 58% chance that the randomly selected 3-yr.-old female will weigh at least 30 pounds.

 AP ERROR ALERT!

When you solve problems involving random variables, start by defining the random variable of interest. For example, let X = the Apgar score of a randomly-selected baby or let Y = the height of a randomly-selected young woman. Then state the probability you’re trying to find in terms of the random variable P (68 ≤ Y ≤ 70) or P ( X ≥ 7).

+

Section 6.1

Discrete and Continuous Random Variables

Summary

In this section, we learned that…

 A random variable is a variable taking numerical values determined by the outcome of a chance process. The probability distribution of a random variable X tells us what the possible values of X are and how probabilities are assigned to those values.

 A discrete random variable has a fixed set of possible values with gaps between them. The probability distribution assigns each of these values a probability between 0 and 1 such that the sum of all the probabilities is exactly 1.

 A continuous random variable takes all values in some interval of numbers. A density curve describes the probability distribution of a continuous random variable.



+

Section 6.1

Discrete and Continuous Random Variables

Summary

In this section, we learned that…

 The mean of a random variable is the long-run average value of the variable after many repetitions of the chance process. It is also known as the expected value of the random variable.

 The expected value of a discrete random variable X is

 x

 x i p i

 x

1 p

1

 x

2 p

2

 x

3 p

3

 ...

 The variance of a random variable is the average squared deviation of the values of the variable from their mean. The standard deviation is the square root of the variance. For a discrete random

 variable X ,

2

X

( x i



X

) 2 p i

 ( x

1



X

) 2 p

1

 ( x

2



X

) 2 p

2

 ( x

3



X

) 2 p

3

 ...

+

Looking Ahead…

In the next Section…

We’ll learn how to determine the mean and standard deviation when we transform or combine random variables.

We’ll learn about

Linear Transformations

Combining Random Variables

Combining Normal Random Variables

Download