Monday, February 6: Chapter 16: Random Variables

advertisement
Chapter 16: Random Variables
A ___________________________ is a numerical variable whose value depends on the outcome of a
chance experiment.
A random variable is ____________ if its set of possible values is a collection of isolated points on the
number line.
A random variable is _______________ if its set of possible values includes an entire interval on the
number line.
Example: Shoe size vs. foot length
Example: age vs. number of birthdays
A ________________________________, or ___________________________, of a discrete random
variable X lists the possible values of X and their probabilities. This is usually done in a table,
probability histogram, or formula.
Let X = number of heads in two flips of a coin. Then, the probability distribution of X is:
table:
histogram:
The probabilities in a distribution give information about the _______________ behavior of the
variable. That is, probabilities are __________________________________. For example, if P(X = 2)
= .25, then after many observations of X, the value X = 2 will occur about 25% of the time, on average.
The ______ of a random variable X, denoted  x , describes where the probability distribution of X is
centered.
Let X = number of license plates on a randomly selected car. The probability distribution is:
X
0
1 2
P(X) .05 .25 .7
Thus, to find the mean of a discrete random variable X, multiply each X value times its probability and
add them all together.
 x   x  p ( x)
The term ______________________, E(x) is sometimes used in place of mean value,  x .
The expected (mean) value will not necessarily be equal to one of the values in the distribution. Instead,
we interpret the expected value as a long term average. For example, if we were to select many cars, we
would expect there to be 1.65 license plates per car, on average.
Suppose I ask you to play a betting game with me. You bet $5 and if you draw an ace of hearts, I will
pay you $100. If you get any other ace, I will pay you $10 and if you get any other heart, I will give you
your money back. If you get any other card, I keep your money.
1. Create a probability model for the amount you will win.
2. Should you play this game?
HW #1: Read 368-370 Problems 16.1-16.7 (Note: answer to 5a is wrong)
Chapter 16: SD of a Random Variable
Now that we can find out where the distribution of a random variable is centered, we would also like to
know how far the values are from the center, on average. This, of course, is the ___________________
Using the license plate distribution from yesterday, if we had a sample of 100 cars, we would expect 5
0’s, 25 1’s, and 70 2’s.
Note: We divide by n instead of n-1 and use the symbol  x instead of s x since we are not calculating
the standard deviation of a sample. There is no uncertainty about the values in the distribution or their
probabilities.
Note: It is calculated in much the same way as the SD of a data set, except that we weight each value
with its probability, since the values that show up more often should have more influence in the
calculation.
Thus, the standard deviation of a discrete random variable is:
x 
 x    P  x
2
x
The ______________ of a discrete random variable is the square of the standard deviation:
2
 x2    x   x  P  x 
Find and interpret the SD for the betting game from yesterday’s notes.
In a certain class, there are 15 girls and 10 boys. Suppose that you are randomly selecting 2 students for
a committee. Create a probability model for the number of girls in the sample. Then, calculate and
interpret the mean and SD of the number of girls.
HW #2: Read 370-372, Problems 16.15-21 odd
Chapter 16: Transforming + Combining RV’s
Now that we know how to find the mean and standard deviation of a random variable, we would also
like to know the mean and standard deviation of a function of that random variable.
If x = the length of a taxi ride in miles, suppose that  x = 5.2 miles and  x = 2.8 miles. Two taxi
companies have different fare-schedules: company 1 charges $2.50 per mile and company 2 charges $2
per mile plus an initial fee of $5. Thus, if we are interested in the total fare of a taxi ride (y), the
functions are:
How can we find the mean and standard deviation for y in each case?
Recall from chapter 4 that adding a constant to every value in a data set changes the measures of
position (mean, median, min, max, Q1, Q3, etc.) BUT NOT the measures of spread (standard deviation,
range, IQR).
However, multiplying every value in a data set by a constant changes both the measures of position
AND the measures of spread.
In addition to paying the fare, the passenger usually gives the driver a tip (t). Assume that  t = 1.50,  t
= .70, and that the tip does not depend on the fare (that is, t and y are independent).
What are the mean and standard deviation of the new variable c, the total cost of a trip, including tip?
The mean and variance of a combination of random variables:
If x and y are random variables with means  x and  y and standard deviations  x and  y , then:
 x  y   x   y and  x  y   x   y
Note: the mean formulas work regardless of whether x and y are independent.
 x2 y   x2   y2 and  x2 y   x2   y2 thus  x  y   x2   y2 and  x  y   x2   y2
Note: the variance (SD) formulas work ONLY when x and y are independent.
Note: Always remember that the variances add, not the standard deviations
Note: Even when we are subtracting the random variables, we add their variances. More variables
always means more variability!
Note: All 4 of these rules work for combining more than 2 random variables as well.
Find c and  c for company 2.
HW #3: Read 373-375, Problems 16.4, 18, 23, 24, 27, 31
Chapter 16: Combining Normal Random Variables
Any linear combination of normally distributed random variables is also normally distributed.
 If x and y are normally distributed then (x + y), (x – y), (2x + 3y), etc. are also normally
distributed.
If men’s weights (in pounds) are N(170, 40), women’s weights are N(135, 30), and a man and woman
are selected at random, what is the probability that the sum of their weights is at least 300 pounds?
What assumptions must you make to do this problem?
What is the probability that the woman is heavier than the man?
If you were to select 3 men at random, what is the probability that the sum of their weights is at most
500 pounds?
HW #4: Read 376-380 Problems 16. 33-39 odd
Chapter 16: Calculator Hints
Suppose the probability model for grades on the AP Stats test (x) at LSHS is given below:
x
1
2
3
4
5
P(x) .03 .04 .24 .43 .26
Find the mean and SD of x:
Find the mean and SD for the sum of 3 randomly selected scores:
Suppose that the weight of a candy bar has a mean of 4 ounces with a standard deviation of 0.15 ounces.
Also, the King size version of the candy bar has a weight of 7.5 ounces with a SD of 0.4 ounces.
Assuming both distributions are approximately normal, what is the probability that a randomly selected
King size bar weighs more than 2 randomly selected regular bars?
HW #5: Problems: 16.2, 10, 22, 28, 34, 36
Review chapter 16
HW #6: Problems: 16. 15, 25, 29, 32, 38
Chapter 17: The Geometric Model
There are few other commonly used probability models besides the Normal model. Two of these, the
Binomial and Geometric, are based on __________________________. Bernoulli trials have 3
important characteristics:
1. There are only two possible outcomes on each trial (success or failure)
2. The probability of success, p, is the same on every trial.
3. The trials are independent. That is, knowing the results of previous trials doesn’t change the
probability of future trials.
 Note: whenever we sample without replacement, this condition is violated. However, as
long as the sample is not more than 10% of the population, the changes in the
probabilities are too small to worry about.
Some examples of Bernoulli trials include:
The _________________________ tells us the probability that it takes a specific number of Bernoulli
trials to achieve a success. The geometric random variable is always:
x = the number of trials required to get one success
The model is completely determined by one parameter (p) and is denoted: G(p).
For example, suppose I am rolling a die until I get a “5”. Is this Geometric?
What is the probability that I get a “5” on my first roll?
What is the probability that I get my first “5” on the 2nd roll
What is the probability that I get my first “5” on the 3rd roll
What is the probability that I get my first “5” on the 4th roll?
What is the probability that I get my first “5” on the kth roll?
For geometric models in general,
 P(X = x) = q x1 p where q = 1 – p = probability of failure
1
 Expected Value:  
p

Standard Deviation:  
q
p2
Suppose that 30% of LSHS students have a driver’s license. On average, how many students will you
need to randomly select to find one with a DL?
What is the probability it will take less students than expected?
HW #7 Read: 386-390, Problems: 17.1, 7, 9, 11
Chapter 17: The Binomial Model
Instead of asking how many trials it will take to achieve one success, we can also ask how many
successes we expect to see in a fixed number of Bernoulli trials.
The __________________________ tells us the probability that we will achieve a specific number of
successes in a given number of trials. The Binomial RV is always:
x = number of successes
The model is determined by two parameters: the number of trials (n) and the probability of success (p)
and is denoted: B(n, p).
Suppose that 70% of people in the mall will make a purchase and 3 mall visitors are randomly selected.
Use a tree diagram to identify the 8 possible outcomes and their probabilities.
What is the probability that exactly 2 of the 3 shoppers will make a purchase?
Write out the complete probability model:
X
P(X)
0
1
2
3
This problem wasn’t too difficult, but what if the question asked about the probability that exactly 200
of 300 shoppers made a purchase? There would be way too many outcomes to display in a tree diagram.
Is there an easier way? Notice the patterns below:
P  X  0   1.7  .3
0
P  X  1  3 .7  .3
1
3
2
P  X  2   3 .7  .3
2
P  X  3  1.7  .3
3
1
0
If x ~ B(n, p) and x = # of successes:
n
x
n x
P  X  x      p   q  where q = 1 - p
 x
n
n!
Note:   = “n choose x” =
 n Cx (TI83:Math:Prb) = nth row in Pascal’s triangle
x ! n  x !
 x
Suppose you rolled a die 4 times and let x = number of sixes. Find the probability distribution of x and
the probability that you get 2 or more sixes.
HW #8: Read: 390-393, Problems: 17.2, 8, 10, 13, 19
Chapter 17: More on the Binomial Model
Suppose that you are a telemarketer and historically only 15% of people you call will remain on the
phone for at least 1 minute. What is the probability that 1 of the next 5 people you call will remain on
the phone for more than 1 minute?
Using the TI-83 to do binomial calculations:
Shortcuts on TI83: Distribution Menu (DISTR = 2nd VARS)
Assume x ~ B(5, .15)
Dist: binompdf(n, p, k) gives the probability that there will be k successes in n trials.
ex: P(x = 1) = binompdf(5, .15, 1) =
Note: pdf stands for probability density function
Dist: binompdf(n, p) gives the probability of each possible value of k (0, 1, …, n)
ex: binompdf(5, .15) =
Dist: binomcdf(n, p, k) gives the probability that the number of successes is  k
ex: P(x ≤ 1) = binomcdf(5, .15, 1) =
Note: cdf stands for cumulative probability density function (cumulative means added up)
Dist: binomcdf(n, p) gives the cumulative probabilities for each possible value of k (  0,  1, …,  n)
ex: binomcdf(5, .15) =
If I roll a die 30 times, what is the probability that I get at least 5 sixes?
What is the probability I get at least 2 sixes and at most 7 sixes?
The mean and standard deviation of a binomial random variable:
If x is a binomially distributed random variable with n trials and probability of success p, then
  np and   npq where q = 1- p
Suppose that 75% of all customers at a gas station choose 87 octane gas. If you observe the next 50
customers,
a. How many customers do you expect to choose 87 octane gas?
b. What is the standard deviation of the number of customers that choose 87 octane gas?
c. What is the probability that the number of customers who choose 87 octane gas is within 2
standard deviations of the mean?
HW #9: Problems 17.14-17
Chapter 17: The Normal Model to the Rescue

Suppose we were to roll a die one million times. What is the probability that we would get more
than 167000 sixes?
In many cases, it is useful to approximate a discrete distribution with a continuous distribution. In particular, we
often use the normal distribution to approximate the binomial distribution. Historically, this is because we didn’t
have calculators with binomcdf. Even now, there are some problems even too formidable for our TI’s!
Fortunately, the normal distribution approximates the binomial quite well in certain circumstances.
It seems like this approximation works best when p is close to 0.5 and n is large.


Success/Failure Condition : If x ~ B(n, p) and np  10 and nq  10, then x ~ N np, npq . That is, it is OK to
use the normal approximation whenever the expected number of successes and failures are both at least 10.
Why np > 10? If we want to use the normal approximation when p is close to 0, we should be able to go at least 3
SD below the mean and still be in the domain of x (that is, we shouldn’t be in negative territory).
  3  0
np  3 npq  0
np  3 npq
n 2 p 2  9  npq 
np  9q
np  9 (since q is close to 1)
Therefore, to be safe, np  10.
Use the normal approximation to answer the original question.
Even though our calculator can handle any reasonable sample size, we need to use the normal approximation to
the binomial when we do the inference procedures for proportions in a few chapters.
Suppose that 51% of US residents are women. In a random sample of size 1000, what is the probability that
fewer than 500 women are selected?
In a sample of size 10, what is the probability that fewer than 5 women are selected?
HW #10: Read 394-397, Problems 17.20, 21, 23, 24
Chapter 17 Review
HW #11: Problems 17.22, 25, 27, 29
Download