Chapter 5 - PRCACalculus

advertisement
Chapter 7
Please pick up an
assignment sheet and
notes packet
Random Variable A grocery store
manager
mightvalue
be
• A numerical
variable
whose
interested in the number of broken
depends
on
the
outcome
of
a
chance
eggs in each carton (dozen of eggs).
experiment
OR
• Associates
a numerical
An environmental
scientistvalue
mightwith
be
interested
in theofamount
of ozone
in
each
outcome
a chance
experiment
an air sample.
• Two types of random variables
– Discrete
Since these values change and are
– Continuous
subject to some uncertainty, these
are examples of random variables.
Two Types of Random
Variables:
• Discrete – its set of possible
is we
In thisvalues
chapter,
willalong
look ata
a collection of isolated points
different
number line
This is typically a
“count” of something
distributions of
discrete and
continuous
random variables.
• Continuous - its set of possible values
This is typically a
includes an entire “measure”
interval on
of a
something
number line
Identify the following variables
as discrete or continuous
1. The number of broken eggs in each
carton Discrete
2. The amount of ozone in samples of air
Continuous
3. The weight of a pineapple
Continuous
4. The amount of time a customer spends in
a store Continuous
5. The number of gas pumps in use
Discrete
Probability Distributions
for Discrete Random
Variables
Probability distribution is a
model that describes the longrun behavior of a variable.
In a Wolf City (a fictional place), regulations
prohibitThis
no more
than
dogsprobability
or cats per
is called
a five
discrete
household.
distribution. It can also be displayed
in anumber
histogram
withand
the cats
probability
Let x = the
of dogs
in a
What
do
you
notice
about
the
sum
of
on
the
vertical
axis.
randomly selected household in Wolf City
these probabilities?
Is this
variable
discrete
orvalues
continuous?
What
are
the
possible
for
x
0
1
2
3
4
5 x?
Probability
P(x) .26 .31 .21 .13 .06 .03
The Department of Animal Control has
collected data over the course of
several years. They have estimated
the long-run probabilities for the
values of x.
Number of Pets
Discrete Probability Distribution
1) Gives the probabilities associated with
each possible x value
2) Each probability is the long-run relative
frequency of occurrence of the
corresponding x-value when the chance
experiment is performed a very large
number of times
3) Usually displayed in a table, but can be
displayed with a histogram or formula
Properties of Discrete Probability
Distributions
1) For every possible x
value,
0 < P(x) < 1.
2) For all values of x,
S P(x) = 1.
Dogs and Cats Revisited . . .
Let x Just
= theadd
number
of dogs or cats
the probabilities
for per
0, 1, and 2
household in Wolf City
x
0
P(x) .26
1
2
3
4
5
.31
.21
.13
.06
.03
What does this mean?
What is the probability that a randomly
selected household in Wolf City has at most 2
pets?
P(x < 2) = .26 + .31 + .21 =
.78
• Finish the dog and cat probability
problems on the second page of your
notes
Dogs and Cats Revisited . . .
Notice that this probability
Let x = the
number
of dogs2!or cats per
does
NOT include
household in Wolf City
x
0
P(x) .26
1
2
3
4
5
.31
.21
.13
.06
.03
What does this mean?
What is the probability that a randomly
selected household in Wolf City has less than
2 pets?
P(x < 2) = .26 + .31 = .57
Dogs and Cats Revisited . . .
Let x = the number of dogs or cats per
household in Wolf City
When
probabilities
x calculating
0
1
2
3
4 for 5discrete
random variables, you MUST pay close
P(x) .26 to.31
.21 certain
.13 .06
.03 are
attention
whether
values
included (< or >) What
or notdoes
included
(< or >) in
this mean?
the calculation.
What is the probability that a randomly
selected household in Wolf City has more than
1 but no more than 4 pets?
P(1 < x < 4) =
.21 + .13 + .06 =
.40
Suppose that each of four random selected customers purchasing a hot tub at a certain
store chooses either an electric (E) or a gas (G) model. Assume that these customers
makes their choices independently of one another and that 40% of all customers select an
electric model. This implies that for any particular one of the four customers P(E) = 0.40
and P(G) = 0.60. One possible experimental outcome is EFFE, where the first and fourth
customers select electric models and the other two choose gas models. Because the
customers make their choices independently the multiplication rule for independent
events implies that
P(EGGE) = P(1st chooses E AND 2nd chooses G AND 3rd chooses G AND 4th chooses E) =
= P(E)P(G)P(G)P(E)
= (0.4)(0.6)(0.6)(0.4)
= 0.0576
Suppose that each of four random selected customers purchasing a hot tub at a certain
store chooses either an electric (E) or a gas (G) model. Assume that these customers
makes their choices independently of one another and that 40% of all customers select an
electric model. This implies that for any particular one of the four customers P(E) = 0.40
and P(G) = 0.60. One possible experimental outcome is EFFE, where the first and fourth
customers select electric models and the other two choose gas models. Because the
customers make their choices independently the multiplication rule for independent
events implies that
P(EGGE) = P(1st chooses E AND 2nd chooses G AND 3rd chooses G AND 4th chooses E) =
= P(E)P(G)P(G)P(E)
= (0.4)(0.6)(0.6)(0.4)
= 0.0576
Outcome
GGGG
EGGG
GEGG
GGEG
GGGE
EEGG
EGEG
EGGE
Outcomes and Probabilities for Hot Tub Models
Probability
# of electric
Outcome
Probability
models sold
0.1296
0
GEEG
0.0576
0.0864
1
GEGE
0.0576
0.0864
1
GGEE
0.0576
0.0864
1
GEEE
0.0384
0.0864
1
EGEE
0.0384
0.0576
2
EEGE
0.0384
0.0576
2
EEEG
0.0384
0.0576
2
EEEE
0.0256
# of electric
models sold
2
2
2
3
3
3
3
4
GGGG
EGGG
GEGG
GGEG
GGGE
EEGG
EGEG
EGGE
P(x = 0) = 0.1296
p(x = 1) =0.3456
p(x = 2) = 0.3456
p(x = 3) = 0.1536
p(x = 4) = 0.0256
P(2 ≤ x ≤ 4) =0.3456 + 0.1536 +
0.0256 = 0.5248
# of electric
models sold
2
2
2
3
3
3
3
4
Probability of Selling X Electric
Hot Tubs per Four Customers
Relative Probability
Outcome
Outcomes and Probabilities for Hot Tub Models
Probability
# of electric
Outcome
Probability
models sold
0.1296
0
GEEG
0.0576
0.0864
1
GEGE
0.0576
0.0864
1
GGEE
0.0576
0.0864
1
GEEE
0.0384
0.0864
1
EGEE
0.0384
0.0576
2
EEGE
0.0384
0.0576
2
EEEG
0.0384
0.0576
2
EEEE
0.0256
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
P(x ≤ 3) = 0.1296 + 0.3456 +
0.3456 + 0.1536 = 0.9744
1
2
3
4
Number of Electric Hot Tubs Purhased Per Four
Customers
Probability Distributions
for Continuous Random
Variables
Consider the random variable:
x = the weight (in pounds) of a full-term
newborn child
Suppose that weight is reported to the nearest
pound. What
The following
probability
histogram
type
of
variable
is
this?
If weight
is sum
measured
with
greater
What
isdistribution
the
of the
areas
of all
displays
the
of
weights.
The
area
of
the
rectangle
and
greater
accuracy,
thecentered
histogram
the
rectangles?
Notice
that
the
rectangles
are
The
shaded
area
represents
the
over
7
pounds
represents
the
This
is
an
example
approaches
a histogram
smooth
curve.
Nownarrower
suppose
that
and
the
weight
is
reported
begins
to the
probability
6
<
x
<
8.
probability
6.5 < appearance.
xof<a7.5
to
have
a
smoother
nearest 0.1 pound. This woulddensity
be the curve.
probability histogram.
Probability Distributions for
Continuous Variables
• Is specified by a curve called a density
curve.
• The function that describes this curve
is denoted by f(x) and is called the
density function.
• The probability of observing a value in a
particular interval is the area under the
curve and above the given interval.
Properties of continuous
probability distributions
1. f(x) > 0 (the curve cannot dip below
the horizontal axis)
2. The total area under the density
curve equals one.
Let x denote the amount of gravel sold (in
tons) during a randomly selected week at a
particular sales facility. Suppose that the
density curve has a height f(x) above the value
x, where
2(1  x ) 0  x  1
f (x )  
0
The density curve is
shown in the figure:
otherwise
Density
2
1
Tons
1
Gravel problem continued . . .
What is the probability that at most ½ ton of
gravel is sold during a randomly selected
week?
P(x < ½)
1 – ½(0.5)(1) = .75
Thismore
areaeasily,
can beby
found
by use
OR,
finding
The
probability
would
be the
the
=
the area
formula
fortriangle,
the area of a
of
the
Density
shaded area under the curve and
trapezoid:
1
2
above the interval
from 0 to 0.5.
A 1 bh
A  2b1  b2 h
1
2 that area from
and subtracting
1.
Tons
1
Gravel problem continued . . .
What is the probability that exactly ½ ton of
gravel is sold during a randomly selected
week?
P(x = ½)
=
2
0
How
do
we
find
the
area
of
a
line
The
probability
would
be
the
area
Density Since a line segment has NO
segment?
under
the
curve
and above
0.5.
area,
then
the
probability
that
exactly ½ ton is sold equals 0.
1
Tons
1
Gravel problem continued . . .
What is the probability that less than ½ ton
of gravel is sold during a randomly selected
week?
P(x < ½)
=
Density
2
P(x < ½)
= 1 – ½(0.5)(1) = .75
Does the
probability
change
This
is different
than
whether the
½ is included
or not?
discrete
probability
1
1
distributions where it
does change the
probability whether a
value is included or
Tons
not!
Suppose x is a continuous random variable
defined as the amount of time (in minutes)
taken by a clerk to process a certain type of
application form. Suppose x has a probability
distribution with density function:
.5 4  x  6
f (x )  
0 otherwise
The following is the graph of f(x), the density
curve:
Density
0.5
4
5
Time (in
6
Application Problem Continued . . .
What is the probability that it takes more than
5.5 minutes to process the application form?
P(x > 5.5) = .5(.5) = .25
When the density is constant over an
Find in
thea probability
by
interval (resulting
horizontal density
calculating
the
area of theisshaded
curve), the
probability
distribution
called
regiondistribution.
(base × height).
a uniform
Density
0.5
4
5
Time (in
6
Other Density Curves
Some density curves resemble the one
below. Integral calculus is used to find
the area under the these curves.
Don’t worry – we will use tables (with
the values already calculated). We can
also use calculators or statistical
software to find the area.
The probability that a continuous random
variable x lies between a lower limit a and an
upper limit b is
This will area
be useful
P(a < x < b) = (cumulative
to thelater
left in
of b)
this chapter!
–
(cumulative
area to the left of
a)
P(a < x < b) = P(x < b) – P(x < a)
Means and Standard Deviations
of Probability Distributions
• The mean value of a random variable
x, denoted by mx, describes where the
probability distribution of x is
centered.
• The standard deviation of a random
variable x, denoted by sx, describes
variability in the probability
distribution
Mean and Variance for Discrete
Probability Distributions
• Mean is sometimes referred to as the
expected value (denoted E(x)).
μx   xp
• Variance is calculated using
s   x  m x  p
2
2
• Standard deviation is the square root
of the variance.
Dogs and Cats Revisited . . .
Let x = the number of dogs and cats in a
randomly selected household in Wolf City
x
0
1
2
3
4
5
P(x) .26 .31 .21 .13 .06 .03
xP(x) 0 + .31
.31 + .42
.42 + .39
.39 +.24
.24 +.15
.15
What is the mean number of pets per
household in Wolf City?
FirstNext
multiply
each
x-value
times
find the
sum
of these
its corresponding
probability.
values.
mx = 1.51 pets
Dogs and Cats Revisited . . .
Let x = the number of dogs or cats per
household in Wolf City
x
0
P(x) .26
1
2
3
4
5
.31
.21
.13
.06
.03
What is the standard deviation of the number
of pets per This
household
in Wolf City?
is
the
variance
– by
take
the
First Next
find the
deviation
of
each xmultiply
the
square
root
of
this
value.
2
2
2(.31)
from
the +mean.
Then
corresponding
probability.
Then
sx =value
(0-1.51)
(.26)
(1-1.51)
2(.21)
these
deviations.
add these
values.
+ square
(2-1.51)
+ (3-1.51)2(.13)
+ (4-1.51)2(.06) + (5-1.51)2(.03)
= 1.7499
sx = 1.323 pets
Mean and Variance for
Continuous Random Variables
For continuous probability distributions, mx and sx
can be defined and computed using methods from
calculus.
• The mean value mx locates the center of the
continuous distribution.
• The standard deviation, sx, measures the
extent to which the continuous distribution
spreads out around mx.
A company receives concrete of a certain type
from two different suppliers.
Let x = compression strength of a randomly
supplier
is preferred
to
selected The firstbatch
from Supplier
1
second both
in terms
of mean
y =the
compression
strength
of a randomly
selected
batch
from
Supplier 2
value
and
variability.
Suppose that
mx = 4650 pounds/inch2 sx = 200
pounds/inch2
my = 4500 pounds/inch2 sy = 275
pounds/inch2 4300 4500 4700 4900
my mx
What
happen had
to the
mean and
Suppose
Wolfwould
City Grocery
a total
standard
deviation
we had
of 14 employees.
The
followingif are
the to
deduct $100 from everyone’s salary
monthly salaries of all the employees.
because of business being bad?
3500
1300
1200
1500
1900
1700
1400
2300
2100
1200
1800
1400
1200
1300
The
and
standard
deviation
of the
Let’smean
graph
boxplots
of these
monthly
monthly
salaries
are happens to the
salaries to
see what
distributions
...
mx = $1700
and sx = $603.56
What
We
see that the distribution
What
happened
just shifts
to the right
100good, so the
happene
Suppose
business
is really
units
butgives
the spread
is a
the
d
to the
manager
everyone
$100 raise perto the
standard
same.
means?
month. The new mean and standard deviation
deviations
would be
?
m = $1800 and s = $603.56
Wolf City Grocery Continued . . .
mx = $1700 and sx = $603.56
Suppose the manager gives everyone a 20%
raise - the new mean and standard deviation
would be
Let’s graph boxplots of these monthly
mx see
= $2040
and sx = $724.27
salaries to
what happens
to the
distributions . . .
Notice
that multiplying
Notice
that both the mean and standard
by a constant stretches
deviation increased by 1.2.
the distribution, thus,
changing the standard
deviation.
Mean and Standard Deviation
of Linear functions
If x is a random variable with mean, mx,
and standard deviation, sx, and a and b
are numerical constants, and the
random variable y is defined by
y  a  bx
and
m y  m a bx  a  bm x
2
sy

2
sa bx

2 2
b sx
or s y  b s x
Consider the chance experiment in which a
customer of a propane gas company is randomly
selected. Let x be the number of gallons required
to fill a propane tank. Suppose that the mean and
standard deviation is 318 gallons and 42 gallons,
respectively. The company is considering the
pricing model of a service charge of $50 plus
$1.80 per gallon. Let y be the random variable of
the amount billed.
What is the equation for y?
y = 50 + 1.8x
What are the mean and standard deviation for
the amount billed?
my = 50 + 1.8(318) = $622.40
sy = 1.8(42) = $75.60
Suppose we are going to play a game
?
called Stat Land! Players spin the two
spinners below and move the sum of the
two numbers. Find the mean
and
2
1 2
1
3
standard
deviation
for
4 3
6
4
5
these
sums.
Spinner B
Spinner A
Not sure – let’s think
mA = 2.5
mB = 3.5
about
it and return
in
sjust
sB = 1.708
a few minutes!
A = 1.118
are the
mean
List all theHere
possible
sums
(A +and
B).
standard
deviation for
Notice
that
the
2
3 How
4 are
5
6
7
the
each
spinner.
mean
of
the
sums
is
standard
deviations
mA+B = 6
3
4the sum
5
6 the 7
8
of
related?
4
5
6
7
8
9
means!
sA+B =2.041
5
6
7
8
9
10
Move
1s
Stat Land Continued . . .
Suppose one variation of the game
had players move the difference
of the spinners
2
1
2
4
3
1
6
?
Move
1s
3
4
5
Find
the
and
weBmean
find the
Spinner
Spinner A How do
standard
deviation
standard
for for
the
mA = 2.5
mBdeviation
= 3.5
these
differences.
sums
or
differences?
sA = 1.118 sB = 1.708
List all the possible differences (B - A).
0
1
2
3
4
5
-1
-2
-3
Notice
that
the
mean
0WOW
-1– this
-2 is
the
of1 the 0
differences
is
-1
same
value as
the
the
difference
of
the
2
1
0
standard
deviation
means!
3 of 2the sums!
1
4
3
2
mB-A= 1
sB-A =2.041
Mean and Standard Deviations
for Linear Combinations
If x1, x2, …, xn are random variables with
means m1, m2, …, mn and variances s12, s22, …,
sn2, respectively,
This resultand
is true ONLY if the x’s
y = aare
+ a2x2 + … + anxn
1x1 independent.
then
This result is true regardless of
whether
my  a1the
mx x’sare
a2mindependent.

...

a
m
x
n
xn
2
1
s y  a12s x21  a22s x22  ...  an2s x2n
A commuter airline flies small planes between
San Luis Obispo and San Francisco. For
small planes the baggage weight is a concern.
Suppose it is known that the variable x =
weight (in pounds) of baggage checked by a
randomly selected passenger has a mean and
standard deviation of 42 and 16, respectively.
Consider a flight on which 10 passengers, all
traveling alone, are flying.
The total weight of checked baggage, y, is
y = x1 + x2 + … + x10
Airline Problem Continued . . .
mx = 42 and sx = 16
The total weight of checked baggage, y, is
y = x1 + x2 + … + x10
What is the mean total weight of the checked
baggage?
mx = m1 + m2 + … + m10
= 42 + 42 + … + 42
= 420 pounds
Airline Problem Continued . . .
42 and sx =are
16 all traveling
x =passengers
Since them10
alone,
it is reasonable
think that
The total
weight
of checked to
baggage,
y, isthe
10 baggage weights are unrelated and
therefore
y
= x1 + x2independent.
+ … + x10
What is the
standard
deviationdeviation,
of the total
To find
the standard
weight of
thethe
checked
take
squarebaggage?
root of this value.
sx2 = sx12 + sx22 + … + sx102
= 162 + 162 + … + 162
= 2560 pounds
s = 50.596 pounds
The Attila Barbell Company makes bars for weight lifting. The
weights of the bars are independent and are normally
distributed with a mean of 720 ounces (45 pounds) and a
standard deviation of 4 ounces. The bars are shipped 10 in a
box to the retailers. The weights of the empty boxes are
normally distributed with a mean of 320 ounces and a standard
deviation of 8 ounces. The weights of the boxes filled with 10
bars are expected to be normally distributed with a mean of
7520 ounces and a standard deviation of:
The Attila Barbell Company makes bars for weight lifting. The
weights of the bars are independent and are normally
distributed with a mean of 720 ounces (45 pounds) and a
standard deviation of 4 ounces. The bars are shipped 10 in a
box to the retailers. The weights of the empty boxes are
normally distributed with a mean of 320 ounces and a standard
deviation of 8 ounces. The weights of the boxes filled with 10
bars are expected to be normally distributed with a mean of
7520 ounces and a standard deviation of:
𝜎𝑥+𝑦 =
𝜎𝑥 2 + 𝜎𝑦 2
The Attila Barbell Company makes bars for weight lifting. The
weights of the bars are independent and are normally
distributed with a mean of 720 ounces (45 pounds) and a
standard deviation of 4 ounces. The bars are shipped 10 in a
box to the retailers. The weights of the empty boxes are
normally distributed with a mean of 320 ounces and a standard
deviation of 8 ounces. The weights of the boxes filled with 10
bars are expected to be normally distributed with a mean of
7520 ounces and a standard deviation of:
𝜎𝑥+𝑦 =
𝜎𝑥 2 + 𝜎𝑦 2
𝜎𝑏𝑎𝑟𝑠 𝑎𝑛𝑑 𝑏𝑜𝑥 =
10(4)2 +82
Number of
Courses
1
2
3
4
5
6
7
Sum
Mean
Variance
Standard
Deviation
Probability
0.02
0.03
0.09
0.25
0.40
0.16
0.05
(Number of
Courses)*(Probability)
Mean
Deviation
Deviation^2
(Deviation^2)*(Probability)
Number of Probability
(Number of
Mean
Deviation
Deviation^2
(Deviation^2)*(Probability)
Courses
Courses)*(Probability)
1
0.02
0.02
4.66
-3.66
13.3956
0.267912
2
0.03
0.06
4.66
-2.66
7.0756
0.212268
3
0.09
0.27
4.66
-1.66
2.7556
0.248004
4
0.25
1
4.66
-0.66
0.4356
0.1089
5
0.4
2
4.66
0.34
0.1156
0.04624
6
0.16
0.96
4.66
1.34
1.7956
0.287296
7
0.05
0.35
4.66
2.34
5.4756
0.27378
Sum
1
4.66
1.4444
Mean
4.66
Variance
1.4444
Standard
1.20
Deviation
Number of Probability
(Number of
Mean
Deviation
Deviation^2
(Deviation^2)*(Probability)
Courses
Courses)*(Probability)
1
0.02
0.02
4.66
-3.66
13.3956
0.267912
2
0.03
0.06
4.66
-2.66
7.0756
0.212268
3
0.09
0.27
4.66
-1.66
2.7556
0.248004
4
0.25
1
4.66
-0.66
0.4356
0.1089
5
0.4
2
4.66
0.34
0.1156
0.04624
6
0.16
0.96
4.66
1.34
1.7956
0.287296
7
0.05
0.35
4.66
2.34
5.4756
0.27378
Sum
1
4.66
1.4444
Mean
4.66
Variance
1.4444
Standard
1.20
Deviation

P(# of courses > m x ) = p(# of courses > 4.66) = p(5) + p(6) + p(7) = 0.61

P( m x - 2* s x < # of courses < m x + 2* s x ) = p(4.66 – 2.4 < # of courses < 4.66 + 2.40)
= p(2.26 < # of courses < 7.06) = p(3) + p(4) + p(5) + p(6) +p(7) = 0.95

p( m x - 2* s x > # of courses OR # of courses > m x + 2* s x )
= p(# of courses < 2.26 OR # of courses > 7.06) = p(2) + p(1) = 0.05
Special Distributions
Two Discrete Distributions:
Binomial and Geometric
One Continuous Distribution:
Normal Distributions
Suppose we decide to record the gender of the
next 25 newborns at a particular hospital.
These questions can be
answered using a
binomial distribution.
Properties of a Binomial
Experiment
1.There are a fixed number of trials
2.Each trial results in one of two mutually
We use n to denote the fixed
exclusive outcomes.
(success/failure)
number
of trials.
3.Outcomes of different trials are
independent
4.The probability that a trial results in
success is the same for all trials
The binomial random variable x is defined as
x = the number of successes observed when a
binomial experiment is performed
Are these binomial distributions?
1) Toss a coin 10 times and count
the number of heads
Yes
2) Deal 10 cards from a shuffled deck
and count the number of red cards
No, probability does not remain
constant
3) The number of tickets sold to
children under 12 at a movie
theater in a one hour period
No, no fixed number
Binomial Probability
Formula:
Let
n = number of independent trials in a binomial
experiment
p = constant probability that any trial results in a
success
n!
P (x ) 
x ! (n  x )!
p (1  p )
x
n x
Where:
n  9 can be used

n
!
Appendix
Table
to find and
Technology,
 n C xsuch
 as calculators
binomial
probabilities.
x
statistical
software,
x ! (n  x will
)! also
 
perform this calculation.
Instead of recording the gender of the next 25
newborns at a particular hospital, let’s record
the gender of the next 5 newborns at this
hospital.
is the
probability of
Is this a What
binomial
experiment?
“success”?
Yes, if the births were not multiple births
(twins, etc).
Define the random variable of interest.
What
will
the largest
value
of the
Will
a
binomial
random
variable
x = the number
of females
born
out
of the next
binomial
random
value
be?
always include the value of 0?
5 births
What are the possible values of x?
x
0
1
2
3
4
5
Newborns Continued . . .
What is the probability that exactly 2 girls will
be born out of the next 5 births?
P (x  2) 5 C 2  0.5  0.5  .3125
2
3
What is the probability that less than 2 girls
will be born out of the next 5 births?
P (x  2)  p (0)  p (1)
5 C 0 .5 .5 5 C 1 .5 .5
0
 .1875
5
1
4
Newborns Continued . . .
Let’s construct the discrete probability
distribution table for this binomial random
variable:
x
0
1
2
3
4
5
p(x)
.03125
.1562
5
.3125
.3125
.1562
5
.0312
5
Notice
thisnumber
is the same
as born
multiplying
What
is thethat
mean
of girls
in the
next five births?
n×p
Since
this is a discrete
mx = 0(.03125)
+ 1(.15625)
+ 2(.3125) +
distribution,
we+could
use:
3(.3125)
+ 4(.15625)
5(.03125)
=2.5
mx   xp
Formulas for mean and standard
deviation of a binomial distribution
mx  np
sx  np 1  p 
Newborns Continued . . .
How many girls would you expect in the next
five births at a particular hospital?
mx  np  5(.5)  2.5
What is the standard deviation of the number
of girls born in the next five births?
sx  np (1  p )  5(.5)(.5)
 1.118
Remember, in binomial distributions, trials
should be independent.
However, when we sample, we typically
sample
without
replacement,
which wouldif
When
sampling
without replacement
mean thatn the
independent.
is attrials
mostare
5%not
of N,
then the . .
binomial
distribution
gives a observed
good
In this case,
the number
of success
approximation
the probability
would not
be a binomialtodistribution
but
distribution of x.
rather hypergeometric distribution.
But when
the samplefor
size,
n, is smallin
and
The calculation
probabilities
a the
population
size, N, is distribution
large, probabilities
hypergeometric
are even
calculated
using
binomial
more
tedious
thandistributions
the binomial and
formula! are VERY close!
hypergeometric distributions
•
•
•
•
•
Suppose a particular breed of dog gives birth to a male dog 59% of the time and gives
birth to a female dog 41% of the time.
Let M = event that a male pup is born
Let F = event that a female pup is born
Let x = the number of male pups born in a litter of four pups
Fill in the following table:
Outcome
Probability
Number of
Male Pups (x)
Outcome
FFFF
FMMF
MFFF
FMFM
FMFF
FFMM
FFMF
MMMF
FFFM
MMFM
MMFF
MFMM
MFMF
FMMM
MFFM
MMMM
Probability
Number of
Male Pups (x)
Outcome
Probability
Number of
Male Pups
(x)
Outcome
Probability
Number of
Male Pups
(x)
FFFF
0.0283
0
FMMF
0.0585
2
MFFF
0.0406
1
FMFM
0.0585
2
FMFF
0.0406
1
FFMM
0.0585
2
FFMF
0.0406
1
MMMF
0.0842
3
FFFM
0.0406
1
MMFM
0.0842
3
MMFF
0.0585
2
MFMM
0.0842
3
MFMF
0.0585
2
FMMM
0.0842
3
MFFM
0.0585
2
MMMM
0.1212
4
Probability of Getting a Number
of Male Pups in a Litter of Four
Pups
Probability
0.4
0.3
0.2
0.1
0
0
1
2
3
Number of Male Pups in a LItter of Four
4
Newborns Revisited . . .
Suppose we were not interested in the
number of females born out of the next
five births, but
which birth would result in the first
female being born?
How is this question different from a
binomial distribution?
Properties of Geometric
Distributions:
• There are two mutually exclusive outcomes
that result in a success or failure
So what are the
• Each trial is independent of the others
possible values of x
• The probability of success is the same for
all trials.
To infinity
How far will this go?
A geometric random variable x is defined as
x = the number of trials UNTIL the FIRST
success is observed ( including the
success).
x
1
2
3
4
...
Probability Formula for the
Geometric Distribution
Let
p = constant probability that any trial results in a
success
x 1
p (x )  (1  p )
Where
x = 1, 2, 3, …
p
Suppose that 40% of students who drive to
campus at your school or university carry
jumper cables. Your car has a dead battery and
you don’t have jumper cables, so you decide to
stop students as they are headed to the parking
lot and ask them whether they have a pair of
jumper cables.
Let x = the number of students stopped before
finding one with a pair of jumper cables
Is this a geometric distribution?
Yes
Jumper Cables Continued . . .
Let x = the number of students stopped before
finding one with a pair of jumper cables
p = .4
What is the probability that third student
stopped will be the first student to have jumper
cables?
P(x = 3) = (.6)2(.4) = .144
What is the probability that at most three
student are stopped before finding one with
jumper cables? P(x < 3) = P(1) + P(2) + P(3)
= (.6)1(.4) + (.6)2(.4) = .784
(.6)0(.4) +
Welcome back! Please pick up:
• Notes Packet
• Assignment Sheet
• t-score table
Normal Distributions
• Continuous probability distribution
is this we
done
To overcome the need forHow
calculus,
rely
• Symmetrical bell-shaped (unimodal)
density
mathematically?
on technology or on a table of areas for the
curve defined by m and s
standard normal distribution
• Area under the curve equals 1
• Probability of observing a value in a
particular interval is calculated by finding
the area under the curve
• As s increases, the curve flattens &
spreads out
• As s decreases, the curve gets
taller and thinner
A
B
6
s
s
Do these two normal curves have the same
mean? If so, what is
it?
YES
Which normal curve has a standard deviation of
B
3?
Which normal curve has a standard deviation
of
A
1?
Notice that the normal curve is curving
downwards from the center (mean) to points
that are one standard deviation on either side
of the mean. At those points, the normal
curve begins to turn upward.
Standard Normal
Distribution
• Is a normal distribution with m = 0 and s
=1
• It is customary to use the letter z to
represent a variable whose distribution is
described by the standard normal curve (or
z curve).
Using the Table of Standard
Normal (z) Curve Areas
• For any number z*, from -3.89 to 3.89 and
To decimal
use the places,
table: the
rounded to two
Appendix Table 2 gives the area under the z
curve
andthe
to correct
the left row
of z*.and column
• Find
(see the P(z
following
< z*) =example)
P(z < z*)
• The number at the intersection of
Where that row and column is the
probability
the letter
z is used to represent a random variable
whose distribution is the standard normal
distribution.
Suppose we are interested in the probability
that z* is less than -1.62.
In the table of areas:
P(z < -1.62) =.0526
•Find the row labeled -1.6
•Find the column labeled 0.02
…
.0436
.0537
.0655
…
.0446
.0548
.0668
…
-1.7
-1.6
-1.5
…
…
•Find the intersection of the row and column
…
z*
.00
.01
.02
.0427
.0526
.0643
.0418
.0516
.0618
Suppose we are interested in the probability
that z* is less than 2.31.
P(z < 2.31) =.9896
…
.9864
.9896
.9920
.02
…
…
.9861
.9893
.9918
.01
…
2.2
2.3
2.4
.00
…
…
z*
.9868
.9898
.9922
.9871
.9901
.9925
Suppose we are interested in the probability
that z* is greater than 2.31.
…
.9864
.9896
.9920
…
.9861
.9893
.9918
…
2.2
2.3
2.4
…
…
The Table of Areas gives the area to the
P(z > 2.31) =
LEFT of the z*.
1 - .9896 =
.0104
To find the area to the right, subtract
the value in the table from 1
…
z*
.00
.01
.02
.9868
.9898
.9922
.9871
.9901
.9925
Suppose we are interested in the finding the z*
for the smallest 2%.
.0162
.0207
.0262
…
…
…
…
…
…
-2.1
-2.0
-1.9
…
…
To find z*:
P(z < z*) = .02
Since .0200 doesn’t appear in the body
z*
= -2.08
Look
for
the area
.0200
in the
body
of
the
Table,
use the
closest
toofit.
z*value
the Table. Follow the row and column
back out to read the z-value.
…
z*
.03
.04
.05
.0158
.0202
.0256
.0154
.0197
.0250
Suppose we are interested in the finding the z*
for the largest 5%.
Since .9500 is exactly between
.9495
.95
P(z > z*)and
= .05
.9505, we can average the z* for
each of these
z* = 1.645
z*
…
…
…
…
…
Remember the Table of Areas gives the
area to the LEFT of z*.
…
z*
.03
.04
.05
1 – (area to the right of z*)
…
1.5
.9382
.9398
.9406
Then look up this
value in
the body
of
… the.9495
1.6
table. .9505 .9515
…
1.7
.9591 .9599 .9608
Finding Probabilities for Other Normal Curves
• To find the probabilities for other normal curves,
standardize the relevant values and then use the
table of z areas.
• If x is a random variable whose behavior is described
by a normal distribution with mean m and standard
deviation s , then
P(x < b) = P(z < b*)
P(x > a) = P(z > a*)
P(a < x < b) = P(a* < z < b*)
Where z is a variable whose distribution is standard
normal and
a* 
a m
s
b* 
b m
s
Data on the length of time to complete
registration for classes using an on-line
registration system suggest that the
distribution of the variable
x = time to register
for students at a particular university can well
be approximated by a normal distribution with
mean m = 12 minutes and standard deviation s
= 2 minutes.
Registration Problem Continued . . .
x = time to register
Standardized this
value.
m = 12 minutes and s = 2 minutes
What is the probability that
will value
take aup
Lookitthis
randomly selected student less
than
9 minutes
in the
table.
to complete registration?
P(x < 9) = .0668
9  12
b* 
 1.5
2
9
Registration Problem Continued . . .
x = time to register
Standardized this
value.
m = 12 minutes and s = 2 minutes
What is the probability that
will value
take aup
Lookitthis
randomly selected student
thanand
13
inmore
the table
minutes to complete registration?
subtract from 1.
P(x > 13) = 1 - .6915 = .3085
13  12
a* 
 .5
2
13
Registration Problem Continued . . .
x = time to register
Standardized these
values.
m = 12 minutes and s = 2 minutes
Look thesethat
values
up take
in thea table
What is the probability
it will
and between
subtract 7 and 15
randomly selected student
(valueregistration?
for a*) – (value for b*)
minutes to complete
P(7 < x < 15) =
.9332 - .0062 = .9270
15  12
a* 
 1.5
2
7  12
b* 
 2.5
2
7
15
Registration Problem Continued . . .
x = time to register
m = 12 minutes and s = 2 minutes
Because some
notto
logthe
off properly, the
Lookstudents
up the do
area
Use
formulaautomatically
for
university would
log
offtable.
students
left oflike
a* to
in
thethe
to find
x.
after some time has standardizing
elapsed. It is decided
to select
this
time so that only 1% of students will be automatically
logged off while still trying to register.
What time should the automatic log off be set
at?
a* = 16.66
P(x > a*) = .01
.99
x  12
2.33 
2
.01
a*
Ways to Assess Normality
What should
Some
of theifmost
happen
our frequently used statistical
methods
are
valid
only
when
x
,
x
,
…,
x
has
1
2
n
data set is
come from a population distribution that at least
normally
is approximately normal. One way to see
distributed?
whether
an assumption of population normality
is plausible is to construct a normal probability
plot of the data.
A normal probability plot is a scatterplot of
(normal score, observed values) pairs.
Consider a random sample with n =
5.
Each region
To
find
appropriate normal
scores for
Why
arethe
these
area
aregions
sample
ofthe
size 5, divide has
the an
standard
not
equal to
0.2.
normal
curve into 5 equal-area
regions.
same width?
Consider
random
sample
withthatn we
= 5.
These aare
the normal
scores
Next – find
the median
for each
would
plot ourz-score
data against.
region.
Why is the
We
use
technology
(calculators
or
median not in
statistical
software)
to
compute
these
the “middle” of
normal
scores.
each region?
-1.28
1.28
0
-.524
.524
Ways to Assess Normality
Some of the most frequently
used statistical
Such as curvature
which
methodsOr
areoutliers
valid would
only when
x1, xskewness
indicate
in
2, …, xn has
come from a population distribution
the data that at least
is approximately normal. One way to see
whether an assumption of population normality
is plausible is to construct a normal probability
plot of the data.
A normal probability plot is a scatterplot of
(normal score, observed values) pairs.
A strong linear pattern in a normal probability plot
suggest that population normality is plausible.
On the other hand, systematic departure from a
straight-line pattern indicates that it is not
reasonable to assume that the population
Sketch
a scatterplot
by pairing
theis
The
following
data
represent
eggplot
weights
Let’s
construct
a normal
probability
Since
the
normal
probability
smallest
score
with
the
(in
grams)
for the
a normal
sample
ofof10
eggs.
plot.
Since
values
normal
approximately
linear,
itthe
is
plausible
smallest
observation
from
data
scores
depend
on the sample
size
n, is
that
the distribution
of
egg the
weights
set &
so on
normal.
the normalapproximately
scores
when
n = 10 are
53.04
below: 53.50
52.53
53.00
53.07
53.5
52.86
52.66
53.23
53.26
53.16
53.0
-1.539 -1.001 -0.656 -0.376 -0.123
52.5
0.123 0.376 0.656
1.001 1.539
-1.5
-1.0 -0.5
0.5
1.0
1.5
Using the Correlation
Coefficient to Assess Normality
•The correlation coefficient, r, can be calculated for
the n (normal score, observed value) pairs.
•If r is too much smaller than 1, then normality of
Since
the underlying distribution is questionable.
r > critical
r,
Values to Which r Can be Compared
to Check
for
How
iseggs
“toosample
thenNormality
it is
plausible
that
the
Consider these points
from
thesmaller
weight
of
data:
of
egg
came
from
much
smaller
(-1.539,
52.53)
52.66)
(-.656,52.86)
(n
5
10 (-1.001,
15
20
25 weights
30
40
50than
60 a 75
.376,53.00) (-.123, 53.04)
(.123,53.07)
distribution
that (.376,53.16)
was
1”?
Critica
(.656,53.23)
(1.001,53.26)
(1.539,53.50)
.832
.880
911
.929
.941
.949
.960 normal.
.966 .971 .976
approximately
lr
Calculate the correlation coefficient for these points.
r = .986
Transforming Data to Achieve
Normality
• When the data is not normal, it is common
to use a transformation of the data.
• For data that shows strong positive
skewness (long upper tail), a logarithmic
transformation usually applied.
• Square root, cube root, and other
transformations can also be applied to the
data to determine which transformation
best normalizes the data.
Consider the data set in Table 7.4 (page 463)
about plasma and urinary AGT levels.
A histogram of the
urinary AGT levels
is strongly
positively skewed.
A logarithmic
transformation is
applied to the
data. The
histogram of the
log urinary AGT
levels is more
Using the Normal Distribution to
Approximate
Discrete
Suppose this a
bar
is centered at x = 6.
The bar actually begins at 5.5 and ends
Distribution
at 6.5.theTheses
endpoints
will be used
Suppose
probability
distribution
of a
Often,
a probability
histogram
can in
bethe
in variable
calculations.
discrete
random
x is displayed
well approximated by a normal curve. If
histogram below.
so, it is
customary
to of
saya that
x has an
The
probability
particular
This
is called a continuity
correction.
approximately
normal
distribution.
value is the area of the rectangle
centered at that value.
6
Normal Approximation to a
Binomial Distribution
Let x be a random variable based on n trials
and success probability p, so that:
m  np
s  np (1  p )
If n and p are such that:
np > 10 and n (1 – p) > 10
then x has an approximately normal
distribution.
Premature babies are born before 37 weeks,
and those born before 34 weeks are most at
risk. A study reported that 2% of births in the
United States occur before 34 weeks.
Suppose that 1000 births are randomly
selected and that the number of these births
that occurred prior to 34 weeks,Since
x, is both
to beare
determined.
greater than 10,
the distribution
Can the distribution
of x be of x
can be
np = 1000(.02)
= 20 > 10
approximated
by a normal
approximated by a
distribution?
n(1 – p) = 1000(.98) = 980 > 10
normal distribution
Find the mean and standard deviation for the
approximated
m  np  1000normal
(.02)  distribution.
20
s  np (1  p )  1000(.02)(.98)  4.427
Premature Babies Continued . . .
m = 20 and s = 4.427
What is the probability
that
the number of
Look up
these
babies in the sample
born prior to 34
valuesofin1000
the table
weeks will be between
10 andthe
25 (inclusive)?
and subtract
To
find
the
shaded
probabilities.
standardize
= .8836
P(10 < x < 25) =.8925 - .0089area,
the endpoints.
a* 
9.5  20
 2.37
4.427
b* 
25.5  20
 1.24
4.427
Image for Question 9
Images for Question 10
Download