6.11 – The Normal Distribution

advertisement
6.11 – The Normal Distribution
IB Math SL/HL Y1&Y2 - Santowski
(A) Random Variables
 Now we wish to combine some basic statistics with some
basic probability  we are interested in the numbers that
are associated with situations resulting from elements of
chance i.e. in the values of random variables

We also wish to know the probabilities with which these
random variables take in the range of their possible values
 i.e. their probability distributions
(A) Random Variables





So 2 definitions need to be clarified:
(i) a discrete random variable is a variable quantity which occurs
randomly in a given experiment and which can assume certain, well
defined values, usually integral  examples: number of bicycles sold
in a week, number of defective light bulbs in a shipment
discrete random variables involve a count
(ii) a continuous random variable is a variable quantity which occurs
randomly in a given experiment and which can assume all possible
values within a specified range  examples: the heights of men in a
basketball league, the volume of rainwater in a water tank in a month
continuous random variables involve a measure
(B) CLASSWORK


CLASSWORK: (to review the distinction
between the 2 types of random variables)
Math SL text, pg 710, Chap29A, Q1,2,3
 Math HL Text, p 728, Chap 30A, Q1,2,3
(C) The Normal Distribution


- data obtained by direct measurement (i.e. population
heights) is usually continuous rather than discrete (all
heights are possible, not just whole numbers
- continuous data also has statistical distributions and many
physical quantities are usually distributed symmetrically
and unimodally about the mean  statisticians observe
this bell shaped curve so often that its model is known as
the normal distribution
(C) The Normal Distribution

the graph of the normal distribution is also referred to as
the standard normal curve and one defining equation for
the curve for our purposes is
F z I
G
J
1
2 K
H
e
,  z  
2
f ( z) 

2
where z refers to a concept called the z score which takes
into account the mean and standard deviation of a set of
data
(C) The Normal Distribution

we can graph the normal distribution as
follows, where the x-axis is the number
of standard deviations, , from the
mean/median,  (the idea behind our z
score)

the total area under the curve is 1 unit
(aising from the fact that the total
probability of all outcomes of an event
can be at most 1 or 100%)

With our z-score, we “set” the mean,  ,
to be 0 and each 1 unit of the x-axis is
1 standard deviation, .
(C) The Normal Distribution



to find the area under the curve between
any two given z-scores, we can rely on
graphs
the area under the curve between our
two given z-scores means the
proportion of values between our two
z-scores
so if we write p(-2 < z < 1) = 0.81859,
we mean that the proportion of data
values that are between 2 standard
deviation units below the mean and 1
unit above is 0.81859, or as a
percentage: 81.859% of our data, or the
probability that our data values lie
between 2 SD’s below and 1 SD unit
above the mean is 0.0.81859  we can
illustrate this on a normal distribution
graph as follows:
(C) The Normal Distribution –
Tables of z scores





We can work out the previous example without a graph and shading areas
under a graph, by simply using prepared tables:
SL Math text, p735 and HL Math text, p772
So to determine the p(-2 < z < 1), we check the table and see that a z value of –
2.00 corresponds to a value of 0.0228  this means that the area shaded under
the curve, starting from –2.00 all the way left to - is 0.0288 (or 2.88% of the
data is more than 2 SD units below the mean)
Likewise, we check the table for our z value of 1.00 and see the value of
0.8413  this means that the area shaded under the curve, starting from 1.00
all the way left to - is 0.8413 (or 84.13% of the data is less than 1 SD units
above the mean)
So what do we do with the 2 numbers? Well, we have accounted for some of
the data twice  the data more than 2 SD units below the mean  so this gets
subtracted from the first value  0.8413 – 0.0288 = 0.8185 as we saw before
with the graph and graphing software
(D) Examples

Use the table to evaluate
p(z<1.5). Interpret the value.

The table gives us the value
0.9332, which means that
93.32% of our data lies 1.5 SD
units above the mean and below
 or the probability of getting
a random data point that is at
most 1.5 SD units above the
mean is 0.9332

We can see this illustrated on
the graph
(D) Example Using Standard Normal Tables

For the standard normal variable, find:
–
–
–

Some slightly more challenging examples:
–
–
–

(i) p(z > 1.7)
(ii) p(z < -0.88)
(iii) p(z > -1.53)
And now some in-between values:
–
–
–

(i) p(z < 1)
(ii) p(z < 0.96)
(iii) p(z < 0.03)
(i) p(1.7 < z < 2.5)
(ii) p(-1.12 < z < 0.67)
(iii) p(-2.45 < z < -0.08)
WE can also do some Ainverse@ problems
–
–
–
(i) p(z < a) = 0.5478
(ii) p(z > a) = 0.6
(iii) p(z < a) = 0.05
(E) Homework

SL Math text, Chap 29H.1, p736, Q1-4
 HL Math text, Chap30K.1, p757, Q2-5
(F) Standardizing Normal Distributions



When we have applications wherein we apply a normal
distribution (i.e. with any continuous R/V like height,
weight of people), each unique application has its own
unique mean and standard deviation along with its unique
distribution graph
What we wish to accomplish now  can we somehow
standardize a normal distribution so that one single
standardized normal distribution applies for every single
possible normal distribution
We can accomplish this by a combination of
transformations of our unique data with its unique normal
distribution
(F) Standardizing Normal Distributions

So from every data point in our distribution, we will
subtract the population’s mean and then divide this
difference by the population’s standard deviation  we
will call this result a “z”-score
 So our “formula” for this data transformation is z = (x )/
 So we then graph the newly transformed data points and
we get a standardized normal distribution curve
 The two key features on the standardized normal
distribution curve are (i) the mean is 0 and (ii) the standard
deviation is 1
(G) Graph of Standardized Normal Distribution
(H) Working with a Standardized
Normal Distribution
Ex 1  The heights of all rugby players
from India is normally distributed with a
mean of 179 cm with a standard deviation
of 5 cm. Find the probability that a
randomly selected player
 (i) was less than 181 cm tall
 (ii) was at least 177.5 cm tall
 (iii) was between 175 and 190 cm

(H) Working with a Standardized
Normal Distribution

Solution #1(i) is to use the zscore tables

z = (181-179)/5 = 0.40
So find 0.40 on the tables,
which is 0.6554
So given that the table gives us
the cumulative area under the
curve until the specified z-score
(0.40), then we can conclude
that 65.5% of the players would
be less than 181 cm



Alternatively, we can use a
GDC:

We simply select the
normalcdf( command and enter
the specifics as follows:
Normalcdf(-EE99,181,179,5)
which tells the GDC that you
want the heights less than 181
(basically from 181 down to ) and that the population mean
is 179 and the SD is 5
Our result is 0.6554 ….. similar
to the result from the table


(H) Working with a Standardized
Normal Distribution

Solution #1(ii)  use the z-score tables  however we must realize that the
table gives us a cumulative area under the curve up to the given z-score 
now however we are looking for a value GREATER than the given area

So, using the table, simply find the area under the curve BELOW the given zscore
Then, using the “complement” idea, simply subtract the area from 1
z-score = (177.5-179)/5 = -0.30
Table value is 0.4404 (so 44.04% of the area under the curve is to the left of –
0.30 on the z-axis)
Therefore, the area representing the probability of our players being
GREATER than 177.5 cm would be 1 – 0.4404 = 0.5596  (so this would be
the area under the curve, to the right of z = -0.30)





In using the GDC, we again simply enter the command normalcdf(177.5,
EE99, 179, 5) and get 0.5596 as our answer
(H) Working with a Standardized
Normal Distribution

Solution #1(iii)  use the z-score tables  however we must realize
that the table gives us a cumulative area under the curve up to the
given z-score  now however we are looking for a value BETWEEN
2 given values

So our two z-scores for 175 and 190 are z = –0.80 and z = 2.1, which
we can illustrate below
(H) Working with a Standardized
Normal Distribution






So, again our tables require several steps in the calculation
(i) find the area under the curve that is LESS THAN –0.80  0.2119
(ii) Now find the area under the curve that is less than 2.1  0.9821
So clearly, the 0.9821 total cumulative area includes the 0.2119 that we
DO NOT have within our specified range of z-scores (player heights
less than 175 cm)
Which suggests that we need to subtract the 0.2119 from 0.9821 =
0.7702
Alternatively, using the GDC, we enter normalcdf(175,190,179,5) and
get the same 0.7702…..
(I) Homework


HL Math text
Chap30K.2, p759, Q1-3
 Chap 30K.3, p760, Q1-4
 Chap 30L, p761, Q1-7


SL Math text
Chap 29H.2, p738, Q1-3
 Chap 29H.3, p739, Q1-3
 Chap 29I, p740, Q1-8
Download