STAT 203 – Lecture 4-1. - The normal distribution is symmetric.

advertisement
STAT 203 – Lecture 4-1.
-
The normal distribution is symmetric.
Getting the probability from between two z-scores
Translating standard scores to and from raw scores.
Extreme values beyond the table.
So Majestic!
Text from last Friday:
Say a value X followed the normal distribution, with mean
(mu, pronounced ‘mew’) and standard deviation
μ
σ (sigma).
We used the z-table to find things like the probability that X is
greater than 1.28 standard deviations above the mean.
In other words, we found Pr( X > μ + 1.28σ)
μ + 1.28σ means a z-score of 1.28.
From the z-table, page 515
z Area between Mean
and z
…
…
1.27
39.80
Area beyond z
…
10.20
1.28
39.97
10.03
1.29
…
40.15
…
9.85
…
Since we’re looking at the values farther away from the mean
than the cutoff, we want the area beyond z.
Pr( X > μ + 1.28σ) = 10.03%, or about 10%
Can we find Pr( X > μ - 1.28σ) ?
Hint: Think symmetry.
We can find Pr( X > μ - 1.28σ)
Symmetry: The same on both sides.
We can find Pr( X > μ - 1.28σ)
Symmetry: The same on both sides.
What is the chance that this value, X, is more than 1 standard
deviation away the mean in either direction?
Start with Pr( X > μ + 1σ) ,
or, because it’s simpler to write:
Pr( Z > 1)
By the table (page 514)…
z Area between Mean
and z
…
…
.99
33.89
34.13
1.00
1.01
34.38
Area beyond z
…
16.11
15.87
15.62
Pr( Z > 1) = .16
Pr( Z > 1) = .16,
so Pr( Z < -1) = .16 also
Pr( Z > 1) + Pr(Z < -1) = .32
Not surprizing since Pr( -1 < Z < 1) = .68, .68 + .32 = 1.00
We could have done this the other way too:
Working backwards from
Pr( -1 < Z < 1) = .68
We could get by converse
Pr(Z < -1) + Pr(Z > 1) = .32
… and get by symmetry
Pr(Z > 1) = .16
One other thing to note is that Z = 0 right at the mean, because
the mean is 0 standard deviations above or below the mean.
Let’s try with some uglier z-scores.
Pr( -1.75 < Z < 0.52)
z Area between Mean
and z
…
…
0.51
19.50
Area beyond z
…
30.50
0.52
19.85
30.15
…
…
1.74
…
…
45.91
…
…
4.09
1.75
45.99
4.01
Doing the math…
Pr( -1.75 < Z < 0.52) can be split into two ranges using the
mean as the split point.
Pr( -1.75 < Z < 0 ) + Pr( 0 < Z < 0.52)
Why would we do this? Because the table has everything from
the mean.
Pr(-1.75 < Z < 0) = .4599
Pr(0 < Z < 0.52) = .1985
.4599 + .1985 = .6584
About 66% of the area.
Pic of the 66%
Z-scores, or standard scores, are a bridge between real
data and probabilities surrounding them.
We find z-scores with this (important!):
X is the value that we’re interested in. We usually want to
know the probability of getting a value below or above X.
X is also called the raw score, meaning we haven’t prepared
it for use at all. Raw as in ‘uncooked’.
μ is the mean, in most cases this will be given to you. Look for
clues like average, and centered around.
μ is the mean, in most cases this will be given to you. Look for
clues like average, and centered around.
σ is the standard deviation, in most cases it’s given or
computed from SPSS.
The Z-Score is the number of standard deviations above the
mean.
Z-Score is also called Standard Score.
Example problem:
The time spent on homework in hours/week for
full time students is normally distributed with
mean 25, and standard deviation = 7
What proportion of students spend more than
20 hours on homework?
Step 1: Identify – μ = 25, σ = 7, x = 20.
We want the proportion, which is like the
probability.
We know the distribution is normal.
These are clues to find the z-score / standard
score, and use it in the z-table to get the
proportion.
Step 2: Apply.
What do we want?!
Z
!!!!
What do we have?! μ = 25, σ = 7, x = 20.
!!!!
Use the formula that has Z on one side, and μ, σ, and x on
the other.
-0.71 isn’t on the table, but by symmetry, we can use 0.71.
By the table, 26.11% is between the mean and z=0.71
,23.89% is beyond z=0.71.
We want Pr( X > 20), which is Pr(Z > -0.71)…
Method 1: Split
Pr( Z > -0.71) = Pr( Z >0) + Pr(-0.71 < Z < 0)
= .5000 + .2611 = .7611
Method 2: Converse
Pr( Z > -0.71) = 1 – Pr(Z < -0.71) = 1 - .2389
= .7611
We can work backwards from a probability to get a value too,
with this: (also important)
This is the same formula as the z-score (standard score)
formula, but rearranged so that X is the value we get out of it.
Example problem:
Homework/week is normally distributed, μ = 25, σ = 7
What’s the minimum homework I can expect 90% of the class
to do?
In other words Pr(X > ??? ) = .9000
Step 1: Identify.
We have the proportion, and we want the value x.
Again, z-score is going to be our bridge.
Going X  Z  Prob, we used the table last.
Going Prob  Z  X, we’ll use the table first.
We want the Z value such that 10% of the area is beyond the
mean.
As z increases, the area beyond that value decreases.
Z
0.00
0.01
0.02
0.03
0.04
0.05
…
% Area Beyond
50.00
49.60
49.20
48.80
48.40
48.01
…
We can use that to find the Z-score with 10% beyond.
(Approximation may be needed)
Z
0.00
0.01
0.02
…
0.44
0.45
0.46
…
1.27
% Area Beyond
50.00
49.60
49.20
…
33.00
32.64
32.36
…
10.20
1.28
10.03
Now we know Pr( Z > 1.28) = 10.03%, that’s the closest z-score
to 10% in the table.
What do we want?!
X
!!!!
What do we have?! μ = 25, σ = 7, z = -1.28
!!!!
So 90% of the full-time students spend 16.04 hours or more on
homework.
What proportion of students spend more than 60 hours/week?
μ = 25, σ = 7, x = 60.
Now we have z = 5, how do we get Pr(Z > 5)?
The table only goes to z = 3.5ish.
Use inference: We want the area beyond z=5, and the area
shrinks as z goes up.
The smallest area is 0.01%, so the area beyond z=5 must be
smaller than that. That’s all we can tell from this table.
Fewer than 0.01% of students spend 60 hours/week on
homework.
(for interest)
Very few data points are going to be more than six standard
deviations above or below the mean. Far less than 0.01%
Six Sigma is a business practice based on making each part in a
machine consistent enough that it will work as long as it’s
within six standard deviations, or 6σ of the mean.
Next time:
- A few more notes on Z-scores
- Discuss Midterm
- We start chapter 6.
Download