Today’s Agenda: - More examples with probability - The normal distribution

advertisement
Today’s Agenda:
-
More examples with probability
The normal distribution
Probability and the normal distribution
Z scores
You should be about halfway through Chapter 5 by now.
(For assignment as well)
The probability rules.
Probability = ways event occurs / ways all
events occur
Probability is always between zero and one.
Zero probability means impossible.
(Never happens)
One probability means certain.
(Always happens)
Converse Rule:
Pr(Not A) = 1 – Pr(A)
Addition Rule:
Pr(A or B) = Pr(A) + Pr(B)
When A and B never happen together.
Multiplication Rule:
Pr(A and B) = Pr(A) x Pr(B)
When A and B are independent.
Say that someone charged with a gun offence is…
Convicted of Gun offence (G) with probability .52 ,
convicted of a Lesser offence (L) with probability .26 and
found Not guilty (N) with probability .22.
So Pr(G) = .52, Pr(L) = .26, Pr(N) = .22
Also, only one decision can be made, so these events are
mutually exclusive.
We’re assuming that Guilty, Lesser charge, and
Not guilty are the only options. Getting one of
these is certain.
So Pr(G, L, or N) = 1
As a check:
Pr(G, L, or N)
= Pr(G) + Pr(L) + Pr(N)
= .52 + .26 + .22 = 1.00
We want the probability of being convicted on a
gun or lesser charge; Pr(G or L)
Method one: Addition rule
Pr(G or L) = P(G) + Pr(L)
= .52 + .26 = .78
We want the probability of being convicted on a
gun or lesser charge; Pr(G or L)
Method two: Converse rule
Convicted of any offence is the opposite of being not
guilty.
So we could write (G or L) as Pr( Not N)
Pr(Not N) = 1 – Pr(N) = 1 - .22 = .78
We expect to see the same answer both ways. There’s
more than one way to get the right solution.
If two people are charged in separate trials, we may be
interested in knowing the probability that neither are
convicted.
We want: Pr(N1 and N2)
N1 is Person 1 is not guilty, N2 means Person 2 is not guilty.
Trials are independent; one conviction doesn’t impact the
other.
By multiplication rule:
Pr(N1 and N2)
= Pr(N1) x Pr(N2)
= .22 x .22
= .0484
Another way to write Pr(N1 and N2) would be
Pr( 2 Not Guilty verdicts out of 2)
These two are equivalent.
What about Pr(1 Not Guilty out of 2),
or Pr(0 Not Guilty out of 2) ?
For the sake of simpler math, let’s use coin flips instead,
With Heads = Not Guilty
And Tails = Everything Else
Using the two verdicts as an analogy, we know
Pr(2 Heads of 2) = Pr(H) x Pr(H) = ½ x ½ = ¼
And likewise…
Pr(0 Heads of 2) = Pr(T) x Pr(T) = ½ x ½ = ¼
But Pr(1 Head of 2) is not Pr(T) x Pr(H)…
Pr(T) x Pr(H) = ¼, but it’s the only other possibly and we only
have ¾ probability in total.
Pr(1 Head of 2) actually covers two situations..
Head, then Tails, with probability 1/4
Tails, then Head, with probability 1/4
… for a total chance of 1/2.
Pr(0 heads) = 0.25
Pr(1 head ) = 0.5
Pr(2 heads) = 0.25
Remember this curve? This is the normal curve.
μ, pronounced ‘mu’ is the mean for normals
σ, pronounced ‘sigma’ is the standard deviation for normal
μ + 2σ refers to the point two standard devs above the mean
If data follows the normal curve,
about 2/3 of the data is within 1 standard deviation,
95% of the data is within 2 standard deviations.
2/3 and 95% are proportions, or ratios between a part of a
group and that group as a whole.
Proportions are useful because they also imply probability.
If 2/3 of the data is within 1sd, then if I pick a point at random
from that distribution…
… there is a 2/3 chance that it will be within 1 standard
deviation.
Example: Reading scores
Grade 5s reading scores are normally distributed with mean
120 and standard deviation 25.
Pick a grade 5 student at random…
You have a 95% chance of getting one with a reading score
between 70 and 170.
Example: Reading scores 2
The normal distribution is symmetric, it’s the same on both
sides of the mean/median.
So the chance of picking a grade 5 with reading score 120 or
more: 0.5
We can combine these rules and get other ranges.
There is a 2/3 chance (68% if you care about the details) a data
point will be between -1σ and +1σ.
The area above the mean is the same as the area below the
mean…
What if we wanted the probability of in both of these ranges?
By symmetry, half of that 68% is above the mean, so..
…we find half of 68%, or 34%.
Pr( value is between mean and 1sd above mean) = 0.34
But what if we wanted…
Pr(Value is less than μ + 1.28σ)
…or something equally awful looking.
Bad news: The formula for find most things from the normal
table is so hard it can’t be written on paper.
(No seriously, it’s called ‘no closed form’)
Good news: We have a table that does most of the work for us.
11th edition: Appendix C, Table A, Pages 513 – 516.
Early editions: Look for “Percentage of Area under the Normal
Curve” table.
No book: Search online for “Standard Normal Table”, look in
images. (This one will look different from the textbook one,
likely)
The first page looks like:
z
.00
.01
.02
.03
.04
.05
.06
Area between Area beyond z
Mean and z
.00
50.00
.40
49.60
.80
49.20
1.20
48.80
1.60
48.40
1.99
48.01
2.39
47.61
Z, the z score, or the standard score, is the number of standard
deviations above or below the mean.
Recall: Pr(value between mean and mean+1sd) = 0.34 = 34%
By the table…
z Area between Mean
and z
…
…
.99
33.89
Area beyond z
1.00
34.13
…
16.11
15.87
1.01
34.38
15.62
… it’s actually 34.13%.
z Area between Mean
and z
…
…
.99
33.89
1.00
34.13
Area beyond z
…
16.11
15.87
We can also use the standard normal table to find the area
past a certain point instead of between the mean and a point.
z Area between Mean
and z
0.00
.00
.01
.02
0.40
0.80
Pr( value 0sd above mean or more) = .50 = 50%
Area beyond z
50.00
49.60
49.20
z Area between Mean
and z
0.00
.00
.01
.02
A z-score of zero is right at the mean.
0.40
0.80
Area beyond z
50.00
49.60
49.20
Finally, to find
Pr(Value is less than μ + 1.28σ)
we find the area between μ and μ + 1.28σ…
z Area between Mean
and z
…
…
1.27
39.80
Area beyond z
1.28
39.97
…
10.20
10.03
…
…
…
…and add 50% for the lower half that wasn’t counted in the
table.
We could have also used the probability that the value was
NOT less than μ + 1.28σ and the converse rule.
z Area between Mean
and z
…
…
39.97
1.28
1 – 0.10 = 0.90
or 100% - 10% = 90%
Area beyond z
…
10.03
Next Monday, more on the z-table, and the relationship
between z-scores and raw scores.
Keep reading Chapter 5.
For the bookless, stick around I’ll show you your table.
Download