Social Science Reasoning Using Statistics

advertisement
Tuesday, September 3, 2013
Probability & the Normal
Distribution
First, some loose ends from
last time
Other standardized distributions
29
=57
=14
43
57
85
Other standardized distributions
29
-2
57
0
43
-1
Original (X): =57
Z-Scores: =0
1
=14
=1
85
2
Other standardized distributions
85
29 43 57
0 1
2
-2
-1
30 40 50 60 70
Original (X): =57 =14
Z-Scores: =0 =1
Standardized: =50 =10
In-Class Exercise:
1. Find the standard deviation for the following
population of scores: 1,3,4,4,5,7,9
2. Find the standard deviation for the following sample of
scores: 1,2,2,3,9,10
3. For a distribution with µ=40 and =12, find the z-score
for each of the following scores: a. X=36 b. X=46 c. X=56
4. A population with a mean of µ=44 and a standard
deviation of =6 is standardized to create a new
distribution of with µ=50 and =10.
a. What is the new value for an original score of X=47?
b. If the new score is 65, what was the original score?
In-Class Exercise:
1. Find the standard deviation for the following
population of scores: 1,3,4,4,5,7,9
2. Find the standard deviation for the following sample of
scores: 1,2,2,3,9,10
3. For a distribution with µ=40 and =12, find the z-score
for each of the following scores: a. X=36 b. X=46 c. X=56
4. A population with a mean of µ=44 and a standard
deviation of =6 is standardized to create a new
distribution of with µ=50 and =10.
a. What is the new value for an original score of X=47?
b. If the new score is 65, what was the original score?
Use z scores to relate the two
distributions to each other
Original Distribution: µ=44, =6
Standardized distribution: µ=50, =10
a. What is the new (standardized) value for an original score of X=47?
Z = (X-μ)/ = (47-44)/6 = +0.5
X = Z  + μ = 0.5*10 + 50 = 55
a. If the new (standardized) score is 65, what was the original score?
Z = (X-μ)/ = (65-50)/10 = +1.5
X = Z  + μ = 1.5*6 + 44 = 53
In-Class Exercise:
1. Find the standard deviation for the following
population of scores: 1,3,4,4,5,7,9
2. Find the standard deviation for the following sample of
scores: 1,2,2,3,9,10
3. For a distribution with µ=40 and =12, find the z-score
for each of the following scores: a. X=36 b. X=46 c. X=56
4. A population with a mean of µ=44 and a standard
deviation of =6 is standardized to create a new
distribution of with µ=50 and =10.
a. What is the new value for an original score of X=47?
b. If the new score is 65, what was the original score?
Use z=(x-μ)/
µ=40 and =12,
X=36
Z = (36-40)/12 = -4/12 = -0.33
X=46
Z = (46-40)/12 = 6/12 = +0.5
X=56
Z = (56-40)/12 = 16/12 = +1.33
In-Class Exercise:
1. Find the standard deviation for the following
population of scores: 1,3,4,4,5,7,9
2. Find the standard deviation for the following sample of
scores: 1,2,2,3,9,10
3. For a distribution with µ=40 and =12, find the z-score
for each of the following scores: a. X=36 b. X=46 c. X=56
4. A population with a mean of µ=44 and a standard
deviation of =6 is standardized to create a new
distribution of with µ=50 and =10.
a. What is the new value for an original score of X=47?
b. If the new score is 65, what was the original score?
Use formula for s (sample SD): s=
s =
2
å( X - M)
n -1
1,2,2,3,9,10
M = 27/6 = 4.5
X-M = -3.5, -2.5, -2.5, -1.5, 4.5, 5.5
(X-M)2 = 12.25, 6.25, 6.25, 2.25, 20.25, 30.25
 (X-M)2 = 77.5
s2=77.5/(n-1) = 77.5/5 = 15.5
s = 3.94
Or use Excel stdev.s command
2
Today: Probability & the Normal
Distribution
Any questions from last
time?
Topics for today
• Review of probability (Chapter 6)
• Binomial Distribution
• Normal Distribution
Basics of Probability
Possible successful outcomes
Probability =
All possible outcomes
• Probability
– Expected relative frequency of a particular outcome, in a situation in
which several different outcomes are possible
• Outcome
– Could be the result of a coin toss or experiment, could be obtaining
a particular score on a variable of interest
Flipping a coin example
What are the odds of getting a “heads”?
Possible successful outcomes
n = 1 flip
Probability =
All possible outcomes
One outcome classified as heads
Total of two outcomes
=
1
2
= 0.5
Flipping a coin example
n=2
Number of heads
2
1
1
What are the odds of
getting two
“heads”?
One 2 “heads”
outcome
Four total
outcomes
= 0.25
0
This situation is known as the binomial
# of outcomes = 2n
Flipping a coin example
n=2
Number of heads
2
1
1
0
What are the odds of
getting “at least one
heads”?
Three “at least one
heads” outcome
Four total
outcomes
= 0.75
Flipping a coin example
n=3
3=
n
=
2
2
8 total outcomes
HHH
Number of heads
3
HHT
2
HTH
2
HTT
1
THH
2
THT
1
TTH
1
TTT
0
HHH
HHH
HHH
HHH
HHH
5
3H = 5
HHT
HHT
HHT
HHT
HHT
5
2H = 11
HTH
HTH
HTH
3
THH
THH
THH
3
HTT
0
THT
THT
2
TTH
TTH
2
TTT
TTT
TTT
TTT
TTT
TTT
6
1H = 4
0H = 6
Connection between probabilities & graphs
• We usually have a population of scores that
can be displayed in a graph (such as a
histogram)
• Each portion of the graph represents a
different proportion of the population
• The proportion is equivalent to the probability
of obtaining an individual in that portion of the
graph
Example
Population with the following scores:
1,1,2,3,3,4,4,4,5,6
4
3
3
2
2
2
1
1
1
1
5
6
0
1
2
3
4
Example
• What is the probability of obtaining a score
greater than 4?
• p(X>4) = ?
Possible successful outcomes
Probability =
All possible outcomes
4
3
3
2
2
2
1
1
1
1
5
6
0
1
2
3
4
2
p(X > 4) =
= .2
10
Example
Find the following probabilities:
• p(X>2) = ?
• p(X>5) = ?
• P(X<3) = ?
Possible successful outcomes
Probability =
All possible outcomes
4
3
3
2
2
2
1
1
1
1
5
6
0
1
2
3
4
Check your understanding
• We are about to look at the normal
distribution and see how probability concepts
are related to this specific distribution.
• Before we move on, any questions about
probability, how to compute it, how it is
related to frequency graphs, etc.?
The Normal Distribution
• Normal distribution
The Normal Distribution
• Normal distribution is a commonly found distribution
that is symmetrical and unimodal.
– Not all unimodal, symmetrical curves are Normal, so
1
be careful with your descriptions
e -(X -m ) / 2s
• It is defined by the following equation:
2ps 2
• The mean, median, and mode are all equal for this
distribution.
2
-2
-1
0
1
2
2
The Normal Distribution
This equation provides x and y coordinates on the
graph of the frequency distribution. You can plug a
given value of x into the formula to find the
corresponding y coordinate. Since the function
describes a symmetrical curve, note that the same y
(height) is given by two values of x (representing two
scores an equal distance above and below the mean)
Y =
-2
-1
0
1
2
1
2ps
2
e
-(X -m ) 2 / 2s 2
The Normal Distribution
As the distance between the observed score (x) and
the mean increases, the value of the expression (i.e.,
the y coordinate) decreases. Thus the frequency of
observed scores that are very high or very low
relative to the mean, is low, and as the difference
between the observed score and the mean gets very
large, the frequency approaches 0.
Y =
-2
-1
0
1
2
1
2ps
2
e
-(X -m ) 2 / 2s 2
The Normal Distribution
• As the distance between the observed score (x) and
the mean decreases (i.e., as the observed value
approaches the mean), the value of the expression
(i.e., the y coordinate) increases.
• The maximum value of y (i.e., the mode, or the peak
in the curve) is reached when the observed score
equals the mean – hence mean equals mode.
1
Y =
-2
-1
0
1
2
2ps
2
e
-(X -m ) 2 / 2s 2
The Normal Distribution
• The integral of the function gives the area under
the curve (remember this if you took calculus?)
• The distribution is asymptotic, meaning that there is
no closed solution for the integral.
• It is possible to calculate the proportion of the area
under the curve represented by a range of x values
(e.g., for x values between -1 and 1).
1
Y =
-2
-1
0
1
2
2ps
2
e
-(X -m ) 2 / 2s 2
Check your understanding
• Next we will see how probability concepts are
related to the normal distribution, by learning
about the Unit Normal Table.
• Before we move on, any questions about the
properties of the normal distribution?
The Unit Normal Table (Appendix B)
The normal distribution is often transformed into z-scores.
z
0
:
:
0.5
:
:
1.0
:
:
2.8
2.9
Body
Tail
0.5000 0.5000
:
:
:
:
0.6915 .3085
:
:
:
:
.8413 .1587
:
:
:
:
.9974 .0026
.9981 .0019
• Unit Normal Table gives the precise
proportion of scores (in z-scores)
between the mean (Z score of 0) and any
other Z score in a Normal distribution
• Contains the proportions in the tail to the
left of corresponding z-scores of a
Normal distribution
• This means that the table lists
only positive Z scores
• Note that for z=0 (i.e., at the mean), the
proportion of scores to the left is .5
Hence, mean=median.
Using the Unit Normal Table
z
0
:
:
0.5
:
:
1.0
:
:
2.8
2.9
Body
Tail
0.5000 0.5000
:
:
:
:
0.6915 .3085
:
:
:
:
.8413 .1587
:
:
:
:
.9974 .0026
.9981 .0019
50%-34%-14% rule
Similar to the 68%-95%-99% rule
34.13%
13.59%
-2
-1
0
1
2.28%
2
At z = +1: 15.87% (13.59% and 2.28%)
of the scores are to the right of the score
100%-15.87% = 84.13% to the left
Using the Unit Normal Table
z
0
:
:
0.5
:
:
1.0
:
:
2.8
2.9
Body
Tail
0.5000 0.5000
:
:
:
:
0.6915 .3085
:
:
:
:
.8413 .1587
:
:
:
:
.9974 .0026
.9981 .0019
• Steps for figuring the
percentage above or below a
particular raw or Z score:
1. Convert raw score to Z score
(if necessary)
2. Draw normal curve, where the
Z score falls on it, shade in the
area for which you are finding
the percentage
3. Make rough estimate of
shaded area’s percentage
(using 50%-34%-14% rule)
Using the Unit Normal Table
z
0
:
:
0.5
:
:
1.0
:
:
2.8
2.9
Body
Tail
0.5000 0.5000
:
:
:
:
0.6915 .3085
:
:
:
:
.8413 .1587
:
:
:
:
.9974 .0026
.9981 .0019
• Steps for figuring the
percentage above or below a
particular raw or Z score:
4. Find exact percentage using unit
normal table
5. If needed, subtract percentage from
100%.
6. Check the exact percentage is within
the range of the estimate from Step 3
SAT Example problems
• The population parameters for the SAT are:
 = 500,  = 100, and it is Normally distributed
Suppose that you got a 630 on the SAT. What percent of
the people who take the SAT get your score or lower?
z=
X -m
s
=
630 - 500
From the table:
=1.3
100
z(1.3) =.9032
So 90.32% got your
That’s 9.68%
score or lower
above this score
Check your understanding
• Next we will see how
• Before we move on, any
to figure out a z score if
questions about the
you know the percentile
connection between
probabilities and
distributions?
• Questions about using
the unit normal table to
find the % of a
distribution falling above
or below a z score?
The Normal Distribution
• You can go in the other direction too
– Steps for figuring Z scores and raw scores from percentages:
1. Draw normal curve, shade in approximate area for the
percentage (using the 50%-34%-14% rule)
2. Make rough estimate of the Z score where the shaded
area starts
3. Find the exact Z score using the unit normal table
4. Check that your Z score is similar to the rough estimate
from Step 2
5. If you want to find a raw score, change it from the Z score
The Normal Distribution
Example: What z score is at the 75th percentile (at or above 75%
of the scores)?
1. Draw normal curve, shade in approximate area for the % (use the
50%-34%-14% rule)
2. Make rough estimate of the Z score where the shaded area starts
(between .5 and 1)
3. Find the exact Z score using the unit normal table (a little less than .7)
4. Check that your Z score is similar to the rough estimate from Step 2
5. If you want to find a raw score, change it from the Z score using mean
and standard deviation info.
The Normal Distribution
Finding the proportion of scores falling between two
observed scores
1.
2.
3.
4.
5.
Convert each score to a z score
Draw a graph of the normal distribution and shade out the area to be
identified.
Identify the area below the highest z score using the unit normal
table.
Identify the area below the lowest z score using the unit normal table.
Subtract step 4 from step 3. This is the proportion of scores that falls
between the two observed scores.
-2
-1
0
1
2
The Normal Distribution
-2 -1 0
1
2
Example: What proportion of scores falls between the mean and .2
standard deviations above the mean?
1.
2.
Convert each score to a z score (mean = 0, other score = .2)
Draw a graph of the normal distribution and shade out the area to be
identified.
3. Identify the area below the highest z score using the unit normal table:
For z=.2, the proportion to the left = .5793
4. Identify the area below the lowest z score using the unit normal table.
For z=0, the proportion to the left = .5
5. Subtract step 4 from step 3:
.5793 - .5 = .0793
About 8% of the observations fall between the mean and .2 SD.
The Normal Distribution
-2 -1 0
1
2
Example 2: What proportion of scores falls between -.2 standard
deviations and -.6 standard deviations?
1.
2.
3.
Convert each score to a z score (-.2 and -.6)
Draw a graph of the normal distribution and shade out the area to be
identified.
Identify the area below the highest z score using the unit normal table:
For z=-.2, the proportion to the left = 1 - .5793 = .4207
4.
Identify the area below the lowest z score using the unit normal table.
For z=-.6, the proportion to the left = 1 - .7257 = .2743
5.
Subtract step 4 from step 3:
.4207 - .2743 = .1464
About 15% of the observations fall between -.2 and -.6 SD.
Check your understanding
• Next we will see how
the shape of the
binomial distribution is
similar to that of the
normal distribution.
• Before we move on, any
questions about use of
the unit normal table?
Flipping a coin example
3=
n
=
2
2
8 total outcomes
HHH
Number of heads
3
HHT
2
HTH
2
HTT
1
THH
2
THT
1
TTH
1
TTT
0
Flipping a coin example
Number of heads
3
Distribution of possible outcomes
probability
(n = 3 flips)
.4
.3
.2
.1 .125
.375 .375 .125
0 1 2 3
Number of heads
2
X
f
p
3
1
.125
2
2
1
3
3
.375
.375
1
0
1
.125
1
2
1
0
Flipping a coin example
Distribution of possible outcomes
probability
(n = 3 flips)
.4
.3
.2
.1 .125
.375 .375 .125
0 1 2 3
Number of heads
Can make predictions about
likelihood of outcomes based on
this distribution.
What’s the probability of
flipping three heads in a
row?
p = 0.125
Flipping a coin example
Distribution of possible outcomes
probability
(n = 3 flips)
.4
.3
.2
.1 .125
.375 .375 .125
0 1 2 3
Number of heads
Can make predictions about
likelihood of outcomes based on
this distribution.
What’s the probability of
flipping at least two heads
in three tosses?
p = 0.375 + 0.125 = 0.50
Flipping a coin example
Distribution of possible outcomes
probability
(n = 3 flips)
.4
.3
.2
.1 .125
.375 .375 .125
0 1 2 3
Number of heads
Can make predictions about
likelihood of outcomes based on
this distribution.
What’s the probability of
flipping all heads or all tails
in three tosses?
p = 0.125 + 0.125 = 0.25
Binomial Distribution
•
•
•
•
•
•
•
Two categories of outcomes (A, B) (e.g., coin toss)
p=p(A) = Probability of A (e.g., Heads)
q=p(B) = Probability of B (e.g., Tails)
p + q = 1.0 (e.g., .5 + .5 – could be different values)
n = number of observations (e.g., coin tosses)
X = number of times category A occurs in a sample
If pn > 10 and qn > 10, X follows a nearly normal
distribution with μ = pn and σ = npq
HHH
HHH
HHH
HHH
HHH
5
3H = 5
HHT
HHT
HHT
HHT
HHT
5
2H = 11
HTH
HTH
HTH
3
THH
THH
THH
3
HTT
0
THT
THT
2
TTH
TTH
2
TTT
TTT
TTT
TTT
TTT
TTT
6
1H = 4
0H = 6
11
10
9
8
7
6
5
4
3
2
1
0
11
6
5
3 Heads
4
2 Heads
1 Heads
0 Heads
Download