Probability of

advertisement
Probability, Sampling, and Inference
Q560: Experimental Methods in Cognitive Science
Lecture 5
What is Probability?
Relationship between samples and populations:
Used to predict what kind of samples are likely to
be obtained from a population
Defining Probability
Probability = proportion of outcome.
Given an outcome A:
Probability of A =
Number of outcomes classed as A
Examples:
coin toss
deck of cards
Total number of outcomes
Probability Notation
Probability of outcome A = p(A)
Examples:
Probability of “king” = p(king) = 4/52.
Probabilities can be expressed as
fractions, decimals, or as percentages.
4/52 = 0.0769 = 7.69%
Probability and Random Sampling
For a random sample these two conditions must
be met:
1. Each individual has an equal chance of being
selected.
2. If more than one individual is selected, there
must be constant probability for each selection.
(requires sampling with replacement)
Explanation…
Location of Scores in a Distribution
X values are transformed into z-scores, such
that …
1. The sign (+, -) indicates location above or
below the mean.
2. The number indicates distance from the
mean in terms of the number of standard
deviations.
IQ scores: =100, =10
z=
=
z=
X-

X-

deviation score
standard deviation
X =  + z
Standardizing a Distribution
What effects does this z-score transformation
have on the original distribution?
1. Shape: stays the same! Individual scores do
not change position.
2. Mean: z-score distribution mean is always zero!
3. Standard deviation: z-score distribution
standard deviation is always 1!
z-Score transformation is like re-labelling the x-axis
…
Standardizing a Distribution
Let’s do a z-score transformation: X: 0, 6, 5, 2, 3, 2
X
0
6
5
2
3
2
SX =18
X-μ
X2
SX 18
m=
= =3
n
6
Standardizing a Distribution
Let’s do a z-score transformation: X: 0, 6, 5, 2, 3, 2
X
X-μ
0
6
5
2
3
2
-3
3
2
-1
0
-1
X2
S(X - m ) = 0
SX 18
m=
= =3
n
6
Standardizing a Distribution
Let’s do a z-score transformation: X: 0, 6, 5, 2, 3, 2
X
X-μ
X2
0
6
5
2
3
2
-3
3
2
-1
0
-1
0
36
25
4
9
4
SX =18
SX 2 = 78
2
(SX)
SS = SX 2 N
182
= 78 = 24
6
SS
s=
= 4 =2
N
Standardizing a Distribution
Let’s do a z-score transformation: X: 0, 6, 5, 2, 3, 2
X
X-μ
X2
0
6
5
2
3
2
-3
3
2
-1
0
-1
0
36
25
4
9
4
μ=3
σ=2
z
z=
X -m
s
Standardizing a Distribution
Let’s do a z-score transformation: X: 0, 6, 5, 2, 3, 2
X
X-μ
X2
z
0
6
5
2
3
2
-3
3
2
-1
0
-1
0
36
25
4
9
4
-1.5
1.5
1
-0.5
0
-0.5
μ=3
σ=2
z=
X -m
s
Standardizing a Distribution
Let’s draw frequency distribution graphs:
Probability and Frequency Graphs
Example: For the population of scores shown below, what is the
probability in a random draw of obtaining a score greater than
4?
p(X>4) =
The Normal Distribution
Diagram:
The Normal Distribution
Proportions of areas within the normal distribution
can be quantified using z-scores:
The Normal Distribution
Note: The normal distribution is symmetrical.
This means that the proportions on both sides of
the mean are identical.
Note: All normal distributions have the same
proportions.
This allows us to solve problems like
the following:
Body height has a normal distribution, with = 68,
and = 6. If we select one person at random,
what is the probability for selecting a person taller
than 80?
The Normal Distribution
A graphical representation of the same problem:
The Unit Normal Table
Given the standard proportions of normal
distributions we can give probabilities for z-scores
with whole number values.
But what about fractional z-scores?
That’s what the unit normal table is all about …
Or, plenty of online calculators:
http://www.stat.tamu.edu/~west/applets/normaldemo.html
The Unit Normal Table
How the table is organized:
Things to remember when using
the unit normal table:
1. Symmetrical (only positive z-scores are
tabulated).
2. Proportions are always positive.
3. Section > 50% = “body”
4. Section < 50% = “tail”
5. Body+tail = 1.00 (100%).
In a graph:
“area greater than” = “area to the right of”
“area smaller than” = “area to the left of”
From Specific Scores to The
Unit Normal Table
You are asked a probability associated with a specific
X value (as opposed to a z-score).
Example:
For a normal distribution with =500 and =100,
give the probability of selecting an individual
whose score is above 650.
(= proportion of individuals with a score above 650.)
Procedure to do this: …
From Specific Scores to The
Unit Normal Table
Follow this procedure:
1. Make a rough sketch ( and ).
2. Locate and mark specific score X.
3. Shade appropriate proportion.
4. Transform X value into z-score.
5. Look up value for proportion in
unit normal table (using z-score).
Probability from the Unit Normal
The math section of the SAT has a  = 500 and
 = 100. If you selected a person at random:
a) What is the probability he would have a score
greater than 650?
b) What is the probability he would have a score
between 400 and 500?
The Binomial Distribution
The Binomial Distribution
“binomial” = “two names”
Variable exists in two categories only…
heads – tails
true – false
Probabilities for each outcome are often
known…
p(heads) = 0.5
p(tails) = 0.5
Question of interest: how often does an
outcome occur in a sample of observations.
The Binomial Distribution
Notation:
1. Two categories: A, B
2. Probabilities: p = p(A), q = p(B). Note: p+q =
1.00.
3. Number of observations in the sample: n
4. Variable X is number of times that A occurs in
the sample. Note: X ranges between 0 and n.
The binomial distribution shows the probability
associated with each value X from X=0 to
X=n.
The Binomial Distribution
Table of outcomes:
X = Number of heads.
Toss 1
Toss 2
X
Heads
Heads
Tails
Tails
Heads
Tails
Heads
Tails
2
1
1
0
p(X=2) = ¼
p(X=1) = ½
p(X=0) = ¼
The Binomial Distribution
Draw the binomial distribution:
Class experiment:
Toss a coin 16 times, count the
number of heads.
Shape of the binomial distribution
for large numbers of trials:
n=2
n=8
n=16
n=64
The Binomial Distribution
The binomial distribution tends to approximate
the normal distribution, as n gets large, or more
precisely, as pn and qn are greater than 10.
Then the normal distribution will
have approximately:
 = pn
 = npq
This means that, given p, q and n, we can directly
derive z-scores:
X – pn
z=
npq
The Binomial Distribution
An example graph:
Using a balanced coin,
what is the probability of
obtaining more than 30
heads in 50 tosses?
The Binomial Distribution
p = 0.5 q = 0.5
n = 50 X = 30
m = pn
= 0.5(50)
= 25
s = npq
= 50(.5)(.5)
= 3.54
z=
X -m
s
30 - 25
=
3.54
= 1.41
Probability is .0793
The Binomial Distribution
A friend bets you that he can draw a king more than 8
times in 20 draws (with replacement) of a fair deck
of cards, and he does it. Is this a likely outcome, or
should you conclude that the deck is not “fair”
p = .077 q = .923
n = 20
X=8
The Binomial Distribution
p = .077 q = .923
n = 20
X=8
m = pn
= 0.077(20)
= 1.54
s = npq
= 20(.077)(.923)
= 1.19
z=
X -m
s
8 -1.54
=
1.19
= 5.43
Probability is ~0
The Binomial Distribution
Baby sea turtles hatch on land and have to
quickly make it to the ocean before they are
picked off by birds. A baby sea turtle has a 1/8
chance of making it to the water safely.
If a mother lays 100 eggs (and they all hatch),
what is the probability that more than half the
hatchlings making it to the ocean safely?
p = 0.125 q = 0.875
n = 100 X = 50
The Binomial Distribution
p = 0.125 q = 0.875
n = 100 X = 50
m = pn
= 0.125(100)
= 12.5
s = npq
= 100(.125)(.875)
= 3.31
z=
X -m
s
50 -12.5
=
3.31
= 11.33
Probability is close to zero
Statistical Significance
It is very unlikely to obtain an individual from the original
population who has a z-score beyond  1.96
Less that 5% of any population fit into this area under
the curve
Therefore, we will define an event as “unlikely due to
chance” or statistically significant if it has a less than
5% chance of occurrence in a normal population.
Our card magician was “unlikely” but our coin flip could
still be explained by chance (p not < .05)
Download