File - Teaching psychology

advertisement
The Standard Normal Distribution;
Sampling Distribution
Seminar 5
An example of a very difficult question
Identify the variables
“Objects and events that are simultaneously
attended with one’s social group are subject to
more elaborative processing, are better
remembered, and are more readily internalized
through social learning. In contrast, none of
these effects are observed when jointly
experiencing an event with nongroup
members.”
Shteynberg et al. (2014). Feeling more together. Emotion.
Your midterm will be easier than this, but
more difficult than your tutorials.
Identify the variables
“Objects and events that are simultaneously
attended with one’s social group are subject to
more elaborative processing, are better
remembered, and are more readily internalized
through social learning. In contrast, none of
these effects are observed when jointly
experiencing an event with nongroup
members.”
Shteynberg et al. (2014). Feeling more together. Emotion.
STANDARD NORMAL DISTRIBUTION
Recap
• We learnt how normal distributions are
formed.
• We learnt about z-scores
• How are they related?
Today’s question
• Example: Ramadhar gets a 50 on his Statistics
midterm and an 50 on his Calculus midterm. Did he
do equally well on these two exams, compared to his
classmates?
• Big question: How can we compare a person’s score
on different variables?
Very similar concepts
Normal distribution
vs.
Standard(ized) normal distribution
The Normal Distribution
f ( x) 
1
 2
Note constants:
=3.14159
e=2.71828
1 x 2
 (
)
2

e
This is a bell shaped
curve with different
centers and spreads
depending on  and 
The Normal Distribution
It’s a probability density function
No matter what the values of  and , must integrate
to 1!



1
2
1 x 2
 (
)
 e 2  dx
1
15
Case 1
Statistics
Calculus
•Statistics: Ramadhar’s
exam score is 10 points
above the mean
10
•Calculus: Dave’s exam
score is 10 points below the
mean
0
5
•How can we interpret
Ramadhar’s grade relative
to the average performance
of the class, for each
course?
0
20406080
100
GRA DE
Mean Statistics = 40
Mean Calculus = 60
*Note*
It is wrong to interpret Ramadhar’s grade relative to the
performance of both courses combined
50⁰F
100⁰F
You can’t say the average temperature is 75⁰F, right?
0 5 10 15 20 25 30
Case 2
•Both distributions have the
same mean (40), but
different standard deviations
(10 vs. 20)
Statistics
•In one case, Ramadhar is
performing better than
almost 95% of the class. In
the other, he is performing
better than approximately
68% of the class.
Calculus
0
•Thus, how we evaluate
Ramadhar’s performance
depends on how much
variability there is in the
20406080
100
exam scores
GRA DE
Standard (Z) Scores
• We want to express a person’s score with respect
to both (a) the mean of the group and (b) the
variability of the scores
– how far a person is from the mean = X - M
– variability = SD
(Xi  M )
Standard score or Z i 
SD
** How far a person is from the mean, in the metric of
standard deviation units **
Case 1
15
Statistics:
Statistics
(50 - 40)/10 = +1
Calculus
10
one SD above the mean
Calculus:
5
(50 - 60)/10 = -1
0
one SD below the mean
0
20406080
100
Mean
Statistics = 40
GRA DE
Mean
Calculus = 60
0 5 10 15 20 25 30
Case 2
An example where the
means are identical, but
the two sets of scores
have different spreads
Statistics
Statistics Z-score:
(50 - 40)/5 = 2
Calculus
Calculus Z-score:
(50 - 40)/20 = .5
0
20406080
100
GRA DE
Three Properties of Standard Scores
1. The mean of a set of z-scores is always 0
2. The SD of a set of standardized scores is always 1
3. Shape of distribution for unstandardized and
standardized scores is identical.
STANDARDIZED
0
0.0
0.1
2
0.2
0.3
4
0.4
6
0.5
UNSTANDARDIZED
0.4
0.6
0.8
1.0
-6
-4
-2
0
2
Two advantages of standard scores
We can use standard scores to find (per)centile
scores: the proportion of people with scores less
than or equal to a particular score.
(per)centile scores ≠ z-scores
(per)centile scores ↔ z-scores
0. 0.1 0.2 0.3 0.4
The area under a normal curve
50%
34% 34%
14%
14%
2%
-4
2%
-2
0
2
4
S CORE
Two advantages of standard scores
Standard scores provides a way to standardize or
equate different metrics.
We can now interpret Ramadhar’s scores in Statistics and
Calculus on the same z-score metric. (Each score comes from a
distribution with the same mean [zero] and the same standard
deviation [1].)
Two disadvantages of standard scores
Because a person’s score is expressed relative to the
group (X - M), the same person (score) can have
different z-scores when assessed in different samples
Example: Ramadhar’s score depends on everyone else’s scores.
Two disadvantages of standard scores
If the absolute score (e.g., $, ₹, €) is meaningful or of
psychological interest (e.g., milliseconds), it will be
obscured by transforming it to a relative metric.
We will revisit this concept in Multiple Regression
(Week 11).
Z-scores  Percentile scores
What’s the probability of getting a math SAT score of 575 or
less,  = 500 and  = 50?
Z
575  500
 1.5
50
A score of 575 is 1.5 standard deviations above the mean.
575
 P( X  575) 
1
 (50)
200
2
1.5
1 x 500 2
 (
)
 e 2 50 dx 



1
2
1
 Z2
 e 2 dz
You don’t have to calculate this. Look up a z-score table.
Looking up probabilities in a
standard normal table
What is the area to the
left of Z=1.51 in a
standard normal
curve?
Z=1.51
Z=1.51
Area is 93.45%,
or 93.45% higher
than all scores.
What does it mean to get a sample from a population?
SAMPLING DISTRIBUTIONS
Recap: Sample vs Population
• Population – A group that includes all the
cases (individuals, objects, or groups) in which
the researcher is interested.
• Sample – A relatively small subset from a
population.
Today’s question
• Are the descriptive statistics we obtain from a
sample the same as the corresponding statistics in a
population?
• Of course it not! How different will they be?
• In other words, what is the error associated with
inferring parameters from a sample to a population?
Population inferences can be made...
...by selecting a representative sample from
the population
Why sample?
• Reduces cost of research (e.g. political polls)
• In some cases (e.g. industrial production)
analysis may be destructive
• Generalize findings to population
– Inference from sample to population: inferential
statistic
Features of good inference from samples
Random selection
Don’t confuse this with “random assignment”
Every member of the population has the same
chance of being selected in the sample
(Often violated in psychology…but is this necessarily a problem)
Henrich et al. (2010). The weirdest people in the world? Behav Brain Sci.
Features of good inference from samples
Large sample size, N
How large?
Central Limit Theorem: N > 30 (demonstration
on last slide)
Sampling distribution
• We take one sample:
(a) how much error we can expect on average and
(b) how much variation there will be on average in the errors
observed
• Sampling distribution: the distribution of a sample
statistic (e.g., a mean) when sampled under known
sampling conditions from a known population.
The real problem
• Often, we don’t know the parameters of a
population. (That’s why we sample!)
• The sample is an estimate of the population
(philosophically: an estimate of the truth)
Features of sampling distribution
In statistics, we are mostly concerned with the
M and SD.
• Mean of sampling distribution
μx̄ = μ
• Standard deviation (SD) of the sampling
distribution: standard error (SE)
σ
σ𝑥 =
𝑛 σ : sample standard deviation
SDs are “errors”.
SEs are essentially “errors” of “errors”
n: sample size
0
5
10
15
20
z
1500
0
0
0
200
500
500
400
1000
600
1000
800
2000
1500
From the previous two formulas…
0
5
10
15
20
z
0
5
10
15
20
z
“small” sample
“medium” sample
“large” sample
mean of sample
means = 10
mean of sample
means = 10
mean of sample
means = 10
SD of sample means =
4.16
SD of sample means =
2.41
SD of sample means =
0.87
Central Limit Theorem
“No matter what we are measuring, the distribution of
any measure across all possible samples is
approximately a normal distribution, as long as the
number of cases in each sample is about 30 or larger.”
What is the probability that
the ball will end up at…
CLT demonstration
Discussion: Why aren’t the small balls normally distributed?
Summary
• Z-scores are useful relative scores in some
cases
• A good sample is one that is random and
large.
• The sampling technique matters too (to be
taught in detail in SRM II)
Download