linear distributions

advertisement
I. Basic Terms
II. Univariate Descriptive Statistics
E. Three general characteristics of distributions
1. Central tendency
2. Variability
s
2
2
(
Y

Y
)


n 1
 (Y Y )
n 1
s

2
2
(Y  Y )2


N

 (Y  Y )
N
2
What does the standard deviation tell us?
For almost all distributions, the value of s (or ) is somewhere between the
smallest deviation from the mean and the largest deviation from the mean.
For “bell shaped” distributions, about 68% of the cases are within one standard
deviation of the distribution’s mean, about 95% of the cases are within two
standard deviations from the mean, and amost all of the cases are within 3
standard deviations from the mean.
3. Shape
a. Two distinctions
1) Unimodal vs. multimodal
1
2) Symmetric vs. asymmetric
b. Common shapes
1) Bell shaped distributions
2) Skewed distributions
3) Uniform distributions
4) U-shaped distributions
c. There are measures that describe how skewed a distribution is, how “flat” it
is, etc.
d. A few notes on how shape is related to the values of the mode, median, mean
and standard deviation
1) For unimodal and symmetric distributions, Mo=Md=Mean
2) For symmetric distributions, Md=Mean. In addition they are both at
the “center” of the distribution
3) For positively skewed distributions, Mo < Md < Mean.
For negatively skewed distributions, Mo > Md > Mean.
4) For variables measured in terms of a ratio scale, if the standard
deviation exceeds the mean, the distribution is positively skewed.
F. Characterizing scores in terms of the distributions of which they are members
1. Two common problems
a. What does a given score mean?
b. How can I combine scores from different distributions?
2. Two solutions for the first problem
a. Transform the score into a percentile rank.
The percentile rank of score X is the percentage of the cases in the
distribution that have scores lower than X.
2
Percentile ranks are easy to understand.
Percentile ranks usually distort differences between scores. In general,
they exaggerate small differences in regions of a distribution that have
high frequencies, and they minimize differences in regions that have low
frequencies.
b. Transform the score into a “standard score” (z-score)
z  Y  Y
Y
A z-score tells you how many standard deviations a given score is away
from the mean of the distribution. For scores above the mean, z-scores are
positive. For scores below the mean, z-scores are negative. For scores at
the mean, the z-score is zero.
Z-score distributions always have a mean of zero and a standard deviation
of zero.
The z-score is one kind of linear transformation of raw scores.
z  Y  1 (Y )
Y
Y
Linear transformations involve adding, subtracting, multiplying and/or
dividing all of the raw scores (Y) by one or more constants.
What happens when you add or subtract a constant from all of the scores
in a distribution?
What happens when you multiply or divide all of the scores in a
distribution by a constant?
A key property of linear transformations is that they do not change the
shape (bimodal, skewed, etc.) of the original distribution of raw scores.
As a result, they preserve information about relative distances between
pairs of scores.
There are many “standard” linear transformations (IQ scores, SAT scores,
GRE scores, etc.) in addition to the “standard score transformation.”
3. Comparing and combining scores from different distributions—Don’t use
raw scores!
3
1st Exam—mean = 50, s.d. = 5
2nd Exam—mean = 50. s.d. = 10
1st Exam
2nd Exam
George
50
60
110
You
60
50
110
Total Pts.
Does George deserve the same overall course grade as your grade?
Instead of raw scores, use z-scores
1st Exam
2nd Exam
George
0
1
1
You
2
0
2
Total Pts.
III. Probability and Probability Distributions
A. Probability is defined as the long run relative frequency of an outcome
Pi=fi/n
Pi The probability of the ith outcome
fi The number of times the ith outcome occurred over many trials
n The number of trials (big)
1. We could conceivably determine probabilities empirically, but sometimes we
know them a priori.
2. Two properties of probabilities
a. 0 ≤ Pi ≤ 1.0
b.  Pi = 1.0
3. Two rules for calculating probabilities of complex events
a. The probability that any one of a set of mutually exclusive outcomes will
occur is equal to the sum of the probabilities of each of those outcomes.
4
What’s the probability of getting an even number when you roll an honest
die?
b. The probability that a particular combination of outcomes of independent
trials will occur is equal to the product of the probabilities of each of those
outcomes.
What’s the probability of getting two ones when you roll an honest die
twice?
B. Probability distributions
1. The probability distribution for rolling an honest die
number
of dots
1
2
3
4
5
6
Pi
1/6
1/6
1/6
1/6
1/6
1/6
a. What’s the shape of this distribution?
b. What’s the mean of this distribution?
The mean of a probability distribution is special thing and has a special
name—the “expected value”
E(Y )   y P( y)
c. What’s the standard deviation of this distribution?
 
2
 
y  E( y) P( y)


5
2
2. The probability distribution for the number of females in random samples of one
person when the population is half female (f = .5)
# of females
in sample
0
1
Pi
.5
.5
1.0
What is the shape of this distribution? What is its mean? What is its
standard deviation?
3. The probability distribution for the number of females in random samples of two
people drawn from a population when the population is half female
Here are the possible sample results
F, F
F, M
M, F
M, M
The probability of each of these four possible sample results equals .52 or .25
(multiplication rule)
Thus, the probability of drawing 2 females equals .25, the probability of
drawing one female equals .50, and the probability of drawing 0 females equals
.25 (addition rule)
# of females
in sample
0
1
2
Pi
.25
.50
.25
1.00
What’s the shape of this distribution? What is its mean and standard deviation?
In the 1700s, Jacob Bernoulli proved that for these kinds of distributions (two
possible outcomes, n independent trials) the mean always equals n( and the
standard deviation always equals the square root of n()(1-). Here n = sample
size and  = probability of drawing a female
6
Download