Descriptive Statistics

advertisement
Descriptive Statistics
Frequency Distributions
• a tallying of the number of times
(frequency) each score value (or interval
of score values) is represented in a group
of scores.
• May be in a Table
• May be in a Figure
• May be ungrouped
• May be grouped
Table, Ungrouped
Table, Grouped
Bar Graph
Histogram
Frequency Polygon
Three-Quarter Rule
• Height of highest point on the ordinate
should be equal to three quarters of the
length of the abscissa.
• Violating this rule may distort the data.
25
20
Baylen
Jackson
Jones
Smith
Stern
15
10
5
0
Sales in $K
Reduce Ordinate & Extend Abscissa
Makes variability among bars appears less.
Stretching the
ordinate and
reducing the width
of the abscissa
makes the bars’
heights appear
more variable.
The Gee-Whiz Plot –
leave out a big
portion of the lower
ordinate, making
the differences
between the bars’
heights appear even
greater.
A graph used by Ronald Reagan on July 27, 1981
Here I have added numbers to the ordinate.
Shapes of Frequency
Distributions
• Symmetrical – the left side is the mirror
image of the right side
• Uniform (rectangular)
• U-Shaped
• Bell-Shaped
Uniform Distribution
U-Distribution
Bell-Shaped
Skewed Distributions
• Lopsided
• Negative skewness: most of the scores
are high, but there are a few very low
scores.
• Positive skewness: most of the scores are
low, but there are a few very high scores.
Negative (Left) Skewness
Positive (Right) Skewness
Measures of Location
• aka “Central Tendency”
• Mean, Median, & Mode
• The red distribution is located to the right of
the black one – it tends to have higher
scores.
The Mean
•
•
•
•
(M = sample mean,  = population mean)
=YN
 (Y - ) = 0
 (Y - )2 is minimal
The Median
• the score or point which has half of the
scores above it, half below it
• Arrange the scores in order, from lowest to
highest. The median location (ml) is
(n + 1)/2, where n is the number of scores.
Count in ml scores from the top score or
the bottom score to find the median
• If ml falls between two scores, the median
is the mean of those two scores.
– 10, 6, 4, 3, 1: ml = 6/2 = 3, median = 4.
– 10, 8, 6 ,4, 3, 1: ml = 7/2 = 3.5, median = 5
The Mode
• the score with the highest frequency
• A bimodal distribution is one which has
two modes.
• A multimodal distribution has three or
more modes.
• 1, 1, 1, 2, 2, 3, 4, 4, 5, 5: Mode = 1.
• 1, 1, 1, 2, 2, 3, 4, 4, 5, 5, 5: Bimodal,
modes are 1 and 5.
Skewness, Mean, & Median
• the mean is very sensitive to extreme
scores and will be drawn in the direction of
the skew
• the median is not sensitive to extreme
scores
• if the mean is greater than the median,
positive skewness is likely
• if the mean is less than the median,
negative skewness is likely
• one simple measure of skewness is
(mean - median) / standard deviation
• statistical packages compute g1, estimated
Fisher’s skewness.
– The value 0 represents the absence of
skewness.
– Values between -1 and +1 represent trivial to
small skewness.
• 5, 4, 3, 2, 1: Mean = 3, Median = 3
• 40, 4, 3, 2, 1: Mean = 10, Median = 3.
• The mean is much more affected by
skewness than is the median.
• In skewed distributions, the median may
be the preferred measure of location.
Measures of Dispersion
• aka variability
• How much do the scores differ from each
other.
• Range Statistics
• Variance
• Standard Deviation
Same Locations, Different
Dispersions
Four Small Distributions, Each
with Mean = 3
X
Y
Z
V
3
1
0
-294
3
2
0
-24
3
3
15
3
3
4
0
30
3
5
0
300
Range
• Value of highest score minus value of
lowest score
• X: 3 - 3 = 0
• Y: 5 - 1 = 4
• Z: 15 - 0 = 15
• V: 300 - (-294) = 594
Interquartile Range
• Q3 - Q1, where Q3 is the third quartile (the
value of Y marking off the upper 25% of
scores) and Q1 is the first quartile (the
value of Y marking off the lower 25%).
• The interquartile range is the range of the
middle 50% of the scores.
The Semi-Interquartile Range
• (Q3 – Q1)/2.
• How far you have to go from the middle in
both directions to mark off the middle 50%
of the scores.
• This is also known as the probable error.
• Astronomers have used this statistic to
estimate by how much one is likely to be
off when estimating the value of some
astronomical parameter.
Mean Absolute Deviation
•  |Y -  |  N
• this statistic is rarely used
• X: 0;
Y: 6/5; Z: 24/5; V: 648/5
Population Variance
SSy
(Y   )
 

N
N
2
2
(Y )
SSy  (Y   )   Y 
N
2
2
2
Population Standard Deviation
 
X: 0
Y: 1.414
2
Z: 6
V: 188.6
Estimating Population Variance
from Sample Data
• computed from a sample, SS / N tends to
underestimate the population variance
• s2 is an unbiased estimate of population
2
variance
SSy
(Y  M )
2
s 
N 1

N 1
• s is a relatively unbiased estimate of
population standard deviation
s s
2
Range and SD
• in a bell-shaped (normal) distribution
nearly all of the scores fall within plus or
minus 3 standard deviations
• when you have a moderately sized sample
of scores from such a distribution the
standard deviation should be
approximately one-sixth of the range
Z-Scores
• Transform the scores so that they have
mean = 0 and standard deviation = 1.
• This provides a convenient way to
measure how far from the mean a score
is.
Z
Y 

Calculations on Y Scores
Y
(Y - M)
(Y - M)2
z*
5
+2
4
1.265
4
+1
1
0.633
3
0
0
0.000
2
-1
1
-0.633
1
-2
4
-1.265
Sum
15
0
10
0
Mean
3
0
2 = 2
0
*s = SQRT(10/4) = 1.581
Standard Scores
• Standard scores have a predetermined
mean and standard deviation.
• Z scores are standard scores with mean 0
and standard deviation 1.
• IQ scores are standard scores with mean
100 and standard deviation 15.
• Each of the three sections of the SAT is
standardized to mean 500 and standard
deviation 100.
Convert from z to Other
Standard Score
• Suzie Clueless has a raw score that is (z)
two standard deviations below the mean.
What is her IQ score?
• Standard Score = Standard Mean +
(z score)(Standard Standard Deviation).
• IQ = 100 – 2(15) = 70. Suzie qualifies as
a moron (IQ 51 to 70).
• Carlos Luis earned a intelligence test
score 2.75 standard deviations above the
mean. What is his IQ score?
• IQ = 100 + 2.75(15) = 141. Carlos
qualifies as a genius.
Download