Education 793 Class Notes

advertisement
Education 793 Class Notes
Descriptive Statistics, Central
Tendency and Variability
10 September 2003
Today’s agenda
•
•
•
•
Lab details finalized
Your announcements
Chapter 3: Frequency Distributions
Chapter 4: Central Tendency and
Variability
2
Data Matrix
Data Matrix is defined as a matrix in which subjects are
listed in the rows and variables corresponding to each subject
are listed in the columns
Example
Case
Sex
Age
Race
1
m
12
white
2
f
11
asian
3
f
13
african american
4
m
13
white
5
m
12
asian
6
f
11
white
3
Basic Terms and Concepts
• Frequency Distribution: Orders values from lowest to
highest and gives the number or percent of subjects with each
value.
• Can be presented with:
• tabular form
• histogram
• frequency polygon
• stem and leaf plot
4
Tables and Polygons
5
Histograms and Bar Charts
6
Stem and Leaf
7
Shape of Frequency Distributions
• Symmetric: when
two halves of the
distribution mirror
each other
8
Modality
• Modality: the number of relative peaks the distribution
exhibits
Unimodal
Bimodal
Rectangular
Multimodal
9
Skew
A.
B.
Left (Negative) Skew
Right (Positive) Skew
10
Kurtosis (Peakedness)
High Peak, Long Tails
Flat Peak, No Tails
11
Central Tendency and Variability
1) Central tendency
Mode
Median
Mean
2) Variation
Range
Semi-interquartile Range
Variance
Standard deviation
12
Central Tendency
Mode
Most frequently occurring score in a distribution
Median
Point on the distribution below which one-half (50%) of the
scores fall
Mean
Arithmetic average of scores within a distribution
Special properties:
Sum of deviations of scores mean is zero
Sum of squares of deviations is at a minimum
13
Mean, Median, Mode
A distribution of GRE scores:
340
600
450
620
Mode:
510
660
580
670
Median:
580
710
Mean:
572
A second distribution of GRE scores:
340
600
450
620
Mode:
510
660
Median:
580
670
580
1710*
Mean:
672
*Heidi’s score
14
Box Plots
15
Shapes and Statistics
A.
B.
16
Variability
Identifying the middle of a distribution as revealed by
measures of central tendency is of limited value
unless one also knows how much the scores in the
distribution differ from each other.
As such, measures of how much scores differ from each
other should always accompany measures of central
tendency. These measure of how much the scores
differ from each other are called measures of
"spread“ or variability.
17
Basic Measures
•
Range
The range is the simplest measure of spread: It is equal to the difference
between the largest and the smallest values.
The range can be a useful measure of spread primarily because it is so
easily understood, but it is otherwise seldom used in real statistical practice.
It is very sensitive to extreme scores, however, because it is based on only
two values.
•
The semi-interquartile range
Computed as one half the difference between the 75th percentile
(often called Q3) and the 25th percentile (Q1), or (Q3 - Q1) / 2.
Because half the scores in a distribution lie between Q3 and Q1,
semi-interquartile range is 1/2 the distance needed to cover 1/2
the scores. In a symmetric distribution, an interval stretching from
one semi-interquartile range below the median to one semiinterquartile above the median will contain 1/2 of the scores. This
will not be true for a skewed distribution, however.
18
Variance and Standard Deviation
The variance is a widely used measure of spread. It is computed as the
average squared deviation of each number from its mean. The formula
(in summation notation) for the variance in a population in which M is
the mean and N is the number of scores is:
2
s2  
( x  x)
n 1
The standard deviation is the square root of the variance. It is the most
commonly used measure of spread.
An important attribute of the standard deviation as a measure of
spread is that it is possible to compute the percentile rank
associated with any given score if the mean and standard
deviation of a normal distribution are known.
19
Calculating Standard Deviation
( x  x)
s 
n 1
2
2
Or to calculate by hand there is a shortcut formula
( X )
X 

N
s2 
N 1
2
2
20
Numbers
X
X2
12
144
17
289
22
484
14
196
12
144
19
361
13
169
15
225
11
121
( X )
X 

2
N
s 

N 1
2
2
135
2133 
9
s2 
9 1
2
2133  2025

8
135 2133
=13.5
21
Next Week
• Coursepack
– Say it with Figures Chapter7:
• The Cross-Tabulation Refines
• and
Available through JSTOR at www.lib.umich.edu
22
Download