Measures of Central Tendency

advertisement
Basic Statistics
Measures of Central Tendency
STRUCTURE OF STATISTICS
TABULAR
DESCRIPTIVE
GRAPHICAL
NUMERICAL
STATISTICS
CONFIDENCE
INTERVALS
INFERENTIAL
TESTS OF
HYPOTHESIS
Consider the following distribution of scores:
How do the red and blue distributions differ?
How do the red and green distributions differ?
1
2
Characteristics of Distributions
• Location or Center
– Can be indexed by using a measure
of
central tendency
• Variability or Spread
– Can be indexed by using a measure
of
variability
Consider the following distributions:
How do they differ?
Consider the following two distributions:
How do the green and red distributions
differ?
Characteristics of Distributions
•
•
•
•
Location or Central Tendency
Variability
Symmetry
Kurtosis
STRUCTURE OF STATISTICS
TABULAR
DESCRIPTIVE
GRAPHICAL
NUMERICAL
NUMERICAL
STATISTICS
CONFIDENCE
INTERVALS
INFERENTIAL
TESTS OF
HYPOTHESIS
STRUCTURE OF STATISTICS
NUMERICAL DESCRIPTIVE MEASURES
TABULAR
DESCRIPTIVE
GRAPHICAL
CENTRAL
TENDENCY
NUMERICAL
VARIABILITY
SYMMETRY
KURTOSIS
Measures of Central Tendency
Summarizing Data
Give you one score or
measure that represents,
or is typical of, an entire
group of scores
The Mean
The Median
The Mode
Most scores tend to center toward
a point in the distribution.
frequency
score
Central Tendency
Frequency Tables & Graphs
Measures of Central
Tendency
73
33
Averaging
52
67 35 43
Frequency
Tabulating84
47 41
Tables
35 35 39
52Graphing
84 49
35
47
90
35
52
47
Graphs
43 41 56
84 35
69
35
77
39
47
Measurement
92 41
52 65 scales
49
47
The Mean
The Median
The Mode
Measures of Central Tendency
Are statistics that describe typical, average,
or representative scores.
The most common measures of central
tendency (mean,median, and mode) are
quite different in conception and calculation.
These three statistics reflect different
notions of the “center” of a distribution.
“The Mode”
The score that occurs most frequently
In case of ungrouped frequency distribution
When observations have been grouped into classes,
the midpoint of the class with the largest frequency is
used as an estimate of the mode.
In case of grouped frequency distribution
The mode of this distribution is estimated to be 52,
the midpoint of the 51-53 class
Unimodal Distribution
-One Mode-
Bimodal Distribution
–Two Modes-
Mode and Measurement Scales
Can you find a mode for each data?
Nominal Scale
1
3
3
2
1
21
23
12
33
23
3
3
3
1
2
2
Ordinal Scale
1234
4343
2442
1244
3234
4
Nationality
Football Poll
1=American
1=first
2=Asian
2=second
3=Mexican
3=third
4=fourth
Interval Scale
112 132
112 113
112 150
125 114
Ratio Scale
68 56 39
56 44 56
45 56 75
81 67 59
112 56
IQ score
Weight
“The Mode”
It is not affected by extremely large or
small values and is therefore a valuable
measure of central tendency when such
values occur.
It can be found for ratio-level, interval-
level, ordinal-level and nominal-level data
“The Median”
The Median is the 50th percentile of a distribution
- The point where half of the observations fall
below and half of the observations fall above
In any distribution there will always be an equal
number of cases above and below the Median.
Oh my !!
Where is the
median?
Location
For an odd number of untied scores
(11, 13, 18, 19, 20)
11
12
13
14 15
16
17
18
19
20
The Median is the middle score when
scores are arranged in rank order
Median Location = (N+1)/2 = 3rd
Median Score = 18
For an even number of untied scores
(11, 15, 19, 20)
11
12
13
14 15
16
17
18
19
20
The Median is halfway between the two central
values when scores are arranged in rank order
Median Location = (N+1)/2 =
2.5th
Md
score=(15+19)/2=17
The Median of group of scores is that
point on the number line such that sum
of the distances of all scores to that
point is smaller than the sum of the
distances to any other point.
There is a unique median for each data
set.
It is not affected by extremely large or
small values and is therefore a valuable
measure of central tendency when such
values occur.
The Median can be computed for
•Ordinal-level data, or
•Interval-level data, or
•Ratio-level data.
Median and Levels of Measurement
1
3
3
2
1
2
2
1
3
2
1
3
2
3
3
3
3
1
2
2
No
Nationality
1234
4343
2442
1244
3234
112 132
112 113
112 150
125 114
68 56 39
56 44 56
45 56 75
81 67 59
Yes Yes Yes
Football Poll
IQ score
Weight
Can you find a median for each type of data?
The Mean
Definition: For ungrouped data, the population
mean is the sum of all the population values
divided by the total number of population values.
To compute the population mean, use the
following formula.
Sigma
Population
mean
X


N
Population
size
Individual value
THE SAMPLE MEAN
 Definition: For ungrouped data, the
sample mean is the sum of all the
sample values divided by the number of
sample values. To compute the sample
mean, use the following formula.
Sigma
X-bar
X  nX
Sample
Size
Individual value
Characteristics of
The Mean
Center of Gravity of a Distribution
Center of Gravity of a
Distribution
1
2
3
4
Mea
n
5 6
7
8
How much error do you expect
for each case?
25
31
-6
27
31
-4
31
0
29
31
-2
35
31
4
Deviation
Scores
2
31
33
6
31
37
Data set
31
The Mean
On average,
I feel fine
It’s too
hot!
It’s too
cold!
The Mean of group of scores is the point
on the number line such that sum of the
squared differences between the scores
and the mean is smaller than the sum of
the squared difference to any other point.
If you summed the differences without
squaring them, the result would be zero.
Mean and Measurement Scales
Every set of interval-level and ratio-level data has a mean.
Nominal data
1
2
2
3
NO
Nationality
1=American
2=Asian
3=Mexican
Ordinal data
1
2
2
3
NO
Football Poll
1=first
2=second
3=third
Interval data
1
2
2
3
YES
IQ Test
Ratio
data
2
YES
1
Weight
2
3
All the values are included in
computing the mean.
X  nX
A set of data has a unique mean and
the mean is affected by unusually
large or small data values.
11
3
5
7
54
6
5.5
The Mean
9
• Every set of interval-level and ratio-level
•
•
•
•
data has a mean.
All the values are included in computing
the mean.
A set of data has a unique mean.
The mean is affected by unusually large or
small data values.
The arithmetic mean is the only measure
of central tendency where the sum of the
deviations of each value from the mean is
zero.
The Relationships between
Measures of Central Tendency
and Shape of a Distribution
Normal Distribution
Symmetric
Unimodal
Mean=Median=Mod
e
Positively Skewed Distribution
Mode
Median
Mean
Mode < Median < Mean
The median falls closer to the mean than to the mode
With unimodal curves of moderate asymmetry, the distance from the median to
the mode is approximately twice that of the distance between the median and
the mean
Negatively Skewed Distribution
Mode
Median
Mean
Mode > Median > Mean
The median falls closer to the mean than to the mode
Bimodal Distribution
Mean=Median
Mode
Mode
Mode1 < Mean=Median < Mode2
If two averages of a moderately
skewed frequency distribution are
known, the third can be approximated.
The formulae are:
Mode = Mean - 3(Mean - Median)
Mean = [3(Median) - Mode]/ 2
Median = [2(Mean) + Mode]/ 3
Measures of Central Tendency
as Inferential Statistics
Parameters
Mean
Median
Mode Difference
Between
Parameter
and
Statistics
Sampling
Errors
Sampling
Statistics
Mean
Median
Mode
As inferential measures, the Mean will be used much
more frequently than the Median or Mode.
Why ?
On the average, there is less sampling error
associated with the Mean than with the Median,
and the Mode tends to have more sampling error
than the Median. In other words, the difference
between the statistic X and the Mean tends to be
less than for the corresponding values for the
sample Median (Md) and population median
(Mdpop).
SUMMARY
There are three common measures of central tendency. The mean is the
most widely used and the most precise for inferential purposes and is the
foundation for statistical concepts that will be introduced in subsequent
class. The mean is the ratio of the sum of the observations to the number of
observations. The value of the men is influenced by the value of every score
in a distribution. Consequently, in skewed distributions it is drawn toward
the elongated tail more than is the median or mode.
The median is the 50th percentile of a distribution. It is the point in a
distribution from which the sum of the absolute differences of all scores are
at a minimum. In perfectly symmetrical distributions the median and mean
have same value. When the mean and median differ greatly, the median is
usually the most meaningful measure of central tendency for descriptive
purposes.
The mode, unlike the mean and median, has descriptive meaning even with
nominal scales of measurement. The mode is the most frequently occurring
observation. When the median or mean is applicable, the mode is the least
useful measure of central tendency. In symmetrical unimodal distribution the
mode, median, and mean have the same value.
Download