Elementary Statistics Central Tendency

advertisement
Elementary Statistics
Central Tendency
John M Dusel
jdusel@whittier.edu
Whittier College
Fall 2015
John M Dusel (Whittier College)
Central Tendency
Fall 2015
1 / 18
1
Measures of Central Tendency
The Mean
Characteristics of the mean
The Median
The Mode
2
Finding Central Tendency of Simple Frequency Distributions
Mean
Median
Mode
3
When to use the Mean, Median, or Mode
Scales of Measurement
Skewed Distributions
John M Dusel (Whittier College)
Central Tendency
Fall 2015
2 / 18
Measures of Central Tendency
The distribution of a random variable X comprises information about the
values of X and their frequencies.
Frequency distributions and graphs show the form of a distribution.
Definition (Measure of Central Tendency)
Descriptive statistic that indicates a typical or representative score of a
distribution. A value that in some sense summarizes a distribution.
If a population is not available, then a parameter cannot be calculated.
Researchers use statistics based on sample data.
Good statistics mimic parameters.
Measures of central tendency are good substitutes for their
corresponding parameters.
John M Dusel (Whittier College)
Central Tendency
Fall 2015
3 / 18
Measures of Central Tendency
The Mean
Definition (Mean of a variable X )
The mean of a sample of X ’s values is denoted X .
Calculate X̄ using the arithmetic average.
Different samples =⇒ different X (uncertainty).
The mean of the population X measures is denoted µ.
Need all population scores to calculate.
No uncertainty.
Formula for the sample mean
P
X =
X
N
P
: addition symbol (Greek letter S for “sum”)
P
X : a score =⇒
X means add all the X s
N: sample size
John M Dusel (Whittier College)
Central Tendency
Fall 2015
4 / 18
Measures of Central Tendency
The Mean
Example
College student goes to the Student Center everyday and spends money.
How much money should (s)he budget per month to pay for this habit?
Data from two weeks X = $ spent on a given day.
Day
1
2
3
4
5
6
7
X =
P
X
N
=
$37.66
14
X
3.25
2.50
4.47
0.00
3.81
1.75
0.00
Day
8
9
10
11
12
13
14
X
0.00
6.78
2.40
0.00
0.00
8.50
4.20
= $2.69. Interpretation in terms of sample days.
Therefore µ ≈ $2.69 (Exercise: Interpret. Hint: Population=?)
Monthly estimated expenses = 30 · X = 30days · $2.69/day = $80.70.
John M Dusel (Whittier College)
Central Tendency
Fall 2015
5 / 18
Measures of Central Tendency
The Mean
Special properties of the mean
1
X is the “balance point” of X ’s distribution: Subtract X from each
score in X ’s distribution, and the sum will always vanish:
X
(X − X ) = 0
2
X is a “least squares” minimizer:
X
X
(X − Y )2 for any number Y
(X − X )2 <
John M Dusel (Whittier College)
Central Tendency
Fall 2015
6 / 18
Measures of Central Tendency
The Mean
Derivations of the mean’s special properties
1
Property
P
(X − X ) = 0: Observe that
X
2
(X − X ) =
X
X − NX =
X
P
X −N ·
X
=0
N
P
P
Property
(X − X )2 < (X − Y )2 for any number Y :
P
(X − Y )2 is quadratic in Y , specifically
X X
X
(X − Y )2 =
X2 − 2
X Y + NY 2
d
Calculus: minimizing value of Y is the solution to dY
X −2
X + 2NY = 0
X
NY =
X
P
X
Y =
=X
N
John M Dusel (Whittier College)
Central Tendency
P
(X − Y )2 = 0
Fall 2015
7 / 18
Measures of Central Tendency
The Median
Definition (Median of a random variable)
The point that divides a distribution of scores in half.
Median is located at position N+1
2 in ordered arrangement.
N N
(N even: avg scores 2 , 2 + 1)
Example (Student Center expenditure data)
Day
13
9
3
$
8.50
6.78
4.47
Day
14
5
1
$
4.20
3.81
3.25
Day
2
10
6
$
2.50
2.40
1.75
Day
4
7
8
$
0.00
0.00
0.00
Day
11
12
$
0.00
0.00
8.50, 6.78, 4.47, 4.20, 3.81, 3.25, 2.50, 2.40, 1.75, 0.00, 0.00, 0.00, 0.00, 0.00
|
{z
}|
{z
}
7 scores
7 scores
N = 14 =⇒ average 7th,8th scores: Median =
John M Dusel (Whittier College)
Central Tendency
$2.50+$2.40
2
= $2.45
Fall 2015
8 / 18
Measures of Central Tendency
The Median
Definition (Median of a random variable)
The point that divides a distribution of scores in half.
Median is located at position N+1
2 in ordered arrangement.
N N
(N even: avg scores 2 , 2 + 1)
Example (Student Center expenditure data)
N = 13 =⇒ use 7th score. One $0.00 dropped: Median = $2.50.
8.50, 6.78, 4.47, 4.20, 3.81, 3.25, 2.50, 2.40, 1.75, 0.00, 0.00, 0.00, 0.00
|
{z
}
|
{z
}
6 scores
6 scores
Score $8.50 dropped: Median = $2.40.
6.78, 4.47, 4.20, 3.81, 3.25, 2.50, 2.40, 1.75, 0.00, 0.00, 0.00, 0.00, 0.00
|
|
{z
}
{z
}
6 scores
John M Dusel (Whittier College)
6 scores
Central Tendency
Fall 2015
9 / 18
Measures of Central Tendency
The Mode
Definition (Mode)
Most frequently occurring score. Relative frequency
mode
N .
Example (Student Center expenditure data)
Day
13
9
3
$
8.50
6.78
4.47
Day
14
5
1
$
4.20
3.81
3.25
Day
2
10
6
$
2.50
2.40
1.75
Day
4
7
8
$
0.00
0.00
0.00
Day
11
12
$
0.00
0.00
Mode = $0.00 occurs 5/14 = 36% of the time.
John M Dusel (Whittier College)
Central Tendency
Fall 2015
10 / 18
Measures of Central Tendency
The Mode
Definition (Measures of Central Tendency)
Mean µ or X =
P
X
N .
Median is located at position N+1
2 in ordered arrangement.
N N
(N even: avg scores 2 , 2 + 1)
Mode: most frequently occurring score. Relative frequency
John M Dusel (Whittier College)
Central Tendency
mode
N .
Fall 2015
11 / 18
Finding Central Tendency of Simple Frequency Distributions
Mean
P
Mean of a variable X is µ or P
X = NX (N = sample size, X = scores).
Frequency distribution: X = NfX , f = frequency of a score X .
Example (Raw test scores of a group of N = 24 college students)
X
97
94
93
89
86
f
1
2
1
1
2
f ·X
97
188
93
89
172
X
85
83
82
79
78
f
2
1
3
1
1
f ·X
170
83
246
79
78
X
77
75
71
70
68
f
1
1
1
1
1
f ·X
77
75
71
70
68
X
66
60
57
50
f
1
1
1
1
f ·X
66
60
57
50
P
fX = 1889 = 97+188+93+89+172+170+83+246+79+· · ·+57+50
=⇒ X = 1889
24 = 78.71
Population = students in this class: µ = 78.71.
Population = all college students: µ unknown, but µ ≈ 78.71.
John M Dusel (Whittier College)
Central Tendency
Fall 2015
12 / 18
Finding Central Tendency of Simple Frequency Distributions
Median is located at position
Median
N+1
2 .
Example (Raw test scores of a group of N = 24 college students)
X
97
94
93
89
86
f
1
2
1
1
2
X
85
83
82
79
78
f
2
1
3
1
1
X
77
75
71
70
68
f
1
1
1
1
1
X
66
60
57
50
f
1
1
1
1
Median located at position “25/2” = average of positions 12, 13.
Count frequencies until positions 12, 13 are reached.
97, 94, 94, 93, 89, 86, 86, 85, 85, 83, 82, |{z}
82 , |{z}
82
#12
#13
Median = 82.
John M Dusel (Whittier College)
Central Tendency
Fall 2015
13 / 18
Finding Central Tendency of Simple Frequency Distributions
Mode
Modal score: highest frequency
Example (Raw test scores of a group of N = 24 college students)
X
97
94
93
89
86
f
1
2
1
1
2
X
85
83
82
79
78
f
2
1
3
1
1
X
77
75
71
70
68
f
1
1
1
1
1
X
66
60
57
50
f
1
1
1
1
Mode = 82.
John M Dusel (Whittier College)
Central Tendency
Fall 2015
14 / 18
When to use the Mean, Median, or Mode
Scales of Measurement
Median requires order, and mean requires quantitative relationships
between the scores.
Nominal: Mode.
Meaningful question: Most frequently occurring area code.
Meaningless question: What is the mean or median area code?
Ordinal: Mode or median.
Meaningful questions: Most frequently occurring class standing
(freshman) or middle class standing (sophomore) at Whittier.
Meaningless question: What is the mean class standing?
Interval: Mode or median or mean.
Ratio: Mode or median or mean.
John M Dusel (Whittier College)
Central Tendency
Fall 2015
15 / 18
When to use the Mean, Median, or Mode
Skewed Distributions
Mean is affected by extreme values in a distribution. Median less so.
Example (Florida housing development)
03-W4221 4/4/07 10:48 AM Page 48
X = elevation of lot (ft above sea level).
f
20
30
30
20
Chapter 3
20 Lots
fX
7000
450
300
100
350
Elevation
X
348–352
13–17
7–12
3–7
■
SW
ACRAMPY
ES
48
80 Lots
25
5
F I G U R E 3 . 1 Elevation of Swampy Acres
Positively skewed! Flood water level 25 feet =⇒ 80 lots underwater!
John M Dusel (Whittier College)
Central
them by explaining that the average elevation of the lots is 78.5 feet and that the water
level has never exceeded 25 feet in that area. On the average, the developer has told
the truth, but this average truth is misleading. Look at the actual lay of the land in
Figure 3.1 and examine the frequency distribution in Table 3.4.
The mean elevation, as the developer said, is 78.5 feet; however, only 20 lots, all
on a hill, are out of the flood zone. The other 80 lots are, on the average, under water.
Using the mean is misleading. The median is a much better measure of central tendency
here because it is unaffected by the few extreme lots on the hill. The median elevation
is 12.5 feet, well below the high-water mark. (Because our interest is in only this one
development, the lot elevations constitute a population of data. There is no interest in
generalizing from these data to some larger group.)
In summary, use the mean if it is appropriate. To follow this advice you must
Tendency
2015
recognize data for which the mean is not appropriate. Perhaps TableFall
3.5 will
help.
16 / 18
When to use the Mean, Median, or Mode
Skewed Distributions
Mean is affected by extreme values in a distribution. Median less so.
Example (Florida housing development)
X = elevation of lot (ft above sea level).
X
348–352
13–17
7–12
3–7
f
20
30
30
20
N = 100
fX
7000
450
300
100
P
fX = 7850
Positively skewed! Flood water level 25 feet =⇒ 80 lots underwater!
µ = 7850/100 = 78.5 ft looks safe (misleading)
Median = 12.5 ft more accurate description of reality
Mean not recommended for skewed distributions!
John M Dusel (Whittier College)
Central Tendency
Fall 2015
17 / 18
distribution to be positively skewed. If the mean is smaller than the median, expect
When to skew.
use theFigure
Mean, 3.2
Median,
Mode
SkewedofDistributions
negative
showsor the
relationship
the mean to the median for a
positively skewed and a negatively skewed distribution of continuous data.
The size of the difference between the mean and median usually indicates the
degree of skew. The greater the difference, the greater the skew. To illustrate, I added
an expenditure of $100.00 to the slightly skewed distribution in Table 3.1. The original
mean was $2.69 and the median was $2.45. Adding a score of $100.00 produces a
much more skewed distribution with a mean of $9.83 and a median of $2.50. In the
original distribution the difference between the mean and median was $0.24; in the
Mean is affected by extreme values in a distribution. Median less so.
Median
Mean smaller than
median—skew
is negative
Median
Narrow
point
Low
Mean larger than
median—skew
is positive
Mean
Frequency
Frequency
Mean
High
Narrow
point
Low
Scores
High
Scores
F I G U R E 3 . 2 The effect of skewness on the relative position of the mean and
median for continuous data
John M Dusel (Whittier College)
Central Tendency
Fall 2015
18 / 18
Download