Topic 2 Measures of Central Tendency

advertisement
ECO 72 ­ INTRODUCTION TO ECONOMIC STATISTICS
Topic 2
Measures of
Central Tendency
These slides are copyright © 2003 by Tavis Barr. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is presently available at http://www.opencontent.org/openpub/).
Measures of Central Tendency
This chapter looks at three different concepts of how we describe a “typical” element of a data set.
Mean
● Median
● Mode
●
There is no one “best” concept for all cases; we will discuss the advantages and disadvantages of each.
Mean
●
●
The mean is what is most commonly called the average.
If a population is finite, of size N, we can write the population mean as EXAMPLE:
●
●
i= N
X i X 1 X 2 ⋯X N
∑ N= N
i=1
●
There are three countries in North America (N=3)
Their land areas are:
Canada
9,093,507 km2
Mexico
1,923,040 km2
U.S.
9,161,923 km2
Total
2,017,840 km2
Average land area:
2,017,840/3 = 6,726,157 km2
Source: 2005 CIA World Factbook
Mean – sample mean
●
i=n
For a sample of size n, we can write sample mean as X i X 1 X 2 ⋯X n
∑ n= n
i=1
Example:
●
●
●
Ten people are asked how many hours of TV they watched last night.
Their responses are 1, 2, .5, 0, 4, 0, 2, 1.5, 0, 3.
Mean: 1+2+0.5+4+2+1.5+3
=1.4
10
Advantages of the sample mean
1. It takes all values in the sample into account.
2. It is unique: Each sample and population has only one mean.
3. The sum of X minus the mean is zero, so the mean acts as a “balancing point.”
Disadvantages of the Mean
1. It only exists for quantitative data
What is the mean between good, fair, poor?
● Between red, yellow, and blue? ●
2. It can be affected strongly by outliers. Example: In Whoville, there are 10 people who earn $10,000 a year and one person who earns $1,000,000
● What is the mean? Is it a typical income?
●
Weighted Mean
Weighted means occur when we have some observations that we wish to place more importance on than others. ●
They require a weighting variable that indicates the importance to place on a given observation.
●
●
We denote the original variable by Xi and the weighting variable by Wi.
Weighted Mean – Formula
Formula for the weighted mean: i =n
∑ W i Xi
i =1
i =n
∑Wi
i =1
= W 1 X 1  W 2 X 2 ⋯W n X n
W 1W 2 ⋯W n
Weighted Mean – Example
Life Expectancy in a group of northern African countries:
Country
Algeria
Egypt
Libya
Morocco
Nigeria
Sudan
Tunisia
Life Expectancy
68
66
73
67
41
55
72
Sum
Mean
442
442/7 = 63.14
Weighted Mean – Example (cont'd)
Life Expectancy in a group of northern African countries:
Country
Algeria
Egypt
Libya
Morocco
Nigeria
Sudan
Tunisia
Life Expectancy
68
66
73
67
41
55
72
Population (mil)
31
70
5
29
126
31
10
Sum
Mean
442
442/7 = 63.14
302
Weighted Mean – Example (cont'd)
Life Expectancy in a group of northern African countries:
Country
Algeria
Egypt
Libya
Morocco
Nigeria
Sudan
Tunisia
Life Expectancy
68
66
73
67
41
55
72
Population (mil)
31
70
5
29
126
31
10
LE x Popn
2108
4620
365
1943
5166
1705
720
Sum
Mean
442
442/7 = 63.14
302
16627
Weighted Mean:
16627/302 = 55.05
Median
●
●
●
Looks at midpoint of data when they are sorted from highest to lowest. If even number of observations, take average of two midpoints.
Example: Hours of television watched, sorted:
0, 0, 0, .5, 1, 1.5, 2, 2, 3, 4
Median
●
●
●
Looks at midpoint of data when they are sorted from highest to lowest. If even number of observations, take average of two midpoints.
Example: Hours of television watched, sorted:
0, 0, 0, .5, 1, 1.5, 2, 2, 3, 4
Median: (1 + 1.5)/2 = 1.25
Advantages of Median
1.
●
Works on ordered data as well as quantitative data.
Example: 20 Opinions of Hillwood Cafe
Excellent:
3
Good:
6
Fair: 7
Poor:
4
Pick midpoint from: Poor, Poor, Poor, Poor, Fair, Fair, Fair, Fair, Fair, Fair, Fair, Good, Good, Good, Good, Good, Good, Excellent, Excellent, Excellent.
Advantages of Median (cont'd)
2.
Median is unaffected by outliers: –
There are 11 people in Whoville; ten make $10,000 per year and one makes $1 million per year. What is the median income?
3. Median is unique: A sample has only one.
Disadvantage of Median
Disadvantage: Not affected by changes in data away from center.
–
Example: In Whoville, what would happen to the median income if the millionaire suddenly started making only $25,000?
Mode
●
Asks which value is observed most. ●
Example: 20 Opinions of Hillwood Cafe
Excellent:
3
Good:
6
Fair: 7
Poor:
4
Here, “Fair” is the most common response.
Mode – Advantage
●
Advantage: Works on category data. ●
Example: Ethnic groups in Ethiopia (millions)
Amharic/Tigray
22.6
Oromo
28.2
Shankella
4.2
Sidamo
6.3
Somali
4.2
Other
4.9
Source: 2005 CIA World Factbook
Disadvantages of Mode
●
●
May not be unique. Consider the following sample of 10 people
Favorite Flavor of Ice Cream
# of people
Vanilla
1
Chocolate
4
Strawberry
4
Coffee
1
Disadvantages of Mode (cont'd)
●
May not even exist in a meaningful way on continuous data. Consider life expectancy data:
Country
Life Expectancy
Algeria
Egypt
Libya
Morocco
Nigeria
Sudan
Tunisia
68
66
73
67
41
55
72
One could say that every value is a mode, or that none is.
Disadvantages of Mode (cont'd)
●
May not lie near the center of the data at all in ordered data. Consider our answers about how many hours of television people watched last night:
0, 0, 0, .5, 1, 1.5, 2, 2, 3, 4
The modal response is not a typical one.
Geometric Mean
●
●
●
●
Used when looking at growth rates. For example, economic growth, interest rates, population growth. Asks what growth rate, if it were constant each year, would get you from starting value to ending value
We won't use it in this class, but keep it in mind if you're working with time­series data such as financial data
Download