Descriptive Statistics II: Measures of Dispersion

advertisement
Descriptive Statistics II:
Measures of Dispersion
How typical is ‘typical’?


Mean, Median and Mode are measures
designed to convey to the reader the “typical”
observation.
Often useful for the reader to know just how
typical the “typical” observation is!


If most of the observations fall in the mode, than
the mode is very “typical”
If most of the observations are close to the mean
or median observation than the mean/median is
very “typical” or indicative of the distribution.
Measures of Dispersion



Measures of dispersion give us an idea of how
representative the measures of central tendency (mode,
median, mean) are of the entire distribution.
The idea is that the more the data is dispersed – or
spread out – from the central measure (mode, median or
mean), the less indicative the central measure.
In other words, a high measure of dispersion tells us that
the mode/median/mean is not very typical and many
observations are quite different!
Measures of Dispersion

Measures of dispersion – how dispersed are
the observations.





Variation ratio
Range
Interquartile Range
Variance & Standard Deviation
Skewness, Kurtosis
Nominal: Variation Ratio

For nominal variables, the variation ratio is
the percentage of cases which are not the
mode.


=1-(number of observations in the mode) / total
number of observations
Infrequently used since the variation ratio
really does not tell the reader anything that
the mode does not already tell the reader.
Example: Variation Ratio and Mode
Canadian Election Study, MBS_B1:
Please circle the number that best reflects your opinion.
The government should:
1. See to it that everyone has a decent standard of
living……1090 (65.7%) = Mode
2. Leave people to get ahead on their own… 384 (23.1%)
8. Not sure ...................185 (11.1%)
Variation Ratio = 34.2%
Note: Unweighted responses are not reflective of the population.
Range

Minimum value to maximum value



Useful when you want to know all the possible
responses, for an aggregate policy measures like
GDP or other interval/ratio data.
Not very useful for closed-ended survey
responses.
In example above, range of real GDP is $338
to $48,589.

What does the range tell us about the mean of
$9,089 or the median of $5,194?
Percentiles, Quantiles and Quartiles


By ordering the values in the distribution, one can
classify observations by where they are in the
distribution.
Percentiles divide the distribution into 100 equal parts.




Lowest values are in the 1st percentile, largest values are in the
99th or 100th percentile.
The median is the 50th percentile.
Quantiles divide the distribution into 10 equal parts.
Quartiles divide the distribution into 4 equal parts.


1st Quartile = 25th Percentile, 2nd Quartile = Median, 3rd Quartile =
75th Percentile
This matters because quartiles provide us with a measure of
dispersion…
Interquartile Range


For closed-ended survey responses, like rating the
Conservative Party, finding the interquartile range (or
IQR) between the observation value at the 25th
percentile and the observation value at the 75th
percentile provides more useful information than the full
range.
IQR measures the range of the middle half of all
observations.


A high IQR relative to the range tells the reader that there are
many observations far from the median.
A low IQR relative to the range tells the reader that at least half of
all observations are very close to the median.
Calculating the interquartile range



Order all of the responses.
Identify the observation at the 25th percentile.

Recall: 50th percentile = median.

Take the value of this observation.
Identify the observation at the 75th percentile.


Take the value of this observation.
Interquartile range= difference between the
value of the observation at the 25th percentile
and the observation at the 75th percentile.
Ex: Finding 25th and 75th Percentile
Frequency
Percent
Cum. %
2
3
4
th
Median (50
5
Percentile)= 5
6
7
8
75th
9
percentile
Strongly like 10
100
76
136
87
85
182
108
146
143
52
46
8.64
6.55
11.7
7.51
7.33
15.63
9.26
12.57
12.3
4.5
4
8.64
15.19
26.89
34.41
41.74
57.37
66.63
79.2
91.5
96
100
Total
1,162
100
25th
Strongly dislike 0
1
percentile
Source: Canadian Election Study, 2008, CES_MBS_I10a [National Weight]
Ex: Calculating the IQR
Frequency
Percent
Cum. %
1
2
3
4
IQR = 7 – 2 = 5
5
6
7
8
75th percentile
9
value
Strongly like 10
100
76
136
87
85
182
108
146
143
52
46
8.64
6.55
11.7
7.51
7.33
15.63
9.26
12.57
12.3
4.5
4
8.64
15.19
26.89
34.41
41.74
57.37
66.63
79.2
91.5
96
100
Total
1,162
100
25th percentile
Strongly dislike 0
value
Source: Canadian Election Study, 2008, CES_MBS_I10a [National Weight]
Interpreting IQR


The interquartile range for opinions of the
Conservative Party (2008) was 5.
An IQR of 5 (with a range of 11) tells us that
most observations fall into a relatively narrow
range of values.

There are few observations with extremely low or
extremely high opinions of the Conservative Party.
Interpreting IQR: Real GDP Example

In example above, range of real GDP ran between $338
to $48,589 = $48,251





The median was $5,194.
The value of the observation at the 25th percentile is $2,018.
The value of the observation at the 75th percentile is $13,532.
The interquartile range is $13,532 - $2,018= $11,514.
This tells us that half of all observations are in a
relatively narrow range since 11,000 is much smaller
than 48,000.

Most countries are nowhere near as rich as the richest
countries…
Interpreting IQR: % Pop on $2/day




Value of observation at 25th Percentile =
13.1%
Value of observation at 75th Percentile =
73.9%
What is the interquartile range?
Since the range was between 2% and 96.6%,
what does the interquartile range tell us?
Variance


Rather than relying on the location of the
value, variance measures dispersion by
calculating how far observations are from the
mean.
Variance = Average of the distance from the
mean of each observation (squared).


High variance means that many/most observations
are far from the mean but could be heavily
influenced by outliers.
Low variance means that many/most observations
are close to the mean.
Formula: Variance
s2





=
𝑋−𝑋
2
𝑁
Where N is the total number of observations
𝑋 is the value of each observation
𝑋 is the mean of the set of data
The difference between each observation’s value and
the mean is squared before being added to eliminate
negative signs.
Result tends to be large relative to the value of the
observations.
Standard deviation

Takes square root of variance to put measure in
the same unit as the observations.


Example: The average rating of the Conservatives is
4.8 and the standard deviation is 2.8.
This tells us that the average amount that the ratings
differ from the mean is 2.8 points on the 11 point scale
used to measure feeling towards the Conservative
Party.

In contrast, the variance is 8.0, which can be interpreted as 8
squared points on the 11 point scale. This explanation is
confusing and has little intuitive power.
Formula: Standard Deviation
s=

𝑋−𝑋
2
𝑁
Standard deviation is the square root of variance (S=
𝑆 2 ), so the calculations (and symbols)are exactly the
same.




N is the total number of observations
𝑋 is the value of each observation
𝑋 is the mean of the set of data
The difference between each observation’s value and the mean
is squared before being added to eliminate negative signs.
Skewness


If observations are symmetric around the mean there are
as many observations less than the mean than there are
observations greater than the mean
Skewness measures the extent to which the
observations are asymmetric.



In other words, skewness tells us whether there are many more
observations above or below the mean.
Except skew does not count the observations, skewness
considers the values of the observations.
Like mean, skew is sensitive to extreme values.
Skewness Implications


Skewness could have normative implications
for policy outcomes and public opinion.
Some bi- and multivariate analyses become
more complicated with a skewed distribution.
Interpreting Skewness

Negative skew= most of the observation
values are above the mean.


Positive skew= most of the observation
values are below the mean.


Usually this means that most of the observations
(including the median) are below the mean.
Usually this means that most of the observations
(including the median) are below the mean.
Skew values close to zero mean that the
distribution is nearly symmetrical.
Conservative Party Skew
Are more observations above
or below the mean?
18
16
14
12
10
%
8
6
4
2
0
Mean = 4.8
Median = 5
Source: Canadian Election Study, 2008, CES_MBS_I10a [National Weight]
Conservative Party Skew
More above; therefore skew is
negative!
18
16
14
12
10
%
8
6
4
2
0
Mean < Median
Source: Canadian Election Study, 2008, CES_MBS_I10a [National Weight]
Real GDP Skew
30
Here, mean > median & skew
is positive (1.35).
Percent
20
Median = $5,194.48
0
10
Mean = $9,089.82
0
10000
20000
30000
Real GDP per Capita
40000
50000
Source: Gleditsch, K. S. 2002 via Quality of Government (QoG) v6, April 2011
Kurtosis


Kurtosis measures how tall or flat is the
distribution of the variable.
Even with the same variance, some
distributions will have more observations in a
tall peak near the mean and then be more
spread out than a distribution with the
observations more concentrated in a
shallower, broad peak near the mean.

Rarely used in social science.
Kurtosis –
Illustrated

Relative to a ‘normal’ mesokurtic distribution (kurtosis=0)


Positive kurtosis (“leptokurtic”) means that the observations have
tall peak near the mean.
Negative kurtosis values (“platykurtic” – sounds like ‘flat’) means
that the observations are very spread apart with a broad, shallow
peak.
Using Descriptive Statistics to
Make Comparisons
Compare distributions
How much have these institutions done to help
resolve the conflict in Lebanon (2006)?
45
40
35
30
U.N.
U.S.A.
E.U.
25
20
15
10
5
0
A lot
A little
Not very
much
Responses are opinions of Canadian adults.
Nothing at all
How would you?


Describe the opinions portrayed in the
previous slide. What would you say?
It may not be very easy.


There is no clear, standard or normal way to make
the descriptions.
This is where descriptive statistics proves its
use.

It is possible to discuss the overall distribution, the
mode, any apparent differences.
How much have these institutions done to
help resolve the conflict in Lebanon?
Mean
Std. Dev
Skewness
U. N.
2.5
0.9
-0.04
U. S. A.
2.9
0.9
-0.46
E. U.
2.8
0.8
-0.15
Scale:
1 = A lot
2 = A little
3 = Not very much
4 = Nothing at all
Note: the median = 3 for all three variables
Comparing Medians

The median for all these variables is three,
indicating that:



Most Canadians think that the UN, EU and UN
are not doing much or nothing at all
There are NOT large differences in opinion
between variables.
But there are some differences, and the
table clearly indicates what those
differences are in a concise manner.
Comparing Means
Mean
U. N.
2.5
U. S. A.
2.9
E. U.
2.8
Scale:
1 = A lot
2 = A little
3 = Not very much
4 = Nothing at all
The mean for the U.N. is
lower than the mean
Std. Dev
Skewness
response for USA and EU,
0.9 us that Canadians
-0.04
telling
thought
that the-0.46
U.N. was
0.9
doing [slightly] more to
0.8
-0.15
resolve the conflict than the
EU and the USA.
The low UN mean is sensitive
to the relatively high number
of respondents who said the
UN was doing “a lot.”
Comparing dispersion
Mean
Std. Dev
Skewness
U. N.
2.5
0.9
-0.04
U. S. A.
2.9
0.9
-0.46
E. U.
2.8
0.8
-0.15
Scale:
1 = A lot
2 = A little
3 = Not very much
4 = Nothing at all
The standard deviation is
about the same, indicating
that the dispersion of opinion
is about the same.
How much have these institutions done to
help resolve the conflict in Lebanon?
U. N.
U. S. A.
E. U.
Scale:
1 = A lot
2 = A little
3 = Not very much
4 = Nothing at all
All
three variables
skew
Mean
Std.
Dev
negative, indicating that
2.5 opinions are
0.9“above”
more
the
the scale
2.9mean. With 0.9
used
2.8 for this variable,
0.8 this
means that more than half
of all respondents thought
that the UN, US and EU
were doing “not very much”
or “nothing at all.” In
particular, the U.S.A., was
seen by many as not doing
very much. Can you see
this in the chart?
Skewness
-0.04
-0.46
-0.15
Comparing attitudes towards the federal parties
Mean
Median
Std. Dev
IQR
Skew
Conservative
4.8
5
2.8
5
-0.1
Liberal
4.7
5
2.3
3
-0.16
NDP
4.3
5
2.5
4
0.10
Greens
3.8
4
2.4
3
0.19
Bloc
Quebecois
2.5
2
2.8
5
0.94

Which party, on average, was the most
popular in 2008? Least popular?

Is one party much more or much less popular than
the others?
Source: Canadian Election Study, 2008, CES_MBS_I10a-e [National Weight]
Comparing attitudes towards the federal parties
Mean
Median
Std. Dev
IQR
Skew
Conservative
4.8
5
2.8
5
-0.1
Liberal
4.7
5
2.3
3
-0.16
NDP
4.3
5
2.5
4
0.10

Towards which of the three largest parties is
the widest range of feelings? Narrowest?


From this table, could you conclude that most
Canadians feel much the same way about one
party?
Do Canadians seem badly divided about any
party?
Source: Canadian Election Study, 2008, CES_MBS_I10a-e [National Weight]
Download