251descr2

advertisement
251descr2 2/10/06 (Open this document in 'Outline' view!)
G. Measures of Dispersion and Asymmetry.
1. Range
Range  highest number  lowest number or highest midpoint  lowest midpoint .
Interquartile Range: IQR  Q3  Q1 . See 251descr2ex2 for example.
2. The Variance and Standard Deviation of Ungrouped Data.
a. The Population Variance - Definitional and Computational Formulas.
The definition of the population variance is ‘the average squared deviation of measurements from the
mean.’ The definitional formula just realizes this definition.
Definitional 
2
 x   

2
Computational 
N
2
x

N
2
 2
Standard Deviation = variance
b. The Sample Variance.
Definitional s
2
 x  x 

2
Computational s
n 1
2
x

2
 nx 2
n 1
The computational formula is one of the most important formulas you will learn. Note that
the same as
 x  . For example, if x is 2,3,5 ,  x
2
2
x
2
is not
 2 2  3 2  5 2  4  9  25  38 , not
2  3  52  10 2  100 .
Example: Use x  2,3,5
Computational Method
x2
x
2
4
3
9
5
25
10
38
From this we find
 x  10,  x
2
 38, x 
Definitional Method
x
x  x 
2
-1.33333
3
-0.33333
5
1.66667
10
0.00001
 x  10  3.33333
n
3
and
 x  x 2
1.77778
0.11111
2.77778
4.66667
 x  x 
2
 4.66667 Note that
 x  x  should be zero, but is not because of rounding. Now, if we use the computational method,
 x  nx  38  33.33333   4.6667  2.3333 (Some texts prefer
we can use s 
2
2
2
2
n 1
s2 

 x 
1
x2 
n
n 1
2
3 1
2
1
38  10 2
4.66666667
3


 2.33333 which gives us a little more accuracy for
3 1
2
a little more work.) If we use the definitional method s 2 
that we had to do three subtractions instead of 1.
 x  x 
n 1
2

4.66667
 2.33333 , but note
2
251descr2 2/10/06
c. The Coefficient of Variation.
C
std .deviation
mean
d. Chebyshef’s Inequality and the Empirical Rule


Chebyshef Inequality: P x    k 
or P  k  x    k   1 
1
k
2
1
k
2
. A z-score z 
x

is
the same as k . (See explanation below)
Empirical rule: (For Symmetrical Unimodal distributions only)
68% within one standard distribution of the mean, 95% within two and almost all (99.7%) within three.
3. The Variance and Standard Deviation of Grouped Data.
For grouped data generally substitute
f
for

.
4. Skewness and Kurtosis.
Define Population Skewness, the 3rd k-statistic, coefficients of Skewness; Population Kurtosis, the 4th kstatistic, the Coefficient of Excess; Leptokurtic, Platykurtic and Mesokurtic distributions.
The usual measurement of skewness is often called the third moment about the mean .
(The population variance is the second). The formula for population skewness is:
 x   
3
3 
N
.
The corresponding sample statistic is the third k-statistic, k 3 
corresponding computational formulas are
1
n
3 
x 3  3
x 2  2 N 3 and k 3 
N
n  1n  2



data formulas, put an f to the right of the

n  1n  2 
 x
n
3
 3x
x
2
x  x 3 .
The

 2nx 3 . To make grouped
sign. Positive values of these formulas imply skewness to
the right, negative values to the left. Note that multiplying all the values of x by two would multiply the
values of these coefficients by eight, but would not change the shape of the distribution. If we want to
compare shapes, we need measurements that will not change if we multiply all values by a constant. Such

k
a measure would be called the coefficient of relative skewness, with the formulas  1  33 and g1  33 .

s
Note that for the Normal distribution  1  0 . Other measures of skewness are Pearson's measures of
skewness, SK1 
mean  mod e
std .deviation
and SK 2 
3mean  median
. These are roughly equivalent, since, for a
std .deviation
moderately skewed distribution, mean  mod e  3mean  median . It seems that 3  SK1  3 and that
values between. 1 and -1 are considered to indicate moderate skewness.
251descr2 2/10/06
Example:
Profit Rate
9-10.99
11-12.99
13-14.99
15-16.99
17-18.99
Total
fx
x (midpoint)
f
3
3
5
3
1
15
10
12
14
16
18
fx2
300
432
980
768
324
2804
30
36
70
48
18
202
fx3
3000
5184
13720
12288
5832
40024
 f  n  15 ,  fx  202 ,  fx  2804 ,  fx  40024 , so that
 fx  202  13.467 and s   fx  nx  2804  1513.467   82.733  5.981 , which means
x
2
So
3
2
2
2
2
n 1
15  1
14
s
.
To measure skewness, use one of the following three
s  5.981  2.446 . C   2.446  0182
.
x 13.467
results.
n
15
k3 
fx 3  3x
fx 2  2nx 3 
40024  313 .467 2804   215 13 .467 3
n  1n  2
14 13
n
15


158.249 
= 0.680, or
(14 )(13)




Relative Skewness g 1 
Pearson's Measure of Skewness SK1 
k3
s
3

0.680
2.446 3
 .046 or
mean  mod e  13.467  14  0.2179 .
Note that, in this case,
std .deviation
2.446
Pearson's Measure 1 and Relative Skewness contradict each other as to the direction of skewness.
The measures of kurtosis are, for populations,
4
 x   

N
4

1
N
 x
4
 4

n2
n  1
k4 
n  1n  2n  3 

x
3
 x  x 
n
 6 2
4

x
2
 3 4
3n  13 s 4 
.

n2


and, for samples,
k 4 can be considered an estimate of
 4  3 4 . To get a measurement of shape use the Coefficient of Excess  2 
4
 3 or g 2 
k4
.
s4
Since the Normal distribution has  4  3 4 , the coefficient of excess is zero for the Normal distribution.
Kurtosis has traditionally been considered a measure of the peakedness of a distribution relative to the
Normal distribution, though there are some exceptions to this interpretation. If the coefficient of excess is
positive, we may call a distribution leptokurtic or sharp-peaked (and long-tailed). If the coefficient of
excess is negative, the distribution can be called platykurtic or flat-peaked (and short-tailed). If the
coefficient of excess is close to zero, we call the distribution mesokurtic, middle-peaked. A symmetric,
mesokurtic distribution is essentially Normal. An alternate measure, called simply the coefficient of
.5x.25  x.75 
kurtosis is K 
. This is dimension-free and takes values between zero and 0.5. Values above
x.10  x.90
.263 ( K for the Normal distribution) indicate a leptokurtic distribution. Values below .263 indicate a
platykurtic distribution.

4
251descr2 2/10/06
Example (using definitional formulas):
Profit Rate
x
f
midpoint
9-10.99
3
10
11-12.99
3
12
13-14.99
5
14
15-16.99
3
16
17-18.99
1
18
Total
15
So
fx
30
36
70
48
18
202
x  x 
-3.467
-1.467
0.533
2.533
4.533
f x  x 
-10.400
-4.400
2.667
7.600
4.533
0.000
 f  n  15 ,  fx  202 ,  f  x  x   0 ,  f  x  x 

f  x  x   8.249 and
s2 
3
 f x  x 
n 1

2
f  x  x
36.053
6.453
1.422
19.253
20.551
83.732
f  x  x
3
-124.985
-9.465
0.759
48.775
93.164
8.249
433.323
13.885
1.079
123.457
422.317
944.466
 83.732 ,
f  x  x   944.466 , so that x 
4
f  x  x 2
 fx  202  13.467 and
n
15
2

s
83 .732
.
.
 5.981 , which means s  5.981  2.446 . C   2.446  0182
x 13.467
14
To measure skewness, use one of the following three results. k 3
Relative Skewness g1 
0.680, or
Pearson's Measure of Skewness SK 
3 mean  mode

n
3
(n  1)(n  2)
k3
s
 f  x  x
3

0.680
 2.446 3

15 8.249
1413
=
.046 or
313.467  14
. Note that, in this case,
 0163
.
std. deviation
2.446
Pearson’s Measure and Relative Skewness contradict each other as to the direction of skewness.

f x  x 4 3n  13 s 4 
n2


n  1
k4 


n  1n  2n  3 
n
n2


k
310337
.
 0.868 . The negative sign implies that the distribution is
=-31.0337. So g 2  44 
s
5.981 2
platykurtic.

5. Review
a. Grouped Data. See 251dscr_D
b. Ungrouped Data. See 251dscr_D
Appendix: Explanation of Sample Formulas (Not for student consumption until you
know about expected value.) See 251dscr_B .
Appendix: Explanation of Computational Formulas (The part about the variance is
fairly easy, the rest is more difficult) See 251dscr_C .
4
251descr2 2/10/06
Appendix: Explanation of Chebyshef’s Inequality
Make a diagram. Show a curve that looks like a Normal curve with the middle marked  . Mark off two
points on either side of on your x axis at equal distances from  . Label these points   k and   k .
( k can be any number above one, like 1.32 or 5. ) The areas below   k and above   k are the left


and right tails of the distribution. Then the statement , P x    k 
of points that is in these two tails cannot be greater than
P  k  x    k   1 
must exceed 1 
1
k2
1
k2
1
k2
1
k2
, means that the total proportion
. The statement ,
, means that the proportion of points that is between   k and   k
. For example, suppose k  1.32,   15 and   3. Then
  k  15  1.323  11.04 ,   k  15  1.323  18.96 , 1
proportion of points between 11.04 and 18.96 is above 1 
1
k2
k2
 1
1
 .5739 and the
1.7424
1.32 2
 1  .5739  .4261 or 42.61%. The
proportion of points in the tails is at most 57.39%.
Measures of Inequality
Measuring Inequality, a PowerPoint presentation which explains various measures of income inequality,
is available on the ECAAR website. This presentation was prepared by Paul Burkholder, ECAAR's Project
Manager as part of ECAAR’s project on "Inequality and Democratic Development." Paul is a recent
graduate of Temple University, with a degree in economics. He does research for current and potential
projects, and assists with media and member outreach.
See http://www.ecaar.org/Inequality/powerpoint/measuring%20inequality_files/frame.htm
Correction?:
In response to a student query (Thank you!), a small correction was made above in the computations for the
variance using grouped data and definitional formulas. However, the results
 fx

2804  1513 .467 2 82 .733

 5.981 , so that s  5.981  2.446 bear more
n 1
15  1
14
explanation. If you used the numbers given here, you would have gotten
fx 2  nx 2 2804  1513 .467 2 83 .599
s2 


 5.971 and s  5.981  2.446 . My result occurred
n 1
15  1
14
because I tend to carry more decimal places than I admit, something that may occur in other places in these
notes. Obviously if your answers differ from mine because of this sort of rounding error, I have no business
calling them wrong.
s
2

2
 nx 2

Download