Lesson 7

advertisement
Last Update
16th March 2011
SESSION 17 & 18
Measures of Dispersion
Measures of Variability
Lecturer:
University:
Domain:
Florian Boehlandt
University of Stellenbosch Business School
http://www.hedge-fundanalysis.net/pages/vega.php
Grouped Data – Investment B
Intervals
-25 to < -15
-15 to < -5
-5 to < 5
5 to < 15
15 to < 25
25 to < 35
Total
Total / 2
Mean
Ome
f(<)
fme
Median
Omo
fm
fm-1
fm+1
Mode
x
f
-20
-10
0
10
20
30
f(<)
2
5
5
4
6
3
25
12.5
xf
2
7
12
16
22
25
Actual
-40
-50
0
40
120
90
160
6.400
5
12
4
6.250
15
6
4
3
19.000
7.072
4.700
multimodal
Learning Objectives
1. Measures of relative standing: Median,
Quartiles, Deciles and Percentiles
2. Measures of dispersion: Range
3. Measures of variability: Variance and
Standard Deviation
Percentiles
The Pth percentile is the value for which P
percent are less than that value and (100 – p)%
are greater than that value.
Some special percentiles commonly used
include the median and the quartiles.
Percentiles are measures of relative standing.
Terminology
50th Percentile
25th, 50th,
75th,100th
Percentile
20th, 40th,…,
100th Percentile
10th, 20th,…,
100th Percentile
½  1 Median
¼4
Quartiles
1/5  5
Quintiles
1/10  10
Deciles
Q2
Q1 , Q2 ,
Q3,Q4,
Lp
Location of a Percentile
The location L of a percentile is a function of the
required percentile P and the sample size n:
Lp = (n + 1) * (P / 100)
As with the median, all observations must be
placed in ascending or descending order first.
Calculation of Percentile
1. Place all observations in order
2. Calculate the location of the percentile
3. Since the location will often be a fraction
(e.g. n/2), the distance between the two
observations in question must be multiplied
with the fractional part of the location
4. The result of 3. is added to the preceding
observation to yield the percentile
Percentile: An example
The following denotes the number of hours
spent on the internet:
0 0 5 7 8 9 12 14 22 23
The values are already placed in order. The
sample size is n = 10. We wish to determine L25,
L50 and L75 (this is analogous to the quartiles Q1,
Q2 and Q3)
Solution – Step 1
Obs
1
2
3
4
5
6
7
8
9
10
Data
0
0
5
7
8
9
12
14
22
23
Quartile
25
50
75
n
10
Lp
2.75
5.50
8.25
=(10 + 1) * (25 / 100)
=( + 1) * (50 / 100)
=( + 1) * (75 / 100)
Use the formula to
calculate the
location for each
percentile / quartile
Solution – Step 2
Obs
1
2
3
4
5
6
7
8
9
10
Data
0
0
5
7
8
9
12
14
22
23
Quartile
25
50
75
n
10
Lp Fraction
2.75
0.75
5.50
0.50
8.25
0.25
=2.75 - 2
=5.5 - 5
=8.25 - 8
Determine the
fractional part of the
location
Solution – Step 3
Obs
1
2
3
4
5
6
7
8
9
10
Data
0
0
5
7
8
9
12
14
22
23
Quartile
25
50
75
n
10
Lp Fraction Lower Upper
2.75
0.75
0
5
5.5
0.50
8
9
8.25
0.25
14
22
Determine the next
lower and next
higher observation
associated with the
location. For 2.75,
the two observations
are 2  0 and 3 
5.
Solution – Step 4
Obs
1
2
3
4
5
6
7
8
9
10
Data
0
0
5
7
8
9
12
14
22
23
Quartile
25
50
75
n
10
Lp Fraction Lower Upper Solution
2.75
0.75
0
5
3.75
5.5
0.50
8
9
8.50
8.25
0.25
14
22 16.00
=0 + (5 - 0) * 0.75
=8 + (9 - 8) * 0.5
=14 + (22 - 14) * 0.25
In order to determine the quartile associated with a given
location, you need to calculate the following:
Solution = Lower + (Upper – Lower) * Fraction
Exercises
You may use shortcuts if you want!
1. Determine the first, second and third
quartiles:
5 8 2 9 5 3 7 4 2 7 4 10 4 3 5
2. Determine the third and eighth deciles (30th
and 80th percentile):
10.5 14.7 15.3 17.7 15.9 12.2 10.0
14.1 13.9 18.5 13.9 15.1 15.7
Range
The range is the difference between the
minimum and maximum observation. It is a
measure of dispersion.
The interquartile range is the difference
between the third and the first quartile:
Interquartile Range = Q3 – Q1
Variance
The variance expresses the sum of the squared
deviation of every single observation from the
sample / population mean. All differences are
squared so that positive and negative deviations
from the mean are not cancelled out.
The variance in a measure of variability.
Population and Sample Variance
We need to differentiate between population
variance and sample variance. From the
calculation of the mean, the sample variance
has one less degrees of freedom (n-1) in
calculating the variance. For the hypothetically
infinite population of size N this is not the case.
Formulas
Sample
Sample size
Observation
Sample Mean
Sample Statistic
Population
Total population
size
Observation
Population
Mean
Population
Parameter
Calculation of Variance
1. Calculate the average:
Sum of observations / number of
observations
2. Subtract the average from every obervation
3. Square the difference
4. Sum the squared differences
5. Divide the result from 4. by either N
(population) or n-1 (sample)
Variance: An example
The following denotes the number of hours
spent on the internet for a sample of n = 10
adults:
0 7 12 5 33 14 8 0 9 22
Calculate the variance.
Solution – Step 1
Obs
Data Difference
1
0
-8
2
7
-1
3
12
4
4
5
-3
5
3
-5
6
14
6
7
8
0
8
0
-8
9
9
1
10
22
14
Total
80
n
10
n-1
Average
8
=(0 - 8)
=(7 - 8)
=(12 - 8)
=(5 - 8)
=(3 - 8)
=(14 - 8)
=(8 - 8)
=(0 - 8)
=(9 - 8)
=(22 - 8)
Use the mean to
calculate the
differences between
the mean and every
observation
Solution – Step 2
Obs
Data Difference Sqr Diff
1
0
-8
64
2
7
-1
1
3
12
4
16
4
5
-3
9
5
3
-5
25
6
14
6
36
7
8
0
0
8
0
-8
64
9
9
1
1
10
22
14
196
Total
80
412
n
10
n-1
9
Average
8
45.778
=(-8)^2
=(-1)^2
=(4)^2
=(-3)^2
=(-5)^2
=(6)^2
=(0)^2
=(-8)^2
=(1)^2
=(14)^2
Square all
differences. Next,
Sum the differences
and divide the sum
by n – 1 (sample
only)
In case of the sample, the sumsq is
divided by n-1, in the case of the
population it is divided by N
Interpretation Variance
The variance may be difficult to interpret.
Remember that all differences are squared to
avoid positive and negative differences from
cancelling out. The statistic may be standardized
by taking the square root of the variance. This
statistic is called the standard deviation.
However, the variances from two datasets may
still be referred to when determining the more
volatile dataset.
Example – Standard Deviation
The population standard deviation:
Similarly, the sample standard deviation:
Thus, for the internet usage example:
Solution – Step 3
Obs
Data Difference Sqr Diff
1
0
-8
64
2
7
-1
1
3
12
4
16
4
5
-3
9
5
3
-5
25
6
14
6
36
7
8
0
0
8
0
-8
64
9
9
1
1
10
22
14
196
Total
80
412
n
10
n-1
9
Average
8
45.778
Sqrt
6.766
Interpretation:
On average, observations of internet
usage within the sample of ten people
deviates by 6.766 h from the sample
mean.
Exercises
1. Calculate the variance and standard
deviation for the following data:
2 8 9 4 1 7 5 4
2. Calculate the variance and standard
deviation for the following data:
7 -5 -3 8 4 -4 1 -5 9 3
Download