Chapter 4

advertisement
Math 211
Introduction to Statistics
Chapter 4 Measures of Dispersion
Dispersion: The degree to which numerical raw data tend to spread about an average value is
called the Dispersion, or Variation of the data. The most common measures of dispersion is the
range, mean deviation, semi-interquartile range, and standard deviation.
The Range: The difference between the largest and smallest numbers in the set.
The Mean Deviation: (Average deviation) The mean deviation of a set of numbers
X1 , X 2 ,... X N is denoted by MD and defined as
N
MD 
X
i 1
i
X
N
N
  Xi  X
i 1
where X is the arithmetic mean, X i  X is the absolute value of the deviations of X i from X .
If X 1 , X 2 ,... X k occur with frequencies f1 , f 2 ,... f k respectively, the mean deviation can
be written as
k
MD 
f
i 1
i
Xi  X
N
k
where N   X i . This form is useful for grouped data, where X i ’s represent class marks and
i 1
f j ’s are the corresponding class frequencies.
The Semi-Interquartile Range: (Quartile Deviation)
Q  Q1
Q 3
2
The 10-90 Percentile Range: P90  P10
Semi- 10-90 Percentile Range:
P90  P10
2
The Standard Deviation: The standard deviation of a set of N numbers X1 , X 2 ,... X N is
N
denoted by S and defined as S 
(X
i 1
i
 X )2
.
N 1
Sonuç Zorlu
Lecture Notes
1
If X 1 , X 2 ,... X k occur with frequencies f1 , f 2 ,... f k respectively, the standard deviation can be
written as
N
S
 f (X
i 1
i
i
 X )2
N 1
k
where N   X i . This form is useful for grouped data, where X i ’s represent class marks and
i 1
f j ’s are the corresponding class frequencies.
The Variance: The variance of a set of N numbers X1 , X 2 ,... X N is denoted by S 2 and defined
N
as S 2 
(X
i 1
i
 X )2
N 1
.
Properties of the Standard Deviation
(1) The standard deviation can be defined as
N
S
(X
i 1
i
 a)2
N 1
a ~ X . S is minimum when a  X .
(2) For moderately skewed distributions, the percentages below may hold approximately.
For normal distributions,
1 s.d. on either
side of the mean
2 s.d. on either side
of the mean
3 s.d. on either side
of the mean
Sonuç Zorlu
Lecture Notes
2
Short methods for computing the standard deviation
 N
X

  Xi
i 1
  i 1
N
 N


N
(1) S 
2
i
2


 



X2 X 
2
(2) If d j  X j  A are the deviations of X j from some arbitrary constant A , then
2
 N

d

 dj 
S  i 1
  i 1 
N
 N 




(3) (Coding Method) When data are grouped into a frequency distribution whose class
intervals have equal size c, we have d j  cu j or cu j  X j  A where u j  0, 1, 2,... , then
N
2
j
 k
f
u

j
  f ju j
i 1
S c
  i 1
N
 N


k
2
j






2
Example 1. Determine the percentage of the students with grades that fall within their ranges
(a) X S
(b) X 2S .
Given,
Grades No.of
students
10-19
2
20-29
5
30-39
8
40-49
11
50-59
8
60-69
5
70-79
2
N=41
Xi
ui
fi ui
14.5
24.5
34.5
44.5
54.5
64.5
74.5
-3
-2
-1
0
1
2
3
-6
-10
-8
0
8
10
6
u i2
9
4
1
0
1
4
9
f i ui2
18
20
8
0
8
20
18
  fu 
c  44.5  0  44.5
Let A=44.5, and c=10. X  A  
 N 


 k
f
u

j
  f ju j
i 1
S c
  i 1
N
 N


k
2
j
2


91
  10
 0  14.98 15
42



Sonuç Zorlu
Lecture Notes
3
(a) X
S  44.5 15, 29.5  59.5
The number of students in the range 29.5  59.5 is, 8+11+8=27.
27
 66% .
The percentage of grades is
41
(b) X 2S  44.5 30, 14.5  74.5
The number of students in the range 29.5  59.5 is,
 19  14.5 
 74.5  70 

 2  5  8  11  8  5  
 2  38.8 .
 10 
 10

38.8
 95% .
The percentage of grades is
41
Example 2. Consider the following frequency distribution to compute X , MD and S , using
coding method for X and S .
Class
boundaries
154.5-158.5
158.5-162.5
162.5-166.5
166.5-170.5
170.5-174.5
174.5-178.5
178.5-182.5
Freq.( f i )
Xi
ui
2
3
8
16
12
9
5
156.5
160.5
164.5
168.5
172.5
176.5
180.5
-3
-2
-1
0
1
2
3
..  55
u i2
fi ui
-6
-6
-8
0
12
18
15
...  25
9
4
1
0
1
4
9
f i ui2
18
12
8
0
12
36
45
...  131
Xi  X
-13.8
-9.8
-5.8
-1.8
2.2
6.2
10.2
fi X i  X
27.6
29.4
46.4
34.2
26.4
55.8
51
...  270.8
  fu 
 25 
X  A
c  168.5    4  170.32
 N 
 55 


k
MD 
f
i 1
i
Xi  X
N

270.8
 4.92
55
 k
f
u

j
  f ju j
i 1
S c
  i 1
N
 N


k
2
j
2

2

131  25 
 4
    5.84
55  55 



Sonuç Zorlu
Lecture Notes
4
Empirical Relation between Measures of Dispersions
For moderately skewed distributions, we have the empirical formulae
4
Mean Deviation   s tan dard deviation 
5
2
Semi  Interquartile Range   s tan dard deviation  .
3
Absolute and Relative Dispersion; coefficient of variation
Absolute dispersion is the actual variation.
Relative Dispersion 
absolute dispersion
.
average
If absolute Dispersion  S and average  X , then
coefficient of variation(V )=
S
(expressed as a percentage) .
X
Example 3. On a final examination in Statistics, the mean grade of a group of 150 students was
78 and the standard deviation was 8.0. In Calculus, however, the mean grade of the group was 73
and the standard deviation was 7.6. Which subject has the greater
(a) absolute dispersion
(b) relative dispersion
(a) The absolute dispersion of Statistics is Ss  8.0 and of Calculus Sc  7.6 .
Therefore, the subject Calculus has smaller absolute dispersion.
(b) Coefficients of variation are
VS 
Ss 8.0

 10.25%
X s 78
VC 
Sc 7.6

 10.41%
X c 73
Standardized Variable: Standard Scores
The variable that measures the deviation from the mean in units of the standard deviation is
XX
called a standardized variable and is given by Z 
.
S
Sonuç Zorlu
Lecture Notes
5
Example 4. A student received a grade of 84 on a final examination in Mathematics for which
the mean grade was 76 and the standard deviation was 10. On the final examination in Physics,
for which the mean grade was 82 and the standard deviation was16, she received a grade of 90.
In which subject was her relative standing higher?
X maths  84, X physics  90 , X maths  76, X physics  82 , Smaths  10, S physics  16
Since Z 
84  76
90  82
XX
 0.8 and Z physics 
 0.5
, Z maths 
10
16
S
Therefore the relative standing of the student is higher in Mathematics.
Example 5. Find the mean deviation of the numbers 2,2,4,6,7,8,9,12.
2  2  4  6  7  8  9  12 50

 4.25
First we need to find X . That is, X 
8
8
Then the mean deviation is,
N
MD 
X
i 1
i
N
X

2  4.25  2  4.25  4  4.25  6  4.25  7  4.25  8  4.25  9  4.25  12  4.25
8
2.25  2.25  0.25  1.75  2.75  3.75  4.75  7.75 25.5


 31.9
8
8
Example 6. Find (a) the Semi-Interquartile Range
(b) 10-90 Percentile Range for the data given in Example 2.
(a) The semi-interquartile range is Q 
Q3  Q1
.
2
æN
ö
çç - (å f ) ÷
÷
æ13.75 - 13 ö
÷
1
ç
÷
÷
çç
Q1 = L1 + çç 4
c
=
166.5
+
÷
÷
÷
÷.4 = 166.69
çè 16
çç
ø
÷
fQ1
÷
÷
çè
÷
ø
æ3N
ö
çç
- (å f ) ÷
÷
1÷
ç
÷
Q3 = L1 + çç 4
c = 174.5 +
÷
÷
çç
÷
fQ 3
÷
÷
çè
÷
ø
Q
æ41.25 - 41ö
÷
çç
÷
÷.4 = 174.75
çè
ø
9
Q3  Q1 174.75  166.69

 4.03
2
2
Hence 50% of the cases lie between 166.69 and 174.75. So the measure of tendency is
Q1  Q3 166.69  174.75

 170.72 . In other words, 50% of the cases lie in the range
2
2
170.72 4.03.
Sonuç Zorlu
Lecture Notes
6
(b) The 10-90 Percentile Range  P90  P10
æ10 N
ö
çç
- (å f ) ÷
÷
1÷
ç
÷
÷
P10 = L1 + çç 100
c = 162.5 +
÷
çç
÷
f P10
÷
÷
ççè
÷
ø
æ5.5 - 5 ÷
ö
çç
÷
÷.4 = 162.75
çè 8 ø
æ90 N
ö
çç
- (å f ) ÷
÷
1÷
ç
÷
÷
P90 = L1 + çç 100
c = 174.5 +
÷
çç
÷
f P 90
÷
÷
ççè
ø÷
æ49.5 - 41ö
÷
çç
.4 = 178.16
÷
÷
çè
ø
9
P90  P10  179.16  162.75  17.41 .
1
341.91
1
17.41
 170.955 and  P90  P10  
 8.705
 P90  P10  
2
2
2
2
We conclude that 80% of the cases lie in the range 170.955 8.705.
Example 7. Find the standard deviation and the variance of the following set of numbers:
6, 8, 12, 7, 4, 5, 5, 10, 9, 8
N
X
X
i 1
N
i

6  8  12  7  4  5  5  10  9  8 74

 7.4
10
10
X
X X
X  X 
6
8
12
7
4
5
5
10
9
8
-1.4
0.6
4.6
-0.4
-3.4
-2.4
-2.4
2.6
1.6
0.6
1.96
0.36
21.16
0.16
11.56
5.76
5.76
6.76
2.56
0.36
 ( X i  X )2  56.4
 X  74
2
The standard deviation is
N
S
(X
i 1
i
 X )2
N 1

56.4
 5.64  2.37
10
and the variance is S 2  5.64 .
Sonuç Zorlu
Lecture Notes
7
Example 8: Consider the following frequency distribution.
classes frequency
Xi
ui
10-14
15-19
20-24
25-29
30-34
12
17
22
27
32
-2
-1
0
1
2
7
11
14
13
5
Total 50
u i2
4
1
0
1
4
fi ui
-14
-11
0
13
10
..  2
f i ui2
28
11
0
13
20
..  72
Use the Coding Method to compute X and S .
æk
ö
çç å f u ÷
j j ÷
÷
ç
÷
÷
c = 22 +
The mean value is X = A + ççç i= 1k
÷
÷
çç
÷
÷
çç å f j ÷
÷
è i= 1
ø
æ- 2 ö
çç ÷
÷
÷5=21.8
çè 50 ø
 k
f
u

j
  f ju j
i 1
The standard deviation is S  c
  i 1
N
 N


k
2
j
Sonuç Zorlu
2

2

72  2 
 5
    5 1.4384  6 .
50  50 



Lecture Notes
8
Download