Document 15929988

advertisement
251solnG3B 1/31/08 (Open this document in 'Page Layout' view!)
G. Measures of Dispersion and Asymmetry.
1. Range
Downing & Clark, problem 7 above (Use data to find IQR). Review solutions and terms on page 41 (36 in 3 rd ed.) of Downing &
Clark.
2. The Variance and Standard Deviation of Ungrouped Data.
Text exercises 3.1b, 3.2b, 3.6, 3.37, 3.24 [3.1b, 3.2b, 3.7, 3.37, 3.23] (3.1b, 3.2b, 3.7, 3.23, 3.33)
3. The Variance and Standard Deviation of Grouped Data.
Text exercises 3.28, 3.30 (3.68, 3.70) (work 3.30 in thousands), Downing & Clark pg 42 or 37, problems 6,7 (Find sample standard
deviation – hint: run problem 6 in hundreds) (Note that you can use the Excel or Minitab techniques in the graded assignment to
compute and sum the
fx
and
fx 2
columns in problems 6 and 7. ), Problems G1, G2. Graded Assignment 1
4. Skewness and Kurtosis.
Find the standard deviation, coefficient of variation and measures of skewness in Text problem 3.1, 3.2. Problems G3A, G4 (See
251wrksht).
5. Review
a. Grouped Data.
b. Ungrouped Data.
Part of Section 4 is in this document.
----------------------------------------------------------------------------------------------------------------------------.
Problem G3A: Use computational formulas on the data below. Consider the data a sample.
a. Complete the cumulative frequency under F .
b. Calculate the mean.
c. Calculate the median.
d. Calculate the mode.
e. Calculate the variance.
f. Calculate the interquartile range.
g. Calculate the standard deviation.
h. Calculate a statistic showing skewness.
i. Show all the data presented on a histogram with six class intervals.
j. Put a box plot below the histogram.
Now repeat Problem G3A using definitional formulas.
Class
x
F
f
0 - 9.999
50
10 - 19.999
50
20 - 29.999
100
30 - 39.999
150
40 - 49.999
50
xf
x2 f
x3 f
Solution: Fill in the table. Note that the conventional way of writing the headings is Class, x , f , F, fx ,
fx2 and fx3 . We use computational formulas first. So if x = 5 and f = 50, fx  505  250 ,
fx2  2505  1250 and fx3  12505  6250 .
F
Class
f
x
0 - 9.999
10 - 19.999
20 - 29.999
30 - 39.999
40 - 49.999
(midpoint)
5
15
25
35
45
50
50
100
150
50
400
50
100
200
350
400
xf
250
750
2500
5250
2250
11000
x2 f
1250
11250
62500
183750
101250
360000
x3 f
6250
168750
1562500
6431250
4556250
12725000
251solnG3 1/31/08
To summarize our results,
 f  n  400 ,  fx  11000 ,  fx
2
 360000 and
 fx
a. Complete the cumulative frequency under F: (See above.) We add down the
3
 12725000 .
f column.
b. Calculate the mean:
x
 fx  11000  27.50
n
400
c. Calculate the median: To get a measure of position in grouped data
 pN  F 
first use position  pn  1 , then use x1 p  L p  
 w to find the value. Here
 f p 
p  .5 . So pn  1  .5401  200 .50 . This location is above 200 and below 350, so use 20
 .5400   200 
to 29.9999. Then x.5  30  
10  30 .00 .
150


d. Calculate the mode:
The group 30-39.999 has a frequency of 150, which is the largest frequency. So the mode is
35.00, its midpoint.
e. Calculate the variance:
s2 
 fx
2
 nx 2
n 1
360000  400 27 .50 2
 144 .1103
399

f. Calculate the interquartile range:
For the first quartile position  pn  1  .25401   100 .25 . This location is above 100 and
 pN  F 
below 200, so use 20 to 29.999. Then, using x1 p  L p  
 w we find
 f p 
 .25400   100 
x.75  Q1  20  
10  20 .00
100


For the third quartile pn  1  .75401   300 .75 . This location is above 200 and below 350, so
 .75 400   200 
x.25  Q3  30  
10  36 .67 . So
150


IQR  Q3  Q1  36.67  20.00  16.67
use 30 to 39.999. Then, we find
g. Calculate the standard deviation:
s  variance  144.1103  12.005 . Note also
std .deviation 12 .005

 0.4365 .
the coefficient of variation C 
mean
27 .50
h. Calculate a statistic showing skewness: There are three possibilities:
n
fx 3  3x
fx 2  2nx 3
1) k 3 
(n  1)( n  2)





400
12725000  327 .5360000   2400 27 .53
(399 )(398 )
 .00251886 12725000  29700000  16637500
 .00251886 337500   850 .115 .


2
251solnG3 1/31/08
2) g 1 
k3
s
3

 850 .115
 144 .1103 
3

 850 .115
 0.4914 .
1729 .98580
3mean  mode  327 .5  35 

 1.874 .
std .deviation
12 .005
Only one of these three is needed. All indicate skewness to the left.
3) Pearson’s Measure of Skewness SK 
i.
Show all the data presented on a histogram with five class intervals.
j. Put a box plot below the histogram. The box will begin at 20 and end at 36.67 with a band at
30 to indicate the median.
(Include a hand-drawn solution to i and j.)
Now we do the problem using definitional formulas. Note how much bigger the table has to be! Once
again, the conventional headings would be Class, x, f, fx, x  x  , f x  x  , f x  x 2 and f x  x 3 .
There is no reason to use both the computational and the definitional methods unless specifically
requested, though, of course, one method serves as a check on the other.
x
(midpoint)
5
15
25
35
45
Class
0 - 9.999
10 - 19.999
20 - 29.999
30 - 39.999
40 - 49.999
f
xf
50
50
100
150
50
400
We can summarize the table as follows:
 f x  x 
3
250
750
2500
5250
2250
11000
F
50
100
200
350
400
x  x 
-22.5
-12.5
-2.5
7.5
17.5
x  x  f
-1125
-625
-250
1125
875
0
x  x 2 f
25312.5
7812.5
625.0
8437.5
15312.5
57500.0
x  x 3 f
-569531.25
-97656.25
-1562.50
63281.25
267968.75
-337500.00
 f  n  90,  fx  11000 ,  f x  x 2  57500 ,
 337500.
e. Calculate the variance:
s2 
 f x  x 
n 1
2

57500
 144 .1102
399
h. Calculate a statistic showing skewness:
n
400
 337500   850 .1152
k 3
f x  x 3 
399 398 
(n  1)( n  2)
Other calculations are the same as on the previous page.

3
251solnG3 1/31/08
Problem G4: For the sample below, compute the following:
b. the mean
c. the median (hint: put in order first!)
d. the mode
e. the variance
f. the interquartile range
g. the standard deviation
h. a statistic showing skewness.
1,2,4,5,6,3,3,7,8,3,1,2
Solution: Both the computational and definitional method are shown. There is no reason to do both
unless specifically requested.
Computational Method
2
x
x
1
2
4
5
6
3
3
7
8
3
1
2
45
1
4
16
25
36
9
9
49
64
9
1
4
227
So n  12 ,
Definitional Method
x
3
1
8
64
125
216
27
27
343
512
27
1
8
1359
 x  45,  x
2
 227 ,
x
3
x in order
xx
x  x 
x  x 
-2.75
-1.75
0.25
1.25
2.25
-0.75
-0.75
3.25
4.25
-0.25
-2.75
-1.75
0.00
7.5625
3.0625
0.0625
1.5625
5.0625
0.5625
0.5625
10.5625
18.0625
0.5625
7.5625
3.0625
58.2500
-20.79688
-5.35938
0.01563
1.95313
11.39063
-0.42188
-0.42188
34.32813
76.76563
-0.42188
-20.79688
-5.35438
70.87499
2
 1359 ,
 x  x 
2
3
 58.250 , and
 x  x 
3
index
x
1
2
3
4
5
6
7
8
9
10
11
12
1
1
2
2
3
3
3
4
5
6
7
8
 70.875.
b. the mean:
x
 x  45  3.75
n
12
c. the median (hint: put in order first!):
To get a measure of position first use position  pn  1  .513  6.5 .
33
This implies that we want the mean of x6 and x7 . x.5 
 3 . We can also use the method
2
for finding any fractile. position  pn  1  .513  6.5  a.b . From this we get a  6 and
.b  .5 . Now use x1 p  xa  .b( xa1  xa ) . So x.50  x 6  .5( x 7  x6 )  3  .5(3  3)  3 . The
two ways of finding the median of ungrouped data always give identical results.
d. the mode: The mode is 3, since that appears most.
e. the variance:
i) Computational Formula s
2
x

2
 nx 2
n 1
 x  x 

227  12 3.75 2
 5.29545
11
2
58 .2500

 5.29545
n 1
11
You need only one of these two. I strongly recommend the first one.
ii) Definitional Formula s 2 
4
251solnG3 1/31/08
f. the interquartile range:
First Quartile position  pn  1  .2513  3.25 . From this we get a  3 and
.b  .25 . Now use x1 p  xa  .b( xa1  xa ) . So Q1  x.75  x 4  .25( x 4  x3 )
 2  .25(2  2)  2 .
Third Quartile position  pn  1  .7513  9.75 . From this we get a  9 and
.b  .75 . Now use x1 p  xa  .b( xa1  xa ) . So Q3  x.25  x9  .75( x10  x9 )
 5  .75(6  5)  5.75 .
IQR  Q3  Q1  5.75  2  3.75
g. the standard deviation: s  variance  5.29545  2.30119 .
std .deviation 2.30119
C

 0.6137 .
Note also the coefficient of variation
mean
3.75
h. a statistic showing skewness. There are three possibilities:
n
i) Computational Formula for Skewness k 3 
(n  1)( n  2)

 x
 3x
3
x
2
 2nx 3


12
1359  33.75 227   212 3.75 3
(11)(10 )
12
1359  255375  1265 .625   12 70.875   7.73182 .

(11)(10 )
(11)(10 )

ii) Definitional Formula for Skewness k 3 

n
(n  1)( n  2)
 x  x 
3
12
70.875   7.73182 .
(11)(10 )
iii) Relative Skewness g1 
k3

7.73182
 0.6345 .
2.30119 3
3mean  mode 33.75  3

 0.87775 .
iv) Pearson’s Measure of Skewness SK 
s
3
std .deviation
2.30119
Only one of these four is needed. All of these are positive, indicating skewness to the right.
5
Download