Numerical Measures

advertisement
Descriptive Statistics: Numerical Measures
Distribution
Chapter 3
BA 201
Slide 1
DISTRIBUTION
Slide 2
Measures of Distribution Shape,
Relative Location, and Detecting Outliers





Distribution Shape
z-Scores
Chebyshev’s Theorem
Empirical Rule
Detecting Outliers
Slide 3
Distribution Shape: Skewness

An important measure of the shape of a distribution
is called skewness.

The formula for the skewness of sample data is
 xi  x 
Skewness 



( n  1)(n  2)
 s 
n
3
Slide 4
Distribution Shape: Skewness
Symmetric (not skewed)
• Skewness is zero.
• Mean and median are equal.
.35
Relative Frequency

Skewness = 0
.30
.25
.20
.15
.10
.05
0
Slide 5
Distribution Shape: Skewness
Moderately Skewed Left
• Skewness is negative.
• Mean will usually be less than the median.
.35
Relative Frequency

Skewness =  .31
.30
.25
.20
.15
.10
.05
0
Slide 6
Distribution Shape: Skewness
Moderately Skewed Right
• Skewness is positive.
• Mean will usually be more than the median.
.35
Relative Frequency

Skewness = .31
.30
.25
.20
.15
.10
.05
0
Slide 7
Distribution Shape: Skewness

Highly Skewed Right
• Skewness is positive (often above 1.0).
• Mean will usually be more than the median.
Relative Frequency
.35
Skewness = 1.25
.30
.25
.20
.15
.10
.05
0
Slide 8
Distribution Shape: Skewness
Apartment Rents
425
430
430
435
435
435
435
435
440
440
440
440
440
445
445
445
445
445
450
450
450
450
450
450
450
460
460
460
465
465
465
470
470
472
475
475
475
480
480
480
480
485
490
490
490
500
500
500
500
510
510
515
525
525
525
535
549
550
570
570
575
575
580
590
600
600
600
600
615
615
Slide 9
Distribution Shape: Skewness
Apartment Rents
Relative Frequency
.35
Skewness = 0.92
.30
.25
.20
.15
.10
.05
0
Slide 10
z-Scores
The z-score is often called the standardized value.
It denotes the number of standard deviations a data
value xi is from the mean.
zi 
xi  x
s
Slide 11
z-Scores
 An observation’s z-score is a measure of the relative
location of the observation in a data set.
z-score < 0
z-score > 0
x
z-score = 0
Slide 12
z-Scores
Apartment Rents
• z-Score of Smallest Value (425)
z
xi  x
s

425  490.80
54.74

 1.20
Standardized Values for Apartment Rents
-1 .2 0
-1 .1 1
-1 .1 1
-1 .0 2
-1 .0 2
-1 .0 2
-1 .0 2
-1 .0 2
-0 .9 3
-0 .9 3
-0 .9 3
-0 .9 3
-0 .9 3
-0 .8 4
-0 .8 4
-0 .8 4
-0 .8 4
-0 .8 4
-0 .7 5
-0 .7 5
-0 .7 5
-0 .7 5
-0 .7 5
-0 .7 5
-0 .7 5
-0 .5 6
-0 .5 6
-0 .5 6
-0 .4 7
-0 .4 7
-0 .4 7
-0 .3 8
-0 .3 8
-0 .3 4
-0 .2 9
-0 .2 9
-0 .2 9
-0 .2 0
-0 .2 0
-0 .2 0
-0 .2 0
-0 .1 1
-0 .0 1
-0 .0 1
-0 .0 1
0 .1 7
0 .1 7
0 .1 7
0 .1 7
0 .3 5
0 .3 5
0 .4 4
0 .6 2
0 .6 2
0 .6 2
0 .8 1
1 .0 6
1 .0 8
1 .4 5
1 .4 5
1 .5 4
1 .5 4
1 .6 3
1 .8 1
1 .9 9
1 .9 9
1 .9 9
1 .9 9
2 .2 7
2 .2 7
Slide 13
PRACTICE
Z-SCORES
Slide 14
Practice #6 – z-Scores
x = 13
s = 7.4
zi 
xi  x
s
xi
xi  x
3
-10
7
-6
11
-2
16
3
18
5
23
10
z-Score
Slide 15
Chebyshev’s Theorem
At least (1 - 1/k2) of the items in any data set will be
within k standard deviations of the mean, where k is
any value greater than 1.
Within k standard
deviations of mean
2
3
4
% of data values
75%
89%
94%
Slide 16
Chebyshev’s Theorem
Apartment Rents
Let z = 1.5 with x = 490.80 and s = 54.74
At least (1  1/(1.5)2) = 1  0.44 = 0.56 or 56%
of the rent values must be between
x - k(s) = 490.80  1.5(54.74) = 409
and
x + k(s) = 490.80 + 1.5(54.74) = 573
(Actually, 86% of the rent values
are between 409 and 573.)
Slide 17
Empirical Rule
When data approximate a bell-shaped distribution, the
empirical rule can be used to determine the percentage of
data values that must be within a specified number of
standard deviations of the mean.
Within … of the mean
% of data values
+/- 1 standard deviation
+/- 2 standard deviations
+/- 3 standard deviations
68.26%
95.44%
99.72%
Slide 18
Empirical Rule
99.72%
95.44%
68.26%
m – 3s
m – 1s
m – 2s
m
m + 3s
m + 1s
m + 2s
x
Slide 19
PRACTICE
CHEBYSHEV’S THEOREM
AND EMPIRICAL RULE
Slide 20
Practice #7 - Chebyshev’s Theorem
x = 1200
s = 110
How many items (%) are within k standard deviations?
k = 1.25
k = 3.5
Slide 21
Practice #7 – Empirical Rule
x = 1200
s = 110
What is the lower bound for 2 standard deviations?
The upper bound? How many items (%) are within
this area?
Slide 22
Detecting Outliers
 An outlier is an unusually small or unusually large
value in a data set.
 A data value with a z-score less than -3 or greater
than +3 might be considered an outlier.
 It might be:
• an incorrectly recorded data value
• a data value that was incorrectly included in the
data set
• a correctly recorded data value that belongs in
the data set
Slide 23
Detecting Outliers
Apartment Rents
• The most extreme z-scores are -1.20 and 2.27
• Using |z| > 3 as the criterion for an outlier, there
are no outliers in this data set.
Standardized Values for Apartment Rents
-1 .2 0
-1 .1 1
-1 .1 1
-1 .0 2
-1 .0 2
-1 .0 2
-1 .0 2
-1 .0 2
-0 .9 3
-0 .9 3
-0 .9 3
-0 .9 3
-0 .9 3
-0 .8 4
-0 .8 4
-0 .8 4
-0 .8 4
-0 .8 4
-0 .7 5
-0 .7 5
-0 .7 5
-0 .7 5
-0 .7 5
-0 .7 5
-0 .7 5
-0 .5 6
-0 .5 6
-0 .5 6
-0 .4 7
-0 .4 7
-0 .4 7
-0 .3 8
-0 .3 8
-0 .3 4
-0 .2 9
-0 .2 9
-0 .2 9
-0 .2 0
-0 .2 0
-0 .2 0
-0 .2 0
-0 .1 1
-0 .0 1
-0 .0 1
-0 .0 1
0 .1 7
0 .1 7
0 .1 7
0 .1 7
0 .3 5
0 .3 5
0 .4 4
0 .6 2
0 .6 2
0 .6 2
0 .8 1
1 .0 6
1 .0 8
1 .4 5
1 .4 5
1 .5 4
1 .5 4
1 .6 3
1 .8 1
1 .9 9
1 .9 9
1 .9 9
1 .9 9
2 .2 7
2 .2 7
Slide 24
Slide 25
Download