File

advertisement
Chapter 3
Descriptive Statistics:
Numerical Methods
Measures of Variability
 Measures of Relative Location and Detecting
Outliers
 Exploratory Data Analysis
 Measures of Association
Between Two Variables

x
© 2003 South-Western/Thomson LearningTM
Slide 1
Measures of Variability







It is often desirable to consider measures of variability
(dispersion), as well as measures of location.
For example, in choosing supplier A or supplier B we
might consider not only the average delivery time for
each, but also the variability in delivery time for each.
Range
Inter-quartile Range
Variance
Standard Deviation
Coefficient of Variation
© 2003 South-Western/Thomson LearningTM
Slide 2
Measures of Variation
Variation
Range
Interquartile
Range
Variance
Standard Deviation
Population
Variance
Population
Standard
Deviation
Sample
Variance
Sample
Standard
Deviation
Coefficient of
Variation
© 2003 South-Western/Thomson LearningTM
Slide 3
Variation

Measures of variation give information on the
spread or variability of the data values.
Same center,
different variation
© 2003 South-Western/Thomson LearningTM
Slide 4
Range


Simplest measure of variation
Difference between the largest and the smallest
observations:
Range = xmaximum – xminimum
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
© 2003 South-Western/Thomson LearningTM
Slide 5Chap 3-5
Example: Apartment Rents

Range
Range = largest value - smallest value
Range = 615 - 425 = 190
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
© 2003 South-Western/Thomson LearningTM
Slide 6
Interquartile Range


The interquartile range of a data set is the difference
between the third quartile and the first quartile.
It is the range for the middle 50% of the data.
© 2003 South-Western/Thomson LearningTM
Slide 7
Example: Apartment Rents

Interquartile Range
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
© 2003 South-Western/Thomson LearningTM
Slide 8
Variance


The variance is a measure of variability that utilizes
all the data.
It is based on the difference between the value of
each observation (xi) and the mean (x for a sample, m
for a population).
© 2003 South-Western/Thomson LearningTM
Slide 9
Variance


The variance is the average of the squared differences
between each data value and the mean.
If the data set is a sample, the variance is denoted by
s2.
s2 

2
(
x

x
)
 i
n 1
If the data set is a population, the variance is denoted
by  2.
2
(
x

m
)

i
2 
N
© 2003 South-Western/Thomson LearningTM
Slide 10
Variance for Grouped Data

Sample Data
s

2


fi ( X i  x )2
n 1
Population Data

2
f (X


i
i
 m)
2
N
© 2003 South-Western/Thomson LearningTM
Slide 11
Standard Deviation




Most commonly used measure of variation
Shows variation about the mean
The standard deviation of a data set is the positive
square root of the variance.
If the data set is a sample, the standard deviation is
denoted s.
2
s s

If the data set is a population, the standard deviation
is denoted  (sigma).

2
© 2003 South-Western/Thomson LearningTM
Slide 12
Calculation Example:
Sample Standard Deviation
Data (Xi) :
10
Sample
12 14 15
n=8
s

17
18
18
24
Mean = x = 16
(10  x ) 2  (12  x ) 2  (14  x ) 2    (24  x ) 2
n 1
(10  16) 2  (12  16) 2  (14  16) 2    (24  16) 2
8 1
126

 4.2426
7
© 2003 South-Western/Thomson
Learning
TM
Slide 13
Coefficient of Variation

Measures relative variation

Always in percentage (%)

Shows variation relative to mean

Is used to compare two or more sets of data measured
in different units
Population
σ
CV  
μ
  100%
 
Sample
 s 
 100%
CV  

x


© 2003 South-Western/Thomson LearningTM
Slide 14
Example: Apartment Rents

Variance

s 
2

( xi  x ) 2
n 1
 2 , 996.16
Standard Deviation
s  s2  2996. 47  54. 74

Coefficient of Variation
s
54. 74
 100 
 100  11.15
x
490.80
© 2003 South-Western/Thomson LearningTM
Slide 15
Measures of Relative Location
and Detecting Outliers


z-Scores
Detecting Outliers
© 2003 South-Western/Thomson LearningTM
Slide 16
z-Scores


The z-score is often called the standardized value.
It denotes the number of standard deviations a data
value xi is from the mean.
xi  x
zi 
s



A data value less than the sample mean will have a
z-score less than zero.
A data value greater than the sample mean will have
a z-score greater than zero.
A data value equal to the sample mean will have a
z-score of zero.
© 2003 South-Western/Thomson LearningTM
Slide 17
Example: Apartment Rents

z-Score of Smallest Value (425)
xi  x 425  490.80
z

 1. 20
s
54. 74
Standardized Values for Apartment Rents
-1.20
-0.93
-0.75
-0.47
-0.20
0.35
1.54
-1.11
-0.93
-0.75
-0.38
-0.11
0.44
1.54
-1.11
-0.93
-0.75
-0.38
-0.01
0.62
1.63
-1.02
-0.84
-0.75
-0.34
-0.01
0.62
1.81
-1.02
-0.84
-0.75
-0.29
-0.01
0.62
1.99
-1.02
-0.84
-0.56
-0.29
0.17
0.81
1.99
-1.02
-0.84
-0.56
-0.29
0.17
1.06
1.99
-1.02
-0.84
-0.56
-0.20
0.17
1.08
1.99
-0.93
-0.75
-0.47
-0.20
0.17
1.45
2.27
-0.93
-0.75
-0.47
-0.20
0.35
1.45
2.27
© 2003 South-Western/Thomson LearningTM
Slide 18
Detecting Outliers




An outlier is an unusually small or unusually large
value in a data set.
A data value with a z-score less than -3 or greater
than +3 might be considered an outlier.
It might be an incorrectly recorded data value.
It might be a data value that was incorrectly included
in the data set.
© 2003 South-Western/Thomson LearningTM
Slide 19
Example: Apartment Rents

Detecting Outliers
The most extreme z-scores are -1.20 and 2.27.
Using |z| > 3 as the criterion for an outlier,
there are no outliers in this data set.
Standardized Values for Apartment Rents
-1.20
-0.93
-0.75
-0.47
-0.20
0.35
1.54
-1.11
-0.93
-0.75
-0.38
-0.11
0.44
1.54
-1.11
-0.93
-0.75
-0.38
-0.01
0.62
1.63
-1.02
-0.84
-0.75
-0.34
-0.01
0.62
1.81
-1.02
-0.84
-0.75
-0.29
-0.01
0.62
1.99
-1.02
-0.84
-0.56
-0.29
0.17
0.81
1.99
-1.02
-0.84
-0.56
-0.29
0.17
1.06
1.99
-1.02
-0.84
-0.56
-0.20
0.17
1.08
1.99
-0.93
-0.75
-0.47
-0.20
0.17
1.45
2.27
-0.93
-0.75
-0.47
-0.20
0.35
1.45
2.27
© 2003 South-Western/Thomson LearningTM
Slide 20
Exploratory Data Analysis

Five-Number Summary
© 2003 South-Western/Thomson LearningTM
Slide 21
Five-Number Summary





Smallest Value
First Quartile
Median
Third Quartile
Largest Value
© 2003 South-Western/Thomson LearningTM
Slide 22
Example: Apartment Rents

Five-Number Summary
Lowest Value = 425
First Quartile = 450
Median = 475
Third Quartile = 525
Largest Value = 615
425
440
450
465
480
510
575
430
440
450
470
485
515
575
430
440
450
470
490
525
580
435
445
450
472
490
525
590
435
445
450
475
490
525
600
435
445
460
475
500
535
600
435
445
460
475
500
549
600
435
445
460
480
500
550
600
440
450
465
480
500
570
615
440
450
465
480
510
570
615
© 2003 South-Western/Thomson LearningTM
Slide 23
Measures of Association
between Two Variables


Covariance
Correlation Coefficient
© 2003 South-Western/Thomson LearningTM
Slide 24
Covariance



The covariance is a measure of the linear association
between two variables.
Positive values indicate a positive relationship.
Negative values indicate a negative relationship.
© 2003 South-Western/Thomson LearningTM
Slide 25
Covariance

If the data sets are samples, the covariance is denoted
by sxy.
 ( xi  x )( yi  y )
sxy 
n 1

If the data sets are populations, the covariance is
denoted by  xy .
 xy
 ( xi  m x )( yi  m y )

N
© 2003 South-Western/Thomson LearningTM
Slide 26
Correlation Coefficient




The coefficient can take on values between -1 and +1.
Values near -1 indicate a strong negative linear
relationship.
Values near +1 indicate a strong positive linear
relationship.
If the data sets are samples, the coefficient is rxy.
rxy 

sxy
sx s y
If the data sets are populations, the coefficient is
 xy
 xy

 x y
 xy
.
© 2003 South-Western/Thomson LearningTM
Slide 27
Download