Lecture Note 4

advertisement
ST361: Ch2.2 & 2.3 Numerical Summary Measures of Variability for Data
Topics:
 Measures of variability (spread):
o
Deviation
o
Variance/standard deviation
o
Interquartile range
 Statistical definition of Outliers: the 1.5 x IQR criterion for outliers
----------------------------------------------------------------------------------------------------------Measures of Variability (Spread)
(1) Deviation: the difference between the observation and the mean
 Deviation of x i , denoted as di , is ______________________
 What is the mean deviation ( d ) ?
1
(2) Variance / Standard Deviation
 Sample Variance: the sample variance s 2 of n observations is ____
________________________________________________________
n
s2 
 x
i 1
 x
2
i
n 1
► An alternative form convenient for calculation
s2 =
 Sample Standard Deviation (SD): the sample standard deviation, denoted by ___, is
_______________________________________________________
n
s
 x
i 1
i
 x
2
n 1
Ex. Grades of 9 students on a HW assignment: 86,85,81,82,84,84,83,84,87.
x  84 . What is the SD? (Also know that
9
x
i 1
2
i
 63532
► Interpretation:
 Comments:
1. SD should be used as a measure of spread only when __________ is used as the
measure of center
2. SD=0 implies ________________________________________________
Why?
3. Like mean, SD is strongly influenced by outliers
Ex. If a coding error makes 87 to be 870, then x =182 and SD becomes 278…
2
(3) Interquartile Range (IQR)
(a) Quartiles:
To find quartiles:
1. Sort the data and divide data points into 2 halves (If there are odd numbers of
observations, _________________________ in each half.)
2. Lower quartile  Q1= _____________________________
3. Upper quartile  Q3 = ____________________________
(b) Inter-Quartile Range (IQR)
 IQR =
 Interpretation of IQR:
Ex. Grades of 9 students on a HW assignment: 86,85,81,82,84,84,83,84,87.
3
Ex.
Rainfall in NC in the past 15 months
1|0
2 | 25
3 | 45
4 | 11667
5 | 449
6|0
7|
8|2
Stem: one digit
Leaf: tenths digit
Find quartiles and the IQR
Remark 1: The 1.5 x IQR Criterion for Outliers
 An observation is called an outlier if _____________________________
___________________________________________________________
 Extreme outlier indicates ____________________________________
_________________________________________________________
Ex. Use the 1.5xIQR rule to check if there is any outlier in the Rainfall dataset.
4
Remark 2: The 5-number summary
Remark 3: Standard numerical summaries of a data set include sample size, center, and
spread.

For reasonably symmetric distribution with no outliers, use _________________

For the rest situation, use _________________________________________
Remark 4: Change of Unit
1. Adding (or subtracting) a constant to each observation WILL / WILL NOT (choose
one) change the measures of spread, such as SD and IQR.
Ex. If the new score = 60 + the old score; what is the spread of the new score?
Cf. Adding/subtracting WILL / WILL NOT (choose one) change the measures of
center and quartiles.
5
2. Multiplying each observation by a constant (denoted by a) will ___________ the
measures of spread such as SD and IQR by ________.
Conclusion: If new unit is aX + b, then the new spread is _______________________
( Recall that the new center is ________________________ )
Ex. Temperatures read in Fahrenheit and the SD temperature is sF, and IQR is rF. What
are the SD and IQR in Centigrade? Note that C  F  32 
5
.
9
6
Download