CHAPTER 2: METHODS FOR DESCRIBING SETS OF DATA

advertisement
CHAPTER 2: METHODS FOR DESCRIBING SETS OF DATA
KEY TERMS & TOPICS
ì describing qualitative data:
 class (or category), class frequency, class relative frequency
 summary table (frequency distribution)
 bar graph
 pie chart
ì graphical methods for describing quantitative data
 dot plot
 stem-and-leaf display
 histogram and relative frequency histogram
Numerical Desriptive Statistics
ì summation notation: ! Bi œ B1  B2  . . .  B8
8
3œ"
I. Numerical measures of central tendency for a set of data
Sample Data Set
• sample mean
8
B œ ! Bi Î8
3œ"
Population Data Set
population mean .
8
. œ ! Bi Î8
3œ"
• sample median
middle value (after sorting)
population median
• sample mode
value occurring most often
population mode
• sample midrange
population midrange
minmax
2
II. Numerical measures of variability for a data set
Sample data set
Population data set
• range
max  min
range
• sample variance
•population variance
! ÐB3 BÑ#
8
=# œ
! ÐB3 .Ñ#
8
52 œ
3œ"
8"
• sample standard deviation
3œ"
8
• pop. standard deviation
! ÐB3 BÑ#
! ÐB3 .Ñ#
= œ Ë 3œ" 8"
5 œ Ë 3œ"
• sample interquartile range
• pop. interquartile range
8
8
8
Q3  Q1
III. Interpreting the standard deviation
• Chebyshev's Rule
Let 5  "Þ For any data set, at least " 
"
5#
of the data falls within 5 standard deviations of the mean.
• Empirical Rule for Bell-Shaped Data Distributions
For a data set with an approximate bell-shaped distribution,
1. Approximately 68% of the data falls within 1 standard deviation of the mean.
2. Approximately 95% of the data falls within 2 standard deviation of the mean.
3. Approximately 99.7% of the data falls within 3 standard deviation of the mean.
IV. Measures of relative position
• percentiles
A :-th percentile B: is a value such that at most :Î"!! of the data fall below B: and at least
:Î"!! of the data is less than or equal to B:
Q1 œ #&>2 :/<-/8>36/
7/.3+8 œ &!>2 :/<-/8>36/
Q3 œ (&>2 :/<-/8>36/
Using empirical distribution function to find percentiles.
• D -score
B7/+8
D œ =>+8.+<.
./@3+>398
ì 5-number summary : minimum, Q1 , median, Q3 , maximum
• Boxplot
ì outliers : data that are extremely large or small relative to the rest of the data
Rule of thumb 1. Call a data value B an outlier if
(i) B  U"  "Þ& ÐU$  U" Ñ
or
(ii) B  U$  "Þ& ÐU$  U" Ñ
Rule of thumb #. Call a data value B an outlier if
its D -score is less than  2 or greater than #Þ
• scatterplot of bivariate data
• misleading descriptive statistics
Download