CHAPTER 2: METHODS FOR DESCRIBING SETS OF DATA KEY TERMS & TOPICS ì describing qualitative data: class (or category), class frequency, class relative frequency summary table (frequency distribution) bar graph pie chart ì graphical methods for describing quantitative data dot plot stem-and-leaf display histogram and relative frequency histogram Numerical Desriptive Statistics ì summation notation: ! Bi œ B1 B2 . . . B8 8 3œ" I. Numerical measures of central tendency for a set of data Sample Data Set • sample mean 8 B œ ! Bi Î8 3œ" Population Data Set population mean . 8 . œ ! Bi Î8 3œ" • sample median middle value (after sorting) population median • sample mode value occurring most often population mode • sample midrange population midrange minmax 2 II. Numerical measures of variability for a data set Sample data set Population data set • range max min range • sample variance •population variance ! ÐB3 BÑ# 8 =# œ ! ÐB3 .Ñ# 8 52 œ 3œ" 8" • sample standard deviation 3œ" 8 • pop. standard deviation ! ÐB3 BÑ# ! ÐB3 .Ñ# = œ Ë 3œ" 8" 5 œ Ë 3œ" • sample interquartile range • pop. interquartile range 8 8 8 Q3 Q1 III. Interpreting the standard deviation • Chebyshev's Rule Let 5 "Þ For any data set, at least " " 5# of the data falls within 5 standard deviations of the mean. • Empirical Rule for Bell-Shaped Data Distributions For a data set with an approximate bell-shaped distribution, 1. Approximately 68% of the data falls within 1 standard deviation of the mean. 2. Approximately 95% of the data falls within 2 standard deviation of the mean. 3. Approximately 99.7% of the data falls within 3 standard deviation of the mean. IV. Measures of relative position • percentiles A :-th percentile B: is a value such that at most :Î"!! of the data fall below B: and at least :Î"!! of the data is less than or equal to B: Q1 œ #&>2 :/<-/8>36/ 7/.3+8 œ &!>2 :/<-/8>36/ Q3 œ (&>2 :/<-/8>36/ Using empirical distribution function to find percentiles. • D -score B7/+8 D œ =>+8.+<. ./@3+>398 ì 5-number summary : minimum, Q1 , median, Q3 , maximum • Boxplot ì outliers : data that are extremely large or small relative to the rest of the data Rule of thumb 1. Call a data value B an outlier if (i) B U" "Þ& ÐU$ U" Ñ or (ii) B U$ "Þ& ÐU$ U" Ñ Rule of thumb #. Call a data value B an outlier if its D -score is less than 2 or greater than #Þ • scatterplot of bivariate data • misleading descriptive statistics