155S3.4o3 Measures of Relative Standing and Boxplots MAT 155 Dr. Claude Moore Cape Fear Community College Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3­1 3­2 3­3 3­4 Review and Preview Measures of Center Measures of Variation Measures of Relative Standing and Boxplots Z score August 31, 2011 Key Concept Measures of relative standing, which are numbers showing the location of data values relative to the other values within a data set, can be used to compare values from different data sets, or to compare values within the same data set. The most important concept is the z score. We will also discuss percentiles and quartiles, as well as a new statistical graph called the boxplot. Interpreting Z Scores • z Score (or standardized value) the number of standard deviations that a given value x is above or below the mean. Sample Population Round z scores to 2 decimal places Whenever a value is less than the mean, its corresponding z score is negative Ordinary values: –2 ≤ z score ≤ 2 Unusual Values: z score < –2 or z score > 2 1 155S3.4o3 Measures of Relative Standing and Boxplots Percentiles are measures of location. There are 99 percentiles denoted P1, P2, . . . P99, which divide a set of data into 100 groups with about 1% of the values in each group. August 31, 2011 Converting from the kth Percentile to the Corresponding Data Value n total number of values in the data set k percentile being used L locator that gives the position of a value Pk kth percentile Converting from the kth Percentile to the Corresponding Data Value Quartiles Are measures of location, denoted Q1, Q2, and Q3, which divide a set of data into four groups with about 25% of the values in each group. • Q1 (First Quartile) separates the bottom 25% of sorted values from the top 75%. • Q2 (Second Quartile) same as the median; separates the bottom 50% of sorted values from the top 50%. • Q3 (Third Quartile) separates the bottom 75% of sorted values from the top 25%. 2 155S3.4o3 Measures of Relative Standing and Boxplots Quartiles Q1, Q2, Q3 divide ranked scores into four equal parts August 31, 2011 Some Other Statistics Interquartile Range (or IQR): Q3 – Q1 Semi­interquartile Range: Midquartile: 10 ­ 90 Percentile Range: P90 – P10 5­Number Summary For a set of data, the 5­number summary consists of 1. the minimum value; 2. the first quartile Q1; 3. the median (or second quartile Q2); 4. the third quartile, Q3; and 5. the maximum value. Boxplot A boxplot (or box­and­whisker­diagram) is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, Q1; the median; and the third quartile, Q3. Boxplot of Movie Budget 3 155S3.4o3 Measures of Relative Standing and Boxplots Boxplots ­ Normal Distribution Heights from a Simple Random Sample of Women August 31, 2011 Outliers An outlier is a value that lies very far away from the vast majority of the other values in a data set. Important Principles Skewed Distribution: Salaries (in thousands of dollars) of NCAA Football Coaches Outliers for Modified Boxplots For purposes of constructing modified boxplots, we can consider outliers to be data values meeting specific criteria. In modified boxplots, a data value is an outlier if it is . . . • An outlier can have a dramatic effect on the mean. • An outlier can have a dramatic effect on the standard deviation. • An outlier can have a dramatic effect on the scale of the histogram so that the true nature of the distribution is totally obscured. Modified Boxplots Boxplots described earlier are called skeletal (or regular) boxplots. Some statistical packages provide modified boxplots which represent outliers as special points. above Q3 by an amount greater than 1.5 × IQR or below Q1 by an amount greater than 1.5 × IQR 4 155S3.4o3 Measures of Relative Standing and Boxplots Modified Boxplot Construction August 31, 2011 Modified Boxplots ­ Example A modified boxplot is constructed with these specifications: • A special symbol (such as an asterisk) is used to identify outliers. • The solid horizontal line extends only as far as the minimum data value that is not an outlier and the maximum data value that is not an outlier. Pulse rates of females listed in Data Set 1 in Appendix B. 3­4 Measures of Relative Standing and Boxplots In this section we have discussed: • z Scores • z Scores and unusual values • Percentiles • Quartiles • Converting a percentile to corresponding data values • Other statistics • 5­number summary • Boxplots and modified boxplots • Effects of outliers 3­4 Measures of Relative Standing and Boxplots Always consider certain key factors: • • • • • • • • • • Context of the data Source of the data Sampling Method Measures of Center Measures of Variation Distribution Outliers Changing patterns over time Conclusions Practical Implications 5 155S3.4o3 Measures of Relative Standing and Boxplots 132/3. Boxplots Shown below is a STATDISK­ generated boxplot of the durations (in hours) of flights of NASA’s Space Shuttle. What do the values of 0, 166, 215, 269, and 423 tell us? TI: page 130 133/8. z Score for World’s Tallest Man Bao Xishun is the world’s tallest man with a height of 92.95 in. ( or 7 ft, 8.95 in.). Men have heights with a mean of 69.6 in. and a standard devia­tion of 2.8 in. a. What is the difference between Bao’s height and the mean height of men? b. How many standard deviations is that ( the difference found in part ( a))? c. Convert Bao’s height to a z score. d. Does Bao’s height meet the criterion of being unusual by corresponding to a z score that does not fall between and 2? August 31, 2011 132/4. Boxplot Comparisons Refer to the two STATDISK­ generated boxplots shown below that are drawn on the same scale. One boxplot represents weights of randomly selected men and the other represents weights of randomly selected women. Which boxplot represents women? How do you know? Which boxplot depicts weights with more variation? TI: page 130 133/10. z Scores for Heights of Women Soldiers The U. S. Army requires women’s heights to be between 58 in. and 80 in. Women have heights with a mean of 63.6 in. and a standard deviation of 2.5 in. Find the z score corresponding to the minimum height requirement and find the z score corresponding to the maximum height requirement. Determine whether the minimum and maximum heights are unusual. 6 155S3.4o3 Measures of Relative Standing and Boxplots Percentiles. In Exercises 15–18, use the given sorted values, which are the numbers of points scored in the Super Bowl for a recent period of 24 years. Find the percentile corresponding to the given number of points. 36 37 37 39 39 41 43 44 44 47 50 53 54 55 56 56 57 59 61 61 65 69 69 75 August 31, 2011 In Exercises 19–26, use the same list of 24 sorted values given for Exercises 15­18. Find the indicated percentile or quartile. 36 37 37 39 39 41 43 44 44 47 50 53 54 55 56 56 57 59 61 61 65 69 69 75 134/22. P80 133/16. 65 133/18. 41 134/28. Boxplot for Number of English Words A simple random sample of pages from Merriam­ Webster’s Collegiate Dictionary, 11th edition, was obtained. Listed below are the numbers of defined words on those pages, and they are arranged in order. Construct a boxplot and include the values of the 5­ number summary. 34 36 39 43 51 53 62 63 73 79 134/26. P95 134/29. Boxplot for FICO Scores A simple random sample of FICO credit rating scores was obtained, and the sorted scores are listed below. Construct a boxplot and include the values of the 5­number summary. S32B 664 693 698 714 751 753 779 789 802 818 834 836 7