Chapter 7 Summarizing and Displaying Measurement Data Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Turning Data Into Meaningful Information Data are the statisticians’ raw material and the numbers we use to interpret reality. ALL statistical problems involve either the collection, description, and analysis of data. How can we represent data in a meaningful way… how can we see underlying patterns in a heap of numbers? Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 2 Picturing Data: Stemplots,Frequency Tables & Histograms Histogram: better for larger data sets, also provides picture of shape. 6 4 2 0 Frequency Stemplot for Exam Scores 3|2 4| 5|5 6|012448 7|35568899 8|0023458 9|02358 Example: 3|2 = 32 8 10 Stemplot: quick and easy way to order numbers and get picture of shape. 30 Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 40 50 60 70 Exam Scores 80 90 100 3 Creating a Stemplot Step 1: Create the Stems Divide range of data into equal units to be used on stem. Have 6 – 15 stem values, representing equally spaced intervals. Step 1: Creating the stem 3| 4| 5| 6| 7| 8| 9| Ordered Listing of 28 Exam Scores 32, 55, 60, 61, 62, 64, 64, 68, 73, 75, 75, 76, 78, 78, 79, 79, 80, 80, 82, 83, 84, 85, 88, 90, 92, 93, 95, 98 Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 4 Creating a Stemplot Step 2: Attach the Leaves Attach a leaf to represent Step 2: Attaching leaves each data point. Next digit 3| 4| in number used as leaf; 5| drop remaining digits. 6|0 Step 3: order leaves on each branch. 7|5 8| 9|35 Ordered Listing of 28 Exam Scores 32, 55, 60, 61, 62, 64, 64, 68, 73, 75, 75, 76, 78, 78, 79, 79, 80, 80, 82, 83, 84, 85, 88, 90, 92, 93, 95, 98 Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 5 Further Details for Creating Stemplots Stemplot B: Splitting Stems: Reusing digits two or five times. 5|4 5|7 5|89 6|0 6|233 6|44555 6|677 6|89 7|001 7|2 7|45 7| 7|8 Stemplot A: 5|4 5|789 6|023344 6|55567789 7|00124 7|58 Two times: 1st stem = leaves 0 to 4 2nd stem = leaves 5 to 9 Five times: 1st stem = leaves 0 and 1 2nd stem =leaves 2 and 3, etc. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 6 Obtaining Info from the Stemplot Determine shape, identify outliers, locate center. Pulse Rates: 5|4 5|789 6|023344 6|55567789 7|00124 7|58 Exam Scores 3|2 4| 5|5 6|024418 7|56598398 8|5430820 9|53208 Bell-shape Centered mid 60’s no outliers Outlier of 32. Apart from 55, rest uniform from the 60’s to 90’s. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Median Incomes: 4|66789 5|11344 5|56666688899999 6|011112334 6|556666789 7|01223 7| 8|0022 Wide range with 4 unusually high values. Rest bell-shape around high $50,000s. 7 Creating Frequency Table • Divide range of data into intervals. • Count how many values fall into each interval this is called the frequency. • Also find the relative frequency by dividing each group frequency by the total number of observations Interval Frequency Relative Frequency 30-39 1 .0357 40-49 0 0 50-59 1 .0357 60-69 6 .2143 70-79 8 .2857 80-89 7 .25 90-99 5 .1786 Total: 28 1 Ordered Listing of 28 Exam Scores 32, 55, 60, 61, 62, 64, 64, 68, 73, 75, 75, 76, 78, 78, 79, 79, 80, 80, 82, 83, 84, 85, 88, 90, 92, 93, 95, 98 Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 8 Creating a Histogram 0 0.000 2 0.010 0.020 Relative Freq. 6 4 Frequency 8 0.030 10 Create a bar that covers each interval and is centered at the midpoint of that interval. The bars height is the frequency or relative frequency of the interval. 30 40 50 60 70 80 90 100 Exam Scores Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 30 40 50 60 70 80 90 100 Exam Scores 9 Forming Intervals •Use intervals of equal lengths with midpoints and endpoints at convenient round numbers. •For a smaller data set use a small number of intervals •For a larger data set use more intervals •By increasing the number of intervals we can “stretch out” the shape of the histogram Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 10 Example 4: How Much Do Students Exercise? How many hours do you exercise per week (nearest ½ hr)? 172 responses from students in intro statistics class Most range from 0 to 10 hours with mode of 2 hours. Responses trail out to 30 hours a week. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 11 Defining a Common Language about Shape 30 50 70 Skew edLeft Right Skewed 90 30 50 70 90 Skewed Skew ed Right Left Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 0 1 2 3 4 5 6 Frequency 4 0 2 Frequency 0 2 4 6 8 Frequency 6 12 • Symmetric: if draw line through center, picture on one side would be mirror image of picture on other side. Example: bell-shaped data set. • Skewed to the Right: higher values more spread out than lower values • Skewed to the Left: lower values more spread out and higher ones tend to be clumped 30 50 70 90 Symmetric Symmetric 12 Summary Statistics What is are statistics? These are simple numerical measurements that summarize and hopefully characterize the entire data set. Any set of measurements has two important properties: • The central or typical value • The spread (or variability) of the data about the central value 0 3 6 Center 20 40 60 80 100 80 100 Narrow Spread 0 3 6 Center 20 40 60 Wide Spread Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 13 Centrality Statistics 6 4 2 Frequency 8 10 Estimate Mean to be where the “balance point” would be 0 Mean = 75 30 40 50 60 70 80 90 100 Exam Scores Ordered Listing of 28 Exam Scores 32, 55, 60, 61, 62, 64, 64, 68, 73, 75, 75, 76, 78, 78, 79, 79, 80, 80, 82, 83, 84, 85, 88, 90, 92, 93, 95, 98 Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 14 Centrality Statistics 10 The median is the midpoint of the data and is obtained by ordering the data from smallest to largest and finding the middle value. If the number of data points is even, when there would be no middle, we average the two values around the middle. Estimate median by dividing graph into equal boxes median = 77 4 6 median occurs at 14th / 15th box Ex. For our example n = 28 therefore we average the 14th and 15th values. 0 2 Frequency 8 Total number of boxes = 28 30 40 50 60 70 80 90 100 Median = 78.5 Exam Scores Ordered Listing of 28 Exam Scores 14th Value 15th Value 32, 55, 60, 61, 62, 64, 64, 68, 73, 75, 75, 76, 78, 78, 79, 79, 80, 80, 82, 83, 84, 85, 88, 90, 92, 93, 95, 98 Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 15 Centrality Statistics The mode is the most common value in the data set. Mode = 75 10 Ex. In our example since since the scores 64,75,78,79, and 80 all occur twice, these are all considered the modes. 4 6 Estimate Mode to be in the midpoint of the interval with the highest bar. 0 2 Frequency 8 Estimate from Histogram 30 40 50 60 70 80 90 100 Exam Scores Ordered Listing of 28 Exam Scores 32, 55, 60, 61, 62, 64, 64, 68, 73, 75, 75, 76, 78, 78, 79, 79, 80, 80, 82, 83, 84, 85, 88, 90, 92, 93, 95, 98 Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 16 Centrality Statistics Comparing Mean and Median The Mean is sensitive to outliers, which are extreme values that are not typical of the rest of the data. 30 50 70 Skewed Skew ed Left Right 90 30 50 70 90 Skewed Skew edRight Left Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 0 1 2 3 4 5 6 Frequency 4 0 2 Frequency 0 2 4 6 8 Frequency 6 12 How do the Mean and Median change as the shape of the histogram changes? 30 50 70 90 Symmetric Symmetric 17 Variability Statistics The Range of the data is the distance between the maximum value and the minimum value. Range = Max Value – Min Value = 98 - 32 = 66 The Lower and Upper Quartiles of the data are the midpoint of the lower half of the data and the upper half of the data when the data is divided by the median. The Inter-Quartile Range (IQR) of the data is the distance between the lower quartile and the upper quartile. IQR = Upper Quartile – Lower Quartile = 84.5 – 66 = 18.5 Ordered Listing of 28 Exam Scores Lower Quartile Median Upper Quartile 32,Copyright 55, 60, ©2005 61, 62, 64, 64, 68, 73, 75, 75, 76, 78, 78, 79, 79, 80, 80, 82, 83, 84, 85, 88, 90, 92, 93, 95, 98 Brooks/Cole, a division of Thomson Learning, Inc. 18 66 78.5 84.5 Variability Statistics The five-number summary display Median Lower Quartile Upper Quartile Lowest Highest 78.5 66 84.5 32 98 Ordered Listing of 28 Exam Scores Lower Quartile Median Upper Quartile 32,Copyright 55, 60, ©2005 61, 62, 64, 64, 68, 73, 75, 75, 76, 78, 78, 79, 79, 80, 80, 82, 83, 84, 85, 88, 90, 92, 93, 95, 98 Brooks/Cole, a division of Thomson Learning, Inc. 19 66 78.5 84.5 Creating a Boxplot for Exam Scores 1. 2. 3. 4. Draw a box from lower quartile (66) to upper quartile (84.5). Draw line in box at median of 78.5. Compute IQR = 84.5 - 66 = 18.5. Compute 1.5(IQR) = 1.5(2) = 27.75. Outlier is any value below 66-27.75 = 38.25, or above 84.5+27.75 =111.25 . 5. Draw line from each end of box extending down to 55 but up to 98. • Draw asterisks at outlier of 32. Box Plot Exam Scores 30 40 50 60 70 80 90 100 Ordered Listing of 28 Exam Scores Lower Quartile Median Upper Quartile 32, 55, 60, 61, 62,Brooks/Cole, 64, 64, 68,a division 73, 75,of75, 76, 78, 78, 79, Copyright ©2005 Thomson Learning, Inc.79, 80, 80, 82, 83, 84, 85, 88, 90, 92, 93, 2095, 98 66 78.5 84.5 Interpreting Boxplots • Divide the data into fourths. • Easily identify outliers. • Useful for comparing two or more groups. Box Plot Exam Scores Outlier: any value more than 1.5(IQR) beyond closest quartile. ¼ of students scored between 32 and 66 ¼ scored between 66 and 78.5 ¼ scored between 78.5 and 84.5 ¼ scored between 84.5 and 98 30 40 50 60 70 80 90 100 Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 21 Example 6: Who Are Those Crazy Drivers? What’s the fastest you have ever driven a car? ____ mph. Males (87 Students) 110 95 120 55 150 Females (102 Students) 89 80 95 30 130 • About 75% of men have driven 95 mph or faster, but only about 25% of women have done so. • Except for few outliers (120 and 130), all women’s max speeds are close to or below the median speed for men. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 22 The Standard Deviation and Variance Consider two sets of numbers, both with mean of 100. Numbers Mean Standard Deviation 100, 100, 100, 100, 100 100 0 90, 90, 100, 110, 110 100 10 • First set of numbers has no spread or variability at all. • Second set has some spread to it; on average, the numbers are about 10 points away from the mean. The standard deviation is roughly the average distance of the observed values from their mean. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 23 Computing the Standard Deviation 1. Find the mean. 2. Find the deviation of each value from the mean. Deviation = value – mean. 3. Square the deviations. 4. Sum the squared deviations. 5. Divide the sum by (the number of values) – 1, resulting in the variance. 6. Take the square root of the variance. The result is the standard deviation. Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 24 Computing the Standard Deviation Try it for the set of values: 90, 90, 100, 110, 110. Mean Standard Dev. Value Dev. From Mean Dev. Squared 90 90-100 = -10 -10^2 = 100 90 90-100 = -10 -10^2 = 100 100 100-100 = 0 0^2 = 0 110 110-100 = 10 10^2 = 100 110 110-100 = 10 10^2 = 100 Total: Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 400 25