Displaying & Summarizing Quantitative Data CHAPTER 4 Objectives Look at histograms, stem-and-leaf plots, and dotpots Summarize the spread and center of distribution Look at mean, median, range, and interquartile range Examine mean and standard deviation Words of Advice Don’t round or truncate intermediate results. Keep full precision that your technology can carry. Report statistics to one decimal place more than the precision of the data. Focus on the meaning in the Tell section and not on the minor differences in numeric results. Don’t sweat the small differences. Histograms/Distribution The distribution of a quantitative variable slices up all the possible values of the variable into equal width bins and gives the number of values (or counts) falling into each bin Histograms A histogram uses adjacent bars to show the distribution of a quantitative variable. Each bar represents the frequency (or relative frequency) of values falling in each bin. There are no spaces in histograms like bar graphs Any spaces in a histogram are actual gaps in the data indicating a region where there are no values. Histograms A relative frequency histogram can be created replacing the counts on the vertical axis with the percentage of total cases falling in each bin. Stem-and-Leaf Displays Histograms are great but they do not show the actual data. A stem-and-leaf display is like a histogram but it shows the individual values. What do you see if you turn a stem-and-leaf on its side? Dotplot Is a simple display that places a dot along an axis for each case in the data. Like a stem-and-leaf except dots are used in place of digits. Think before you graph You need to think before you choose which type of graph to display data. Although a bar chart and histogram may look the same they are not. You cannot display categorical data in a histogram or quantitative data in a bar chart. Look at the distribution: talk about the shape, center & spread. Shape of Distribution Does the histogram have a single, central hump or separate humps? The humps are called modes. A histogram with one peak is called unimodal, with two peaks bimodal, and with three or more multimodal. Shape of Distribution A histogram that appear to not have any mode and in which all bars are approximately the same height is called uniform. Shape of Distribution Is the histogram symmetric? The thinner ends of a distribution are called tails. If one tail stretches out farther that the other, the histogram us said to be skewed to the side of the longer tail. Example Would you expect distributions of these variables to be uniform, unimodal, or bimodal? Symmetric of skewed? Explain why. Ages of people at a Little League game. Number of siblings of people in your class. Pulse rate of college-age males. Number of times each face of a die shows in 100 tosses. Shape of Distribution Do any unusual features stick out? You should always mention any stragglers or outliers, that stand off away from the body of distribution. Example A credit card company wants to see how much customers in a particular segment of their market use their credit card. They have provided you with data on the amount spent by 500 selected customers during a 3month period and have asked you to summarize the expectations. Describe the shape of this distribution? Center of Distribution: Median When we think of a typical value, we usually look for the center of the distribution. Histograms follow the area principle: the middle value that divides the histogram into two equal areas is called the median. How do you find the median? Spread: Range When we describe a distribution numerically, we always report a measure of it spread along with its center. How do you measure? Use the range. How do you find the range? Spread: Interquartile Range A better way to describe the spread of a variable might be to ignore the extremes and focus on the middle of the data. Divide the data in half, then divide both halves in half cutting the data into four quarters. These new dividing points are called quartiles. One quarter of the data lies below the lower quartile and one quarter of data lies above called the upper quartile. Spread: Interquartile Range The difference between the quartiles is called the interquartile range. (IQR) The IQR is almost always a reasonable summary of the spread of distribution. Even is the distribution is skewed or has outliers, the IQR will provide useful information. The lower & upper quartiles represent the 25th & 75th percentiles of the data. 5-Numbered Summary The 5-numbered summary of a distribution reports its, median, quartiles, and extremes. Example In the Super Bowl, by how many points does the winning team out score the losers? Here are the winning margins for the first 42 Super Bowl games: 25,19,9,16,3,21,7,17,10,4,18,17,4,12,17,5,10,29,22,36,19, 32,4,45,1,13,35,17,23,10,14,7,15,7,27,3,27,3,3,11,12,3 a) Find the median b) Find the quartiles c) Write a description based on the 5-number summary. The Mean The mean is the center because it is the point where the histogram balances. Total y y n n Mean or Median Using the center of balance makes sense if the data is symmetric. Since median only considers order: it is resistant to values that are extraordinarily large or small. If the histogram is symmetric and there are no outliers we prefer the mean. If the histogram is skewed or has outliers we prefer the median. If not sure, report both and explain Spread: Standard Deviation Takes into account how far each value is from the mean. Standard Deviation is appropriate only for symmetric data. Think about spread is to examine how far each data value is from the mean. This difference is called deviation. We could average the deviations but it not helpful. To make the deviation helpful we square it. We added up the squared deviations and find there average (almost), which is called variance. Spread: Standard Deviation s y y 2 Variance = 2 n 1 Spread: Standard Deviation Standard Deviation = y y 2 s n 1 Spread: Standard Deviation Statistics is about variation, spread is important. Measures of spread help us understand what we do not know Homework: pg 72-78 #5, 8, 10, 13, 16, 22,24,29,32, 37,38, 44, 47