Chapter 4: Displaying Quantitative Data Akshat Patel, Velocity Brown Histograms: Displaying the Distribution of Price Changes ● First, slice up the entire span of values covered by the quantitative variable into equal-width piles called bins. ● The bins and the counts in each bin give the distribution of the quantitative variable. ● A histogram plots the bin counts as the heights of bars (like a bar chart). ● A relative frequency histogram displays the percentage of cases in each bin instead of the count. Creating Histograms ● Used with numerical data ● Bars touch on histograms ● Two types o Discrete - Bars are centered over discrete values o Continuous - Bars cover a class (interval) of values For comparative histograms – use two separate graphs with the same scale on the horizontal axis Stem-and-Leaf Displays ■ Stem-and-leaf displays show the distribution of a quantitative variable, like histograms do, while preserving the individual values. ■ Stem-and-leaf displays contain all the information found in a histogram and, when carefully drawn, satisfy the area principle and show the distribution. How to Make a Stem-and-Leaf Display ■ ■ ■ First, cut each data value into leading digits (“stems”) and trailing digits (“leaves”). Use the stems to label the bins. Use only one digit for each leaf—either round or truncate the data values to one decimal place after the stem. Dotplots ■ A dotplot is a simple display. It just places a dot along an axis for each case in the data. (Can be vertical or horizontal) Shape, Center, and Spread SHAPE 1. MODES - does the histogram have a single hump in the middle or several separated bumps? (UNIMODAL VS. BIMODAL VS. MULTIMODAL) a. A histogram with no modes (pretty much flat) is called uniform. 2. SYMMETRY - Histogram is symmetric if you can fold it in half through the middle and have both sides match up. a. The thinner ends of a distribution are called tails. If one tail stretches out further than the other, the histogram is skewed in the direction of the longer tail. 3. UNUSUAL FEATURES - Always mention outliers, points that stand away from the body of the distribution, as well as gaps in the data when discussing the shape. Shape, Center, and Spread CENTER - - If you had to pick a single number to describe all the data what would you pick? It’s easy to find the center when a histogram is unimodal and symmetric— it’s right in the middle. On the other hand, it’s not so easy to find the center of a skewed histogram or a histogram with more than one mode - eyeball it for now. SPREAD - Are the values of the distribution tightly clustered around the center or more spread out? Comparing Distributions || Time Plots Comparing To compare distributions, it is crucial to use the same SCALE. Otherwise, you cannot compare the two distributions accurately. Compare distributions by describing differences and similarities in shape, center, and spread. Time Plots For some data sets, we are interested in how the data behave over time. In these cases, we construct time plots of the data. Re-expression - One way to make a skewed distribution more symmetric is to re-express or transform the data by applying a simple function. Common transformations include square roots, logarithms, and reciprocals. #19 19. Hurricanes The data below gives the number of hurricanes that happened each year from 1944 through 2000 as reported by Science magazine. 3,2,1,2,4,3,7,2,3,3,2,5,2,2,4,2,2, 6,0,2,5,1,3,1,0,3,2,1,0,1,2,3,2,1, 2,2,2,3,1,1,1,3,0,1,3,2,1,2,1,1,0, 5,6,1,3,5,3 a. Create a dotplot of these data. b. Describe the distribution. b. Slightly skewed to the right. Unimodal (mode near 2) Possible second mode at 5. No outliers. #21 21. Acid Rain Two researchers measured the pH (a scale on which a value of 7 is neutral and values below 7 are acidic) of water collected from rain and snow over a 6-month period in Allegheny County, PA. Describe their data with a graph and a few sentences. 4.57 5.62 4.12 5.29 4.64 4.31 4.30 4.39 4.45 5.67 4.39 4.52 4.26 4.26 4.40 5.78 4.73 4.56 5.08 4.41 4.12 5.51 4.82 4.63 4.29 4.60 Stem 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 Leaf 22 669 0199 015 267 034 3 2 8 9 1 27 8 41 | 2 = 4.12 pH Skewed to the right. Possibly bimodal, with one mode around 4.4 and another around 5.6. Two outliers in the middle seem not to belong (5.08 and 5.29).