Chapter 4 Displaying Quantitative Data Quantitative variables • Quantitative variables- record measurements or amounts of something. Must have units or a variable in which the numbers act as numerical values Types of Displays • Histogram • Stem and Leaf Displays • Dotplots Histogram • A histogram uses adjacent bars to show the distribution of values in a quantitative variable. • Looks very similar to a bar graph but there are differences. • The horizontal axis is continuous not just labeled. An example • The histogram shown below gives the number of children visited a particular zoo. •. Histogram • A histogram is more convenient than a dot-plot or a stem and leaf plot because you don't have to represent each data point. However, you don't get to see the value of each data point. So a table of data and summary statistics would help people interpret the data. Be Careful • A histogram gives the number of data points that fall into equal intervals. Care must be taken in choosing the intervals because it can affect the shape of the graph and misrepresent the true data. 1st graph • The first graph is uses intervals of size 10 yielding the intervals 40-50, 50-60, etc. In this case, Yemen had a life expectancy of 50 and was placed in the 50-60 column. Usually, borderline values are placed in the higher column. 2nd Graph • In the second graph, the intervals are 4045, 45-50, 50-55, etc. This affects the shape of the graph. Stem and Leaf Displays • Shows quantitative data values in a way that sketches the distribution of the data. •The stem-and-leaf plot below shows the number of students enrolled • in a dance class in the past 12 years. • The number of students are 81, 84, 85, 86, 93, 94, 97, 100, 102, 103, 110, and 111. Dotplot • Graphs a dot for each case against a single axis • Graph the following number 5, 5,5,5,5,5,5,10,10,10,10,10 etc Dotplot with two sets of data Example Shape • To describe the shape of a distribution, look for • Symmetry versus skewness • Single versus multiple modes Symmetrical • A distribution is symmetric if the two halves on either side of the center look approximately like the mirror images of each other. Symmetrical • Symmetrical Histogram Dotplot • Dots are mirrored images Stem and leaf • Example Skewed • A distribution is skewed if it is not symmetric and one tail stretched out further than the other. • Skewed left- when the longer tail stretches to the left. • Skewed right-when the longer tail stretched to the right Examples • Skewed right Skewed left • Left All three • Examples Funny example • http://www.herkimershideaway.org/apstatis tics/ymmsum99/ymm111.htm New seating chart • http://www.random.org/integers/ Stem-and-Leaf Revisited Compare the histogram and stem-andleaf display for the pulse rates of 24 women at a health clinic. Which graphical display do you prefer? Think Before You Draw, Again • Remember the “Make a picture” rule? • Now that we have options for data displays, you • need to Think carefully about which type of • display to make. • Before making a stem-and-leaf display, a • histogram, or a dotplot, check the • Quantitative Data Condition: The data are • values of a quantitative variable whose units • are known. Constructing a Stem-and-Leaf Display • First, cut each data value into leading digits • (“stems”) and trailing digits (“leaves”). • Use the stems to label the bins. • Use only one digit for each leaf—either round or • truncate the data values to one decimal place • after the stem. Center • A value that attempts the impossible by summarizing the entire distribution with a single number, a “typical” value. • Measures include the mean and median. Spread • A numerical summary of how tightly the values are clustered around the center. • Measures of spread include the IQR and standard deviation. Mode • a hump or local high pint in the shape of the distribution of a variable is called the mode. The apparent location of modes can change as the scale of a histogram is changed Unimodal • Having one mode. Bimodal • Distribution with two modes Example Uniform A histogram that doesn’t appear to have any mode and in which all the bars are approximately the same height is called uniform: Anything Unusual? • • • • • • • • • • Do any unusual features stick out? Sometimes it’s the unusual features that tell us something interesting or exciting about the data. You should always mention any stragglers, or outliers, that stand off away from the body of the distribution. Are there any gaps in the distribution? If so, we might have data from more than one group. Outliers The following histogram has outliers— there are three cities in the leftmost bar: Outliers • Are extreme values that do not appear to belong to the rest of the data. They may be unusual values that deserve further investigation, or they may be just mistakes; there’s no obvious way to tell. Do not delete them. Outliers can affect many statistical analyses, so you should always be alert to them. Outliers • Away from the main portion of data Where is the Center of the Distribution? • • • • • • • • • • If you had to pick a single number to describe all the data what would you pick? It’s easy to find the center when a histogram is unimodal and symmetric—it’s right in the middle. On the other hand, it’s not so easy to find the center of a skewed histogram or a histogram with more than one mode. For now, we will “eyeball” the center of the distribution. In the next chapter we will find the center numerically. How Spread Out is the Distribution? • Variation matters, and Statistics is about • variation. • Are the values of the distribution tightly clustered • around the center or more spread out? • In the next two chapters, we will talk about • spread… Comparing Distributions • • • • • • • • • Often we would like to compare two or more distributions instead of looking at one distribution by itself. When looking at two or more distributions, it is very important that the histograms have been put on the same scale. Otherwise, we cannot really compare the two distributions. When we compare distributions, we talk about the shape, center, and spread of each distribution. Example Compare the following distributions of ages for female and male heart attack patients: HOMEWORK!!!! Web Pages Used • http://www.fao.org/wairdocs/ilri/x5469e/x5469e3 8.gif • http://www.sciencebuddies.org/science-fairprojects/descriptive_statistics_files/BimodalDist.j pg • http://images.absoluteastronomy.com/images/en cyclopediaimages/b/bi/bimodal.png • http://upload.wikimedia.org/wikipedia/commons/ b/bc/Bimodal_geological.PNG Web Pages Used • http://mathworld.wolfram.com/images/epsgif/OutlierHistogram_1000.gif Timeplots: Order, Please! • For some data sets, we are interested in how the • data behave over time. In these cases, we • construct timeplots of the data. *Re-expressing Skewed Data to Improve Symmetry *Re-expressing Skewed Data to Improve Symmetry (cont.) One way to make a skewed distribution more symmetric is to re-express or transform the data by applying a simple function (e.g., logarithmic function). Note the change in skewness from the raw data (Figure 4.11) to the transformed data (Figure 4.12): What Can Go Wrong? • Don’t make a histogram of a categorical variable— • bar charts or pie charts should be used for • categorical data. • Don’t look for shape, • center, and spread • of a bar chart. What Can Go Wrong? (cont.) • Don’t use bars in every display—save them for • histograms and bar charts. • Below is a badly drawn timeplot and the proper • histogram for the number of eagles sighted in a • collection of weeks: What Can Go Wrong? (cont.) What Can Go Wrong? (cont.) Choose a bin width appropriate to the data. Changing the bin width changes the appearance of the histogram: What Can Go Wrong? (cont.) Avoid inconsistent scales, either within the display or when comparing two displays. Label clearly so a reader knows what the plot displays. Good intentions, bad plot: What have we learned? • • • • • • • • • • We’ve learned how to make a picture for quantitative data to help us see the story the data have to Tell. We can display the distribution of quantitative data with a histogram, stem-and-leaf display, or dotplot. Tell about a distribution by talking about shape, center, spread, and any unusual features. We can compare two quantitative distributions by looking at side-by-side displays (plotted on the same scale). Trends in a quantitative variable can be displayed in a timeplot.