COMPARING DISTRIBUTIONS A lot of the more interesting statistical questions involve two or more groups. Type Dotplot of EnergyCost vs Type ALWAYS discuss shape, center, spread, and possible outliers whenever you compare distributions of a quantitative variables (always actually compare, don’t just list attributes for each group separately) bottom Ex. top 56 70 84 98 112 126 140 EnergyCost Compare the distributions of energy cost for two types of refrigerators (freezer is located on top or bottom of fridge). Answer to previous slide example: Compare the distributions of energy cost for two types of refrigerators (freezer is located on top or bottom of fridge). Shape: The distribution for bottom freezers looks skewed to the right and possibly bimodal. The distribution for top freezers looks roughly symmetric. Center: The center of bottom freezers energy cost (~$70) is greater than the center for top freezer’s energy cost (~$56) Spread: There is much more variability in the energy costs for bottom freezers and for top freezers. Outliers: There are a couple of bottom freezers with unusually high energy costs, there are no unusual top freezer costs. When graphing quantitative data you are comparing you can use… Dotplots – The scales on the axes should match to help in comparison Stemplots – Use a back-to-back stemplot with common stems o Stems are listed in the middle, leaves are placed on the left for one variable and on the right for the other variable o Leaves increase in value as they move away from the stem o Label each side with variable or group HISTOGRAM Most common graph of the distribution of one quantitative variable. It groups values together, so good for large sets of data. How to Construct: 1) Divide the range of data into classes of equal width. Each data value should fall into exactly one class. 2) Find the count (frequency) or percent (relative frequency) of individuals in each class. 3) Label & Scale your axes. Horizontal – classes. Vertical – count/percent 4) Draw the bars, the height of the bar is the class’s count/percent - adjacent bars should touch, unless a class contains no individuals - if classes don’t start at 0, use “break-in-scale” symbol (//) Ex. The table represents the average points scored per game (PPG) for the 30 NBA teams in the 2012-2013 regular seasons. Team Atlanta Hawks Boston Celtics Brooklyn Nets Charlotte Bobcats Chicago Bulls Cleveland Cavaliers Dallas Mavericks Denver Nuggets Detroit Pistons Golden State Warriors PPG 98.0 96.5 96.9 93.4 93.2 96.5 101.1 106.1 94.9 101.2 Team Houston Rockets Indiana Pacers Los Angeles Clippers Los Angeles Lakers Memphis Grizzlies Miami Heat Milwaukee Bucks Minnesota Timberwolves New Orleans Hornets New York Knicks PPG 106.0 94.7 101.1 102.2 93.4 102.9 98.9 95.7 94.1 100.0 Team Oklahoma City Thunder Orlando Magic Philadelphia 76ers Phoenix Suns Portland Trail Blazers Sacramento Kings San Antonio Spurs Toronto Raptors Utah Jazz Washington Wizards PPG 105.7 94.1 93.2 95.2 97.5 100.2 103.0 97.2 98.0 93.2 Make a histogram of the average points scored per game. Then describe the distribution of the PPG. Answer to previous slide example: Make a histogram of the average points scored per game. Then describe the distribution of the PPG. Graph: Horizontal axis is PPG, Vertical axis can be frequency or percents (# in class/30) Smallest PPG is 93.2, largest is 106.1, so classes could be width of 2, or 1 if want. Either way, horiztonal scale will be at least 93 to 107. Shape: slightly skewed to the right Center: between 97 and 99 points if just looking at graph Spread: the average points per game range from 93.2 to 106.1 points. Outliers: there appears to be no unusual average points scored per game. Important things when it comes to histograms ❖ Don’t confuse histograms and bar graphs ❖ Don’t use counts or percents (of individuals in each class) as data ❖ No right choices for classes. Too few or too many classes won’t give a good picture of the shape of the distribution. (5 classes is a good minimum) ❖ Be sure classes/bars are the same width ❖ Be careful about letting a computer or calculator choose the classes ❖ Just because a graph looks nice, its not necessarily a meaningful display of data When comparing distributions using histograms as the graph, must use 2 separate histograms (no such thing as a side-by-side histogram like with bar graphs) • Use percents instead of counts on vertical axis • The classes & horizontal axis scales must be the same for both histograms