Uploaded by Rithiv Vikaas

Comparing Distributions: Dotplots, Stemplots, Histograms

advertisement
COMPARING DISTRIBUTIONS
A lot of the more interesting statistical questions involve two or more groups.
Type
Dotplot of EnergyCost vs Type
ALWAYS discuss shape, center, spread, and possible
outliers whenever you
compare distributions of a quantitative variables (always actually compare,
don’t just list attributes for each group separately)
bottom
Ex.
top
56
70
84
98
112
126
140
EnergyCost
Compare the distributions of
energy cost for two types of refrigerators (freezer is located on top or bottom
of fridge).
Answer to previous slide example:
Compare the distributions of energy cost for two types of refrigerators
(freezer is located on top or bottom of fridge).
Shape: The distribution for bottom freezers looks skewed to the right and possibly
bimodal. The distribution for top freezers looks roughly symmetric.
Center: The center of bottom freezers energy cost (~$70) is greater than the center
for top freezer’s energy cost (~$56)
Spread: There is much more variability in the energy costs for bottom freezers and
for top freezers.
Outliers: There are a couple of bottom freezers with unusually high energy costs,
there are no unusual top freezer costs.
When graphing quantitative data you are comparing you can use…
Dotplots – The scales on the axes should match to help in comparison
Stemplots – Use a back-to-back stemplot with common stems
o Stems are listed in the middle, leaves are placed
on the left for one variable and on the right for
the other variable
o Leaves increase in value as they move away
from the stem
o Label each side with variable or group
HISTOGRAM
Most common graph of the distribution of one quantitative
variable. It groups values together, so good for large sets of
data.
How to Construct:
1) Divide the range of data into classes of equal width. Each data value
should fall into exactly one class.
2) Find the count (frequency) or percent (relative frequency) of individuals
in each class.
3) Label & Scale your axes. Horizontal – classes. Vertical – count/percent
4) Draw the bars, the height of the bar is the class’s count/percent
- adjacent bars should touch, unless a class contains no individuals
- if classes don’t start at 0, use “break-in-scale” symbol (//)
Ex. The table represents the average points scored per game (PPG) for the 30
NBA teams in the 2012-2013 regular seasons.
Team
Atlanta Hawks
Boston Celtics
Brooklyn Nets
Charlotte Bobcats
Chicago Bulls
Cleveland Cavaliers
Dallas Mavericks
Denver Nuggets
Detroit Pistons
Golden State Warriors
PPG
98.0
96.5
96.9
93.4
93.2
96.5
101.1
106.1
94.9
101.2
Team
Houston Rockets
Indiana Pacers
Los Angeles Clippers
Los Angeles Lakers
Memphis Grizzlies
Miami Heat
Milwaukee Bucks
Minnesota Timberwolves
New Orleans Hornets
New York Knicks
PPG
106.0
94.7
101.1
102.2
93.4
102.9
98.9
95.7
94.1
100.0
Team
Oklahoma City Thunder
Orlando Magic
Philadelphia 76ers
Phoenix Suns
Portland Trail Blazers
Sacramento Kings
San Antonio Spurs
Toronto Raptors
Utah Jazz
Washington Wizards
PPG
105.7
94.1
93.2
95.2
97.5
100.2
103.0
97.2
98.0
93.2
Make a histogram of the average points scored per game. Then describe the
distribution of the PPG.
Answer to previous slide example:
Make a histogram of the average points scored per game. Then describe the
distribution of the PPG.
Graph: Horizontal axis is PPG, Vertical axis can be frequency or percents (# in class/30)
Smallest PPG is 93.2, largest is 106.1, so classes could be width of 2, or 1 if want.
Either way, horiztonal scale will be at least 93 to 107.
Shape: slightly skewed to the right
Center: between 97 and 99 points if just looking at graph
Spread: the average points per game range from 93.2 to 106.1 points.
Outliers: there appears to be no unusual average points scored per game.
Important things when it comes to histograms
❖ Don’t confuse histograms and bar graphs
❖ Don’t use counts or percents (of individuals in each class) as data
❖ No right choices for classes. Too few or too many classes won’t give a
good picture of the shape of the distribution.
(5 classes is a good minimum)
❖ Be sure classes/bars are the same width
❖ Be careful about letting a computer or calculator choose the classes
❖ Just because a graph looks nice, its not necessarily a meaningful display of
data
When comparing distributions using histograms as the graph, must use 2
separate histograms (no such thing as a side-by-side histogram like with bar
graphs)
• Use percents instead of counts on vertical axis
• The classes & horizontal axis scales must be the same for both histograms
Download