Chapter 4: Displaying Quantitative Data

advertisement
Chapter 4: Displaying
Quantitative Data
Akshat Patel, Velocity Brown
Histograms: Displaying the Distribution of Price
Changes
● First, slice up the entire span of values covered by the quantitative variable
into equal-width piles called bins.
● The bins and the counts in each bin give the distribution of the quantitative
variable.
● A histogram plots the bin counts as the heights of bars (like a bar chart).
● A relative frequency histogram displays the percentage of cases in each
bin instead of the count.
Creating Histograms
● Used with numerical data
● Bars touch on histograms
● Two types
o Discrete - Bars are centered over discrete values
o Continuous - Bars cover a class (interval) of values
For comparative histograms – use two separate graphs
with the same scale on the horizontal axis
Stem-and-Leaf Displays
■ Stem-and-leaf displays show the distribution of a quantitative
variable, like histograms do, while preserving the individual values.
■ Stem-and-leaf displays contain all the information found in a
histogram and, when carefully drawn, satisfy the area principle and
show the distribution.
How to Make a Stem-and-Leaf Display
■
■
■
First, cut each data value into leading digits
(“stems”) and trailing digits (“leaves”).
Use the stems to label the bins.
Use only one digit for each leaf—either round or
truncate the data values to one decimal place
after the stem.
Dotplots
■ A dotplot is a simple display. It just places a dot along an axis
for each case in the data. (Can be vertical or horizontal)
Shape, Center, and Spread
SHAPE
1. MODES - does the histogram have a single hump in the middle or several
separated bumps? (UNIMODAL VS. BIMODAL VS. MULTIMODAL)
a. A histogram with no modes (pretty much flat) is called uniform.
2. SYMMETRY - Histogram is symmetric if you can fold it in half through the
middle and have both sides match up.
a. The thinner ends of a distribution are called tails. If one tail stretches
out further than the other, the histogram is skewed in the direction of
the longer tail.
3. UNUSUAL FEATURES - Always mention outliers, points that stand away
from the body of the distribution, as well as gaps in the data when
discussing the shape.
Shape, Center, and Spread
CENTER
-
-
If you had to pick a single number to describe all the data what would you
pick?
It’s easy to find the center when a histogram is unimodal and symmetric—
it’s right in the middle.
On the other hand, it’s not so easy to find the center of a skewed
histogram or a histogram with more than one mode - eyeball it for now.
SPREAD
-
Are the values of the distribution tightly clustered around the center or
more spread out?
Comparing Distributions || Time Plots
Comparing To compare distributions, it is crucial to use the same SCALE. Otherwise, you
cannot compare the two distributions accurately.
Compare distributions by describing differences and similarities in shape,
center, and spread.
Time Plots For some data sets, we are interested in how the data behave over time. In
these cases, we construct time plots of the data.
Re-expression
-
One way to make a skewed distribution more symmetric is to re-express or
transform the data by applying a simple function.
Common transformations include square roots, logarithms, and
reciprocals.
#19
19. Hurricanes
The data below gives the number of hurricanes
that happened each year from 1944 through 2000
as reported by Science magazine.
3,2,1,2,4,3,7,2,3,3,2,5,2,2,4,2,2,
6,0,2,5,1,3,1,0,3,2,1,0,1,2,3,2,1,
2,2,2,3,1,1,1,3,0,1,3,2,1,2,1,1,0,
5,6,1,3,5,3
a. Create a dotplot of these data.
b. Describe the distribution.
b. Slightly skewed to the right.
Unimodal (mode near 2)
Possible second mode at 5.
No outliers.
#21
21. Acid Rain
Two researchers measured the pH (a scale on
which a value of 7 is neutral and values below 7
are acidic) of water collected from rain and snow
over a 6-month period in Allegheny County, PA.
Describe their data with a graph and a few
sentences.
4.57 5.62 4.12 5.29 4.64 4.31 4.30 4.39 4.45
5.67 4.39 4.52 4.26 4.26 4.40 5.78 4.73 4.56
5.08 4.41 4.12 5.51 4.82 4.63 4.29 4.60
Stem
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
Leaf
22
669
0199
015
267
034
3
2
8
9
1
27
8
41 | 2 = 4.12 pH
Skewed to the right.
Possibly bimodal, with one
mode around 4.4 and
another around 5.6.
Two outliers in the middle
seem not to belong (5.08 and
5.29).
Download