Slide 4- 2 - friendlymath

advertisement
Chapter 4
Displaying
Quantitative Data
.
Dealing With a Lot of Numbers…
Summarizing the data will help us when we look
at large sets of quantitative data.
Without summaries of the data, it’s hard to grasp
what the data tell us.
The best thing to do is to make a picture…
We can’t use bar charts or pie charts for
quantitative data, since those displays are for
categorical variables.




.
Slide 4- 2
Histograms: Displaying the Distribution of
Price Changes
The chapter example discusses the changes in
Enron’s stock price from 1997 – 2001.
First, slice up the entire span of values covered
by the quantitative variable into equal-width piles
called bins.
The bins and the counts in each bin give the
distribution of the quantitative variable.



.
Slide 4- 3
Histograms: Displaying the Distribution
of Price Changes (cont.)
A histogram plots
the bin counts as
the heights of bars
(like a bar chart).
Here is a histogram
of the monthly price
changes in Enron
stock:


.
Slide 4- 4
Histograms: Displaying the Distribution
of Price Changes (cont.)
A relative frequency histogram displays the percentage of
cases in each bin instead of the count.
 In this way, relative
frequency histograms
are faithful to the
area principle.
Here is a relative
frequency histogram of
the monthly price
changes in Enron stock:


.
Slide 4- 5
Creating Histograms




Used with numerical data
Bars touch on histograms
Would a histogram be a good graph for the
Two types
fastest speed driven by IB Math students?
 Discrete
Why or why not?
 Bars are centered over discrete values
 Continuous
 Bars cover a class (interval) of values
For comparative
histograms
– graph
use two
Would a histogram
be a good
for separate
the
graphs
with of
the
same
scale
on the
horizontal
axis
number
pieces
of gum
chewed
per
day by
IB Math students? Why or why not?
.
Slide 4- 6
Activity: Head Circumference
With a partner use a measuring tape to measure
the circumference of both heads (in inches).
Record your data on the board.
Let’s create a histogram!
.
Slide 4- 7
Activity continued…..
How do you do this on your calculator?
.
Slide 4- 8
Stem-and-Leaf Displays
Stem-and-leaf displays show the distribution of a
quantitative variable, like histograms do, while
preserving the individual values.
Stem-and-leaf displays contain all the information
found in a histogram and, when carefully drawn,
satisfy the area principle and show the
distribution.


.
Slide 4- 9
Stem-and-Leaf Example
Compare the histogram
and stem-and-leaf display
for the pulse rates of 24
women at a health clinic.
Which graphical display do
you prefer?

.
5
6
6
0444
6
8888
7
2222
7
6666
8
000044
8
8
Slide 4- 10
Constructing a Stem-and-Leaf Display
First, cut each data value into leading digits
(“stems”) and trailing digits (“leaves”).
Use the stems to label the bins.
Use only one digit for each leaf—either round or
truncate the data values to one decimal place
after the stem.



.
Slide 4- 11
Creating Stem-and-Leaf Plots, pg. 66 #14
The Cornell Lab of Ornithology holds an annual
Christmas Bird Count, in which birdwatchers at
various locations around the country see how
many different species of birds they can spot.
Here are some of the counts reported from sites
in Texas during the 1999 event.
228 178 186 162 206 166 163
183 181 206 177 175 167 162
160 160 157 156 153 153 152
Create a stem-and-leaf display for these data.
.
Slide 4- 12
Creating Stem-and-Leaf Plots, pg. 66 #14
Stem Leaf
15
23367
16
0022367
17
178
18
136
19
20
66
21
22
8
Key: Stem: tens Leaf: ones
15|2 = 152 species
.
Slide 4- 13
Example: Creating Stem-and-Leaf Plots
The average tuition and fees at public institutions in
the year 2004 for 20 US states are shown
below. The observations ranged from a low
value of 2724 to a high value of 8260. Create a
stemplot for this data.
3977 3423 4010 3785 5761 2724 3239 3323
6060 5754 7129 8260 4590 3218 7603 4677
6511 8117 2875 3491
.
Slide 4- 14
Example - Creating Stem-and-Leaf Plots




Select one or more leading digits for the stem
values. The trailing digits become the leaves.
List possible stem values in a vertical column.
Record the leaf for every observation beside the
corresponding stem value (separate with commas
if leaves are more than one digit)
Indicate units for stems and leaves = KEY
.
Slide 4- 15
Example - Creating Stem-and-Leaf Plots
Repeated stems to stretch a display might also be used if too many
observations are concentrated on just a few stems.
Median Ages in 2030
The accompanying data on the Census Bureau’s projected median
age in 2030 for the 50 US states and Washington D.C. appeared in
USA today in 2005.
41.0 32.9 39.3 29.3 37.4 35.6 41.1 43.6 33.7 45.4 35.6 38.7
39.2 37.8 37.7 42.0 39.1 40.0 38.8 46.9 37.5 40.2 40.2 39.0
41.1 39.6 46.0 38.4 39.4 42.1 40.8 44.8 39.9 36.8 43.2 40.2
37.9 39.1 42.1 40.7 41.3 41.5 38.3 34.6 30.4 43.9 37.8 38.5
46.7 41.6 46.4
Create stemplot for this data.
.
Slide 4- 16
Example - Creating Stem-and-Leaf Plots
A comparative stem-and-leaf plot is used when two groups of data are to be analyzed
together. One group will extend to the left of the stem and the other group will extend to
the right.
The UNICEF report “Progress for Children” (April, 2005) included the accompanying data
on the percentage of primary-school-age children who were enrolled in school for 19
countries in Northern Africa and for 23 countries in Central Africa.
Northern Africa
54.6 34.3 48.9 77.8 59.6 88.5 97.4 92.5 83.9 96.9 88.9 91.6 97.8 96.1 92.2 94.9
98.6 86.6
Central Africa
58.3 34.6 35.5 45.4 38.6 63.8 53.9 61.9 69.9 43.0 85.0 63.4 58.4 61.9 40.9 73.9
34.8 74.4 97.4 61.0 66.7 79.6 98.9
Construct a comparative stem-and-leaf display.
.
Slide 4- 17
Dotplots
A dotplot is a simple
display. It just places a
dot along an axis for
each case in the data.
The dotplot to the right
shows Kentucky Derby
winning times, plotting
each race as its own
dot.
You might see a dotplot
displayed horizontally or
vertically.



.
Slide 4- 18
Distribution Activity with Dotplots
Follow directions carefully!!
Do NOT waste time!!
Work quickly!!
.
Slide 4- 19
Think Before You Draw, Again
Remember the “Make a picture” rule?
Now that we have options for data displays, you
need to Think carefully about which type of
display to make.
Before making a stem-and-leaf display, a
histogram, or a dotplot, check the
 Quantitative Data Condition: The data are
values of a quantitative variable whose units
are known.



.
Slide 4- 20
Shape, Center, and Spread
When describing a distribution, make sure to
always tell about three things: shape, center,
spread, and unusual points…

.
Slide 4- 21
What is the Shape of the Distribution?
1. Does the histogram have a single, central hump
or several separated bumps?
2. Is the histogram symmetric?
3. Do any unusual features stick out?
.
Slide 4- 22
Humps and Bumps
1. Does the histogram have a single, central hump
or several separated bumps?

Humps in a histogram are called modes.

A histogram with one main peak is dubbed
unimodal; histograms with two peaks are
bimodal; histograms with three or more peaks
are called multimodal.
.
Slide 4- 23
Humps and Bumps (cont.)
A bimodal histogram has two apparent peaks:

.
Slide 4- 24
Humps and Bumps (cont.)
A histogram that doesn’t appear to have any mode and
in which all the bars are approximately the same height
is called uniform:

.
Slide 4- 25
Symmetry
2.
Is the histogram symmetric?

If you can fold the histogram along a vertical line
through the middle and have the edges match
pretty closely, the histogram is symmetric.
.
Slide 4- 26
Symmetry (cont.)


.
The (usually) thinner ends of a distribution are called
the tails. If one tail stretches out farther than the other,
the histogram is said to be skewed to the side of the
longer tail.
In the figure below, the histogram on the left is said to
be skewed left, while the histogram on the right is said
to be skewed right.
Slide 4- 27
Anything Unusual?
3. Do any unusual features stick out?

Sometimes it’s the unusual features that tell
us something interesting or exciting about the
data.

You should always mention any stragglers, or
outliers, that stand off away from the body of
the distribution.

Are there any gaps in the distribution? If so,
we might have data from more than one
group.
.
Slide 4- 28
Anything Unusual? (cont.)
The following histogram has outliers—there are
three cities in the leftmost bar:

.
Slide 4- 29
Where is the Center of the Distribution?
If you had to pick a single number to describe all
the data what would you pick?
It’s easy to find the center when a histogram is
unimodal and symmetric—it’s right in the middle.
On the other hand, it’s not so easy to find the
center of a skewed histogram or a histogram with
more than one mode.
For now, we will “eyeball” the center of the
distribution. In the next chapter we will find the
center numerically.




.
Slide 4- 30
How Spread Out is the Distribution?
Variation matters, and Statistics is about
variation.
Are the values of the distribution tightly clustered
around the center or more spread out?
In the next two chapters, we will talk about
spread…



.
Slide 4- 31
Comparing Distributions
Often we would like to compare two or more
distributions instead of looking at one distribution
by itself.
When looking at two or more distributions, it is
very important that the histograms have been put
on the same scale. Otherwise, we cannot really
compare the two distributions.
When we compare distributions, we talk about the
shape, center, and spread of each distribution.



.
Slide 4- 32
Comparing Distributions (cont.)
Compare the
following
distributions of
ages for female
and male heart
attack patients:

.
Slide 4- 33
Timeplots: Order, Please!
For some data sets, we are interested in how the
data behave over time. In these cases, we
construct timeplots of the data.

.
Slide 4- 34
Cumulative Relative Frequency Plot




. . . is used to answer questions about percentiles.
Percentiles are the percent of individuals that are
at or below a certain value.
Quartiles are located every 25% of the data. The
first quartile (Q1) is the 25th percentile, while the
third quartile (Q3) is the 75th percentile. What is
the special name for Q2?
Interquartile Range (IQR) is the range of the
middle half (50%) of the data.
IQR = Q3 – Q1
.
Slide 4- 35
Example: Degree of Reading Power
There are many ways to measure the reading ability of
children. One frequently used test is the Degree of
Reading Power (DRP). In a research study on third-grade
students, the DRP was administered to 44 students.
Their scores were:
40
26
39
14
42
18
25
43
46
27
19
47
19
26
35
34
15
44
40
38
31
46
52
25
35
35
33
29
34
41
49
28
52
47
35
48
22
33
41
51
27
14
54
45
Construct a cumulative relative frequency plot for this data.
.
Slide 4- 36
Example

What DRP score is at the 20th percentile?

What is the median DRP score?

What is the IQR for the distribution of DRP
scores?
.
Slide 4- 37
*Re-expressing Skewed Data to
Improve Symmetry
Figure 4.11
.
Slide 4- 38
*Re-expressing Skewed Data to
Improve Symmetry (cont.)
One way to make a skewed
distribution more symmetric is
to re-express or transform the
data by applying a simple
function
(e.g., logarithmic function).
Note the change in skewness
from the raw data (Figure
4.11) to the transformed data
(Figure 4.12):
Common transformations
include square roots,
logarithms, and reciprocals.



.
Figure 4.12
Slide 4- 39
What Can Go Wrong?
Don’t make a histogram of a categorical variable—
bar charts or pie charts should be used for
categorical data.

Don’t look for shape,
center, and spread
of a bar chart.

.
Slide 4- 40
What Can Go Wrong? (cont.)
Don’t use bars in every display—save them for
histograms and bar charts.
Below is a badly drawn timeplot and the proper
histogram for the number of eagles sighted in a
collection of weeks:


.
Slide 4- 41
What Can Go Wrong? (cont.)
Choose a bin width appropriate to the data.
 Changing the bin width changes the
appearance of the histogram:

.
Slide 4- 42
What Can Go Wrong? (cont.)
Avoid inconsistent scales,
either within the display or
when comparing two
displays.
Label clearly so a reader
knows what the plot
displays.
 Good intentions, bad
plot:


.
Slide 4- 43
What have we learned?
We’ve learned how to make a picture for quantitative data
to help us see the story the data have to Tell.
We can display the distribution of quantitative data with a
histogram, stem-and-leaf display, or dotplot.
Tell about a distribution by talking about shape, center,
spread, and any unusual features.
We can compare two quantitative distributions by looking
at side-by-side displays (plotted on the same scale).
Trends in a quantitative variable can be displayed in a
timeplot.





.
Slide 4- 44
Download