Chapter 4 - marinoapstat

advertisement
Chapter 4
Displaying Quantitative Data
Quantitative variables
• Quantitative variables- record
measurements or amounts of something.
Must have units or a variable in which the
numbers act as numerical values
Types of Displays
• Histogram
• Stem and Leaf Displays
• Dotplots
Histogram
• A histogram uses adjacent bars to show
the distribution of values in a quantitative
variable.
• Looks very similar to a bar graph but there
are differences.
• The horizontal axis is continuous not just
labeled.
An example
• The histogram shown below gives the
number of children visited a particular zoo.
•.
Histogram
• A histogram is more convenient than a
dot-plot or a stem and leaf plot because
you don't have to represent each data
point. However, you don't get to see the
value of each data point. So a table of
data and summary statistics would help
people interpret the data.
Be Careful
• A histogram gives the number of data
points that fall into equal intervals. Care
must be taken in choosing the intervals
because it can affect the shape of the
graph and misrepresent the true data.
1st graph
• The first graph is uses intervals of size 10
yielding the intervals 40-50, 50-60, etc. In
this case, Yemen had a life expectancy of
50 and was placed in the 50-60 column.
Usually, borderline values are placed in
the higher column.
2nd Graph
• In the second graph, the intervals are 4045, 45-50, 50-55, etc. This affects the
shape of the graph.
Stem and Leaf Displays
• Shows quantitative data values in a way
that sketches the distribution of the data.
•The stem-and-leaf plot below shows the number of students enrolled
• in a dance class in the past 12 years.
• The number of students are 81, 84, 85, 86, 93, 94, 97, 100, 102, 103, 110, and 111.
Dotplot
• Graphs a dot for each case against a
single axis
• Graph the following number 5,
5,5,5,5,5,5,10,10,10,10,10 etc
Dotplot with two sets of data
Example
Shape
• To describe the shape of a distribution,
look for
• Symmetry versus skewness
• Single versus multiple modes
Symmetrical
• A distribution is symmetric if the two
halves on either side of the center look
approximately like the mirror images of
each other.
Symmetrical
• Symmetrical Histogram
Dotplot
• Dots are mirrored images
Stem and leaf
• Example
Skewed
• A distribution is skewed if it is not
symmetric and one tail stretched out
further than the other.
• Skewed left- when the longer tail stretches
to the left.
• Skewed right-when the longer tail
stretched to the right
Examples
• Skewed right
Skewed left
• Left
All three
• Examples
Funny example
• http://www.herkimershideaway.org/apstatis
tics/ymmsum99/ymm111.htm
New seating chart
• http://www.random.org/integers/
Stem-and-Leaf Revisited
􀂄 Compare the histogram and stem-andleaf
display for the pulse rates of 24 women at
a
health clinic. Which graphical display do
you
prefer?
Think Before You Draw, Again
• 􀂄 Remember the “Make a picture” rule?
• 􀂄 Now that we have options for data displays,
you
• need to Think carefully about which type of
• display to make.
• 􀂄 Before making a stem-and-leaf display, a
• histogram, or a dotplot, check the
• 􀂄 Quantitative Data Condition: The data are
• values of a quantitative variable whose units
• are known.
Constructing a Stem-and-Leaf
Display
• 􀂄 First, cut each data value into leading
digits
• (“stems”) and trailing digits (“leaves”).
• 􀂄 Use the stems to label the bins.
• 􀂄 Use only one digit for each leaf—either
round or
• truncate the data values to one decimal
place
• after the stem.
Center
• A value that attempts the impossible by
summarizing the entire distribution with a
single number, a “typical” value.
• Measures include the mean and median.
Spread
• A numerical summary of how tightly the
values are clustered around the center.
• Measures of spread include the IQR and
standard deviation.
Mode
• a hump or local high pint in the shape of
the distribution of a variable is called the
mode. The apparent location of modes
can change as the scale of a histogram is
changed
Unimodal
• Having one mode.
Bimodal
• Distribution with two modes
Example
Uniform
􀂄 A histogram that doesn’t appear to have
any mode and
in which all the bars are approximately the
same height
is called uniform:
Anything Unusual?
•
•
•
•
•
•
•
•
•
•
Do any unusual features stick out?
􀂄 Sometimes it’s the unusual features that tell
us something interesting or exciting about the
data.
􀂄 You should always mention any stragglers, or
outliers, that stand off away from the body of
the distribution.
􀂄 Are there any gaps in the distribution? If so,
we might have data from more than one
group.
Outliers
The following histogram has outliers—
there are
three cities in the leftmost bar:
Outliers
• Are extreme values that do not appear to
belong to the rest of the data. They may
be unusual values that deserve further
investigation, or they may be just
mistakes; there’s no obvious way to tell.
Do not delete them. Outliers can affect
many statistical analyses, so you should
always be alert to them.
Outliers
• Away from the main portion of data
Where is the Center of the
Distribution?
•
•
•
•
•
•
•
•
•
•
􀂄 If you had to pick a single number to describe all
the data what would you pick?
􀂄 It’s easy to find the center when a histogram is
unimodal and symmetric—it’s right in the middle.
􀂄 On the other hand, it’s not so easy to find the
center of a skewed histogram or a histogram with
more than one mode.
􀂄 For now, we will “eyeball” the center of the
distribution. In the next chapter we will find the
center numerically.
How Spread Out is the
Distribution?
• 􀂄 Variation matters, and Statistics is about
• variation.
• 􀂄 Are the values of the distribution tightly
clustered
• around the center or more spread out?
• 􀂄 In the next two chapters, we will talk
about
• spread…
Comparing Distributions
•
•
•
•
•
•
•
•
•
􀂄 Often we would like to compare two or more
distributions instead of looking at one distribution
by itself.
􀂄 When looking at two or more distributions, it is
very important that the histograms have been put
on the same scale. Otherwise, we cannot really
compare the two distributions.
􀂄 When we compare distributions, we talk about the
shape, center, and spread of each distribution.
Example
Compare the
following
distributions of
ages for female
and male heart
attack patients:
HOMEWORK!!!!
Web Pages Used
• http://www.fao.org/wairdocs/ilri/x5469e/x5469e3
8.gif
• http://www.sciencebuddies.org/science-fairprojects/descriptive_statistics_files/BimodalDist.j
pg
• http://images.absoluteastronomy.com/images/en
cyclopediaimages/b/bi/bimodal.png
• http://upload.wikimedia.org/wikipedia/commons/
b/bc/Bimodal_geological.PNG
Web Pages Used
• http://mathworld.wolfram.com/images/epsgif/OutlierHistogram_1000.gif
Timeplots: Order, Please!
• 􀂄 For some data sets, we are interested in
how the
• data behave over time. In these cases, we
• construct timeplots of the data.
*Re-expressing Skewed Data to
Improve Symmetry
*Re-expressing Skewed Data to
Improve Symmetry (cont.)
One way to make a skewed
distribution more symmetric is
to re-express or transform the
data by applying a simple
function
(e.g., logarithmic function).
􀂄 Note the change in skewness
from the raw data (Figure
4.11) to the transformed data
(Figure 4.12):
What Can Go Wrong?
• 􀂄 Don’t make a histogram of a categorical
variable—
• bar charts or pie charts should be used for
• categorical data.
• 􀂄 Don’t look for shape,
• center, and spread
• of a bar chart.
What Can Go Wrong? (cont.)
• 􀂄 Don’t use bars in every display—save
them for
• histograms and bar charts.
• 􀂄 Below is a badly drawn timeplot and the
proper
• histogram for the number of eagles
sighted in a
• collection of weeks:
What Can Go Wrong? (cont.)
What Can Go Wrong? (cont.)
Choose a bin width appropriate to the data.
􀂄 Changing the bin width changes the
appearance of the histogram:
What Can Go Wrong? (cont.)
􀂄 Avoid inconsistent scales,
either within the display or
when comparing two
displays.
􀂄 Label clearly so a reader
knows what the plot
displays.
􀂄 Good intentions, bad
plot:
What have we learned?
•
•
•
•
•
•
•
•
•
•
􀂄 We’ve learned how to make a picture for quantitative data
to help us see the story the data have to Tell.
􀂄 We can display the distribution of quantitative data with a
histogram, stem-and-leaf display, or dotplot.
􀂄 Tell about a distribution by talking about shape, center,
spread, and any unusual features.
􀂄 We can compare two quantitative distributions by looking
at side-by-side displays (plotted on the same scale).
􀂄 Trends in a quantitative variable can be displayed in a
timeplot.
Download