Chapter 2 - Iona College Employee Telephone Directory

advertisement
Draft
03/06/16
Draft
Chapter 2
Organizing Data
Gratzer and Jantzen
Organizing Data
Page # 1
Draft
03/06/16
Draft
In the last chapter you were introduced to the vocabulary of statistics and you developed
a greater understanding and appreciation for data and the data collection process. We will now
begin to work with data. The first step in understanding the information contained in data is the
organization and presentation of the data. Lots of unorganized numbers can be very confusing
and often do not convey much useful information. Frequency distributions, which are tables
that show how often each particular score (or category) occurs in the data set, are one commonly
used method for extracting the hidden information contained in a data set. We will use the
following data set to learn more about how to organize data into frequency distributions.
AGES OF EMPLOYEES WHO WISH TO ENROLL
IN COMPANY'S NEW DENTAL PLAN
31
47
44
34
41
27
39
42
42
45
42
44
43
40
47
57
43
33
44
51
34
47
40
49
53
59
46
44
49
54
44
51
48
43
47
57
37
69
54
49
42
58
44
35
31
51
49
55
35
52
47
50
54
36
61
38
43
44
49
54
42
48
40
56
49
55
31
44
49
34
33
40
34
37
33
61
50
35
55
44
35
47
37
55
50
30
31
51
27
39
39
55
50
27
46
Although the above list contains a lot of information, very little of it is informative to the reader.
A frequency distribution for enrollee ages can be created as follows:
1. Divide the range of all scores (high score – low score) into between 6 and 15 classes
of equal width, and organize the classes in ascending order. You can use the
minimum and maximum scores as guides in the creation of classes with pleasing
endpoints. For example, the youngest person above is 27 and the oldest 69. You
could create classes like 27-31, 32-36, 37-41 . . . 62-66 and 67-71 for the above age
numbers, but it would be more pleasing to create classes like 25-29, 30-34, 35-39 . . .
Organizing Data
Page # 2
Draft
03/06/16
Draft
60-64 and 65-69. Commonly used class widths are 2, 5, 10 and multiples of 10. Also,
be sure that the classes don’t overlap, e.g., 30-35 and 35-39 wouldn’t work because
where would you put persons 35 years old? Find the absolute frequency of
observations for each class by tallying how many observations fall into each class.
2. Find the relative frequency of each class. Relative frequencies show what fraction of
the observations falls into each class and can be found by dividing the number of
subjects in each class by the total number of subjects.
3. Find the cumulative relative frequency of each class. Cumulative relative
frequencies show what fraction of the observations have values that are less than or
equal to the highest value contained in each class. The cumulative relative frequency
of any class can be found by adding up all of the relative frequencies for all classes up
to and including that class.
The above steps outline how a frequency distribution can be created for a numerical
(quantitative) variable like ages. Frequency distributions for categorical variables, like gender
(male/female), can be constructed in a similar manner, with a few modifications. Usually since
the number of possible categories is small, it isn’t necessary to group differing categories
together. Absolute frequencies and relative frequencies can be computed for each possible
category, e.g., how many men (women) are there and what percent of the sample is male
(female)? But cumulative relative frequencies are not computed for categorical variables,
because the categories themselves cannot be organized in a hierarchical manner (men don’t have
more or less gender than women).
Using the above steps, we can create a frequency table for the age data above. There are
95 measurements ranging from a low of 27 to a high of 69, hence the data values have a range of
Organizing Data
Page # 3
Draft
03/06/16
Draft
42 years. An interval width of 5 years would produce 9 classes of ages, an acceptable number.
The following frequency distribution describes the age distribution of the dental plan enrollees.
Ages of Employees Applying for Dental Plan
Age
Tally
25-29
30 –34
35-39
40-44
45-49
50-54
55-59
60-64
65-69
///
///// ///// //
///// ///// //
///// ///// ///// ///// ///
//// ///// ///// ///
///// ///// ////
///// /////
//
/
Total:
Absolute
Frequency
3
12
12
23
18
14
10
2
1
95
Relative
Frequency
.032 = 3.2%
.126=12.6%
.126=12.6%
.242=24.2%
.189=18.9%
.147=14.7%
.105=10.5%
.021=2.1%
.011=1.1%
1.00 =100%
Cumulative Relative
Frequency
.032=3.2%
.158=15.8%
.284=28.4%
.526=52.6%
.716=71.6%
.863=86.3%
.968=96.8%
.989=98.9%
1.00=100%
Looking at the 40-44 age category, we can see that 23 persons were 40 to 44 years old.
The corresponding relative frequency shows that 40-44 year olds comprised 24.2% (23 divided
by 95) of the total number of dental plan seekers. The cumulative relative frequency for that
class shows that 52.6% (3.2% + 12.6% + 12.6% + 24.2%) of all plan applicants were less than or
equal to 44.
We can now begin to make sense of the data. We see the most common and uncommon
age intervals, we get an idea of where the data is centered and we begin to understand what it
takes to be of exceptional age.
Charts
Charts are powerful methods for presenting data, and can be constructed using the
information contained in frequency distributions. Charts, by creating a picture of the data,
provide the reader with information about the values that the data takes on and how often they
Organizing Data
Page # 4
Draft
03/06/16
Draft
occur. A popular method for graphically displaying categorical data is the Bar Graph, while
Histograms can be constructed for numerical variables.
Bar Graphs
Bar Graphs plot either the absolute or the relative frequencies of the differing outcomes
of a categorical variable. Suppose the frequency distribution below shows the hair colors for a
sample of 110 college students.
Hair Color
Brown
Blonde
Black
Other
Total:
Absolute
Frequency
40
17
29
24
110
Relative
Frequency
.364=36.4%
.155=15.5%
.264=26.4%
.218=21.8%
1.00=100%
A Bar Graph presenting the distribution of students’ hair color is shown below. The
absolute frequencies for each color category are plotted on the vertical axis, while the hair color
categories are listed on the horizontal axis. Note that the vertical axis is scaled by numbers that
are the same distance from each other (5, 10, 15, 20, etc., not 5, 15, 20, 30, etc.). The highest
value on this scale must also be at least as large as the absolute frequency of the category that
occurs most often. The horizontal axis contains the list of categories equally spaced on the axis.
Above each category label is a bar whose height corresponds to the count of that category.
Ideally, each bar should be of equal width, to avoid drawing the reader’s attention to a specific
category. The bars also don’t meet, emphasizing the fact that the data is discrete in nature.
Organizing Data
Page # 5
Draft
03/06/16
Draft
45
40
35
Number of
Students
30
25
20
15
10
5
Brown
Blonde
Black
Other
Hair Color
We could have plotted the relative frequencies instead of the absolute frequencies,
which would have yielded a chart with the same shape but differing values on the vertical axis.
The advantage of plotting relative frequencies is that the chart will show what share of the
sample falls into each category.
Histograms
If a variable is numerical, Histograms can be used to:
1.
look for patterns
2. describe the shape of the distribution (bell-shaped, flat, bimodal, symmetric, skewed)
3. approximate the center of a distribution
4. draw attention to outliers.
Organizing Data
Page # 6
Draft
03/06/16
Draft
A histogram can be constructed using the following procedure.
1. Create a frequency distribution table if one is not available.
2. Create an axis system with either absolute or relative frequencies on the vertical axis
and the data classes on the horizontal axis. Like the bar graph, it is essential that the
numbered ticks on the vertical scale be equidistant from each other. The horizontal
classes must also be of uniform width, and include all possible values in the data
range, even if some classes have no observations.
3. Bars with height equal to the absolute frequency (or relative frequency) of each class
covering the range of possible values are drawn in. Note that these bars will touch,
emphasizing the fact that the data is continuous in nature.
It should be noted that if the class widths or minimum values used to create the frequency
distribution are changed, the shape of the histogram will change, because the absolute (and
relative) frequencies will differ. Since the choice of class width and starting values is somewhat
subjective, there is no single best histogram for a given data set.
The frequency distribution for the dental plan data described earlier, and the
corresponding histogram, are presented below.
Organizing Data
Page # 7
Draft
03/06/16
Draft
Ages of Employees Applying for Dental Plan
Age
Tally
25-29
30 –34
35-39
40-44
45-49
50-54
55-59
60-64
65-69
///
///// ///// //
///// ///// //
///// ///// ///// ///// ///
//// ///// ///// ///
///// ///// ////
///// /////
//
/
Total:
Absolute
Frequency
3
12
12
23
18
14
10
2
1
95
Relative
Frequency
.032 = 3.2%
.126=12.6%
.126=12.6%
.242=24.2%
.189=18.9%
.147=14.7%
.105=10.5%
.021=2.1%
.011=1.1%
1.00 =100%
Cumulative Relative
Frequency
.032=3.2%
.158=15.8%
.284=28.4%
.526=52.6%
.716=71.6%
.863=86.3%
.968=96.8%
.989=98.9%
1.00=100%
Histogram of Ages of Employees
Opting for New Dental Plan
24
18
Number of
Employees
12
6
0
20
25
30
35
40
45
50
55
60
65
70
75
Ages of Employees
Organizing Data
Page # 8
Draft
03/06/16
Draft
Notice that a vertical breaker appears at age 35. This line emphasizes that there are two
distinct categories, namely, 30 to 34 and 35 to 39, into which employees can be placed. Even if
several consecutive categories have the same frequency, vertical breakers are necessary to
indicate that the observations fall into different classes. The histogram shows that employee
ages are centered around 40-44, meaning that that’s the most probable age category, and that the
number of employees decreases as you move farther away from that center.
Histograms and Area
Careful examination of the histogram in the previous section reveals the interesting
observation that the ratio of the area enclosed by the bar of a category to the total area under the
histogram (area of a bar)/(total area) is the relative frequency of that category. In this example
the total area is 475 square units and the area of the bar enclosing the age group 50 to 54 is 14 
5 = 70. This yields a ratio of 70/475 = .147, which is equal to the relative frequency of this age
group as seen in the frequency distribution. Similar calculations for the age group 30 to 34
yields a ratio of 60/475 = .126, the relative frequency of this group. That this will always be the
case can be seen by the following argument.
Let fi = frequency of category i.
Let x = common width of the intervals.
Then the area of bar i, Ai = fi  x.
The total area under the histogram TA = fi  x
The ratio of area of bar i to total area = Ai / TA = (fi  x) / (fi  x)
= fi / fi
= relative frequency of category i
Organizing Data
Page # 9
Draft
03/06/16
Draft
Occasionally one may encounter data that is presented in a frequency distribution
containing unequal intervals. For example, in a study of subjects' reaction to violence on TV, the
researchers’ design may call for the groupings child (ages 5 – 12), teens (ages 13 – 18), college
aged (ages 19 – 24), family building years (ages 25 – 39) and middle age (ages 40 – 65). There
are often practical and valid non-statistical reasons for such groupings. The groupings in the
above example might well have been created to investigate the relationship of a person’s position
in a family and their perceptions of violent behavior. The following example will illustrate some
of the potential problems caused by the presence of unequal interval widths.
A quality control expert examined hospital charts for the number of inconsistencies
between summary reports and original notes. The results of the investigation are reported in the
table below.
Number of inconsistencies in Hospital Charts
Number of
Inconsistencies
0 x<1
1 x<2
2 x<3
3 x<9
Freq
6
3
4
7
If a Histogram is drawn using the numbers in the table above, the following is produced.
Organizing Data
Page # 10
Draft
03/06/16
Draft
Histogram Chart Inconsistencies
8
Frequency
6
4
2
0
2
4
6
8
10
Inconsistencies
Does this picture accurately depict what the children saw? In a very real sense it suggests just
the opposite. The bar above the group 3 – 9 overshadows the rest of the plot, suggesting that
large numbers of inconsistencies are far more prevalent than small numbers of inconsistencies.
Careful examination of the data in the table informs us that this is not the case. What went
wrong? An examination of area will again provide insight. The total area under this histogram
is calculated to be 55 square units.
Inconsistencies
Area
Ratio
True Relative Frequency
First bar
0x<1
61=6
6/55 = .109
6/20 = .3
Second
1x<2
31=3
3/55 = .055
3/20 = .15
Third
2x<3
41=4
4/55 = .073
4/20 = .2
Fourth
3x<9
7  6 = 42
42/55 = .767
7/20 = .35
The ratio of bar area to total area does not equal the relative frequency of the groups. Thus, the
picture above can not be counted on to provide reliable information. However, reliable
histograms can be constructed for the purpose of picturing data with unequal intervals. The key
Organizing Data
Page # 11
Draft
03/06/16
Draft
step in such a construction is the new measure, frequency per unit. The first step in the creation
of this new measure is the choice of the unit. The frequencies may be counted per year, per fiveyear groups, or per decade. We will adopt the convention that the adjustments will be made per
single unit. Thus, in the current example we will measure frequency per inconsistency. The
adjusted measure is then created by dividing the frequency of each category by the number of
units in the category.
Number of
Inconsistencies
0
1
2
3
Frequency
6
3
4
7
x<1
x<2
x<3
x<9
Number of
Inconsistencies
in Interval
1
1
1
6
Chart
Frequency per
Inconsistency
6/1 = 6
3/1 = 3
4/1 = 4
7/6 = 1.167
Using Frequency per inconcistency as the scale on the vertical axis, the total area under the
histogram is calculated to be: (6  1) + (3  1) + (4  1) + (7/6  6) = 20 square units. The
illustration that this change makes the ratio of area of bar/ total area equal to the relative
frequency follows.
Inconsistencies
Area
Ratio
Relative Frequency
First bar
0x<1
61=6
6/20
6/20
Second
1x<2
31=3
3/20
3/20
Third
2x<3
41=4
4/20
4/20
Fourth
3x<9
7/6  6 = 7
7/20
7/20
The argument which follows shows that this adjustment procedure always works.
Let fi = frequency of category i and ni = number of unit widths in the category.
Let nix = width of interval i, where x is the unit width.
Organizing Data
Page # 12
Draft
03/06/16
Draft
The adjusted frequency is fi / ni and the area of category i is (fi / ni)  (nix) = fi  x
The total area under the histogram TA = [(fi / ni )  (ni  x)] = (fi  x)
The ratio of area of bar i to total area = Ai / TA = (fi  x) / (fi  x)
= fi  x / x  fi = fi / fi
= relative frequency of category i
The adjusted histogram in the Figure below is more representative of reality.
Histogram of Charts per Inconsistency
7
6
Charts per
Inconsistency
5
4
3
2
1
0
2
4
6
8
10
# of Inconsistencies
A Final Example
Age is a measure that must be treated with care, or counts will be off. Age categories are
often listed in the form 1 – 4 or 5 – 9. The reader should notice that the category 5 – 9 includes
the ages 5, 6, 7, 8, and 9. Thus, this category contains five ages and not the sometimes expected
four. The following table summarizes the number of Deaths by accident in Massachusetts in the
year 1960. An adjusted histogram follows.
Organizing Data
Page # 13
Draft
03/06/16
Draft
Deaths from accident by Age, Massachusetts, 1960
Age
# of Deaths
Class Size
Under1
1–4
5–9
10 – 14
15 – 24
25 – 34
35 – 44
45 – 54
55 – 64
65 – 74
75 – 84
85 – 99
71
93
85
38
239
133
186
199
249
389
471
301
1
4
5
5
10
10
10
10
10
10
10
15
Deaths
per Year
of Age
71
23.25
17
7.6
23.9
13.3
18.6
19.9
24.9
38.9
47.1
20.07
Histogram of Deaths per year of Age, Massachusetts, 1960
Deaths per of Age
75
60
45
30
15
10
Organizing Data
20
30
40
50
60
70
80
90
100 Age
Page # 14
Draft
03/06/16
Draft
The Stem-and-Leaf Plot: The Basics
The stem-and-leaf plot, popularized by John Tukey, is another useful way of displaying
how the data for a numerical variable are distributed. The construction of such a plot will be
illustrated using the following data gathered from 22 students. A statistics teacher asked his/her
students to bring to the final exam the sample problems that they had attempted while studying
for the final. While the students were taking the exam, the instructor counted the number of
problems attempted by each student. The results of this survey are found below.
Number of Review Problems Attempted by 22 Statistics Students
51 94 103 114 100 106 100 122 75 84 95
98 70 81 101 110 85 93
112 90
97 86
In order to create a stem-plot, the numbers must be measured with the same degree of precision,
e.g., to one decimal place, in whole units, in tens, etc. The first step is to separate each
observation’s number into two parts: a stem, containing all of the number’s digits except the
right-most digit, and a leaf consisting of the right-most digit. For example, the smallest number
51 would have a stem of 5 and a leaf of 1. The largest number 122 would have a stem of 12 and
a leaf of 2.
For the second step, make a vertical list of stem values that includes all integer values
that lie between the smallest and largest stem values, arranged in ascending order, and place a
vertical line (a splitter) to the right of the stem list. For this example a list of 8 stems, the
numbers 5, 6, 7, 8, 9, 10, 11 and 12, will span the entire data set. Include all integer values
between the minimum and maximum stem values, even if the data doesn’t contain numbers that
correspond to all of the stem values. We will observe the convention that if the list of stems
Organizing Data
Page # 15
Draft
03/06/16
Draft
contains between 6 and 15 numbers, the reader can go on to the next step. What to do when this
is not the case will be discussed later in this section.
For the third step, place each leaf to the right of the splitter on the line containing the
stem to which it was originally attached. For example the data point 51 would be recorded as
5 | 1 and the data point 122 is recorded as 12 | 2. If a stem line already holds a leaf, the next leaf
is simply placed to the right of the existing one, i.e., 103 and 100 would be recorded as 10 | 30.
For the final step, rearrange the leaves so that they are in ascending order away (to the right of)
the stems. The initial and rearranged stem-and-leaf plots of the entire class can be seen below.
The stem width of each plot is also noted, and indicates the order of magnitude of the stem
values, e.g., 5 1 is 5 tens plus 1 or 51, 12 | 2 is 12 tens plus 2 or 122.
5
6
7
8
9
10
11
12
1
50
4156
458307
30601
402
2
stem width = 10
5
6
7
8
9
10
11
12
1
1456
034578
00136
024
2
stem width = 10
Stem-and-leaf plots can provide the reader with the same information as histograms, on
how the variable is distributed. Stem-and-leaf plots, unlike histograms, also preserve the original
numbers, so that a reader can recreate all of the original data series. A stem-and-leaf plot also
draws the reader’s attention to gaps and sudden dips or peaks in the data stream. Data points
that deviate greatly from the overall perceived pattern in the data are known as outliers. Special
attention must be paid to outliers to determine if they are real or, as is often the case, the result of
measurement or recording error. Finally, the stem-and-leaf plot can be an aid in approximating
Organizing Data
Page # 16
Draft
03/06/16
Draft
the center of a distribution. Thus, the plot above presents the reader with a distribution of
attempted problems that is symmetric, centered around 97, with a small gap in the 60’s; that is,
no student attempted 60 something problems. Thus, attention is drawn to the 51. Was this a
mistake, or did this student give up, or was this student so well prepared that no more review was
needed?
Back-to-Back Stem-and-Leaf Plots
In order to compare two samples, two stem-and-leaf plots can be plotted back-to-back,
creating a back-to-back stem-and-leaf plot. When using this method two splitter lines are used,
one to the left and one to the right of the stem column. The leaves of one sample are recorded to
the left of the left splitter while the leaves of the other sample are recorded to the right of the
right splitter. Consider the hypothetical data below, reflecting the danger of extended exposure
to the sun. This data contains the depth of a melanoma and the sex of the person on whom it was
discovered. Each depth has been rounded to the nearest tenth of a mm for demonstration
purposes.
Subject
1
2
3
4
5
6
7
8
9
10
11
12
13
Organizing Data
Melanoma
Depth
2.3
2.6
1.5
4.7
3.1
3.7
3.1
0.3
3.7
3.3
2.5
6.9
3.2
Gender
Female
Male
Male
Female
Male
Female
Female
Female
Female
Male
Female
Male
Female
Subject
14
15
16
17
18
19
20
21
22
23
24
25
Melanoma
Depth
1.7
2.8
1.2
3.3
0.2
3.0
10.0
8.6
6.3
4.1
3.5
2.4
Gender
Male
Female
Female
Male
Female
Male
Male
Male
Female
Male
Male
Male
Page # 17
Draft
03/06/16
Draft
The above data can be used to construct a back-to-back stem-and-leaf plot comparing the
depth of the diagnosed melanomas of males and females. You will notice that the data ranges
from 0.2 to 10.0 mm. Thus, stems ranging from 0 to 10 measured in whole units will be
combined with leaves measured in units measured in tenths. Female depths will be recorded to
the left of the stem column and male depths to the right. The plot on the left contains the initial
unarranged leaves, while the plot on the right has the leaves arranged in ascending order away
from the stem.
Women
23
2
853
217
7
3
Men
0
1
2
3
4
5
6
7
8
9
10
Women
57
64
13305
1
9
32
2
853
721
7
3
6
0
stem width = units
Men
0
1
2
3
4
5
6
7
8
9
10
57
46
013305
1
9
6
0
stem width = units
This back-to-back stem-and-leaf plot shows that the melanomas of women have smaller
depths than men’s, perhaps suggesting women tend to get treatment earlier than men
The Stem-and-Leaf Plot: The not-so Basics
Ideally the number of stems that a stem-and-leaf plot contains should be between 6 and
15. A smaller number hides how the variable is distributed, by lumping too many observations
together. More also obscures the variable’s distribution, because few stems will have more than
a couple of observations.
Organizing Data
Consider the following data gathered from a random sample of 38
Page # 18
Draft
03/06/16
Draft
U. S. hospitals. Each hospital reported what percentage of their inpatients had had their care
paid for by the Medicaid insurance program. The Medicaid program is the principal insurance
program for poor persons in the U.S.
Medicaid Shares (in %) of Hospitals
1, 2, 5, 6, 7, 7, 10, 24, 26,
2, 3, 3, 3, 4, 4, 5, 5, 5, 6, 7, 8, 8, 8, 8, 8, 10, 10,
10, 13, 15, 16, 18, 21, 22, 23, 24, 30, 33
Since the Medicaid percentages lie between 1 and 33 percent, we could construct a stem
and leaf plot with stems of 0, 1, 2 and 3. But 4 stems is less than the recommended minimum of
6 and such a stem and leaf plot would obscure how the Medicaid shares are actually distributed.
We can, however, use split stems to increase the number of stem values. To double the number
of stem values (from 4 to 8 in this example), write each of the stem values twice, and then assign
leaves with the values of 0 thru 4 on the upper stem value and 5 thru 9 on the lower. Below are
two stem and leaf plots for the Medicaid share data, the second of which uses split stems.
0
1223334455556677 788888
1
00003568
2
123446
3
03
stem width = units
Organizing Data
Page # 19
Draft
03/06/16
0
12233344
0
55556677788888
1
00003
1
568
2
12344
2
6
3
03
Draft
stem width = units
Note that in the bottom plot, hospitals with percents ranging between 1 thru 4 had their leaves
placed on the upper 0 stem, while percents ranging from 5 thru 9 had their leaves placed on the
lower 0 stem. The split-stem plot provides more insight into how the Medicaid shares are
distributed, showing that nearly twice as many hospitals had Medicaid shares between 5 and 9%,
than in the 0 to 4% range.
Sometimes a variable’s data values will generate a stem and leaf plot that has too many
stem values (more than 15). Consider the following list which shows the number of licensed
beds that the aforementioned 38 hospitals have:
Number of licensed beds
19, 30, 36, 40, 43, 56, 56, 75, 78, 79, 88, 93, 106, 115, 135, 155, 157, 168, 178, 187, 193,
223, 233, 252, 252, 258, 270, 281, 287, 294, 295, 346, 350, 356, 442, 463, 531, 653
Given minimum and maximum bed sizes of 19 and 653, a stem and leaf plot of the above
data would have 65 stems (ranging from 1 to 65). Such a plot wouldn’t be very informative,
because many of the stems would have no leaves and the rest would have only one or two. One
Organizing Data
Page # 20
Draft
03/06/16
Draft
way to reduce the number of stems is to truncate each number by discarding the right most digit.
In this example, we could discard the right most digit of the above series of numbers yielding the
following number series:
1, 3, 3, 4, 4, 5, 5, 7, 7, 7, 8, 9, 10, 11, 13, 15, 15, 16, 17, 18, 19, 22, 23, 25, 25, 25,
27, 28, 28, 29, 29, 34, 35, 35, 44, 46, 53, 65
The first hospital’s size of 19 has been recoded as a 1 (19 discarding the 9), the second’s
size of 30 as 3 (30 discarding the 0), the third’s size of 36 as 3 (36 discarding the 6), etc. Since
the recoded values range from 1 to 65, the stem and leaf plot will have 6 stems, namely 1, 2, 3, 4,
5 and 6. The corresponding plot is below:
0
133445577789
1
013556789
2
2355578899
3
455
4
46
5
3
6
5
stem width = 100
The plot shows that the number of hospitals is concentrated at the smaller sizes, with decreasing
numbers in the larger size categories. Note the stem width, which indicates that the stem values
are measured in 100s and the leaves are measured in tens. Hence the bottom row of 6  5 is for
the hospital with 653 beds. Also, since the above plot was created by using truncated values, the
original data’s values can only be approximated from the plot. Finally, if truncating only one
digit still yields more than 15 stem values, try truncating another digit.
Organizing Data
Page # 21
Download