HOW TO CONSTRUCT A HISTOGRAM

advertisement
BUS210: BUSINESS STATISTICS
North Seattle Community College
Histogram
Pg. 1 of 5
HOW TO CONSTRUCT A HISTOGRAM
With a calculator, graph paper, and a few sharp pencils, you can easily construct histograms for use
in your quality improvement team's diagnostic efforts. An even easier approach is to use one of the
many graphic software packages that automatically generate histograms from a table of data.
In this section, we will show the step-by-step construction of a histogram, using an example to
illustrate. At the end of this section, you will find a one-page summary of the construction steps, for
easy reference.
Getting Ready
As with all the analytical tools presented in this workbook, a good histogram starts with good data.
The data that we need is any set of measurement )f a continuous variable such as time, weight,
length, volume, temperature, ohms, tensile strength, percent of an impurity, etc. Histograms can also
be constructed based on counts of discrete counts such as number of defects per unit, number of
items in a container, number of tasks completed per hour, etc. Of course, you cannot mix different
units of measure on a single histogram.
Since a team often needs to later produce stratified histograms for a thorough analysis, it pays to
spend some time thinking about the other information you may need for future stratification.
Imagine the factors that might result in different histogram patterns - different machines, times of
day, input conditions, etc. (See the chapter on Stratification in this workbook for more information
on this topic.)
If you decide to use sample data (i.e., because it is impractical to measure every item), use the largest
sample size possible, consistent with budget and time constraints. Unless your team has access to
statistical expertise for data analysis, we recommend that you use a sample size of at least 30
measurements in constructing a histogram. If more than one histogram may be constructed, 30
observations are needed for each one. If you have at least eight data points, but fewer than 30, you
can use the "box plot" to display and analyze the data.
Sometimes, the data you need already exists in routine inspections, log books, or as the result of a
special study. If the data does not exist, your team should develop a method for gathering it.
To be successful in constructing and analyzing histograms, your data must be:
•
•
•
Accurate. The pattern of variability in your histogram is a combination of the variability
inherent in the process or product and the variability in the measurement process. In order to
see the variability in the process or product, you must keep the variability from the
measurement process as small as possible. A doublepeaked histogram may result simply
from two different ways of measuring the characteristic.
Complete. Remember that there is a good chance that you will want to stratify the data as you
proceed through your analysis. Be sure to record all the relevant information associated with
each measurement.
Representative. Make sure your data represents the normal, or typical, conditions and
situations in the process.
BUS210: BUSINESS STATISTICS
North Seattle Community College
Histogram
Pg. 2 of 5
Steps in Constructing a Histogram
Step 1:
On the table of raw data, determine the high value, the low value, and the
range.
The table of amplifier gain data is shown below, with the high and low values in bold. The range of
the data is simply the difference between the high and low value. In our example, the range is 3.9
dB.
AMPLIFIER GAIN DATA
8.1
8.2
9.1
11.5
9.3
8.4
7.9
9.9
8.7
8.1
10.4
8.9
8.4
8.0
9.7
9.1
8.5
10.6
9.8
10.1
8.8
10.1
9.6
7.9
8.7
10.1
9.2
8.6
8.5
9.6
9.7
7.8
9.9 11.7
9.4
9.2
7.9
9.5
11.1
7.9
8.5
8.7
8.3
8.7 10.0
9.4
8.2
8.9
8.6
9.5
7.8
8.1
8.8
8.0
8.7 10.2
7.9
9.8
9.4
8.8
8.2 10.5
8.9
9.1
8.4
8.1
8.3
8.0
9.8
9.0
Range = high – low = 3.9db
8.0
10.9
7.8
9.0
9.4
9.2
8.3
9.7
9.5
8.9
9.3
7.8
10.5
9.2
8.8
8.4
9.0
9.1
8.7
8.1
9.0
8.3
8.5
10.7
8.3
7.8
9.6
8.0
9.3
9.7
Step 2: Decide on the number of cells.
A cell is a subdivision of the range. The table below provides a guide to help determine the number
of cells you will need. Since there are 120 data points in our example, we want approximately eight
cells in our histogram.
Alternatively, you can take the square root of the
number of data points to establish an appropriate
number of cells. In this example, the result would
be eleven. (The square root of 120 is 10.9 – round
to the nearest whole number.) As you can see,
these methods provide only guidelines – the results
may vary, depending on the method used.
Recommended Number of
Histogram Cells
Number of
Recommended
Data Points
Number of Cells
20*-50
6
51-100
7
101-200
8
201-500
9
501-1000
10
Over 1000
11-20
The number of cells in the histogram is an
important decision. The process of subdividing the
range into cells essentially groups the data and
discards some information in order to make the
* Minimum for a good histogram is 30 points.
pattern clear. In the extreme, with only one cell we
Fewer points may occur if the original histogram is stratified.
would have discarded so much information that
even the pattern is lost. At the other extreme, with
only one data value in each cell, we would be no better off than we were with the original table of
data. The recommendations above represent a good compromise, based on experience, between
these two extremes.
BUS210: BUSINESS STATISTICS
North Seattle Community College
Histogram
Pg. 3 of 5
Step 3: Calculate the approximate cell width.
Each cell will have the same width. The width of a cell is the difference between the high value and
the low value that define it. The approximate cell width equals the range divided by the
recommended number of cells. In our example:
Approximate cell width = 32.9 / 8 = 0.49 dB
Step 4: Round the cell width to a convenient number.
The usual convention is to make the cell widths 1, 2, or 5 (or 10,20,50; 0.1,0.2, 0.5, etc.). Since these
numbers all divide into 10 evenly, they will be easy to mark off on the horizontal axis of the
histogram. Sometimes, this rounding will add or subtract a cell from the recommended number. This
should not concern us. In our example here, we choose a cell width of 0.5 dB, which gives us eight
cells.
Step 5. Construct the cells by listing the cell boundaries.
The boundaries of the first cell should include the lowest value in the data. To avoid the problem of
data values falling on the boundary, the cell boundaries can be expressed to one more significant
digit than the data. Since the lowest value in our data is 7.8 dB, we will begin our first interval at
7.75 dB and construct eight cells of 0.5 dB width (i.e., Cell 1 goes from 7.75 dB to 8.25 dB; Cell 2
from 8.25 dB to 8.75 dB, etc.).
Step 6: Tally the number of data points in each cell.
Take each individual data point from the table of raw data, determine which cell it falls in, and put a
tally mark next to that cell. Continue this process through the entire table of data. As a check of this
procedure, make sure that the total number of tally marks equals the total number of data points. The
tally for our example is shown below.
Cell Boundaries
7.75 - 8.25
8.25 - 8.75
8.75 - 9.25
9.25 - 9.75
9.75 - 10.25
10.25 - 10.75
10.75 - 11.25
11.25 - 11.75
Tally
//// //// //// //// ////
//// //// //// //// //// ///
//// //// //// //// //// /
//// //// //// ////
//// //// //
//// //
//
//
Total
24
28
26
19
12
7
2
2
120
Step 7: Draw and label the horizontal axis.
Add one extra cell to each end of the horizontal axis, to leave a space on either end of the histogram.
Divide the horizontal axis into equal divisions, place numeric labels on the axis and provide a
caption to describe the measurement (in this case, gain) and the units of measurement (in this case,
dB).
BUS210: BUSINESS STATISTICS
North Seattle Community College
Histogram
Pg. 4 of 5
Step 8: Draw and label the vertical axis.
The numeric labels on the axis should range from 0 to a multiple of 5 that is greater than the largest
number of data points in any cell. In our example, the largest cell (8.25 - 8.75 dB) has 28 data points,
so we label the axis from 0 to 30. Provide a caption of "number" or "percent".
Step 9: Draw in the bars to represent the number of data points in each cell.
The height of the bars should be equal to the number of data pointsin that cell, as measured on the
vertical axis.
Step 10: Title the chart, indicate the total number of data points, and show nominal
values and limits (if applicable).
You may also wish to add other notes describing further the subject of the measurements and the
conditions under which they were taken. These notes help others interpret the chart, and they serve
as a record of the source of the data.
BUS210: BUSINESS STATISTICS
North Seattle Community College
Histogram
Pg. 5 of 5
Step 11: Identify and classify the pattern of variation.
Refer to the common histogram patterns shown below. The histogram of amplifier gain data is a
truncated distribution.
COMMON HISTOGRAM PATTERNS
Step 12: Develop a plausible and relevant explanation for the pattern.
Your interpretation must be based on your team's knowledge and observation of the specific
situation. Also, remember to confirm your theories through additional data gathering and
observation.
Download