BUS210: BUSINESS STATISTICS North Seattle Community College Histogram Pg. 1 of 5 HOW TO CONSTRUCT A HISTOGRAM With a calculator, graph paper, and a few sharp pencils, you can easily construct histograms for use in your quality improvement team's diagnostic efforts. An even easier approach is to use one of the many graphic software packages that automatically generate histograms from a table of data. In this section, we will show the step-by-step construction of a histogram, using an example to illustrate. At the end of this section, you will find a one-page summary of the construction steps, for easy reference. Getting Ready As with all the analytical tools presented in this workbook, a good histogram starts with good data. The data that we need is any set of measurement )f a continuous variable such as time, weight, length, volume, temperature, ohms, tensile strength, percent of an impurity, etc. Histograms can also be constructed based on counts of discrete counts such as number of defects per unit, number of items in a container, number of tasks completed per hour, etc. Of course, you cannot mix different units of measure on a single histogram. Since a team often needs to later produce stratified histograms for a thorough analysis, it pays to spend some time thinking about the other information you may need for future stratification. Imagine the factors that might result in different histogram patterns - different machines, times of day, input conditions, etc. (See the chapter on Stratification in this workbook for more information on this topic.) If you decide to use sample data (i.e., because it is impractical to measure every item), use the largest sample size possible, consistent with budget and time constraints. Unless your team has access to statistical expertise for data analysis, we recommend that you use a sample size of at least 30 measurements in constructing a histogram. If more than one histogram may be constructed, 30 observations are needed for each one. If you have at least eight data points, but fewer than 30, you can use the "box plot" to display and analyze the data. Sometimes, the data you need already exists in routine inspections, log books, or as the result of a special study. If the data does not exist, your team should develop a method for gathering it. To be successful in constructing and analyzing histograms, your data must be: • • • Accurate. The pattern of variability in your histogram is a combination of the variability inherent in the process or product and the variability in the measurement process. In order to see the variability in the process or product, you must keep the variability from the measurement process as small as possible. A doublepeaked histogram may result simply from two different ways of measuring the characteristic. Complete. Remember that there is a good chance that you will want to stratify the data as you proceed through your analysis. Be sure to record all the relevant information associated with each measurement. Representative. Make sure your data represents the normal, or typical, conditions and situations in the process. BUS210: BUSINESS STATISTICS North Seattle Community College Histogram Pg. 2 of 5 Steps in Constructing a Histogram Step 1: On the table of raw data, determine the high value, the low value, and the range. The table of amplifier gain data is shown below, with the high and low values in bold. The range of the data is simply the difference between the high and low value. In our example, the range is 3.9 dB. AMPLIFIER GAIN DATA 8.1 8.2 9.1 11.5 9.3 8.4 7.9 9.9 8.7 8.1 10.4 8.9 8.4 8.0 9.7 9.1 8.5 10.6 9.8 10.1 8.8 10.1 9.6 7.9 8.7 10.1 9.2 8.6 8.5 9.6 9.7 7.8 9.9 11.7 9.4 9.2 7.9 9.5 11.1 7.9 8.5 8.7 8.3 8.7 10.0 9.4 8.2 8.9 8.6 9.5 7.8 8.1 8.8 8.0 8.7 10.2 7.9 9.8 9.4 8.8 8.2 10.5 8.9 9.1 8.4 8.1 8.3 8.0 9.8 9.0 Range = high – low = 3.9db 8.0 10.9 7.8 9.0 9.4 9.2 8.3 9.7 9.5 8.9 9.3 7.8 10.5 9.2 8.8 8.4 9.0 9.1 8.7 8.1 9.0 8.3 8.5 10.7 8.3 7.8 9.6 8.0 9.3 9.7 Step 2: Decide on the number of cells. A cell is a subdivision of the range. The table below provides a guide to help determine the number of cells you will need. Since there are 120 data points in our example, we want approximately eight cells in our histogram. Alternatively, you can take the square root of the number of data points to establish an appropriate number of cells. In this example, the result would be eleven. (The square root of 120 is 10.9 – round to the nearest whole number.) As you can see, these methods provide only guidelines – the results may vary, depending on the method used. Recommended Number of Histogram Cells Number of Recommended Data Points Number of Cells 20*-50 6 51-100 7 101-200 8 201-500 9 501-1000 10 Over 1000 11-20 The number of cells in the histogram is an important decision. The process of subdividing the range into cells essentially groups the data and discards some information in order to make the * Minimum for a good histogram is 30 points. pattern clear. In the extreme, with only one cell we Fewer points may occur if the original histogram is stratified. would have discarded so much information that even the pattern is lost. At the other extreme, with only one data value in each cell, we would be no better off than we were with the original table of data. The recommendations above represent a good compromise, based on experience, between these two extremes. BUS210: BUSINESS STATISTICS North Seattle Community College Histogram Pg. 3 of 5 Step 3: Calculate the approximate cell width. Each cell will have the same width. The width of a cell is the difference between the high value and the low value that define it. The approximate cell width equals the range divided by the recommended number of cells. In our example: Approximate cell width = 32.9 / 8 = 0.49 dB Step 4: Round the cell width to a convenient number. The usual convention is to make the cell widths 1, 2, or 5 (or 10,20,50; 0.1,0.2, 0.5, etc.). Since these numbers all divide into 10 evenly, they will be easy to mark off on the horizontal axis of the histogram. Sometimes, this rounding will add or subtract a cell from the recommended number. This should not concern us. In our example here, we choose a cell width of 0.5 dB, which gives us eight cells. Step 5. Construct the cells by listing the cell boundaries. The boundaries of the first cell should include the lowest value in the data. To avoid the problem of data values falling on the boundary, the cell boundaries can be expressed to one more significant digit than the data. Since the lowest value in our data is 7.8 dB, we will begin our first interval at 7.75 dB and construct eight cells of 0.5 dB width (i.e., Cell 1 goes from 7.75 dB to 8.25 dB; Cell 2 from 8.25 dB to 8.75 dB, etc.). Step 6: Tally the number of data points in each cell. Take each individual data point from the table of raw data, determine which cell it falls in, and put a tally mark next to that cell. Continue this process through the entire table of data. As a check of this procedure, make sure that the total number of tally marks equals the total number of data points. The tally for our example is shown below. Cell Boundaries 7.75 - 8.25 8.25 - 8.75 8.75 - 9.25 9.25 - 9.75 9.75 - 10.25 10.25 - 10.75 10.75 - 11.25 11.25 - 11.75 Tally //// //// //// //// //// //// //// //// //// //// /// //// //// //// //// //// / //// //// //// //// //// //// // //// // // // Total 24 28 26 19 12 7 2 2 120 Step 7: Draw and label the horizontal axis. Add one extra cell to each end of the horizontal axis, to leave a space on either end of the histogram. Divide the horizontal axis into equal divisions, place numeric labels on the axis and provide a caption to describe the measurement (in this case, gain) and the units of measurement (in this case, dB). BUS210: BUSINESS STATISTICS North Seattle Community College Histogram Pg. 4 of 5 Step 8: Draw and label the vertical axis. The numeric labels on the axis should range from 0 to a multiple of 5 that is greater than the largest number of data points in any cell. In our example, the largest cell (8.25 - 8.75 dB) has 28 data points, so we label the axis from 0 to 30. Provide a caption of "number" or "percent". Step 9: Draw in the bars to represent the number of data points in each cell. The height of the bars should be equal to the number of data pointsin that cell, as measured on the vertical axis. Step 10: Title the chart, indicate the total number of data points, and show nominal values and limits (if applicable). You may also wish to add other notes describing further the subject of the measurements and the conditions under which they were taken. These notes help others interpret the chart, and they serve as a record of the source of the data. BUS210: BUSINESS STATISTICS North Seattle Community College Histogram Pg. 5 of 5 Step 11: Identify and classify the pattern of variation. Refer to the common histogram patterns shown below. The histogram of amplifier gain data is a truncated distribution. COMMON HISTOGRAM PATTERNS Step 12: Develop a plausible and relevant explanation for the pattern. Your interpretation must be based on your team's knowledge and observation of the specific situation. Also, remember to confirm your theories through additional data gathering and observation.