Graphical Representation of Data Graphs, charts, and diagrams offer effective display and enable easy comprehension of complex, multifaceted relationships. Gnanadesikan and Wilk [1] point out that “man is a geometrical animal and seems to need and want pictures for parsimony and to stimulate insight.” Various forms of statistical graphs have been in use for over 200 years. Beniger and Robyn [2] cite the following as first or near-first uses of statistical graphs: Playfair’s [3] use of the bar chart to display Scotland’s 1781 imports and exports for 17 countries; Fourier’s [4] cumulative distribution of population age in Paris in 1817; Lalanne’s [5] contour plot of temperature by hour and month; and Perozzo’s [6] stereogram display of Sweden’s population for the period 1750–1875 by age groupings. Fienberg [7] notes that Lorenz [8] made the first use of the probability–probability (P–P) plot in 1905. Each of these plots is discussed subsequently. Beniger and Robyn [2], Cox [9], and Fienberg [7] provide additional historical accounts and insights on the evolution of graphics. Today, graphical methods play an important role in all aspects of a statistical investigation – from the initial exploratory plots, through various stages of analysis, to the final communication and display of results. Many persons consider graphical displays as the single most effective, robust statistical tool. Not only are graphical procedures helpful but in many cases they are essential. Tukey [10] claims that “the greatest value of a picture is when it forces us to notice what we never expected to see.” This is no better exemplified than by Anscombe’s data sets [11], where plots of four equal-size data sets (Figure 1) reveal large differences among the sets even though all sets produce the same linear regression summaries. Mahon [12] maintains that statisticians’ responsibilities include communication of their findings to decision makers, who frequently are statistically naive, and the best way to accomplish this is through the power of the picture. Good graphs should be simple, self-explanatory, and not deceiving. Cox [9] offers the following guidelines: 1. The axes should be clearly labeled with the names of the variables and the units of measurement. 2. Scale breaks should be used for false origins. 3. Comparison of related diagrams should be easy, for example, by using identical scales of measurement and placing diagrams side by side. 4. Scales should be arranged so that systematic and approximately linear relations are plotted at roughly 45° to the x axis. 5. Legends should make diagrams as nearly selfexplanatory (i.e., independent of the text) as is feasible. 6. Interpretation should not be prejudiced by the technique of presentation. Most of the graphs discussed here, which involve spatial relationships, are implicitly or explicitly on Cartesian or rectangular coordinate grids, with axes that meet at right angles. The horizontal axis is the abscissa or x axis and the vertical axis is the ordinate or y axis. Each point on the grid is uniquely specified by an x and a y value, denoted by the ordered pair (x, y). Ordinary graph paper utilizes linear scales for both axes. Other scales commonly used are logarithmic (see Figure 8) and inverse distribution fun ctions (see Figure 6). Craver [13] includes a discussion of plotting techniques with over 200 graph papers that may be copied without permission of the publisher. This discussion includes old and new graphical forms that have broad application or are specialized but commonly used. Taxonomies based on the uses of graphs have been addressed by several authors, including Tukey [14], Fienberg [7], and Schmid and Schmid [15]. This discussion includes references to more than 50 different graphical displays (Table 1) and is organized according to the principal functions of graphical techniques: Exploration Analysis Communication and display of results Graphical aids Some graphical displays are used in a variety of ways; however, each display here is discussed only in the context of its widest use. 2 Graphical Representation of Data 15 15 10 10 5 5 00 (a) 5 10 15 20 0 15 15 10 10 5 5 00 0 5 10 15 20 0 5 10 15 20 (b) 5 10 15 20 (c) 0 (d) Figure 1 Anscombe’s [11] plots of four equal-size data sets, all of which yield the same regression summaries [Reprinted with permission from American Statistical Association. 1973.] Exploratory Graphs Exploratory graphs are used to help diagnose characteristics of the data and to suggest appropriate statistical analyses and models. They usually do not require assumptions about the behavior of the data or the system or mechanism that generated the data. Data Condensation A listing or tabulation of data can be very difficult to comprehend, even for relatively small data sets. Data condensation techniques, discussed in most elementary statistics texts, include several types of frequency distributions (see, e.g., Freund [16] and Johnson and Leone [17]). These distributions associate the frequency of occurrence with each distinct value or distinct group of values in a data set. Ordinarily, data from a continuous variable will first be grouped into intervals, preferably of equal length, which completely cover the range of the data without overlap. The number or length of these intervals is usually best determined from the size of the data set, with larger sets effectively able to support more intervals. Table 2 presents four commonly used forms: frequency, relative frequency, cumulative frequency, and cumulative relative frequency. Here carbon monoxide emissions (grams per mile) of 794 cars are grouped into intervals of length 24, where the upper limit is included (denoted by the square upper interval bracket) and the lower limit is not (denoted by the lower open parenthesis). Columns 4 and 6 depict relative frequencies, which are scaled versions (divided by 794) of columns 3 and 5, respectively. The four distributions tabulated in Table 2 are useful data summaries; however, plots of them can help the data analyst develop an even better understanding of the data. A histogram is a bar graph associating frequencies or relative frequencies with data intervals. The histogram for carbon monoxide data shown in Figure 2 clearly shows a positively skew (see Skewness), unimodal distribution with modal interval (72–96). Other forms of histograms use symbols such as dots (dotarray diagram) or asterisks in place of bars, with each symbol representing a designated number of counts. A frequency polygon is similar to a histogram. Points are plotted at coordinates representing interval midpoints and the associated frequency; consecutive Graphical Representation of Data Table 1 3 Graphical displays used in the analysis and interpretation of data Exploratory plots Data condensation Histogram Dot-array diagram Stem and leaf diagram Frequency polygon Ogive Box and whisker plot Relationship among variables Two variables Three or more variables Scatter plot Sequence plot Autocorrelation plot Cross-correlation plot Labeled scatter plot Glyphs and metroglyphs Weathervane plot Biplot Face plots Fourier plot Cluster trees Similarity and preference maps Multidimensional scaling displays Graphs used in the analysis of data Distribution assessment Probability plot Q–Q plot P–P plot Hanging histogram Rootogram Poissonness plot Model adequacy and assumption verification Average versus standard deviation Residual plots Partial residual plot Component-plus-residual plot Decision making Control chart CUSUM chart Youden plot Half-normal plot Cp plot Ridge trace Communication and display of results Quantitative graphics Summary of statistical analyses Bar chart Pictogram Pie chart Contour plot Stereogram Color map Means plots Sliding reference distribution Notched-box plot Factor space/response Interaction plot Contour plot Predicted response plot Confidence region plot points are connected with straight lines (e.g., in Table 2, plot column 3 vs column 2). The form of this graph is analogous to that of a probability density function. A disadvantage of a grouped-data histogram is that individual data points cannot be identified since all the data falling in a given interval are indistinguishable. A display that circumvents this difficulty is the stem and leaf diagram, a modified histogram with “stems” corresponding to interval groups and “leaves” corresponding to bars. Tukey [10] gives a thorough discussion of stem and leaf and its variations. Graphical aids Power curves Sample-size curves Confidence limits Nomographs Graph paper Trilinear coordinates An ogive is a graph of the cumulative frequencies (or cumulative relative frequencies) against the upper limits of the intervals (e.g., from Table 2, plot column 5 vs the upper limit of each interval in column 1) where straight lines connect consecutive points. An ogive is a grouped-data analog of a graph of the empirical cumulative distribution function and is especially useful in graphically estimating percentiles (see Quantiles), which are data values associated with specified cumulative percents. Figure 3 shows the ogive for the carbon monoxide data and how it is used to obtain the 25th percentile (i.e., the lower quartile). 200 0.25 160 0.20 120 0.15 80 0.10 40 0.05 0 12 60 108 156 204 252 Grams per mile 300 348 Relative frequency Graphical Representation of Data Frequency 4 0 Figure 2 Histogram of Environmental Protection Agency surveillance data (1957–1967) on carbon monoxide emissions from 794 cars Table 2 Frequency distributions of carbon monoxide data (1) Interval 1. (0–24] 2. (24–48](a) 3. (48–72] 4. (72–96] 5. (96–120] 6. (120–144] 7. (144–168] 8. (168–192] 9. (192–216] 10. (216–240] 11. (240–264] 12. (264–288] 13. (288–312] 14. (312–336] 15. (336–360] (a) (2) (3) (4) (5) Interval midpoint Frequency Relative frequency Cumulative Frequency (6) Cumulative relative frequency 12 36 60 84 108 132 156 180 204 228 252 276 300 324 348 13 98 161 189 148 85 45 30 10 5 5 1 2 1 1 0.016 0.123 0.203 0.238 0.186 0.107 0.057 0.038 0.013 0.006 0.006 0.001 0.003 0.001 0.001 13 111 272 461 609 694 739 769 779 784 789 790 792 793 794 0.016 0.140 0.343 0.581 0.767 0.874 0.931 0.969 0.981 0.987 0.994 0.995 0.997 0.999 1.000 Notation designates inclusion of all values greater than 24 and less than or equal to 48 Another display, which highlights five important characteristics of a data set, is a box and whisker or box plot. The box, usually aligned vertically, encloses the interquartile range (see Descriptive Statistics), with the lower line identifying the 25th percentile (lower quartile, see Quartiles) and the upper one the 75th (upper quartile). A line sectioning the box displays the 50th percentile (median) and its relative position within the interquartile range. The whiskers at either end may extend to the extreme values, or, for large data sets, to the 10th/90th or 5th/95th percentiles. These plots are especially convenient for comparing two or more data sets, as shown in Figure 4 for winter snowfalls of Buffalo and Rochester, New York. (See Tukey [10] for further discussion, and McGill et al. [18] for some variations.) Relationships between Two Variables Often of interest is the relationship, if any, between x and y, or the development of a model to predict y 800 100 Table 3 640 80 Unit 480 60 320 40 160 20 0 0 48 96 25th percentile 144 192 240 Grams per mile 288 336 Cumulative percent Cumulative frequency Graphical Representation of Data 0 384 Figure 3 Ogive of automobile carbon monoxide emissions data shown in Figure 2 Scale 200 199 160 162 120 108 97 89 75 80 78 71 40 40 42 0 Buffalo Rochester Figure 4 Box and whisker plots comparing winter snowfalls of Buffalo and Rochester, New York, (1939–1940 to 1977–1978) and demonstrating little distributional difference (contrary to popular belief). (Based on local climatological data gathered by the National Oceanic and Atmospheric Administration, National Climatic Center, Asheville, NC) given the value of x. Ordinarily, an (x, y) measurement pair is obtained from the same experimental unit, as shown in Table 3. A usual first step is to construct a scatter plot, a collection of plotted points representing the measurement pairs (xi , yi ), i = 1, . . . , n. The importance of scatter plots was seen in Figure 1. These four very different sets of data yield the same regression line Object Person Product x Diameter Age Raw material purity 5 y Weight Height Quality (drawn on the plots) and associated statistics [11]. Consequently, the numerical results of an analysis, without the benefit of a look at plots of the data, could result in invalid conclusions. The objective of regression analyses is to develop a mathematical relationship between a measured response or dependent variable, y, and two or more predictor or independent variables, x1 , x2 , . . . , xp . A usual initial step is to plot y versus each of the x variables individually and to plot each xi versus each of the other x’s. This results in a total of p + p(p − 1)/2 plots. Plots of y versus xi enable one to identify the xi variables that appear to have large effects, to assess the form of a relationship between the y and xi variables, and to determine whether any unusual data points are present. Plots of xi versus xj , i = j , help to identify strong correlations that may exist among the predictor variables. It is important to recognize such correlations because least-squares regression techniques work best when these correlations are small [19] (see Collinearity; Data Collection). In many instances data are collected sequentially in time and a plot of the data versus the sequence of collection can help identify sources of important effects. Figure 5 shows a sequence plot of gasoline mileage of an automobile versus the sequence of gasoline fill-ups. The large seasonal effect (summer mileage is higher than winter mileage) and the increase in gasoline mileage due to a major tune-up are clearly evident. Observations collected sequentially in time, such as the gasoline mileage data plotted in Figure 5, form a time series (see Time Series Analysis). Statistical modeling of a time series is largely accomplished by studying the correlation between observations separated by 1, 2, . . . , n − 1 units in time. For example, the lag-1 autocorrelation is the correlation coefficient between observations collected at time i and time i + 1, i = 1, 2, . . . , n − 1; it measures the linear relationship among all pairs of consecutive observations (see Autocorrelation Function). An autocorrelation 6 Graphical Representation of Data 24 22 Miles per gallon 20 18 16 14 12 Tune up 10 June Jan 1969 June 1970 Jan June 1971 Jan June 1972 Jan Data of fill-up Figure 5 Sequence plot of gasoline mileage data. Note the seasonal variation and the increased average value and decreased variation in mileage after tune-up plot may be used to study the “correlation structure” in the data where the lag-j autocorrelation coefficient computed between observations i and i + j is plotted versus lag j . A cross-correlation plot between two time series is developed similarly. The lag-j correlation coefficient, computed between observations i in one series and observations i + j in the other series, is plotted versus the lag j . Box and Jenkins [20] discuss the construction and interpretation of these plots, in addition to presenting examples on the use of sequence plots and the modeling of time series. Relationships among More than Two Variables Scatter plots directly display relationships between two variables. Values of a third variable can be incorporated in a labeled scatter plot, in which each plotted point (whose location designates the values of two variables) is labeled by a symbol designating a level of the third variable. Anderson [21] extended these to “pictorialized” scatter plots, called glyphs and metroglyphs, where each coordinate point is plotted as a circle and has two or more rays emanating from it; the length of each ray is indicative of the value of the variable associated with that ray. Bruntz et al. [22] developed a variation of the glyph for four variables, called the weathervane plot, where values of two of the variables are again indicated by the plotted coordinates and the other two by using variable-sized plotting symbols and variable-length arrows attached to the symbols. Tukey [10], Gabriel’s biplot [23], and Mandel [24] provide innovative methods for displaying twoway tables of data. All three approaches involve fitting a model to the table of data and then constructing various plots of the coefficients in the fitted model to study the relationships among the variables. Chernoff [25] used facial characteristics to display values of up to 18 variables through face plots. Each face represents a multivariate datum point, and each variable is represented by a different facial characteristic, such as size of eyes or shape of mouth. Experience has shown that the interpretation of these plots can be affected by how the variables are assigned to facial characteristics. Exploratory plots of raw data are usually less effective with measurements on four or more variables (e.g., height, weight, age, sex, race, etc., of a subject). The usual approach then is to reduce the dimensionality of the data by grouping variables with common properties or identifying and eliminating unimportant variables. The variables plotted may be functions of the original variables as specified by a statistical model. Exploratory analysis of the data is then conducted with plots of the “reduced” data. Andrews’ Fourier plots [26], data clustering [27], similarity and preference mapping [28], and geometrical representations of multidimensional scaling Graphical Representation of Data Graphs Used in the Analysis of Data The graphical methods discussed next generally depend on assumptions of the analysis. Decisions made from these displays may be either subjective in nature, such as a visual assessment of an underlying distribution, or objective, such as an out-of-control signal from a control chart. Distribution Assessment and Probability Plots The probability plot is a widely used graphical procedure for data analysis. Since other graphical techniques discussed in this article require a basic understanding of it, a brief discussion follows (see Probability Plots). A probability plot on linear rectangular coordinates is a collection of two-dimensional points specifying corresponding quantiles from two distributions. Typically, one distribution is empirical and the other is a hypothesized theoretical one. The primary purpose is to determine visually if the data could have arisen from the given theoretical distribution. If the empirical distribution is similar to the theoretical one, the expected shape of the plot is approximately a straight line; conversely, large departures from linearity suggest different distributions and may indicate how the distributions differ. Imagine a sample of size n in which the data y1 , . . . , yn are independent observations on a random variable Y having some continuous distribution function (df) (see Probability Density Function (PDF)). The ordered data y(i) , y(1) ≤ · · · ≤ y(n) , i = 1, . . . , n, represent sample Quantiles and are plotted against theoretical quantiles, xi = F −1 (pi ), where F −1 denotes the inverse of F , the hypothesized df of Y . Moreover, F (y) may involve unknown location (ν) and scale (δ) parameters (not necessarily the mean and the standard deviation) as long as F ((y − ν)/δ) is completely specified. If Y has a df F , then pi = F (xi ) = F ((y(i) − ν)/δ); xi is called the reduced y(i) variate and is a function of the unknown parameters. Selection of pi for use in plotting has been much discussed. Suggested choices have been i/(n + 1), (i − 1/2)/n, (2i − 1)/2n, and (i − 3/8)/(n + 1/4). Kimball [32] discusses some choices of pi in the context of probability plots. Now F −1 (pi ) is not expressible in closed form for most commonly encountered distributions and thus provides an obstacle to easy evaluations of xi . An equivalent procedure often employed to avoid this difficulty is to plot y(i) against pi on probability paper, which is rectangular graph paper with an F −1 scale for the p axis. The pi entries on the p axis are commonly called plotting positions. Naturally, a different type of probability paper is needed for each family of distributions, F . Normal probability paper, with F −1 based on the normal distribution, is most common, although many others have been developed (see, e.g., King [33]). Normal probability paper is available in two versions: arithmetic probability paper, where the data axis has a linear scale, and logarithmic probability paper, where the data axis has a natural logarithmic scale. The latter version is used to check for a lognormal distribution (see Probability Density Functions). To illustrate the procedure, two data sets of size n = 15 have been plotted on normal (arithmetic) probability paper shown in Figure 6, using pi = 10 9 8 7 Data values analyses [29] are examples of such procedures. One often attempts to determine whether there are two or more groups (i.e., clusters) of observations within the data set. When several different groups are identified, the next step is usually to determine why the groups are different. These methods are sophisticated and require computer programs for implementation on a routine basis. (See Gnanadesikan [30], Everitt [31].) 7 6 5 4 3 2 Data set I Data set II 1 0 0.01 0.1 12 5 20 40 60 80 95 99 99.9 Plotting position, 100 pi Figure 6 Comparison of two data sets plotted on normal probability paper. Set I can be adequately approximated by a normal distribution, whereas set II cannot Graphical Representation of Data i/(n + 1). Here the horizontal axis is labeled in percentage; 100pi has been plotted against the ith smallest observation in each set. The plotted points of set I appear to cluster around the straight line drawn through them, visually supportive evidence that these data come from a population that can be adequately approximated by a normal distribution. The points for set II, however, bend upward at the higher percents, suggesting that the data come from a distribution with a longer upper tail (i.e., larger upper quantiles) than the normal distribution. If set I data are viewed as sufficiently normal, graphical estimates of the mean (µ) and the standard deviation (σ ) are easily obtained by noting that the 50th percentile of the normal distribution corresponds to the mean and that the difference between the 84th and 50th percentiles corresponds to one standard deviation. The respective graphical estimates from the line fitted by eye for µ and σ are 5.7 and 7.0 − 5.7 = 1.3, respectively. The conclusions from Figure 6 were expected because the data from set I are, in fact, random normal deviates with µ = 6 and σ = 1. The data in set II are random lognormal deviates with µ = 0.6 and σ = 1. A plot of set II data on logarithmic probability paper produces a more nearly linear collection of points. Visual inference, such as determining here whether the collection of points forms a straight line, is fairly easy, but should be used with theoretical understanding to enhance its reliability. A curvilinear pattern of points based on a large sample offers more evidence against the hypothesized distribution than does the same pattern based on a smaller sample. For example, the plot of set I data exhibits some asymmetric irregularities, which are due to random fluctuations; however, a similar pattern of irregularity based on a much larger sample would be much more unlikely from a normal distribution. Daniel [34] and Daniel and Wood [35] give excellent discussions on the behavior of normal probability plots. The probability plot is a special case of the quantile–quantile (Q−Q) plot [36] (see Probability Plots), which is a quantile–quantile comparison of two distributions, either or both of which may be empirical or theoretical; whereas a probability plot is typically a display of sample data on probability paper (i.e., empirical vs theoretical). Q−Q plots are particularly useful because a straight line will result when comparing the distributions of X and Y , whenever one variable can be expressed as a linear function of the other. Q−Q plots are relatively more discriminating in low-density or low-frequency regions (usually the tails) of a distribution than near high-density regions, since in low-density regions quantiles are rapidly changing functions of p. In the plot, this translates into comparatively larger distances between consecutive quantiles in low-density areas. The quantiles in Figure 6 illustrate this, especially the larger empirical ones of set II. A related plot considered by Wilk and Gnanadesikan [36] is the P − P plot. Here, for varying xi , pi1 = F1 , (xi ) is plotted against pi2 = F2 (xi ), where Fj , j = 1, 2, denotes the df (empirical or theoretical). If F1 = F2 for all xi , the resulting plot is a straight line with unit slope through the origin. This plot is especially discriminating near high-density regions, since here the probabilities are more rapidly changing functions of xi than in low-density regions. The P − P plot is not as widely used as the Q−Q plot since it does not remain linear if either variable is transformed linearly (e.g., by a location or scale change). For large data sets, an obvious approach for comparison of data with a probability model is a graph of a fitted theoretical density (parameters estimated from data), with the appropriate scale adjustment, superimposed on a histogram. Gross differences between the ordinates of the distributions are easily detected. A translation of the differences to a reference line (instead of a reference curve) to facilitate visual discrimination is easily accomplished by hanging the bars of the histogram from the density curve [37]. 0.05 Relative frequency 8 0.00 −0.05 12 60 108 156 204 252 Grams per mile 300 348 Figure 7 Hanging carbon monoxide histogram from fitted lognormal distribution Graphical Representation of Data Figure 7 illustrates the hanging histogram, where the histogram for the carbon monoxide data (Figure 2) is hung from a lognormal distribution. Slight, systematic variation about the reference line suggests that the data are slightly more skewed to the right in the high-density area than in the lognormal distribution. Further improvements in detecting systematic variation may be achieved by rootograms [37]. A hanging rootogram is analogous to a hanging histogram except that the square roots of the ordinate values are graphed. The suspended rootogram is an upside-down graph of the residuals about the baseline of the hanging rootogram. Graphical assessments for discrete distributions can also be made by comparing the histogram of the data to the fitted probability density, p(x). However, as with continuous distributions, curvilinear discrimination may be difficult and linearizing procedures are helpful. A general approach is to determine a function of p(x), say, r(x) = r(p(x)), which is linearly related to a function of x, say, s(x). Then using sample data one calculates relative frequencies to estimate p(x), evaluates r(x), and plots r(x) against s(x). The absence of systematic departures from linearity offers some evidence that the data could arise from density p(x). The slope and intercept will be functions of the parameters and can be used to estimate the parameters graphically. A suitable r(x) may be obtained by simply transforming p(x); for example, taking logarithms of the density of the discrete Pareto distribution, where p(x) ∝ x λ , gives r(x) = log p(x) and s(x) = log x. In other cases, ratios of consecutive probabilities (e.g., p(x + 1)/p(x)) are linear functions of s(x) [38]. Table 4 summarizes these ratios for three commonly encountered discrete distributions. Ord [39] expands on the foregoing ideas by defining a class of discrete distributions where r(x) = xp(x)/(p(x − 1)) is a linear function of x (i.e., s(x) = x)), thereby keeping the same abscissa scale. Distributions in this class are binomial, negative binomial, Poisson, logarithmic, and uniform. These graphical tests for discrete distributions may be difficult to interpret because the sample relative frequencies have nonhomogeneous variances. This difficulty may be compounded when using ratios of the relative frequencies as functions of s(x). These procedures are therefore recommended more as exploratory than confirmatory. 9 Another graphical technique for the Poisson distribution (see Probability Density Functions) is the Poissonness plot [40], similar in spirit to probability plotting. It can also be applied to truncated Poisson data or any one-parameter exponential family of discrete distributions such as the binomial. For further insights, see Parzen [41]. Model Adequacy and Assumption Verification Any statistical analysis is based on certain assumptions. Those usually associated with least-squares regression analysis are that experimental errors are independent and have a homogeneous variance and a normal (Gaussian) distribution. It is standard practice to check these assumptions as part of the analysis. These checks, most often done graphically, have the desirable by-product of forcing the analyst to look at the data critically. This can be effectively accomplished by graphical analysis of both the raw data and residuals from the fitted model. In addition to assumption verification, this evaluation frequently results in the discovery of unusual observations or unsuspected relationships. Most of the plots discussed below are applications of graphical forms previously discussed. If repeat observations have been obtained for each of k groups representing different situations or conditions being studied, a scatter plot of the group standard deviation, si , versus the group mean, y i , i = 1, . . . , k, will appear random and show little correlation when the homogeneous variance assumption is satisfied. Box et al. [42] point out that if these assumptions are not satisfied, this plot can be used to determine a transformed measurement scale on which the assumptions will be more nearly satisfied (Figure 8). The normal distribution assumption can be checked in this situation from a histogram of the Residuals, rij = yij − y i , between the observations in each group (yij ) and the average of the group (y i ). This histogram will tend to be bell-shaped if the normal distribution and homogeneous variance assumptions are satisfied. Alternatively, especially for small data sets, the rij ’s may be plotted on normal probability paper. The expected shape is a straight line if these assumptions are appropriate. Replicate observations often are not available. However, the analysis assumptions and adequacy of the form of the model can still be checked by 10 Graphical Representation of Data Table 4 Distribution Binomial Poisson Pascal p(x) r(x) π x (1 − π )n−x , x = 0, . . . , n e−λ λx , x = 0, 1, 2, . . . x−1 k x! π (1 − π )x−k , x = k, k + 1, . . . k−1 p(x+1) p(x) p(x) p(x+1) p(x) p(x+1) n x Intercept + π − 1−π Slope × (n+1)π 1−π 1 λ k−1 1−π 1 λ 1 1−π s(x) 1 x+1 x 1 x the correlation structure on the utility of these plots is negligible [45]. The plot of residuals on normal probability paper provides a check on the normal distribution assumption. Substantive deviations from linearity may be due to a nonnormal distribution of experimental errors, the presence of atypical (i.e., outlying) data points, or an inadequate model (Figure 9). The residuals (ri ) versus the predicted values (ŷi ) plot will show a random distribution of points (no trends, shifts, or peculiar points) if all assumptions are satisfied. Any curvilinear relationship indicates that the model is inadequate (Figure 10). Nonhomogeneous variance is indicated if the spread in ri changes with ŷi . When the spread increases linearly with ŷi , a log transformation (i.e., replace y in the analysis 0.40 0.30 0.20 Standard deviation (s) = 0.10 0.08 0.06 0.04 0.03 + 0.2 0.3 0.4 Average (y ) 0.6 0.8 1.0 Figure 8 Toxic agent data [42, Table 7.11]. Linear log–log relationship suggests that a power transformation will produce homogeneous variances. The slope of the line indicates the necessary power [WileyInterscience.] constructing plots of the residuals or standardized residuals [43], from the fitted model. The residual associated with observation yi is ri = yi − ŷi , where ŷi is the value of yi predicted by the model fitted to the data. Four types of residual plots are routinely constructed [44]: plots on normal probability paper, residuals (ri ) versus predicted values (ŷi ), sequence plot of residuals, and residuals (ri ) versus predictor variables (xj ). Note that although the residuals are not mutually independent, the effect of 0 Outlier (a) − Normal probability + r = y − y^ 0.01 0.1 r = y − y^ 0.02 (b) 0 − Normal probability Figure 9 Normal probability plot of residuals. Plot (a) shows an outlying data point and plot (b) shows a set of residuals not normally distributed 11 Graphical Representation of Data Log transformation suggested 0 − (a) + Level shift + r = y − y^ r = y − y^ + y^ 0 − Higher – order terms needed 1 2 Test sequence Linear trend Outlier 0 − r = y − y^ r = y − y^ + (b) n 3 (a) 0 y^ Figure 10 Residuals versus fitted values. Plot (a) shows increased residual variability with increasing fitted values and plot (b) shows a curvilinear relationship by y = log y) will often produce a response scale on which the homogeneous variance assumption will be satisfied (Figure 10). Plots of the residuals (ri ) versus raw observations (yi ) are of little value because they will always show a linear correlation with a value of (1 − R 2 )1/2 , where R 2 is the coefficient of determination (see Coefficient of Determination (R 2 )) of the fitted model [46]. If all analysis assumptions are satisfied, the sequence plot of the residuals (ri ) can be expected to show a random distribution of points and contain no trends, shifts, or atypical points. Any trends or shifts here suggest that one or more variables not included in the model may have changed during the collection of the data (Figure 11). This plot may show cycles in the residuals and other dependencies, indicating that the assumption of independence of experimental errors is not appropriate. This assumption can also be checked by constructing an autocorrelation plot of the residuals (see earlier discussion). The residuals (ri ) versus the predictor variables (xi ) plot should also show a random distribution of points. Any smooth patterns or trends − (b) 1 2 n 3 Test sequence Figure 11 Sequence plot of residuals. Plot (a) shows an abrupt change and plot (b) shows a gradual change due to factors not accounted for by the model suggests that the form of the model may not be appropriate (Figure 12). The scatter plots of residuals discussed above are not independent of each other. Peculiarities and trends observed in one plot usually show up in one or more of the others. Collectively, these plots provide a good check on data behavior and the model construction process. With a large number of predictor variables it is sometimes hard to see relationships between y and xi in scatter plots. Larsen and McCleary [47] developed the partial residual plot to overcome this problem. Wood [48] and Daniel and Wood [35] refer to these as component-plus-residual plots. A regression model must be fitted to the data before the plot can be constructed. In effect, the relationships of all the other variables are removed and the plot of the component-plus-residual versus xi displays only the computed relationship between y and xi and the residual variation in the data. 12 Graphical Representation of Data Higher–order terms needed 0 − X Figure 12 Residuals versus a predictor variable Plots for Decision Making At various points in a statistical analysis, decisions concerning the effects of variables and differences among groups of data are made. Statisticians and other scientists have developed a variety of statistical techniques that use graphical displays to make these decisions. In some instances (e.g., control charts, Youden plots) these contain both the data in raw or reduced form and a measure of their uncertainty. The user, in effect, makes decisions from the plot rather than by calculating a test statistic. In other situations (e.g., half-normal plot, ridge trace) a measure of uncertainty is not available but the analyst makes decisions concerning the magnitude of an effect or the appropriateness of a model by assessing deviations from expected or desired appearance. The control chart (see Control Charts, Overview) is widely used to control industrial production and analytical measurement processes [49]. It is a sequence plot of a measurement or statistic (average, range, etc.) versus time sequence together with limits to reflect the expected random variation in the plotted points. For example, on a plot of sample averages, limits of ±3 standard deviations are typically shown about the process average to reflect the uncertainty in the averages. The process is considered to be out of control if a plotted average falls outside the limits. This suggests that a process shift has occurred and a search for an assignable cause should be made. The cumulative sum control chart (see Cumulative Sum (CUSUM) Chart) is another popular process control technique, particularly useful in detecting small process shifts [50, 51]. Ott [53] used the control chart concept to develop his analysis-of-means plotting procedure for the interpretation of data that would ordinarily be analyzed by analysis-of-variance techniques. Schilling [54, 55] systematized Ott’s procedure and extended it past the cross-classification designs to incomplete block experiments and studies involving random effects. The analysis-of-means procedure enables those familiar with control chart concepts and technology to quickly develop an ability to analyze the results from experimental designs. The Youden plot [56] was developed to study the ability of laboratories to perform a test procedure. Samples of similar materials, A and B, are sent to a number of laboratories participating in a collaborative test. Each laboratory runs a predetermined number of replicate tests on each sample for a number of different characteristics. A Youden plot is constructed for each measured characteristic. Each point represents a different laboratory, where the average of the replicate results on material A is plotted versus the average results on material B (Figure 13). Differences along a 45° line reflect between-lab variation. Differences in the direction perpendicular to the 45° line reflect within-lab and lab-by-material interaction variation. An uncertainty ellipse can be used to identify problem laboratories. Any point outside this ellipse is an indication that the associated laboratory’s results are significantly different from those of the laboratories within the ellipse. Mandel and Lashof [52] generalized and extended Stress at 300% elongation for material 305–405 r = y − y^ + 1350 1300 1250 1200 1150 1100 1050 1000 950 900 850 800 1100 1200 1300 1400 1500 1600 Stress at 300% elongation for material 105–205 Figure 13 Youden plot [52] [American Society of Quality, 1974.] 13 Graphical Representation of Data Rank order of absolute effect on half-normal probability scale 15 Purity Catalyst concentration 14 pH 12 Solvent concentration Temperature 10 8 6 4 2 5 10 15 20 25 30 Effect of variable 35 40 Figure 14 Half-normal plot from a 25−1 factorial experiment showing four important effects on a color response 100 df d 90 cdf cd cde def de 80 70 bd cdef bdf 60 bdef bcdf bcd bde bcdef bcde Cp the construction and interpretation of Youden’s plot. Ott [57] showed how this concept can be used to study paired measurements. For example, “before” and “after” measurements are frequently collected to evaluate a process change or a manufacturing stage of an industrial process. Daniel [34] developed the Half-Normal Plot) to interpret two-level factorial and fractional Factorial Experiments. It displays the absolute value of the n − 1 contrasts (i.e., main effects and interactions) from an n-run experiment, versus the probability scale of the half-normal distribution (Figure 14). Identification of large effects (positive or negative) is enhanced by plotting the absolute value of the contrast. Alternatively, to preserve the sign of the contrast, the contrast value may be plotted on normal probability paper [42]. With no significant effects, either plot will appear as a straight line. Significant effects are indicated by the associated contrast falling off the line. Although this plot is usually assessed visually, uncertainty limits and decision guides have been developed by Zahn [58, 59]. The half-normal plot is not restricted to two-level experiments and can be used in the interpretation of any experiment for which the effects can be described by independent 1-degree-of-freedom contrasts. The half-normal plot was significant in marking the beginning of extensive research on probability plotting methods in the early 1960s. During this time the statistical community became convinced of the usefulness and effectiveness of graphical techniques; many of these developments were discussed earlier. 50 40 30 ae ac ad a 20 a IN ab a AND b IN 10 acdf abc abe abd abce abcd abde acef 0 1 2 3 4 adef acdef abcde abcdef abcef abcf, abef abf 0 acde acd, ace aef acf, adf ade abcdf abdef abdf 5 6 7 P Figure 15 Cp plot [60]. The six variables in the equation are denoted by a, b, c, d, e, and f . The number of terms in the equation is denoted by P [Reprinted with permission from American Statistical Association. 1966.] Research in the 1960s and 1970s also focused on regression analysis and the fitting of equations to data when the predictor variables (x’s) are correlated. Two displays, the Cp plot and the ridge trace, were developed as graphical aids in these studies. The Cp plot, suggested by C. L. Mallows and popularized by Gorman and Toman [60] and Daniel and Wood [35], is used to determine which variables should be included in the regression equation. It (Figure 15) is constructed by plotting, for each equation considered, Cp versus p, where Cp = RSSp /s 2 − (n − 2p), p is the number of terms in the equation, RSSp is the residual sum of squares for the p-term equation of interest, n is the total number of observations, and s 2 is the residual mean square obtained when all the variables are included in the equation. If all the important terms are in the equation, Cp = p. The line Cp = p is included on the plot and one looks for the equations that have points falling near this line. Points above the line indicate that significant terms have not been included. The objective is to find the equation with the smallest 14 Graphical Representation of Data . number of terms for which Cp = p. The ridge trace, developed by Hoerl and Kennard [19] and discussed by Marquardt and Snee [61], identifies which regression coefficients are poorly estimated because of correlations among the predictor variables (x’s). This is accomplished by plotting the regression coefficients versus the bias parameter, k, which is used to calculate the ridge regression coefficients, β̂ = (X X + kI)−1 X Y (Figure 16). Coefficients whose sign and magnitude are affected by correlations among the predictor variables (x’s) will change rapidly as k increases. Hoerl and Kennard recommend that an appropriate value for k is the smallest for which the coefficients “stabilize” or change very little as k increases. If the coefficients remain nearly constant for k > 0, then k = 0 is suggested; this indicates that the least-squares coefficients should be used. Communication and Display of Results Quantitative Graphics Quantitative graphics encompass general methods of presenting and summarizing numerical information, 1.0 0.8 Regression coefficient 0.6 0.4 3 1 2 0.2 0 −0.2 −0.4 −0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 Ridge bias (k) Figure 16 Ridge trace showing instability of regression coefficients 1 and 2 usually for easy comprehension by the layman. There can be no complete catalog of these graphics since their form is limited only by one’s ingenuity. Beniger and Robyn [2] trace the historical development since the seventeenth century of quantitative graphics and offer several pictorial examples illustrating the improved sophistication of graphs. Schmid and Schmid [15], in a comprehensive handbook, discuss and illustrate many quantitative graphical forms. A few of the more common forms follow briefly. A bar chart is similar in appearance to a histogram (Figure 2) but more general in application. It is frequently used to make quantitative comparisons of qualitative variables, as a company might do to compare revenues among departments. Comparisons within and between companies are easily made by superimposing a similar graph of another comparable company and identifying the bars. A pictogram is similar to the bar chart except that “bars” consist of objects related to the response tallied. For example, figures of people might be used in a population comparison where each figure represents a specified number of people [62]. When partial objects are used (e.g., the bottom half of a person), it should be stated whether the height or volume of the figure is proportional to frequency. A pie chart is a useful display for comparing attributes on a relative basis, usually a percentage. The angle formed by each slice of the pie is an indication of that attribute’s relative worth. Governments use this device to illustrate the number of pennies of a taxpayer’s dollar going into each major budget category. A contour plot may effectively depict a relationship among three variables by one or more contours. Each contour is a locus of values of two variables associated with a constant value of the third. A relief map that shows latitudinal and longitudinal locations of constant altitude by contours or isolines is a familiar example. Contours displaying a response surface, y = f (x1 , x2 ), are discussed subsequently (Figure 21). A stereogram is another form for displaying spatial relationship of three variables in two dimensions, similar to a draftsman’s threedimensional perspective drawing in which the viewing angle is not perpendicular to any of the object’s surfaces (planes). The field of computer graphics has rapidly expanded, offering many new types of graphics, see Tufte [63]. Graphical Representation of Data Summary of Statistical Analysis Many of the displays previously discussed are useful in summarizing and communicating the results of a study. For example, histograms or dot-array diagrams are used to display the distribution of data. Box plots are useful (see Figure 4) for large data sets or when several groups of data are compared. The results of many analyses may be displayed by a means plot of the group means (y) together with a measure of their uncertainty, such as standard error (y ± SE), confidence limits (y ± tSE), or least significant interval limits (LSI = y ± LSD/2). The use of the LSI [65] is particularly advantageous because the intervals provide a useful decision tool (Figure 17). Any two averages are significantly different at the assigned probability level if and only if their LSIs do not overlap. Intervals based on the standard error and confidence limits for the mean do not have this straightforward interpretation. Similar intervals can be developed for other multiple comparison procedures, such as those developed by Tukey (honest significant difference), Dunnett, or Scheffé. Box et al. [42] use the sliding reference distribution to display and compare means graphically. The distribution width is determined by the Standard Error of the means. Any two means not within the bounds of the distribution are judged to be significantly different. They also use this display in the interpretation of factor effects in two-level factorial experiments. The notched-box plot of McGill et al. [18] is the nonparametric analog of the LSI-type plots discussed above. The median of the sample and confidence limits for the difference between two medians are displayed in this plot; any two medians whose intervals do not overlap are significantly different. McGill et al. [18] discuss other variations and applications of the box plot. Factorial experimental designs (see Factorial Experiments) are widely used in science and engineering. In many instances the effects of two and three variables are displayed on factor space/response plots with squares and cubes similar to those shown in Figure 18. The numbers in these figures are the mean responses obtained at designated values of the independent variables studied. These figures show the region, or factor space, over which the experiments were conducted, and aid the investigator in determining how the response changes as one moves around in the experimental region. An interaction plot is used to study the nature and magnitude of the interaction of two variables from a designed experiment by plotting the response (averaged over all replicates) versus the level of one of the independent variables while holding the 20 Males Male rat weight gain Any two means whose intervals do not overlap are significantly different at the 0.05 probability level 30 Females 16 28 14 26 12 24 C1 C2 200 1000 2500 C1 C2 200 1000 2500 Dose of test compound (ppm) Figure 17 18 Least significant interval plot [64] [International Biometrics Society, 1979.] 10 Female rat weight gain 34 32 15 16 Graphical Representation of Data Metal etching process weight loss (GM) Brittleness (%) of iron bars 60 25 Yes 684 90 10 52 35 250° tu re 10 0 30 2 30 45° 450° 30 M 1215 No 713 ol d te m pe ra Stirred Injection cycle (s) 2966 530° Solution temperature (°C) Injection temperature (°C) 22 Factorial design 23 Factorial design Figure 18 Factor space/response plots showing results of 22 and 23 experiments. The average responses are shown at the corners of the figures [66] Metal etching process 3500 3000 2500 Weight loss other independent variable(s) fixed at a given level. In Figure 19 it is seen by the nonparallel lines that the size of the effect of solution temperature on weight loss depends on whether stirring occurred. Solution temperature and stirring are said to interact; another way of saying that their effects are not additive. Monlezun [67] discusses ways of plotting and interpreting three-factor interactions. The objective of many experiments is to develop a prediction model for the response (y) of the system as a function of experimental (predictor) variables. A typical two-predictor variable model, based on a second-order Taylor series, is 2000 Stirred 1500 × × × × y = b0 + b1 X1 + b2 X2 + b12 X1 X2 + b11 X12 + b22 X22 (1) where the b’s are coefficients estimated by regression analysis techniques. One of the best ways to understand all the effects described by this equation is by a contour plot, which gives the loci of X1 and X2 values associated with a fixed value of the response. By constructing contours on a rectangular X1 − X2 grid for a series of fixed values of y, one obtains a global picture of the response surface (Figure 20a). 1000 500 ×× × × Not stirred 0 30 Solution temperature (°C) Figure 19 Nonparallelism of lines suggests an interaction between stirring and solution temperature. Stirring has a larger effect at 30 ° C than at 0 ° C Graphical Representation of Data High Medium (a) Low 95 97 0 10 99 X2 = Enzyme concentration Contour curves showing constant sensitivity 83 87 95 92 97 100 Sensitivity 95 90 85 X2 low X2 medium 80 X2 high 75 Low Medium High X1 = pH (b) Figure 20 Contour plot of (a) quadratic response surface and (b) X1 –X2 interaction plot [70] [American Association for Clinical Chemistry, 1979.] The response surface of a mixture system such as a gasoline or paint can be displayed by a contour plot on a triangular grid [68, 69]. Another use of trilinear coordinates is shown in Figure 22. Predicted response plots help in interpreting interaction terms in regression equations [71]. The interaction between Xi and Xj is studied by fixing the other variables in the equation at some level (e.g., set Xk = X k , k = i, j ) and constructing an interaction plot of the values predicted by the equation for different values of Xi and Xj (Figure 20b). Similar plots are also useful in interpreting response surface models for mixture systems [72] and in the analysis of the results of mixture screening experiments [73]. A contour plot is also the basis for a confidence region plot, used to display a plausible region of simultaneous hypothesized values of two or more parameters. For example, Figure 21 [74] shows a contour representing the joint confidence region (see Confidence Intervals) for regression parameters β1 and β2 associated with each of four coal classes. The figure clearly shows that these four groups do not all have the same values of regression parameters. Confidence region plots on trilinear coordinates [75] can also be used to display and identify variable dependence in two-way contingency tables where one variable is partitioned into three categories (see also Draper et al. [76]). A two-dimensional plot results (Figure 22) from the constraint that the sum of the 6 Volatile content coefficient (b2) 17 Black p1 = 1 4 B 2 C 0 A −2 p3 = 0 −4 D Brown −6 −8 p2 = 0 Hazel −80 −60 −40 −20 0 Coking rate coefficient (b1) Figure 21 Joint 95% β1 − β2 confidence regions for coal classes A, B, C, and D [74] [American Society for Quality, 1973.] Brunette or red p2 = 1 Green Blue p1 = 0 Blond p3 = 1 Figure 22 Joint 95% confidence region plots for hair color probabilities for brown, hazel, green, and blue eye colors [75] [Reprinted with permission from American Statistical Association. 1974.] 18 Graphical Representation of Data three probabilities, π1 , π2 , and π3 , is unity. It is also possible to display three-dimensional response surfaces or confidence regions via a series of twodimensional slices through three-dimensional space. [2] [3] [4] Graphical Aids [5] Statisticians, like chemists, physicists, and other scientists, rely on graphical devices to help them do their job more effectively. The following graphs are used as “tools of the trade” in offering a parsimonious representation for complex functional relationships among two or more variables. Graphs of power functions (see Power) or operating characteristics (e.g., Natrella [77]) are used for the evaluation of error probabilities of hypothesis tests (see Hypothesis Testing) when expressed as a function of the unknown parameter(s). Test procedures are easily compared by superimposing two power curves on the same set of axes. Sample-size curves are constructed from a family of operating characteristic curves, each associated with a different sample size. These curves are useful in planning the size of an experiment to control the chances of wrong decisions. Natrella [77] offers a number of these for some common testing situations. Similar contour graphics using families of curves indexed by sample size are also useful for determining confidence limits (see Confidence Intervals). These are especially convenient when the endpoints are not expressible in closed form, such as those for the correlation coefficient or the success probability in a Bernoulli sample [78]. Nomographs are graphical representations of mathematical relationships, frequently involving more than three variables. Unlike most graphics, they do not offer a picture of the relationship, but only enable the determination of the value of (usually) any one variable from the specification of the others. Levens [79] discusses techniques for straightline and curved scale nomograph construction. Other statistical nomographs have appeared in the Journal of Quality Technology. [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] References [22] [1] Gnanadesikan, R. & Wilk, M.B. (1969). in Multivariate Analysis, P.R. Krishnaiah, ed, Academic Press, New York, Vol. 2, pp. 593–637. Beniger, J.R. & Robyn, D.L. (1978). Quantitative graphics in statistics: a brief history, American Statistician 32, 1–11. Playfair, W. (1786). The Commercial and Political Atlas, London. Fourier, J.B.J. (1821). Recherches Statistiques sur la Ville de Paris et le Department de la Seine, Vol. 1, pp. 1–70. Lalanne, L. (1845). Appendix to Cours Complet de météorologie de L.F. Kaemtz, translated and annotated by C. Martins, Paris. Perozzo, L. (1880). Della rappresentazione graphica di una collettivita di individui nella successione del tempo, Annals of Statistics 12, 1–16. Fienberg, S.E. (1979). Graphical methods in statistics, American Statistician 33, 165–178. Lorenz, M.O. (1905). Methods for measuring concentration of wealth, Journal of the American Statistical Association 9, 209–219. Cox, D.R. (1978). Some remarks on the role of statistics in graphical methods, Applied Statistics 27, 4–9. Tukey, J.W. (1977). Exploratory Data Analysis, Addison-Wesley, Reading, Mass. Anscombe, F.J. (1973). Graphs in statistical analysis, American Statistician 27, 17–21. Mahon, B.H. (1977). Journal of the Royal Statistical Society. A 140, 298–307. Craver, J.S. (1980). Graph Paper from Your Copier, H.P. Books, Tucson, Ariz. Tukey, J.W. (1972). in Statistical Papers in Honor of George W. Snedecor, T.A. Bancroft, ed, Iowa State University Press, Ames, Iowa. Schmid, C.F. & Schmid, S.E. (1979). Handbook of Graphic Presentation, 2nd Edition, John Wiley & Sons, New York. Freund, J.E. (1976). Statistics: A First Course, 2nd Edition, Prentice-Hall, Englewood Cliffs. Johnson, N.L. & Leone, F.C. (1977). Statistics and Experimental Design in Engineering and the Physical Sciences, 2nd Edition, John Wiley & Sons, New York, Vol. 1. McGill, R., Tukey, J.W. & Larsen, W.A. (1978). Variations of box plots, American Statistician 32, 12–16. Hoerl, A.E. & Kennard, R.W. (1970). Ridge regression: biased estimation of nonorthogonal problems, Technometrics 12, 55–70. Box, G.E.P. & Jenkins, G.M. (1970). Time Series Analysis: Forecasting and Control, Holden-Day, San Francisco. Anderson, E. (1960). A semigraphical method for the analysis of complex problems, Technometrics 2, 387–391. Bruntz, S.M., Cleveland, W.S., Kleiner, B. & Warner, J.L. (1974). Proceedings of the Symposium on Atmospheric Diffusion and Air Pollution, American Meteorological Society, 125–128. Graphical Representation of Data [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] Gabriel, K.R. (1971). The biplot graphic display of matrices with application to principal component analysis, Biometrika 58, 453–467. Mandel, J. (1971). A new analysis of variance model for non-additive data, Technometrics 13, 1–18. Chernoff, H. (1973). Using faces to represent points in kdimensional space graphically, Journal of the American Statistical Association 68, 361–368. Andrews, D.F. (1972). Plots of high-dimensional data, Biometrics 28, 125–136. Hartigan, J.A. (1975). Clustering Algorithms, WileyInterscience, New York. Green, P.E. & Carmone, F.J. (1970). Multidimensional Scaling and Related Techniques in Marketing Analysis, Allyn & Bacon, Boston. Lingoes, J.C., Roskam, E.E. & Borg, I. (1979). Geometrical Representations of Relational Data – Readings in Multidimensional Scaling, Mathesis Press, Ann Arbor. Gnanadesikan, R. (1977). Methods for Statistical Data Analysis of Multivariate Observations, John Wiley & Sons, New York. Everitt, B.S. (1978). Graphical Techniques for Multivariate Data, North-Holland, Amsterdam. Kimball, B.F. (1960). On the choice of plotting positions on probability paper, Journal of the American Statistical Association 55, 546–560. King, J.R. (1971). Probability Charts for Decision Making, Industrial Press, New York. Daniel, C. (1959). Use of half-normal plots in interpreting factorial two-level experiments, Technometrics 1, 311–341. Daniel, C. & Wood, F.S. (1980). Fitting Equations to Data, 2nd Edition, John & Wiley Sons, New York. Wilk, M.B. & Gnanadesikan, R. (1968). Probability plotting methods for the analysis of data, Biometrika 55, 1–17. Wainer, H. (1974). The suspended rootogram and other visual displays: an empirical validation, American Statistician 28, 143–145. Dubey, S.D. (1966). Graphical tests for discrete distributions, American Statistician 20, 23–24. Ord, J.K. (1967). Graphical methods for a class of discrete distributions, Journal of the Royal Statistical Society. A 13, 232–238. Hoaglin, D.C. (1980). A Poissonness plot, American Statistician 34, 146–149. Parzen, E. (1979). A very good discussion of the properties of quantile functions, density quantile, Journal of the American Statistical Association 74, 105–121. Box, G.E.P., Hunter, W.G. & Hunter, J.S. (1978). Statistics for Experimenters, Wiley-Interscience, New York. Draper, N.R. & Behnken, D.W. (1972). Residuals and their variance patterns, Technometrics 14, 101–111. Draper, N.R. & Smith, H. (1981). Applied Regression Analysis, 2nd Edition, John Wiley & Sons, New York. Anscombe, F.J. & Tukey, J.W. (1963). The examination and analysis of residuals, Technometrics 5, 141–160. [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] 19 Jackson, J.E. & Lawton, W.H. (1967). Technometrics 9, 339–341. Larsen, W. & McCleary, S. (1972). The use of partial residual plots in regression anal, Technometrics 14, 781–790. Wood, F.S. (1973). The use of individual effects and residuals in fitting equations to data, Technometrics 15, 677–695. Grant, E.L. & Leavenworth, R.S. (1980). Statistical Quality Control, 5th Edition, McGraw-Hill, New York. Barnard, G.A. (1959). Control charts and stochastic processes, Journal of the Royal Statistical Society. B 21, 239–271. Lucas, J.M. (1976). The design and use of V-mask control schemes, Journal of Quality Technology 8, 1–12. Mandel, J. & Lashof, T.W. (1974). Interpretation and generalization of. Youden’s two-sample diagram, Journal of Quality Technology 6, 22–36. Ott, E.R. (1967). Industrial Quality Control 24, 101–109. Schilling, E.G. (1973). A systemic approach to the analysis of means, Journal of Quality Technology Part 1, 5, 93–108. Schilling, E.G. (1973). A systemic approach to the analysis of means, Journal of Quality Technology Parts 2 and 3 5, 147–159. Youden, W.J. (1959). Graphical diagnosis of interlaboratory test results, Industrial Quality Control 15, 133–137. Ott, E.R. (1957). Industrial Quality Control 13, 1–4. Zahn, D.A. (1975). Modiÿcations of and revised critical values for the half-normal plot, Technometrics 17, 189–200. Zahn, D.A. (1975). An empirical study of the halfnormal plot, Technometrics 17, 201–212. Gorman, J.W. & Toman, R.J. (1966). Selection of variables for fitting equations to data, Technometrics 8, 27–51. Marquardt, D.W. & Snee, R.D. (1975). Ridge regression in practice, American Statistician 29, 3–20. Joiner, B.L. (1975). International Statistical Review 43, 339–340. Tufte, E.R. (1997). Visual Explanation; Images and Quantities, Evidence and Narrative, Graphics Press, Cheshire. Snee, R.D., Acuff, S.K. & Gibson, J.R. (1979). A useful method for the analysis of growth studies, Biometrics 35, 835–848. Andrews, H.P., Snee, R.D. & Sarner, M.H. (1980). Graphical display of means, American Statistician 34, 195–199. Bennett, C.A. & Franklin, N.L. (1954). Statistical Analysis in Chemistry and the Chemical Industries, John Wiley & Sons, New York. Monlezun, C.J. (1979). Two-dimensional plots for interpreting interactions in the three-factor analysis of variance model, American Statistician 33, 63. Cornell, J.A. (1981). Experiments with Mixtures, WileyInterscience, New York. 20 [69] [70] [71] [72] [73] [74] [75] Graphical Representation of Data Snee, R.D. (1979). Experimenting with mixtures, Chemtech 9, 702–710. Rautela, G.S., Snee, R.D. & Miller, W.K. (1979). Response-surface co-optimization of reaction conditions in clinical chemical methods, Clinical Chemistry 25, 1954–1964. Snee, R.D. (1973). Some Aspects of nonorthogonal data analysis. Part I. Developing prediction equations, Journal of Quality Technology 5, 67–79. Snee, R.D. (1975). Technometrics 17, 425–430. Snee, R.D. & Marquardt, D.W. (1976). Screening concepts and designs for experiments with mixtures, Technometrics 18, 19–29. Snee, R.D. (1973). Journal of Quality Technology 5, 109–122. Snee, R.D. (1974). Graphical display of two-way contingency tables,American Statistician 28, 9–12. [76] Draper, N.R., Hunter, W.G. & Tierney, D.E. (1969). Which product is better? Technometrics 11, 309–320. [77] Natrella, M.G. (1963). Experimental Statistics, National Bureau of Standards Handbook 91, U.S. Government Printing Office, Washington, DC. [78] Pearson, E.S. & Hartley, H.O. (1970). Biometrika Tables for Statisticians, Cambridge University Press, Cambridge, Vol. 1. [79] Levens, A.S. (1959). Nomography, John Wiley & Sons, New York. RONALD D. SNEE AND CHARLES G. PFEIFER Article originally published in Encyclopedia of Statistical Sciences, 2nd Edition (2005, John Wiley & Sons, Inc.). Minor revisions for this publication by Jeroen de Mast.