MULTIDIMENSIONAL SCALING: AN INTRODUCTION Workshop in Methods Indiana University December 7, 2012 William G. Jacoby Department of Political Science Michigan State University Inter-university Consortium for Political and Social Research University of Michigan I. Basic Objectives of Multidimensional Scaling (MDS) A. MDS produces a geometric model of proximities data 1. Start with data on similarities (or dissimilarities) among a set of stimulus objects. 2. MDS represents each stimulus as a point within a space. 3. Similarities are represented by interpoint distances— greater similarity between two stimuli is shown by a smaller distance between the two points representing those stimuli. B. Hopefully, the point configuration produced by MDS will make sense in substantive terms. 1. Clusters of points may correspond to groups of stimuli that are distinct from each other in terms of their characteristics. 2. Directions within the space may correspond to properties and characteristics that differentiate among stimuli. C. A simple (but real) example, using 2004 electorate perceptions of prominent political figures. 1. Input data are contained within a square, symmetric, matrix of perceptual dissimilarities. 2. Dissimilarities data are represented almost perfectly by a two-dimensional point configuration. 3. Geometric model can be interpreted very easily, even without any information about the MDS procedure that produced the “map.” Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 2 D. In general, MDS is a very useful and flexible strategy for discerning structure within data. II. Utility of MDS for Social Research A. Dimension reduction 1. Each of k stimulus objects has a “profile” consisting of its values across q variables (note that q can be a very large number) 2. In principle, could use the variables that make up the profiles as coordinate axes, to plot k points (representing the stimuli) in q-dimensional space. 3. Unfortunately, we encounter the “curse of dimensionality” if q is larger than two (or maybe three) 4. Instead, use MDS to determine whether the information contained within q dimensions can be summarized adequately in a much lower-dimensioned, m-dimensional space 5. If m is a sufficiently small positive integer, such as two or three, we can draw a picture which plots the stimulus points within the m-dimensional space 6. Hopefully, the relative positions of the stimulus points will “make sense” in substantive terms, and summarize the important ways that the stimuli differ from each other. B. In survey research contexts, MDS is very useful for modeling respondent perceptions. 1. Survey questions usually ask respondents to: a. Provide affective responses to stimuli (e.g., issue positions) b. Make preferential choices among stimulus objects (e.g., presidential candidates) c. Evaluate stimuli according to specified criteria (e.g., place themselves, political parties, and/or candidates along a liberal-conservative continuum). 2. Interpretation of survey responses usually presupposes that respondents’ beliefs about the stimuli conform to the researcher’s prior expectations. This is not always the case: a. Respondents may not perceive stimuli (e.g., issue alternatives or candidates for public office) in the same manner. b. Respondents may not actually use the evaluative criteria that are of interest to the researcher (e.g., they may not think about politics in ideological terms). 3. MDS provides empirical evidence about respondents’ perceptual structures and the evaluative criteria they actually employ when thinking about the stimuli in question. Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 3 C. Theory Testing 1. Many social scientific theories can be recast in terms of spatial models 2. Could apply MDS to appropriate data in order to estimate the parameters of the relevant spatial model, and determine whether the empirical results comform to the theory-based predictions. D. Many different variations of MDS, most of which are relevant for survey research. 1. Classical MDS (sometimes shown as “CMDS”) assumes one set of input data and produces one geometric representation (i.e., it assumes homogeneity across all respondents). 2. Weighted MDS (sometimes shown as “WMDS”) allows for individual differences in perceptions of a common set of stimuli (i.e., allows dimensions to have greater or lesser “importance” for different subsets of respondents). 3. Unfolding models (sometimes called “ideal points models”) can represent respondent’s preferential choices among the stimuli (i.e., respondents shown as a second set of points in the same space; greater preference for a stimulus corresponds to smaller distance between points representing that respondent and that stimulus). E. Not too demanding of the input data. 1. There are many different measures of dissimilarity that can be employed as input data. 2. Nonmetric MDS only requires ordinal-level input data (though it still produces metric, or interval-level output). F. A very useful measurement tool 1. Can produce interval-level measurement of respondent characteristics and evaluative criteria, using only ordinal-level response data. 2. Again, this is important because it enables researcher to investigate empirically (rather than merely assume)the judgmental standards that respondents bring to bear on stimuli. G. Main results of MDS are graphical in nature and, therefore, usually quite easy to interpret. 1. Researchers can often discern structure that would otherwise remain hidden in complex data. 2. The graphical output from MDS can be used very easily to convey analytic results to lay audiences and clients. Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 4 III. Metric Multidimensional Scaling A. A very simple example, using mileage distances between cities. We will begin by carrying out a familiar task: 1. Start with a map, which illustrates the relative geographic locations of a set of American cities. 2. The map is a geometric model in which cities are represented as points in two-dimensional space. The distances between the points are proportional to the geographic proximities of the cities. 3. Using the map/model it is easy to construct a square matrix containing the distances between any pair of cities. 4. The matrix, itself, is analogous to the mileage chart that is often included with road maps. B. MDS “reverses” the preceding task. 1. MDS uses the matrix of distances (i.e., the “mileage chart”) as input data. 2. The output from MDS consists of two parts: a. A model showing the cities as points in space, with the distances between the points proportional to the entries in the input data matrix (i.e., a map). b. A goodness-of-fit measure showing how closely the geometric point configuration corresponds to the data values from the input data matrix. C. Basic definitions and nomenclature. 1. While we typically say that MDS models proximities, we usually assume that the input data are dissimilarities. a. Dissimilarities are the “opposite” of proximities in that larger data values indicate that two objects are ‘less proximal to each other. b. While admittedly a bit confusing, the use of dissimilarities guarantees that data values are directly (rather than inversely) related to the distances in the MDS solution. c. Proximities can always be converted to dissimilarities by subtracting them from an arbitrary constant, and vice versa. 2. Information about the proximities between all possible pairs of k stimulus objects is contained in a square, symmetric, matrix of order k. This matrix is called ∆, with cell entry δij giving the dissimilarity between the ith and j th stimuli. Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 5 3. The number of dimensions in the MDS solution is shown as m. a. In principle, m can be any integer from 1 to k -1. Larger values of m are possible, but unnecessary. b. Ideally, k is a very small integer (i.e., 1, 2, or 3). This is useful because the MDS results can be presented very easily in pictorial form. 4. The first part of the MDS output (i.e., the “map” of the stimulus points) consists of the k by m matrix, X. Each cell entry in X, shown as xip , shows the coordinate of the point representing stimulus i along the pth dimension within the m-dimensional space produced by the MDS solution. 5. In the simplest form of metric MDS, the dissimilarities between the stimuli are assumed to be equal to the interpoint distances in m-dimensional space (which are, themselves, shown as “dij ” for the distance between the points representing stimuli i and j ): δij = dij = [ m X (xip − xjp )2 ]0.5 p=1 D. Restating the objective of the MDS: Find X, using only the information in ∆. E. Procedure for metric MDS (developed by Warren S. Torgerson). 1. “Double-center” the ∆ matrix, producing a new matrix, ∆∗ , with cell entries δij∗ . a. Double-centering is a simple transformation which changes ∆ so that the row means, the column means, and the grand mean of the entries in the new matrix, ∆∗ are all equal to zero. b. The formula for double-centering is simple, but not particularly informative. For each cell in the ∆ matrix, create the corresponding entry in ∆∗ by carrying out the following operation: δij∗ = −0.5(δij2 − δi.2 − δ.j2 + δ..2 ) 2. Perform an eigendecomposition on ∆∗ as follows: ∆∗ = V Λ2 V 0 Where V is the matrix of eigenvectors and Λ2 is the diagonal matrix of eigenvalues. 3. Define X as follows, using only the first m eigenvectors and eigenvalues: X =VΛ Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 6 4. Plot k points representing the rows of X within an m-dimensional coordinate system defined by the columns of X. This is the MDS “map.” 5. Defining a goodness of fit measure. a. The eigendecomposition is variance-maximizing. That is, each successive dimension (i.e., eigenvector) “explains” the maximum amount of variance remaining in the data, after taking any previous dimensions into account. b. The eigenvalues measure the variance explained by each dimension, and the sum of the eigenvalues is equal to the variance of the entries in ∆∗ . c. The proportion of variance accounted for by the m dimensions in the MDS solution is given by the sum of the first m eigenvalues, divided by the sum of all eigenvalues (there will usually be k nonzero eigenvalues): Pm λ2p 2 p=1 λp Metric MDS Fit = Pp=1 k F. Important analogy: Metric MDS is accomplished by performing a principal components analysis on the double-centered dissimilarities matrix. 1. Using the results provided above, we can present the metric MDS solution as follows: ∆∗ = XX 0 2. The preceding equation shows that we can think of metric MDS as a matrix operation that is analogous to taking the square root of a scalar number. (Note, however, that this operation is not taking the square root of the ∆∗ matrix; that is a completely different matrix operation). IV. A Critically Important Idea: Generalizing the Applicability A. If MDS works with physical distances, it should also work with “conceptual distances.” B. A new example. 1. Use the same set of ten American cities as stimuli 2. Define dissimilarities in terms of social, economic, and cultural differences among the cities. a. Places Rated Almanac evaluates cities on a variety of criteria. b. For each pair of cities, take the sum of squared differences of their scores across the criteria. Optionally, take the square root. c. Result is called a “profile dissimilarity” measure. C. Simply perform metric MDS on matrix of profile dissimilarities. Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 7 D. Potential problem: Dimensionality is not known in advance. 1. General idea: Each additional dimension in the MDS solution explains more variance in the dissimilarities data. Use only as many dimensions as are necessary to obtain a satisfactory amount of explained variance. 2. Proceed by examining eigenvalues. The basic idea is that “meaningful” dimensions should account for variance in the dissimilarities data. Therefore, the dimensionality of the solution should be equal to the number of eigenvectors that have “large” eigenvalues. 3. A graphical approach is often used to evaluate dimensionality. a. A scree plot graphs eigenvalues against the corresponding dimension number. Adjacent points in the scree plot are connected with line segments. b. Look for an “elbow” in the scree plot. Dimensionality corresponds to the number of dimensions that falls just prior to the elbow. 4. There is often a trade-off between low dimensionality (which enhances visualization of the solution) and explained variance (which creates a model that more accurately reproduces the entries in the dissimilarities data. E. MDS of conceptual dissimilarities can often be used to discern substantively interesting patterns and structures in the point configuration. This is particularly the case when the dimensionality of the MDS solution is small enough to facilitate a graphical representation of the stimuli. V. Measurement Level: A Potential Problem A. Conceptual dissimilarities (e.g., survey respondents’ judgments about the similarities of stimuli) are often measured at the ordinal level. B. In fact, often not a problem at all. Note that it is always the analyst who determines the measurement level of the data. C. Could perform MDS on ordinal data, simply “pretending” those data are measured at the interval level. 1. Going back to the intercity distances, we can use the rank-order of the distances between cities, rather than the actual mileages. 2. Metric MDS of the rank-ordered distances produces a solution that is virtually identical to that obtained from the mileage data. D. The preceding approach is problematic (even though it often works quite well). 1. It really is “cheating” with respect to the characteristics of the data. 2. The rank-ordered data matrix involves an implicit assumption that the amount of dissimilarity is uniform across all stimulus pairs. Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 8 E. Much more satisfactory solution: Develop an MDS method that uses only the ordinal properties of the dissimilarities data (and still produces a map of the stimuli with interval-level measurement of distances between points). VI. Nonmetric Multidimensional Scaling: The Basic Idea A. General objective of nonmetric MDS 1. Find a configuration of stimulus points in m dimensions such that the interpoint distances are monotonically related to the dissimilarities. 2. In other words, as dissimilarities between pairs of stimuli increase, the distances between the respective point pairs representing the two stimuli never decrease. Beyond this simple criterion, there is no limitation on the relationship between dissimilarities within the input data and distances in the MDS solution. 3. The monotonic relationship between dissimilarities and scaled distances can be shown formally, as follows (i, j, q, and r are stimulus objects and, as before, δ represents a pairwise dissimilarity while d represents the scaled distance between two points): If δij < δqr then dij ≤ dqr B. Nonmetric MDS requires a new procedure for obtaining the solution. 1. Recall that metric MDS employed a variance-maximizing procedure which added dimensions, if necessary, to account for a satisfactory amount of variance in the dissimilarities data. 2. The concept of “variance” is undefined for ordinal data, so a variance-maximizing solution strategy cannot be used in nonmetric MDS. 3. Instead, specify dimensionality and obtain a complete MDS solution in a space with the hypothesized number of dimensions. If the scaled configuration of points is “sufficiently monotonic” with respect to the dissimilarities, the analysis is complete. If there are too many violations of the monotonicity requirement (that is, δij < δqr but dij > dqr ), then increase the dimensionality and try again. 4. To obtain a nonmetric solution, tentatively locate points in space of specified dimensionality, then move them around until the interpoint distances correspond closely enough to dissimilarity information in data. 5. If the data conform to the assumptions of the MDS model, the final point locations will be tightly “constrained”; that is, they cannot be moved very much (relative to each other) without violating the monotonicity requirement for the dissimilarity-distance relationship. Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 9 VII. Nonmetric Multidimensional Scaling: Intuitive (and Informal) Example A. Example uses artificial data on perceptual dissimilarities among four presidential candidates. 1. Data could be obtained from a single respondent or, more likely, aggregated (e.g., averaged) across a larger sample of respondents. 2. Data can be arranged within a square symmetric matrix, as before. Cell entries rank-order the dissimilarities. With four candidates, there are six dissimilarities; hence, values in the matrix cells range from one to six. 3. For this example, data will be presented differently. candidate pairs will be arrayed in order, from the least-dissimilar (or most similar) pair to the most dissimilar (or least similar) pair. 4. Caveat: This example is used for instructional purposes only. Nonmetric MDS should never be carried out with only four stimuli! B. Start by attempting a unidimensional scaling solution. 1. Arrange candidate points along a number line such that points representing more dissimilar pairs of candidates are located farther apart than points representing less dissimilar candidate pairs. 2. The unidimensional MDS solution fails. It is impossible to locate points along a single dimension in a way that reflects the dissimilarities among the candidates. C. Attempting a two-dimensional scaling solution. 1. Begin by locating candidate points randomly within two-dimensional space. 2. This random configuration will almost certainly not produce distances that are monotonic to the dissimilarities. 3. The random point arrangement is merely a “starting configuration” for the two- dimensional MDS solution. The scaling procedure will move the points around within the space until the distances are monotonic to the dissimilarities. D. In order to guide the point movements, we will generate “target distances” for each pair of stimulus points. 1. Target distances are usually called “disparities” in the MDS literature. The target distance for the pair of points representing stimuli i and j is shown as dˆij . 2. Target distances are compared against the current, scaled, distances between the points representing i and j. If dij < dˆij then the points need to be moved farther apart from each other. If dij > dˆij then the points need to be moved closer together. Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 10 E. Disparities possess two important characteristics: 1. Disparities are as close as possible to the actual distances in the current MDS solution. 2. Disparities are always monotonic to the input dissimilarities, even if the current distances are not. F. Calculating disparities for the current MDS point configuration. 1. Sort the interpoint distances according to the dissimilarities between the corresponding candidate pairs. 2. As we move downward through the sorted array (from least to most dissimilar pairs), the interpoint distances should never decrease. Anytime they do, it is a violation of the monotonicity requirement. 3. Rules for obtaining disparities: a. Wherever possible, simply use the actual distance as the disparity. b. When monotonicity is violated, take mean of adjacent distances until monotonicity is re-established. c. When moving through the array of distances, the rule is “Compare forward and average backward.” d. This procedure is called “Kruskal’s monotone regression.” The resultant disparities are the values that come as close as possible (in the leastsquares sense) to the current scaled distance, but which are still monotonic with the input dissimilarities. G. After calculating all disparities, move the points (using the comparison of the current distances and the corresponding disparities to guide the movements) in order to create a new configuration. H. Calculate disparities for the new point configuration and, once again, move points using the disparities as guides. I. Repeat the process until no further point movements are necessary (i.e., dij = dˆij for all possible pairs of stimulus points, i and j). This implies that the scaled distances are monotonic with the dissimilarities. Hence, it is a “perfect” scaling solution. J. The remarkable feature of the MDS solution is that the relative locations of the points are fairly tightly constrained within the two-dimensional space. 1. The relative positions of the points cannot be changed too much, without violating the monotonicity requirement. 2. With only four points, the location constraints are not very “tight.” 3. As the number of points increases (relative to the dimensionality), the point locations become more tightly fixed, relative to each other. Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 11 VIII. Nonmetric Multidimensional Scaling: Application to “Real” Data A. With real data, perfect MDS solutions are rare (but they do occur!). 1. Much more typical to obtain an MDS solution in which the interpoint distances are nearly, but not perfectly, monotonic to the dissimilarities. 2. Therefore, it is necessary to develop a fit measure, which can be used to assess the quality of the current MDS solution. B. Kruskal’s Stress Coefficient (named for Joseph Kruskal, one of the pioneers of nonmetric MDS) 1. With nonmetric MDS, we try to make the distances as close as possible to the corresponding disparities. 2. This objective can be formalized by saying that we want to minimize the following expression: #pairs X (dij − dˆij )2 3. However, the measurement units in the MDS solution are arbitrary, so we will need to standardize the solution somehow. 4. It is also usually more convenient to deal with distances (and disparities) rather than squared distances. 5. Based upon the preceding considerations, Kruskal’s Stress1 coefficient is defined as follows: #0.5 " P#pairs (dij − dˆij )2 Stress1 = P#pairs 2 dij 6. The Stress coefficient is a badness-of-fit measure. a. Smaller values indicate better scaling solutions b. The minimum value of Stress is zero. 7. Kruskal also developed a second Stress coefficient: (dij − dˆij )2 Stress2 = P#pairs (dij − d¯ij )2 " P#pairs #0.5 a. Stress2 provides the same kind of information as Stress1 , although its value will always be larger. b. Stress2 does have a particularly convenient interpretation: It shows the proportion of the variance in the scaled distances that is inconsistent with the monotonicity assumption. C. The objective of nonmetric MDS is to find the configuration of points within a given dimensionality that minimizes the Stress coefficient. Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 12 D. Many people are uncomfortable with a badness-of-fit measure like the Stress coefficient. It is possible to develop goodness-of-fit measures, too. 1. Could take the Spearman rank correlation between the input dissimilarities and the scaled distances. 2. Could take the Pearson correlation between the disparities and the scaled distances. 3. Either of the two preceding correlations can be used to assess the degree to which the scaled distances are monotonic to the dissimilarities data. E. The Shepard diagram, a graphical diagnostic tool. 1. The Shepard diagram (named after Roger Shepard, one of the pioneers of nonmetric MDS) is a scatterplot of the scaled distances versus the input dissimilarities. 2. The points in the Shepard diagram represent stimulus pairs. 3. The points in the Shepard diagram should conform to a monotonically-increasing pattern. A nonparametric regression curve (e.g., loess) can be fitted to the points in order to characterize the shape of the monotonic relationship. F. Determining appropriate dimensionality for the MDS solution. 1. Analytic objective is to find a solution in the minimum number of dimensions necessary in order to accurately reproduce the input dissimilarities. 2. Substantive theory and prior hypotheses are often useful, particularly if the number of stimuli is relatively small. 3. As a more “objective” guide, create a scree plot. a. Instead of the eigenvalues used in the metric MDS, substitute the Stress values obtained for nonmetric MDS solutions at increasing dimensionalities. b. The assumption is that each additional “meaningful” dimension will produce a substantial improvement in the consistency between input dissimilarities and scaled distances. c. Look for the elbow in the scree plot, and take the number of dimensions that corresponds to it. IX. Steps in a Nonmetric MDS Procedure A. Most computer routines for nonmetric MDS work the same way, although the details differ from one program to the next. 1. This can affect the results of the MDS in that the point configuration generated by one program will probably be slightly different from the point configuration produced by another program. Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 13 2. Note, however, that the differences are generally very small. They rarely affect the substantive conclusions that would be drawn from the analysis B. Steps in a “generic” nonmetric MDS routine. Step 1: Create a starting configuration of k stimulus points within m-dimensional space. 1. A “random start” simply creates coordinates for the k points using a random number generator. 2. A “rational start” uses a designated configuration (e.g., perhaps obtained by performing a metric MDS on the ordinal dissimilarities, or derived from prior substantive theory). Step 2: Calculate Stress (or some other fit measure) for the starting point configuration. Step 3: Calculate partial derivatives of Stress with respect to the km point coordinates, and use them to move the points, creating a new configuration. 1. Partial derivatives show the change in Stress that occurs when each point coordinate is changed by a minute amount. 2. Using the information provided by the partial derivatives, move the points to produce the maximum possible decrease in Stress. 3. If point movements would increase Stress, then do not move the points. Step 4: Calculate Stress for the new point configuration. 1. If Stress = 0 then a perfect solution has been achieved. Proceed to Step 5. 2. If Stress has not changed since the previous configuration, then the MDS solution is not improving with the point movements. Proceed to Step 5. 3. If Stress for the new point configuration is smaller than Stress for the previous point configuration, then the MDS solution is improving with the point movements. Go back to Step 3 and proceed with further point movements. Step 5: Terminate the MDS routine and print results. 1. Minimally, print the point coordinates and Stress value for the final point configuration. 2. Most MDS routines also provide an “iteration history” showing the Stress value for each iteration. 3. Optionally, most MDS routines will plot the point configuration and the Shepard diagram for the final scaling solution. Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 14 C. Interesting to note that the “formal” MDS routine described here and the informal, intuitive approach used earlier produce virtually identical results! Why use the formal approach if the informal strategy works just as well? 1. There is some subjectivity in the informal procedure which may affect final results (e.g., which pair of points should we move first?). 2. The computationally-intensive nature of the strategy makes the pencil-andpaper approach impractical for larger datasets (which are necessary in order to obtain a well-constrained MDS solution). 3. The partial derivatives summarize the full set of movements for each point coordinate, making them much more efficient than the step-by-step series of pairwise point movements employed in the informal approach. X. Interpretation of MDS Results A. Important to recognize that nonmetric MDS only determines the relative distances between the points in the scaling solution. The locations of the coordinate axes are completely arbitrary. 1. The final MDS configuration is usually rotated to a varimax orientation (i.e., it maximizes the variance of the point coordinates along each of the rectangular coordinate axes). 2. Point coordinate values are usually standardized along each axis— e.g., set to a mean of zero and a variance of 1.0 (or some other arbitrary value). 3. The axes are simply a device to “hang the points within the m-dimensional space. They have no intrinsic substantive importance or interpretation! B. A big advantage of MDS is the simplicity of the underlying geometric model; therefore simply “eyeballing” the results is often sufficient for interpretation. 1. Look for interesting “directions” within the space, which may correspond to the substantive dimensions underlying the judgments that produced the dissimilarities in the first place. 2. Look for distinct groups or clusters of points, which may reveal how the data source (presumably, a set of survey respondents) differentiates among the stimulus objects. 3. The subjectivity inherent in simple visual interpretation of MDS results makes it desirable to use more systematic (and, hopefully, “objective”) interpretation methods. Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 15 C. Embedding external information in a point configuration 1. The researcher often has prior hypotheses about the dimensions that differentiate the stimuli. If so, then it is useful to obtain external measures of these dimensions (i.e., obtained separately from the dissimilarities used to create the MDS solution). 2. It is easy to embed the external information in the scaling solution by simply regressing the external measure on the point coordinates. The estimated regression coefficients can be used to draw a new axis within the space, corresponding to the external measure. 3. This strategy is useful for determining whether the MDS results conform to prior expectations. D. Cluster analysis provides an objective strategy for identifying groups of stimulus points within the MDS solution; the analyst can then determine whether the clusters correspond to substantively-interesting groups of stimuli. 1. There are many varieties of cluster analysis. They all work by joining “similar” objects together into “clusters.” Hierarchical clustering methods are most common. 2. Begin by considering each stimulus as a separate cluster. Create a new cluster by joining together the two stimuli whose points are closest to each other within the m-dimensional space. Once joined, they are considered a single cluster (the separate stimuli are no longer distinguished from each other). The location of this cluster is some summary of the locations of the original two stimulus points (e.g., perhaps the mean of their coordinates along each axis in the space). 3. Proceed through k steps. On each step, join together the two most similar clusters to form a new cluster. Continue until all k objects are together in a single cluster. 4. A dendrogram (or tree diagram) traces out the process of joining clusters and is usually considered the main output from a cluster analysis. E. Be aware that most MDS solutions are amenable to several different substantive interpretations. 1. Objective strategies can be used to show that the scaling results are consistent with some particular interpretation of the space. 2. Objective methods can never be used to find the single “true” meaning of the MDS-produced point configuration. 3. While this uncertainty bothers some researchers, it is really no different from the general scientific approach of theory construction and revision through empirical hypothesis-testing. Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 16 XI. Data for MDS A. Distinguish between rectangular (or “multivariate”) data and square (or “proximities”) data. 1. The difference between the two does not involve the physical shape of the data matrix. 2. With rectangular data, the rows and columns of the data matrix are separate objects (usually, observations and variables, respectively— hence, “multivariate” data). 3. With square data, the rows and columns of the data matrix are the same objects. B. MDS usually used with square data 1. Entries in a square data matrix show the degree to which row and column objects “match” or “correspond to” each other. 2. Matching/correspondence information is reflected to create the dissimilarities that are employed as input to MDS. C. Even though we call the input data “dissimilarities”, MDS can actually handle any type of data that can be interpreted as a distance function. 1. Many kinds of information can be interpreted as distances. 2. This leads to one of the strong features of MDS— its ability to analyze many different kinds of data. D. Assume that D is some function that applies to pairs of objects, say a and b. D is a distance function if the following four properties hold for all possible pairs of objects in a given set of objects: 1. D(a, b) ≥ 0 (Non-negativity) 2. D(a, a) = 0 (Identity) 3. D(a, b) = D(b, a) (Symmetry) 4. D(a, b) + D(a, c) ≥ D(b, c) (Triangle inequality) E. Many different types of data can be interpreted as distance functions. 1. Direct dissimilarity judgments. a. Could have respondents sort pairs of stimuli according to their perceived similarity. b. Could have respondents rate similarity of stimulus pairs on some predefined scale, and take mean similarity ratings. Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 17 2. Various profile dissimilarity measures. a. Profile dissimilarities measure how different stimuli are across a “profile” of characteristics. b. Common example in survey research is “sum of squared differences”. Assume that n survey respondents rate objects a and b. If the ratings are called Ra and Rb , then the dissimilarity of the two stimuli could be measured by the following: δab = n X (Ria − Rib )2 i=1 3. Measures of temporal stability (from panel data) a. Assume a variable with k categories, measured at two time points. b. The proportion of respondents who move from one category to the other over the time interval can be interpreted as the similarity of the two categories. 4. Theory-based measures of spatial separation. a. For example, the number of characteristics shared by a pair of respondents (or by two subsets of respondents). b. The line-of-sight dissimilarity measure developed by Rabinowitz uses the triangle inequality to convert rating scale responses toward k stimuli into a matrix of dissimilarities among those stimuli. 5. Correlation coefficients a. In fact, the correlation coefficient is usually problematic as a dissimilarity measure, because it measures the angular separation of two variable vectors, rather than the distance between two points. b. With certain assumptions, correlations can be converted into dissimilarities in various ways (e.g., define δij = 1 − rij ). XII. Potential Problems That Might Occur in an MDS Analysis A. Too few stimuli in a nonmetric MDS 1. With nonmetric MDS, a fairly large number of stimuli is necessary so that the pairwise dissimilarities impose enough restrictions to guarantee that the point locations are tightly constrained within the m-dimensional space. 2. Too few stimuli result in unconstrained scaling solutions and, potentially, meaningless results. Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 18 3. General guidelines for the number of stimuli in nonmetric MDS: a. Never use less than eight stimuli b. The number of stimuli should always be at least four times the number of dimensions (and, preferably, more than that). B. In nonmetric MDS, local minima. 1. The Stress function temporarily may stop decreasing with point movements. In such cases, the MDS software might mistakenly conclude that the “best” solution has been achieved. 2. A local minimum is signalled by a meaningless point configuration accompanied by a high Stress value. 3. The problem can usually be fixed by forcing the program to continue iterating. 4. Local minima are not very common with rational starting configurations (which are used by most modern MDS software). C. Degenerate solutions in nonmetric MDS 1. If the k stimuli in an MDS analysis can be divided into two (or more) completely disjoint subsets, then the scaling solution often causes the stimulus points within each subset to converge to a single position. As a result, it is impossible to differentiate between the objects within each subset in the scaling solution. 2. If there are enough stimuli, it might be better to perform MDS separately on each of the subsets. 3. Even one stimulus that exists “in between” the disjoint subsets can be used to overcome the degeneracy. D. MDS terminology is not standardized, so it can be very confusing. 1. For example, the type of metric MDS shown here is sometimes called “classical scaling” and distinguished from other metric MDS procedures which assume interval- level dissimilarities data. 2. WMDS is sometimes called “individual differences scaling,” or “INDSCAL.” 3. Multidimensional scaling, itself, is sometimes called “SSA” or “Smallest Space Analysis” (after a slightly different algorithm and a series of computer programs written by Louis Guttman and James Lingoes). 4. The best advice is to read the program documentation very carefully! Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 19 XIII. Brief Comparison to Factor Analysis A. Speaking very generally, objectives of factor analysis (FA) are similar to those of MDS 1. Both procedures analyze a matrix of proximity data. a. In MDS, proximities are dissimilarities among stimuli. b. In FA, proximities are correlations between variables. 2. Both procedures represent objects in multidimensional space. a. MDS locates points representing stimuli b. FA locates vectors representing variables. B. MDS and FA differ in the type of model they produce to to represent the input proximities. 1. MDS represents proximities (dissimilarities) as distances between stimulus points. 2. FA represents proximities (correlations) as angles between vectors representing the variables. C. Speaking very informally, MDS and FA are used to model somewhat different kinds of phenomena. 1. FA is used to explain why the values of variables tend to go up and down together. 2. MDS is used to explain why objects “match” or “coincide with” each other. D. Some practical differences between MDS and FA 1. The proximity model of MDS is simpler (and, therefore, usually easier to understand and convey) than the scalar products model of FA. 2. The assumptions underlying an MDS analysis are usually less stringent than those required for FA. 3. FA is really only applicable to correlations, while MDS can be used with many types of dissimilarities. 4. When applied to the same data, MDS often produces solutions in lower dimensionality than factor analysis. Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 20 XIV. Weighted Multidimensional Scaling (A Brief Introduction) A. WMDS is useful when there are several dissimilarities matrices (of the same stimuli), obtained from several different data sources (e.g., different subjects in an experiment, different subsets of survey respondents, surveys taken at different time points, etc.) 1. The most common WMDS model assumes that all data sources use the same set of dimensions to evaluate the stimuli. 2. However, the salience or importance of each dimension can vary from one data source to the next. B. Stimuli are located in space, with weighted Euclidean distances used to represent the data sources’ dissimilarities among the stimulus objects. 1. For data source s, the dissimilarity between stimuli i and j (represented by δijs ) is modeled as follows: δijs = m X 0.5 2 wsp (xip − xjp ) 2 p=1 2. For data source s, the coordinate of stimulus i along dimension p is wsp xip . C. The weights “stretch” or “shrink” the dimensions of the MDS space differently for each data source. 1. The more salient a dimension for a data source, the more that dimension is stretched, relative to the other dimensions, and vice versa for less salient dimensions. 2. Because the stretching/shrinking differs across data sources, it distorts the space into a different form for each data source. D. Routines for estimating the parameters of the WMDS model generally employ an “alternating least squares” procedure that successively estimates the dimension weights and the point coordinates. The first version of this procedure was developed by J. D. Carroll and J. J. Chang E. Output from a WMDS routine. 1. Iteration history and fit measure for the final scaling solution. 2. Point coordinates (and, optionally a graph) showing the stimulus locations in a space with all dimensions weighted equally. 3. Dimension weights. a. These are usually plotted as vectors, relative to the dimensions of the MDS solution (i.e. the same dimensions used to plot the stimulus points). b. Each vector summarizes the dimension weights for a single data source. Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 21 c. The smaller the angle between a vector and a dimension, the greater the salience (i.e., the weight) of that dimension for that data source. F. Note that the axes in the WMDS space should be substantively meaningful (unlike CMDS) and they cannot be rotated without degrading the fit of the WMDS model to the dissimilarities data. XV. Multidimensional Unfolding (A Brief Introduction) A. The unfolding model is useful in situations where the researcher has information about respondents’ preferences with respect to the stimuli used in the MDS analysis. 1. Each respondent is assumed to have a position of maximum preference with respect to the dimensions in the MDS solution. This is usually called that respondent’s “ideal point” because a stimulus located at that position would be preferred over all other stimuli. 2. The ideal point is located such that the distances to the stimulus points correspond to preferences. The greater the respondent’s preference for a stimulus, the smaller the distance from the ideal point to that stimulus point, and vice versa. B. Objective of the unfolding analysis is to estimate the locations of each ideal point, using that respondent’s expressed preferences for the stimuli. 1. This problem is sometimes called the “external unfolding model” because the stimulus point locations are already known (from the MDS) and we only seek to locate the ideal points with respect to the stimuli. 2. An “internal unfolding analysis” seeks to locate both ideal points and stimulus points, simultaneously. Most MDS software can carry out internal unfolding analyses, but the results are generally problematic. Therefore, internal unfolding analyses are not recommended and they are not considered here. C. In the most straightforward version of the multidimensional unfolding model, respondent s’s preferences for stimulus i are expressed as a function of the stimulus points and s’s ideal point, as follows: P refsi = αs − m X βs (xip − ysp )2 + si p=1 1. In the preceding equation, P refsi is respondent s’s measured preference (e.g., a rating scale response) for stimulus i. 2. Just as before, xip is the coordinate for the point representing stimulus i along the pth axis in the MDS solution. Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 22 3. ysp is the coordinate for s’s ideal point along the same axis. 4. αs and βs are coefficients specific to respondent s, while is a random error term. D. P refsi and the xip ’s are all observed or previously-estimated quantities. The ysp ’s and the coefficients are all constants that can be estimated, using a simple regression model. E. A separate regression is estimated for each respondent. A procedure developed by J. D. Carroll is used to manipulate the OLS coefficients in order to recover estimates of the ideal point coordinates. The R2 from the regression shows how well the respondent’s preferences for the stimuli can be explained by the interpoint distances (i.e., from the ideal point to the stimulus points). F. What does the external unfolding analysis provide? 1. Minimally, a descriptive, graphical, summary of respondent preferences. 2. Potentially, useful measurement information. a. Recall that one substantive objective of MDS is to determine the evaluative dimensions that people bring to bear on a set of stimulus objects. b. The unfolded ideal point estimates effectively measure the respondents with respect to the same evaluative dimensions. XVI. Software for Multidimensional Scaling A. SPSS 1. SPSS has two MDS routines, ALSCAL and PROXSCAL 2. Both procedures are very flexible. They can perform many different varieties of MDS, and they can be used to construct dissimilarity matrices from multivariate data. B. SAS 1. PROC MDS is very flexible and can perform many varieties of MDS (generally modeled after ALSCAL, although the estimation algorithm is a bit different). 2. PROC MDS does not plot results; instead, it produces a dataset composed of point coordinates, which can be passed along to graphing software (in SAS or other packages). Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 23 C. SYSTAT 1. SYSTAT has a very flexible and easy-to-use MDS routine. It can be used to perform most varieties of MDS. 2. SYSTAT also has an excellent graphics system, which integrates well with the output from its MDS routine. 3. Finally, SYSTAT claims to have made important advances in the estimation of the internal unfolding model. D. STATA 1. Beginning with STATA Version 9.0, there are mds and mdsmat procedures. a. The mds procedure assumes multivariate data that must be converted to dissimilarities prior to the analysis. Note that mds carries out the conversion. b. The mdsmat procedure assumes that the data are already contained in a dissimilarities matrix. 2. Beginning with STATA Version 10.0, nonmetric multidimensional scaling is available; earlier versions of STATA provided only metric MDS. 3. However, mds and mdsmat are well-integrated with STATA’s overall system of model estimation and post-estimation commands. E. The R Statistical Computing Environment 1. The Base R installation only includes the function cmdscale, which performs metric MDS. 2. The MASS package includes functions isoMDS, sammon, and Shepard, all of which perform nonmetric CMDS. 3. The smacof package (available in R 2.7.0 and later) provides functions for estimating metric and nonmetric CMDS, WMDS, and unfolding models, using a unified analytic approach. 4. While beginners sometimes find R a bit difficult, it is well worth learning. The functions available in R implement many state-of-the-art statistical procedures, and the graphics are better than those available in any other software package. 5. A big advantage of R is that it is open-source software— in other words, it’s free! Information about downloading R is available at the following web site: http://www.r-project.org/ Multidimensional Scaling 2012 WIM Workshop, Indiana University Page 24 F. ggvis 1. ggvis is a module within the ggobi software package. ggobi is a program for visualizing high-dimensional data. It is freely available on the web, at: http://ggobi.org ggobi (and ggvis) can be integrated with the R statistical computing environment, via the rggobi package. 2. ggvis is an interactive MDS program in that the user can manipulate directly the scaled configuration of points. This can be useful for evaluating the robustness of an MDS solution. 3. The data structures required to use ggvis are a bit unusual, compared to other MDS software.