NONLINEAR MAPPING: APPROACHES BASED ON OPTIMIZING AN INDEX OF CONTINUITY AND APPLYING CLASSICAL METRIC MDS TO REVISED DISTANCES By Ulas Akkucuk & J. Douglas Carroll Rutgers Business School – Newark and New Brunswick 1 Outline • Introduction • Nonlinear Mapping Algorithms • Parametric Mapping Approach • ISOMAP Approach • Other Approaches • Experimental Design and Methods • Error Levels • Evaluation of Mapping Performance • Problem of Similarity Transformations • Results • Discussion and Future Direction 2 Introduction • Problem: To determine a smaller set of variables necessary to account for a larger number of observed variables • PCA and MDS are useful when relationship is linear • Alternative approaches needed when the relationship is highly nonlinear 3 • Shepard and Carroll (1966) – Locally monotone analysis of proximities: Nonmetric MDS treating large distances as missing • Worked well if the nonlinearities were not too severe (in particular if the surface is not closed such as a circle or sphere) – Optimization of an index of “continuity” or “smoothness” • Incorporated into a computer program called “PARAMAP” and tested on various sets of data 4 • 20 points on a circle 5 • 62 regularly spaced points on a sphere, and the azimuthal equidistant projection of the world 6 • 49 points regularly spaced on a torus embedded in four dimensions 7 • In all cases the local structure is preserved except points at which the shape is “cut open” or “punctured” • Results were successful, but severe local minimum problem existed • Addition of error to the regular spacing made the local minimum problem worse • Current work is stimulated by two articles on nonlinear mapping (Tenenbaum, de Silva, & Langford, 2000; Roweis & Saul, 2000) 8 Nonlinear Mapping Algorithms – n : number of objects – M : dimensionality of the input coordinates, in other words of the configuration for which we would like to find an underlying lower dimensional embedding. – R : dimensionality of the space of recovered configuration, where R<M – Y : n M input matrix – X : n R output matrix 9 – The distances between point i and point j in the input and output spaces respectively are calculated as: M ( y im y jm ) , 2 i, j d ij2 ( xir x jr ) 2 , i, j 2 ij m 1 R r 1 [ ij ] D [ dij ] 10 Parametric Mapping Approach • Works via optimizing an index of “continuity” or “smoothness” • Early application in the context of timeseries data (von Neuman, Kent, Bellison, & Hart, 1941; von Neuman, 1941) 1 n1 2 yi1 yi 2 n 1 i 1 S 2 1 n 2 ( yi y ) n i 1 11 • A more general expression for the numerator is: y i 1 n 1 i 1 xi n 1 2 2 • Generalizing to the multidimensional case we reach i j 2 ij d 4 ij 1 2 d ij i j 2 12 • Several modifications needed for the minimization procedure: – d2ij + Ce2 is substituted for d2ij , C is a constant equal to 2 / (n - 1) and e takes on values between 0 and 1 – e has a practical effect on accelerating the numerical process – Can be thought of as an extra “specific” dimension, as e gets closer to 0 points are made to approach “common” part of space 13 – In the numerator the constant z, and in the denominator [2/n(n1)]2 • Final form of function: z i j 2 ij d ij4 2 1 2 n(n 1) i j d ij 2 1 1 2 with z nn 1 2 4 i j ij 14 • Implemented in C++ (GNU-GCC compiler) • Program takes as input e, number of repetitions, dimensionality R to be recovered, and number of random starts or starting input configuration • 200 iterations each for 100 different random configurations yields reasonable solutions • Then this resulting best solution can be further fine tuned by performing more iterations 15 ISOMAP Approach • Tries to overcome difficulties in MDS by replacing the Euclidean metric by a new metric • Figure (Lee, Landasse, & Verleysen, 2002) 16 • To approximate the “geodesic” distances ISOMAP constructs a neighborhood graph that connects the closer points – This is done by connecting the k closest neighbors or points that are close to each other by or less distance • A shortest path procedure is then applied to the resulting matrix of modified distances • Finally classical metric MDS is applied to obtain the configuration in the lower dimensionality 17 Other Approaches • Nonmetric MDS: Minimizes a cost function • Needed to implement locally monotone MDS approach of Shepard (Shepard & Carroll, 1966) STRESS 1 i j d ij dˆij 2 d ij i j 2 or STRESS 2 i j d ij dˆij 2 2 ( d d ) ij i j 18 • Sammon’s mapping: Minimizes a mapping error function • Kruskal (1971) indicated certain options used with nonmetric MDS programs would give the same results E 1 i j ij i j [ ij d ij ] 2 ij 19 • Multidimensional scaling by iterative majorization (Webb, 1995) • Curvilinear Distance Analysis (CDA) (Lee et al., 2002), analogue of ISOMAP, omits the MDS step replacing it by a minimization step • Self organizing map (SOM) (Kohonen 1990, 1995) • Auto associative feedforward neural networks (AFN) (Baldi & Hornik, 1989; Kramer, 1991) 20 Experimental Design and Methods • Primary focus: 62 located at the intersection of 5 equally spaced parallels and 12 equally spaced meridians • Two types of error A and B – A: 0%, 10%, 20% – B: ±0.00, ±0.01, ±0.05, ±0.10, ±0.20 • Control points being irregularly spaced and being inside or outside the sphere 21 respectively 22 • To evaluate mapping performance:We calculate “rate of agreement in local structure”abbreviated “agreement rate” or A – Similar to RAND index used to compare partitions (Rand, 1971; Hubert & Arabie, 1985) – Let ai stand for the number of points that are in the k-nearest neighbor list for point i in both X and Y. A will be equal to n a i 1 i kn 23 Example of calculating agreement rate 1 2 3 2 1 4 3 1 4 4 2 3 5 1 2 1 2 3 4 5 4 5 3 4 2 4 1 5 3 4 k=2, Agreement rate = 2/10 or 20 % 24 • Problem of similarity transformations: We use standard software to rotate the different solutions into optimal congruence with a landmark solution (Rohlf & Slice 1989) • We use the solution for the error free and regularly spaced sphere as the landmark • We report also VAF R VAF 1 n 2 ˆ ( x x ) ir ir r 1 i 1 R n 2 ( x ) ir r 1 i 1 25 • The VAF results may not be very good • Similarity transformation step is not enough • An alternating algorithm is needed that reorders the points on each of the five parallels and then finds the optimal similarity transformation • We also provide Shepard-like diagrams 26 (b) 0%is Type A Error / 0.01 Type B Error Why similarity transformation not enough? VAF = 0.47 (a) 0% Type A Error / 0.00 Type B Error 0.20 0.00 -0.10 48 36 37 47 34 22 44 45 33 21 10 9 11 8 12 7 1 25 13 6 5 2 3 4 14 24 29 26 27 -0.10 15 0.00 0.00 35 20 36 19 25 17 24 16 0.10 18 0.20 -0.20 -0.20 -0.10 16 15 14 29 17 3 4 5 6 0.00 31 30 18 19 2 7 1 13 8 9 12 11 10 23 42 41 28 40 46 27 26 37 45 33 34 5958 60 57 61 44 62 56 50 55 32 54 43 51 52 53 39 -0.10 38 31 32 49 0.10 30 47 48 -0.20 -0.20 23 28 46 35 0.20 40 51 41 50 5253 61 54 62 42 60 55 59 58 57 56 43 49 0.10 38 39 20 21 22 0.10 0.20 27 Results • Agreement rate for the regularly spaced and errorless sphere 82.9%, k=5 • Over 1000 randomizations of the solution: Average, and standard deviation of the agreement rate 8.1% and 1.9% respectively. Minimum and maximum are 3.5% and 16.7% 28 0.25 0.2 0.15 0.1 0.05 0 -0.25 -0.2 -0.15 -0.1 -0.05 0 -0.05 0.05 0.1 0.15 0.2 0.25 -0.1 -0.15 -0.2 -0.25 29 • We can use Chebychev’s inequality stated as: 1 P( Z k ) 2 k • 82.9 is about 40 standard deviations away from the mean, an upper bound of the probability that this event happens by chance is 1/402 or 0.000625, very low! 30 0.20 (a) 47 37 24 33 22 44 45 23 28 (b) 34 35 36 46 0.00 48 0.20 40 51 41 50 5253 61 54 62 42 60 55 59 58 57 56 43 49 0.10 38 39 21 109 11 8 12 7 1 25 13 6 5 2 3 4 14 26 27 15 0.20 (c) 44 45 19 35 -0.10 -0.20 -0.20 26 37 25 24 -0.10 14 15 29 0.00 0.00 26 19 20 21 39 -0.10 20 25 0.00 14 30 19 9 8 7 10 6 23 11 1 5 12 4 24 13 3 2 -0.10 21 22 -0.20 -0.20 40 49 31 0.20 50 5152 41 53 61 37 62 54 60 42 48 59 55 56 58 57 36 47 43 46 45 44 35 32 31 33 34 0.10 18 38 0.10 17 4 5 6 23 30 31 30 24 32 22 Type B Error (d) 0% Type A Error / 0.10 -0.10 0.00 0.10 0.20 VAF = 0.63 (d) 34 33 2 7 1 13 8 9 12 11 10 -0.20 -0.20 27 29 3 42 16 25 17 15 14 45 42 41 28 132 3 16 12 11 4 34 22 10 1 5 17 6 9 8 7 21 33 18 20 19 32 23 37 18 16 40 27 26 57 61 44 62 56 50 55 54 43 51 52 53 43 36 -0.10 5655 54 57 41 53 46 58 62 52 59 40 60 61 5051 47 39 28 48 49 38 0.00 36 46 59 58 60 39 0.10 35 0.20 0.00 20 38 31 (c) 0% Type A Error / 0.05 Type B Error -0.10 0.00 0.10 0.20 VAF = 0.42 -0.20 -0.20 -0.10 49 0.10 30 47 48 29 32 18 29 17 16 28 15 0.10 27 0.20 31 VAF=0.23 VAF = 0.61 0.20 0.20 (e) 0.10 0.00 -0.10 42 41 48 61 4938 5453 52 40 55 56 51 44 62 57 32 50 39 45 58 59 6061 38 27 33 46 49 47 48 28 26 37 36 34 35 25 14 24 15 23 13 2 12 3 11 16 4 22 10 1 5 9 17 8 76 21 29 18 20 19 30 44 33 32 45 -0.10 24 (h) 10% Type26A Error VAF = 0.62 -0.10 0.00 -0.20 -0.20 (h) 0.10 0.20 36 35 34 37 -0.20 -0.20 -0.10 17 29 5 B Error 0.20 51 52 58 56 55 43 44 32 33 25 10 8 20 26 7 11 12 41 42 30 31 18 19 6 5 17 29 40 1 4 213 3 6 0.10 22 21 23 9 24 -0.10 18 19 7 45 31 5453 57 46 47 0.00 40 30 16 14 28 / 0.05 27 Type 59 20 60 62 0.10 50 3861 4948 58 5655 39 41 11 1 4 12 2133 15 25 42 43 8 10 37 31 8 0.00 22 34 46 59 43 921 62 5254 47 60 53 23 35 51 42 50 61 39 41 30 36 48 4938 31 40 24 37 29 28 26 16 18 20 27 17 19 5 14 6 15 7 25 41 3 213 12 11 10 51 54 53 52 44 32 33 22 921 39 55 -0.10 58 56 45 34 23 57 35 0.20 0.10 -0.20 -0.20 36 57 4746 0.00 0.20 -0.10 62 59 (g) 10% Type A Error / 0.01 Type B Error 43 VAF = 0.33 -0.10 0.00 0.10 0.20 (g) 50 60 0.10 -0.20 -0.20 0.00 (f) 15 27 14 0.00 28 16 0.10 0.20 32 VAF = 0.49 VAF = 0.27 0.20 (i) 0.20 41 (j) 53 51 54 42 52 39 55 43 62 56 60 58 50 59 57 45 61 48 44 38 46 49 47 32 34 33 37 36 35 21 22 23 9 24 0.10 0.00 26 27 14 -0.10 40 25 1211 213 3 28 15 16 8 41 0.10 (k) 48 36 37 26 27 14 15 -0.10 25 2 -0.20 -0.20 -0.10 0.20 55 57 43 34 -0.20 -0.20 5 17 16 4 29 7 86 18 19 31 0.10 0.00 11 0.10 31 29 18 17 19 5 7 6 41 0.10 23 -0.20 -0.20 58 47 4860 38 25 26 49 37 52 51 61 50 14 45 56 62 0.20 20 57 44 53 43 54 42 30 29 55 41 40 39 28 27 16 17 3 15 24 23 11 2 9 12 1 13 22 21 8 10 35 -0.10 0.20 3 12 213 11 4633 59 36 33 22 10 21 9 30 34 40 26 27 2816 14 15 25 41 0.00 0.00 24 37 13 12 24 -0.10 20 45 32 44 39 41 42 30 54 42 (l) 20% Type 8A Error / 0.01 Type B Error VAF = 0.42 (l) 3 1 61 48 4938 17 29 43 62 52 54 60 51 53 50 -0.10 6 19 18 58 56 55 59 56 51 5253 40 39 28 36 59 58 49 50 38 35 0.20 46 47 46 60 23 30 31 0.10 61 62 0.10 0.00 4735 0.00 5 0.00 0.20 10 22 34 921 20 7 -0.10 (k) 20% Type A Error / 0.00 Type B Error VAF = 0.37 -0.20 -0.20 44 33 32 45 57 10 20 32 31 19 -0.10 0.00 5 46 0.10 18 7 0.20 33 VAF = 0.37 VAF = 0.34 0.20 0.20 (m) 34 4633 21 22 -0.10 9 8 25 41 40 39 28 0.00 40 41 28 39 12 2 10 27 16 17 3 15 153 25 5 4 6 18 23 0.20 (o) 27 39 40 0.10 49 5061 14 26 15 38 51 0.00 10 4 5 6 32 7 18 0.10 31 19 0.20 56 62 45 58 44 5720 59 24 35 33 46 34 36 23 11 21 9 22 8 12 0.00 32 7 10 1 2 13 31 -0.10 22 9 21 8 53 43 52 25 13 11 37 -0.10 -0.20 -0.20 60 48 47 12 1 54 55 3 57 45 44 20 2930 42 41 56 62 58 17 16 28 2 19 (o) 20% 31 Type A Error / 0.20 Type B Error 7 -0.20 VAF = 0.36 -0.20 -0.10 0.10 43 53 52 24 0.20 0.00 13 0.00 55 -0.10 38 51 61 50 60 59 33 16 49 27 48 47 46 34 26 17 36 35 14 37 24 -0.10 14 1 -0.20 -0.20 49 26 37 23 11 5138 61 50 0.10 29 54 56 5862 52 55 60 48 36 29 30 42 54 47 35 53 43 45 59 0.00 (n) 20 57 44 0.10 42 30 32 4 0.10 18 19 6 5 0.20 34 35 A=48.1 % ISOMAP (a) 0% Type A Error / 0.00 Type B Error 0.20 0.10 0.00 -0.10 48 PARAMAP 36 37 47 22 44 45 34 23 33 21 10 9 11 8 12 7 1 25 13 6 5 2 3 4 14 24 28 46 35 40 29 30 A=82.9% 31 32 20 19 26 27 -0.20 -0.20 51 41 50 5253 61 54 62 42 60 55 59 58 57 56 43 49 39 38 -0.10 15 0.00 18 17 16 36 0.10 0.20 (b) 1.4 1.2 1.2 original distances 1.4 1 0.8 0.6 0.4 0.2 1 0.8 0.6 0.4 0.2 0 0 0 0.02 0.04 0.06 0.08 0.1 0.12 0 0.02 0.04 recovered distances 0.06 0.08 0.1 0.12 recovered distances (c) 1.4 1.2 original distances original distances (a) 1 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 recovered distances Shepard-like Diagrams 37 SWISS Roll Data – 130 points Agreement rate=ISOMAP 59.7%, PARAMAP 70.5% 38 Discussion and Future Direction • Disadvantage of PARAMAP: Run time • Advantage of ISOMAP: Noniterative procedure, can be applied to very large data sets with ease • Disadvantage of ISOMAP: Bad performance in closed data sets like the sphere 39 • Improvements in computational efficiency of PARAMAP should be explored: – Use of a conjugate gradient algorithm instead of straight gradient algorithm – Use of conjugate gradient with restarts algorithm – Possible combination of straight gradient and conjugate gradient approaches • Improvements that could both benefit ISOMAP and PARAMAP: – A wise selection of landmarks and an interpolation or extrapolation scheme to recover the rest of the data 40