Nonlinear Mapping: Approaches Based On

NONLINEAR MAPPING: APPROACHES BASED ON OPTIMIZING AN INDEX OF CONTINUITY AND APPLYING CLASSICAL METRIC MDS TO REVISED DISTANCES By Ulas Akkucuk & J. Douglas Carroll Rutgers Business School – Newark and New Brunswick 1 Outline • Introduction • Nonlinear Mapping Algorithms • Parametric Mapping Approach • ISOMAP Approach • Other Approaches • Experimental Design and Methods • Error Levels • Evaluation of Mapping Performance • Problem of Similarity Transformations • Results • Discussion and Future Direction 2 Introduction • Problem: To determine a smaller set of variables necessary to account for a larger number of observed variables • PCA and MDS are useful when relationship is linear • Alternative approaches needed when the relationship is highly nonlinear 3 • Shepard and Carroll (1966) – Locally monotone analysis of proximities: Nonmetric MDS treating large distances as missing • Worked well if the nonlinearities were not too severe (in particular if the surface is not closed such as a circle or sphere) – Optimization of an index of “continuity” or “smoothness” • Incorporated into a computer program called “PARAMAP” and tested on various sets of data 4 • 20 points on a circle 5 • 62 regularly spaced points on a sphere, and the azimuthal equidistant projection of the world 6 • 49 points regularly spaced on a torus embedded in four dimensions 7 • In all cases the local structure is preserved except points at which the shape is “cut open” or “punctured” • Results were successful, but severe local minimum problem existed • Addition of error to the regular spacing made the local minimum problem worse • Current work is stimulated by two articles on nonlinear mapping (Tenenbaum, de Silva, & Langford, 2000; Roweis & Saul, 2000) 8 Nonlinear Mapping Algorithms – n : number of objects – M : dimensionality of the input coordinates, in other words of the configuration for which we would like to find an underlying lower dimensional embedding. – R : dimensionality of the space of recovered configuration, where R<M – Y : n  M input matrix – X : n  R output matrix 9 – The distances between point i and point j in the input and output spaces respectively are calculated as: M    ( y im  y jm ) , 2  i, j d ij2   ( xir  x jr ) 2 ,  i, j 2 ij m 1 R r 1   [ ij ] D  [ dij ] 10 Parametric Mapping Approach • Works via optimizing an index of “continuity” or “smoothness” • Early application in the context of timeseries data (von Neuman, Kent, Bellison, & Hart, 1941; von Neuman, 1941)  1 n1 2  yi1  yi    2 n  1 i 1 S 2 1 n 2 ( yi  y )  n i 1 11 • A more general expression for the numerator is:  y i 1     n  1 i 1  xi n 1 2    2 • Generalizing to the multidimensional case we reach     i j  2 ij d 4 ij  1   2  d ij   i  j 2 12 • Several modifications needed for the minimization procedure: – d2ij + Ce2 is substituted for d2ij , C is a constant equal to 2 / (n - 1) and e takes on values between 0 and 1 – e has a practical effect on accelerating the numerical process – Can be thought of as an extra “specific” dimension, as e gets closer to 0 points are made to approach “common” part of space 13 – In the numerator the constant z, and in the denominator [2/n(n1)]2 • Final form of function:   z i j  2 ij d ij4  2 1    2  n(n  1) i  j d ij  2  1 1  2 with z  nn  1  2  4  i  j  ij  14 • Implemented in C++ (GNU-GCC compiler) • Program takes as input e, number of repetitions, dimensionality R to be recovered, and number of random starts or starting input configuration • 200 iterations each for 100 different random configurations yields reasonable solutions • Then this resulting best solution can be further fine tuned by performing more iterations 15 ISOMAP Approach • Tries to overcome difficulties in MDS by replacing the Euclidean metric by a new metric • Figure (Lee, Landasse, & Verleysen, 2002) 16 • To approximate the “geodesic” distances ISOMAP constructs a neighborhood graph that connects the closer points – This is done by connecting the k closest neighbors or points that are close to each other by  or less distance • A shortest path procedure is then applied to the resulting matrix of modified distances • Finally classical metric MDS is applied to obtain the configuration in the lower dimensionality 17 Other Approaches • Nonmetric MDS: Minimizes a cost function • Needed to implement locally monotone MDS approach of Shepard (Shepard & Carroll, 1966) STRESS 1   i j d ij  dˆij 2 d  ij i j  2 or STRESS 2   i j d ij  dˆij  2 2 ( d  d )  ij i j 18 • Sammon’s mapping: Minimizes a mapping error function • Kruskal (1971) indicated certain options used with nonmetric MDS programs would give the same results E 1  i j  ij i  j [ ij  d ij ] 2  ij 19 • Multidimensional scaling by iterative majorization (Webb, 1995) • Curvilinear Distance Analysis (CDA) (Lee et al., 2002), analogue of ISOMAP, omits the MDS step replacing it by a minimization step • Self organizing map (SOM) (Kohonen 1990, 1995) • Auto associative feedforward neural networks (AFN) (Baldi & Hornik, 1989; Kramer, 1991) 20 Experimental Design and Methods • Primary focus: 62 located at the intersection of 5 equally spaced parallels and 12 equally spaced meridians • Two types of error A and B – A: 0%, 10%, 20% – B: ±0.00, ±0.01, ±0.05, ±0.10, ±0.20 • Control points being irregularly spaced and being inside or outside the sphere 21 respectively 22 • To evaluate mapping performance:We calculate “rate of agreement in local structure”abbreviated “agreement rate” or A – Similar to RAND index used to compare partitions (Rand, 1971; Hubert & Arabie, 1985) – Let ai stand for the number of points that are in the k-nearest neighbor list for point i in both X and Y. A will be equal to n a i 1 i kn 23 Example of calculating agreement rate 1 2 3 2 1 4 3 1 4 4 2 3 5 1 2 1 2 3 4 5 4 5 3 4 2 4 1 5 3 4 k=2, Agreement rate = 2/10 or 20 % 24 • Problem of similarity transformations: We use standard software to rotate the different solutions into optimal congruence with a landmark solution (Rohlf & Slice 1989) • We use the solution for the error free and regularly spaced sphere as the landmark • We report also VAF R VAF  1  n 2 ˆ ( x  x )  ir ir r 1 i 1 R n 2 ( x )  ir r 1 i 1 25 • The VAF results may not be very good • Similarity transformation step is not enough • An alternating algorithm is needed that reorders the points on each of the five parallels and then finds the optimal similarity transformation • We also provide Shepard-like diagrams 26 (b) 0%is Type A Error / 0.01 Type B Error Why similarity transformation not enough? VAF = 0.47 (a) 0% Type A Error / 0.00 Type B Error 0.20      0.00  -0.10  48    36  37 47    34  22  44 45 33  21  10 9 11  8  12 7 1  25  13 6   5 2   3 4 14  24   29    26 27   -0.10 15 0.00     0.00  35  20 36  19   25    17 24  16 0.10   18 0.20 -0.20 -0.20 -0.10  16 15 14  29  17   3 4 5 6  0.00 31 30 18  19  2  7 1  13 8  9 12 11 10 23  42 41 28    40   46  27 26 37     45  33 34 5958 60 57  61  44 62 56 50  55 32  54  43 51 52 53  39  -0.10  38 31 32  49  0.10 30 47 48    -0.20 -0.20  23  28  46 35  0.20  40 51 41  50 5253 61  54  62 42 60  55   59 58 57 56  43 49  0.10  38   39   20 21 22 0.10 0.20 27 Results • Agreement rate for the regularly spaced and errorless sphere 82.9%, k=5 • Over 1000 randomizations of the solution: Average, and standard deviation of the agreement rate 8.1% and 1.9% respectively. Minimum and maximum are 3.5% and 16.7% 28 0.25 0.2 0.15 0.1 0.05 0 -0.25 -0.2 -0.15 -0.1 -0.05 0 -0.05 0.05 0.1 0.15 0.2 0.25 -0.1 -0.15 -0.2 -0.25 29 • We can use Chebychev’s inequality stated as: 1 P( Z    k )  2 k • 82.9 is about 40 standard deviations away from the mean, an upper bound of the probability that this event happens by chance is 1/402 or 0.000625, very low! 30 0.20   (a)    47   37   24  33 22   44 45  23  28 (b)  34 35 36  46   0.00   48   0.20  40 51 41  50 5253 61  54  62 42 60  55   59   58 57 56  43 49  0.10  38   39  21  109 11 8  12 7 1  25  13 6   5 2   3 4 14        26 27  15   0.20  (c)  44 45    19  35 -0.10 -0.20 -0.20   26 37  25 24 -0.10    14 15  29 0.00 0.00 26  19  20 21  39     -0.10  20    25 0.00   14 30 19  9 8 7 10 6 23  11  1  5   12 4 24  13  3 2  -0.10 21 22   -0.20 -0.20 40  49  31 0.20  50 5152  41  53 61 37    62 54 60  42 48  59   55   56 58 57 36   47 43   46 45 44  35   32 31  33 34    0.10 18  38  0.10  17 4 5 6 23   30 31 30  24 32 22 Type B Error (d) 0% Type A Error / 0.10 -0.10 0.00 0.10 0.20 VAF = 0.63 (d)   34 33  2  7 1  13 8  9 12 11 10 -0.20 -0.20  27 29  3  42  16  25  17    15 14   45  42 41 28  132 3  16 12   11 4 34 22 10 1  5  17   6 9   8 7 21   33 18   20 19  32 23  37 18 16         40 27 26  57  61  44 62 56 50  55  54  43 51 52 53  43  36  -0.10 5655 54  57  41 53  46 58 62 52  59   40    60 61 5051 47  39  28    48 49 38  0.00 36 46 59 58 60  39    0.10 35   0.20     0.00 20   38 31 (c) 0% Type A Error / 0.05 Type B Error -0.10 0.00 0.10 0.20 VAF = 0.42 -0.20 -0.20    -0.10 49  0.10 30 47 48 29 32   18   29 17   16 28 15  0.10 27 0.20 31 VAF=0.23 VAF = 0.61 0.20 0.20  (e)  0.10   0.00 -0.10 42 41    48 61 4938   5453 52 40 55   56 51 44  62  57 32  50 39  45  58 59 6061 38   27 33 46 49    47 48  28 26  37    36 34 35   25 14   24 15  23    13 2  12  3  11  16 4  22  10 1 5  9  17 8 76  21   29  18 20   19 30          44 33 32  45    -0.10 24   (h) 10% Type26A Error VAF = 0.62 -0.10 0.00 -0.20 -0.20 (h) 0.10  0.20   36 35   34   37   -0.20 -0.20 -0.10  17 29 5 B Error 0.20 51  52  58  56  55   43  44 32 33    25  10 8 20  26    7   11 12   41 42   30 31 18 19 6 5  17 29  40 1 4   213 3    6 0.10 22 21 23 9 24 -0.10 18 19      7 45    31 5453 57  46 47 0.00 40 30  16 14  28 / 0.05 27 Type  59   20 60 62 0.10   50  3861 4948  58 5655   39      41  11  1 4 12   2133  15 25   42 43  8 10 37 31 8 0.00      22 34  46  59   43  921  62  5254  47 60  53  23 35 51  42 50    61  39  41  30 36   48 4938  31   40 24 37  29   28  26 16  18  20  27 17 19    5  14 6  15  7 25  41  3  213  12  11  10   51 54 53  52    44 32 33  22 921 39 55   -0.10 58  56 45 34  23   57     35 0.20 0.10 -0.20 -0.20 36    57  4746 0.00 0.20 -0.10  62 59    (g) 10% Type A Error / 0.01 Type B Error 43 VAF = 0.33 -0.10 0.00 0.10 0.20 (g) 50  60  0.10 -0.20 -0.20 0.00  (f)  15  27 14 0.00   28 16 0.10 0.20 32 VAF = 0.49 VAF = 0.27 0.20  (i) 0.20 41 (j)   53 51 54 42 52    39  55 43 62  56  60  58  50  59  57  45   61 48 44  38   46  49 47  32 34   33  37  36 35    21 22  23 9 24  0.10 0.00   26 27 14  -0.10    40  25     1211 213 3 28 15 16 8   41  0.10    (k)   48 36       37  26 27 14 15       -0.10  25  2   -0.20 -0.20 -0.10 0.20 55    57 43   34 -0.20 -0.20 5 17 16  4  29   7 86  18 19 31  0.10   0.00 11 0.10 31 29 18 17 19  5 7  6 41 0.10   23 -0.20 -0.20  58   47 4860  38  25    26 49 37   52    51 61 50  14   45 56 62 0.20 20 57 44  53 43 54 42 30 29  55  41  40 39  28  27 16  17    3 15  24 23   11 2    9 12 1 13 22 21  8  10 35 -0.10 0.20    3 12 213 11 4633 59     36 33 22 10 21 9  30  34     40  26   27 2816  14  15 25  41 0.00    0.00 24 37    13 12 24  -0.10   20    45 32 44   39  41 42 30  54   42 (l) 20% Type 8A Error / 0.01 Type B Error VAF = 0.42 (l)  3 1 61 48 4938   17 29 43 62 52  54  60  51 53 50  -0.10 6 19 18 58 56 55 59       56 51 5253  40 39  28  36   59 58    49  50 38 35 0.20  46 47   46 60    23 30 31 0.10 61 62 0.10 0.00 4735   0.00  5 0.00 0.20 10   22 34 921  20 7  -0.10    (k) 20% Type A Error / 0.00 Type B Error VAF = 0.37 -0.20 -0.20 44  33 32  45  57   10 20   32 31 19  -0.10 0.00   5 46  0.10 18 7 0.20 33 VAF = 0.37 VAF = 0.34 0.20 0.20     (m)    34 4633     21 22 -0.10    9 8   25 41 40  39    28 0.00 40 41  28 39 12  2  10  27 16  17  3 15  153   25  5 4 6  18  23 0.20  (o)   27 39  40 0.10     49 5061  14  26  15    38 51  0.00  10   4 5 6  32 7   18 0.10 31 19 0.20 56 62   45 58  44  5720 59   24 35 33 46 34  36  23  11  21 9 22 8   12 0.00   32   7 10 1 2 13 31     -0.10  22 9 21 8 53 43 52  25  13 11    37 -0.10 -0.20 -0.20  60 48 47    12 1  54 55 3     57 45 44 20  2930  42  41   56 62 58 17  16 28   2 19 (o) 20% 31 Type A Error / 0.20 Type B Error  7 -0.20 VAF = 0.36 -0.20 -0.10 0.10 43 53 52 24  0.20 0.00   13  0.00 55  -0.10    38  51  61  50  60       59 33 16 49 27  48 47 46 34  26  17  36  35 14  37   24  -0.10 14  1 -0.20 -0.20 49 26 37  23 11 5138 61 50         0.10 29 54   56   5862 52 55 60 48  36 29 30 42 54   47 35  53 43        45 59   0.00 (n) 20 57 44  0.10     42 30 32 4  0.10 18 19 6 5 0.20 34 35 A=48.1 % ISOMAP (a) 0% Type A Error / 0.00 Type B Error 0.20   0.10    0.00  -0.10  48 PARAMAP 36  37 47     22  44 45 34 23 33  21  10 9 11  8  12 7 1  25  13 6   5 2   3 4 14  24 28  46 35   40   29 30 A=82.9% 31 32 20  19    26 27   -0.20 -0.20   51 41  50 5253 61  54  62 42 60  55   59 58 57 56  43 49       39 38 -0.10 15 0.00   18 17 16 36 0.10 0.20 (b) 1.4 1.2 1.2 original distances 1.4 1 0.8 0.6 0.4 0.2 1 0.8 0.6 0.4 0.2 0 0 0 0.02 0.04 0.06 0.08 0.1 0.12 0 0.02 0.04 recovered distances 0.06 0.08 0.1 0.12 recovered distances (c) 1.4 1.2 original distances original distances (a) 1 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 recovered distances Shepard-like Diagrams 37 SWISS Roll Data – 130 points Agreement rate=ISOMAP 59.7%, PARAMAP 70.5% 38 Discussion and Future Direction • Disadvantage of PARAMAP: Run time • Advantage of ISOMAP: Noniterative procedure, can be applied to very large data sets with ease • Disadvantage of ISOMAP: Bad performance in closed data sets like the sphere 39 • Improvements in computational efficiency of PARAMAP should be explored: – Use of a conjugate gradient algorithm instead of straight gradient algorithm – Use of conjugate gradient with restarts algorithm – Possible combination of straight gradient and conjugate gradient approaches • Improvements that could both benefit ISOMAP and PARAMAP: – A wise selection of landmarks and an interpolation or extrapolation scheme to recover the rest of the data 40

Nonlinear Mapping: Approaches Based On

Related documents

Products

Support

Nonlinear Mapping: Approaches Based On

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib