Nonlinear Mapping: Approaches Based On

advertisement
NONLINEAR MAPPING:
APPROACHES BASED ON
OPTIMIZING AN INDEX OF
CONTINUITY AND APPLYING
CLASSICAL METRIC MDS TO
REVISED DISTANCES
By Ulas Akkucuk
& J. Douglas Carroll
Rutgers Business School – Newark
and New Brunswick
1
Outline
• Introduction
• Nonlinear Mapping Algorithms
• Parametric Mapping Approach
• ISOMAP Approach
• Other Approaches
• Experimental Design and Methods
• Error Levels
• Evaluation of Mapping Performance
• Problem of Similarity Transformations
• Results
• Discussion and Future Direction
2
Introduction
• Problem: To determine a smaller set of
variables necessary to account for a larger
number of observed variables
• PCA and MDS are useful when relationship
is linear
• Alternative approaches needed when the
relationship is highly nonlinear
3
• Shepard and Carroll (1966)
– Locally monotone analysis of proximities:
Nonmetric MDS treating large distances as
missing
• Worked well if the nonlinearities were not too
severe (in particular if the surface is not closed such
as a circle or sphere)
– Optimization of an index of “continuity” or
“smoothness”
• Incorporated into a computer program called
“PARAMAP” and tested on various sets of data
4
• 20 points on a circle
5
• 62 regularly spaced points on a sphere, and
the azimuthal equidistant projection of the
world
6
• 49 points regularly spaced on a torus
embedded in four dimensions
7
• In all cases the local structure is preserved
except points at which the shape is “cut
open” or “punctured”
• Results were successful, but severe local
minimum problem existed
• Addition of error to the regular spacing
made the local minimum problem worse
• Current work is stimulated by two articles
on nonlinear mapping (Tenenbaum, de
Silva, & Langford, 2000; Roweis & Saul,
2000)
8
Nonlinear Mapping Algorithms
– n : number of objects
– M : dimensionality of the input coordinates, in
other words of the configuration for which we
would like to find an underlying lower
dimensional embedding.
– R : dimensionality of the space of recovered
configuration, where R<M
– Y : n  M input matrix
– X : n  R output matrix
9
– The distances between point i and point j in the
input and output spaces respectively are
calculated as:
M
   ( y im  y jm ) ,
2
 i, j
d ij2   ( xir  x jr ) 2 ,
 i, j
2
ij
m 1
R
r 1
  [ ij ]
D  [ dij ]
10
Parametric Mapping Approach
• Works via optimizing an index of
“continuity” or “smoothness”
• Early application in the context of timeseries data (von Neuman, Kent, Bellison, &
Hart, 1941; von Neuman, 1941)

1 n1
2
 yi1  yi 


2
n  1 i 1
S
2
1 n
2
( yi  y )

n i 1
11
• A more general expression for the
numerator is:
 y i
1

 

n  1 i 1  xi
n 1
2



2
• Generalizing to the multidimensional case
we reach 
  
i j

2
ij
d
4
ij

1 
 2 
d ij 
 i  j
2
12
• Several modifications needed for the
minimization procedure:
– d2ij + Ce2 is substituted for d2ij , C is a constant
equal to 2 / (n - 1) and e takes on values
between 0 and 1
– e has a practical effect on accelerating the
numerical process
– Can be thought of as an extra “specific”
dimension, as e gets closer to 0 points are made
to approach “common” part of space
13
– In the numerator the constant z, and in the
denominator [2/n(n1)]2
• Final form of function:
  z
i j

2
ij
d ij4
 2
1 


2
 n(n  1) i  j d ij 
2

1
1 
2
with z  nn  1  2 
4
 i  j  ij 
14
• Implemented in C++ (GNU-GCC compiler)
• Program takes as input e, number of
repetitions, dimensionality R to be
recovered, and number of random starts or
starting input configuration
• 200 iterations each for 100 different random
configurations yields reasonable solutions
• Then this resulting best solution can be
further fine tuned by performing more
iterations
15
ISOMAP Approach
• Tries to overcome difficulties in MDS by
replacing the Euclidean metric by a new
metric
• Figure (Lee, Landasse, & Verleysen, 2002)
16
• To approximate the “geodesic” distances
ISOMAP constructs a neighborhood graph
that connects the closer points
– This is done by connecting the k closest
neighbors or points that are close to each other
by  or less distance
• A shortest path procedure is then applied to
the resulting matrix of modified distances
• Finally classical metric MDS is applied to
obtain the configuration in the lower
dimensionality
17
Other Approaches
• Nonmetric MDS: Minimizes a cost function
• Needed to implement locally monotone
MDS approach of Shepard (Shepard &
Carroll, 1966)
STRESS 1 

i j
d ij  dˆij
2
d
 ij
i j

2
or STRESS 2 

i j
d ij  dˆij

2
2
(
d

d
)
 ij
i j
18
• Sammon’s mapping: Minimizes a mapping
error function
• Kruskal (1971) indicated certain options
used with nonmetric MDS programs would
give the same results
E
1

i j

ij i  j
[ ij  d ij ]
2
 ij
19
• Multidimensional scaling by iterative
majorization (Webb, 1995)
• Curvilinear Distance Analysis (CDA) (Lee
et al., 2002), analogue of ISOMAP, omits
the MDS step replacing it by a minimization
step
• Self organizing map (SOM) (Kohonen
1990, 1995)
• Auto associative feedforward neural
networks (AFN) (Baldi & Hornik, 1989;
Kramer, 1991)
20
Experimental Design and
Methods
• Primary focus: 62 located at the intersection
of 5 equally spaced parallels and 12 equally
spaced meridians
• Two types of error A and B
– A: 0%, 10%, 20%
– B: ±0.00, ±0.01, ±0.05, ±0.10, ±0.20
• Control points being irregularly spaced and
being inside or outside the sphere
21
respectively
22
• To evaluate mapping performance:We
calculate “rate of agreement in local
structure”abbreviated “agreement rate” or A
– Similar to RAND index used to compare
partitions (Rand, 1971; Hubert & Arabie, 1985)
– Let ai stand for the number of points that are in
the k-nearest neighbor list for point i in both X
and Y. A will be equal to
n
a
i 1
i
kn
23
Example of calculating agreement rate
1
2
3
2
1
4
3
1
4
4
2
3
5
1
2
1
2
3
4
5
4
5
3
4
2
4
1
5
3
4
k=2, Agreement rate = 2/10 or 20 %
24
• Problem of similarity transformations: We
use standard software to rotate the different
solutions into optimal congruence with a
landmark solution (Rohlf & Slice 1989)
• We use the solution for the error free and
regularly spaced sphere as the landmark
• We report also VAF
R
VAF  1 
n
2
ˆ
(
x

x
)
 ir ir
r 1 i 1
R n
2
(
x
)
 ir
r 1 i 1
25
• The VAF results may not be very good
• Similarity transformation step is not enough
• An alternating algorithm is needed that
reorders the points on each of the five
parallels and then finds the optimal
similarity transformation
• We also provide Shepard-like diagrams
26
(b) 0%is
Type
A Error
/ 0.01 Type B Error
Why similarity transformation
not
enough?
VAF = 0.47
(a) 0% Type A Error / 0.00 Type B Error
0.20





0.00

-0.10

48



36

37
47



34

22

44
45
33

21

10 9
11  8

12
7
1

25

13
6


5
2 

3 4
14

24


29



26
27


-0.10
15
0.00




0.00

35

20
36

19


25 


17
24

16
0.10


18
0.20
-0.20
-0.20
-0.10

16
15
14

29

17


3
4 5 6

0.00
31
30
18

19

2 
7
1 
13
8

9
12
11 10
23

42
41
28



40


46

27
26
37




45  33 34
5958
60
57

61

44
62 56
50

55
32

54  43
51
52 53

39

-0.10

38
31
32

49 
0.10
30
47
48



-0.20
-0.20

23

28

46
35

0.20

40
51
41 
50 5253
61

54 
62
42
60

55


59
58 57 56  43
49 
0.10

38


39


20
21
22
0.10
0.20
27
Results
• Agreement rate for the regularly spaced and
errorless sphere 82.9%, k=5
• Over 1000 randomizations of the solution:
Average, and standard deviation of the
agreement rate 8.1% and 1.9% respectively.
Minimum and maximum are 3.5% and
16.7%
28
0.25
0.2
0.15
0.1
0.05
0
-0.25 -0.2 -0.15 -0.1 -0.05 0
-0.05
0.05 0.1
0.15 0.2
0.25
-0.1
-0.15
-0.2
-0.25
29
• We can use Chebychev’s inequality stated
as:
1
P( Z    k )  2
k
• 82.9 is about 40 standard deviations away
from the mean, an upper bound of the
probability that this event happens by
chance is 1/402 or 0.000625, very low!
30
0.20


(a)



47


37


24

33
22


44
45

23

28
(b)

34
35
36

46


0.00


48


0.20

40
51
41 
50 5253
61

54 
62
42
60

55


59
 
58 57 56  43
49 
0.10

38


39

21

109
11
8

12
7
1 
25  13
6


5
2 

3 4
14







26
27

15


0.20

(c)

44
45



19

35
-0.10
-0.20
-0.20


26
37

25
24
-0.10



14
15

29
0.00
0.00
26

19

20
21

39 



-0.10

20



25
0.00


14
30
19

9 8 7
10
6
23

11  1  5


12
4
24  13  3
2

-0.10
21
22


-0.20
-0.20
40

49 
31
0.20

50 5152  41

53
61
37



62
54
60

42
48  59   55
 
56
58
57
36 

47
43


46 45 44

35


32 31

33
34



0.10
18

38

0.10

17
4 5 6
23


30
31
30

24
32
22 Type B Error
(d) 0% Type A Error / 0.10
-0.10
0.00
0.10
0.20
VAF = 0.63
(d)


34
33

2 
7
1 
13
8

9
12
11 10
-0.20
-0.20

27
29

3

42

16

25 
17



15
14


45 
42
41
28

132 3  16
12


11
4
34
22 10 1  5  17


6
9 

8 7
21


33
18


20
19

32
23

37
18
16








40
27
26

57

61

44
62 56
50

55

54  43
51
52 53

43

36

-0.10
5655 54

57

41
53

46 58 62
52

59


40
  
60 61 5051
47

39  28



48 49 38

0.00
36
46
59
58
60 
39



0.10
35


0.20




0.00
20


38
31
(c) 0% Type A Error / 0.05 Type B Error
-0.10
0.00
0.10
0.20
VAF = 0.42
-0.20
-0.20



-0.10
49 
0.10
30
47
48
29
32


18


29
17


16
28
15

0.10
27
0.20
31
VAF=0.23
VAF = 0.61
0.20
0.20

(e)

0.10


0.00
-0.10
42 41

 
48
61
4938
 
5453 52 40
55


56
51
44

62

57
32

50 39

45  58
59
6061 38


27
33 46
49



47
48

28
26

37



36
34 35


25 14


24
15

23    13
2

12

3

11

16
4

22  10 1
5
 9 
17
8 76

21


29

18
20


19
30









44
33 32

45



-0.10
24


(h) 10% Type26A Error
VAF = 0.62
-0.10
0.00
-0.20
-0.20
(h)
0.10

0.20


36 35 

34


37


-0.20
-0.20
-0.10

17
29
5
B Error
0.20
51
 52

58
 56

55


43

44
32
33



25

10
8
20

26



7


11
12


41
42


30
31
18
19
6
5

17 29

40
1
4
 
213
3



6
0.10
22
21
23 9
24
-0.10
18
19





7
45



31
5453
57

46
47
0.00
40
30

16
14  28
/ 0.05
27 Type

59


20
60 62
0.10


50 
3861
4948

58
5655


39

 
 
41

11
 1
4
12


2133

15
25


42
43

8
10
37
31
8
0.00





22 34  46  59


43

921

62  5254

47
60

53

23 35
51

42
50



61  39  41  30
36   48
4938

31


40
24 37

29


28

26
16 
18  20

27
17
19



5

14
6

15

7
25

41

3

213

12

11

10


51
54
53
 52



44
32
33 
22
921
39
55


-0.10
58
 56
45
34

23


57




35
0.20
0.10
-0.20
-0.20
36



57

4746
0.00
0.20
-0.10

62
59



(g) 10% Type A Error / 0.01
Type
B Error
43
VAF = 0.33
-0.10
0.00
0.10
0.20
(g)
50

60

0.10
-0.20
-0.20
0.00

(f)

15

27 14
0.00


28 16
0.10
0.20
32
VAF = 0.49
VAF = 0.27
0.20

(i)
0.20
41
(j)


53
51 54
42
52



39 
55 43
62  56

60  58

50

59 
57

45
  61
48
44
 38  
46

49
47 
32
34 

33

37  36 35
   21
22

23
9
24

0.10
0.00


26
27
14

-0.10



40

25
 


1211
213
3
28 15
16
8


41

0.10
 

(k)


48
36






37

26
27
14
15






-0.10

25

2


-0.20
-0.20
-0.10
0.20
55



57 43


34
-0.20
-0.20
5
17
16

4

29


7
86

18
19
31

0.10


0.00
11
0.10
31
29 18
17 19

5
7

6
41
0.10


23
-0.20
-0.20

58


47
4860

38

25



26 49
37


52

 
51
61
50

14


45
56
62
0.20
20
57
44

53
43
54
42
30
29

55

41

40
39

28

27
16

17



3 15

24
23


11
2

 
9
12 1 13
22
21

8

10
35
-0.10
0.20

 
3
12 213
11
4633
59




36
33
22
10
21 9

30

34
 


40

26
 
27
2816

14

15
25

41
0.00



0.00
24 37



13
12
24

-0.10
 
20



45
32
44


39

41
42 30

54


42
(l) 20% Type 8A Error / 0.01 Type B Error
VAF = 0.42
(l)

3
1
61
48
4938


17
29
43
62
52
 54 
60  51
53
50

-0.10
6 19
18
58
56 55
59



 

56
51 5253

40
39 
28

36


59
58



49 
50
38
35
0.20

46
47


46
60



23
30
31
0.10
61
62
0.10
0.00
4735


0.00

5
0.00
0.20
10


22
34
921

20
7

-0.10



(k) 20% Type A Error / 0.00 Type B Error
VAF = 0.37
-0.20
-0.20
44

33 32  45

57


10
20


32
31
19

-0.10
0.00


5
46

0.10
18
7
0.20
33
VAF = 0.37
VAF = 0.34
0.20
0.20




(m)

 
34
4633


 
21
22
-0.10



9
8


25
41
40

39



28
0.00
40 41

28 39
12 
2

10 
27 16

17

3 15

153


25

5
4 6

18

23
0.20

(o)


27
39
 40
0.10




49 5061

14 
26

15



38
51

0.00

10
 
4
5
6

32
7


18
0.10
31
19
0.20
56
62


45
58
 44

5720
59


24
35
33
46 34

36

23

11

21
9 22
8

 12
0.00


32


7
10
1
2 13
31




-0.10

22
9 21
8
53
43
52

25

13
11



37
-0.10
-0.20
-0.20

60
48 47



12
1

54
55
3

   57
45
44 20

2930
 42

41


56
62
58
17

16 28


2
19
(o) 20% 31
Type A Error / 0.20 Type B Error

7
-0.20
VAF = 0.36
-0.20
-0.10
0.10
43
53
52
24

0.20
0.00


13 
0.00
55

-0.10



38

51

61

50  60






59 33
16
49
27

48 47
46 34

26

17

36  35
14 
37


24

-0.10
14

1
-0.20
-0.20
49
26
37

23
11
5138
61
50




 


0.10
29
54


56 

5862 52 55
60
48

36
29
30
42
54


47
35

53
43







45
59


0.00
(n)
20
57
44

0.10

 
 42
30
32
4

0.10
18
19
6
5
0.20
34
35
A=48.1 %
ISOMAP
(a) 0% Type A Error / 0.00 Type B Error
0.20


0.10



0.00

-0.10

48
PARAMAP
36

37
47




22

44
45
34
23
33

21

10 9
11  8

12
7
1 
25  13
6


5
2 

3 4
14

24
28

46
35


40


29
30
A=82.9%
31
32
20

19



26
27


-0.20
-0.20


51
41 
50 5253
61

54 
62
42
60

55


59
58 57 56  43
49 





39
38
-0.10
15
0.00


18
17
16
36
0.10
0.20
(b)
1.4
1.2
1.2
original distances
1.4
1
0.8
0.6
0.4
0.2
1
0.8
0.6
0.4
0.2
0
0
0
0.02
0.04
0.06
0.08
0.1
0.12
0
0.02
0.04
recovered distances
0.06
0.08
0.1
0.12
recovered distances
(c)
1.4
1.2
original distances
original distances
(a)
1
0.8
0.6
0.4
0.2
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
recovered distances
Shepard-like Diagrams
37
SWISS Roll
Data – 130
points
Agreement rate=ISOMAP 59.7%, PARAMAP 70.5%
38
Discussion and Future Direction
• Disadvantage of PARAMAP: Run time
• Advantage of ISOMAP: Noniterative
procedure, can be applied to very large data
sets with ease
• Disadvantage of ISOMAP: Bad
performance in closed data sets like the
sphere
39
• Improvements in computational efficiency
of PARAMAP should be explored:
– Use of a conjugate gradient algorithm instead
of straight gradient algorithm
– Use of conjugate gradient with restarts
algorithm
– Possible combination of straight gradient and
conjugate gradient approaches
• Improvements that could both benefit
ISOMAP and PARAMAP:
– A wise selection of landmarks and an
interpolation or extrapolation scheme to recover
the rest of the data
40
Download