Introduction to Neural Networks John Paxton Montana State University Summer 2003 Chapter 3: Pattern Association • Aristotle’s observed that human memory associates – similar items – contrary items – items close in proximity – items close in succession (a song) Terminology and Issues • • • • • Autoassociative Networks Heteroassociative Networks Feedforward Networks Recurrent Networks How many patterns can be stored? Hebb Rule for Pattern Association • Architecture x1 w11 xn y1 ym wnm Algorithm 1. 2. 3. 4. 5. set wij = 0 1 <= i <= n, 1 <= j <= m for each training pair s:t xi = si yj = tj wij(new) = wij(old) + xiyj Example • • • • • • • • s1 = (1 -1 -1), s2 = (-1 1 1) t1 = (1 -1), t2 = (-1 1) w11 = 1*1 + (-1)(-1) = 2 w12 = 1*(-1) + (-1)1 = -2 w21 = (-1)1+ 1(-1) = -2 w22 = (-1)(-1) + 1(1) = 2 w31 = (-1)1 + 1(-1) = -2 w32 = (-1)(-1) + 1*1 = 2 Matrix Alternative • s1 = (1 -1 -1), s2 = (-1 1 1) • t1 = (1 -1), t2 = (-1 1) 1 -1 -1 1 -1 1 1 -1 -1 1 = 2 -2 -2 2 -2 2 Final Network • f(yin) = 1 if yin > 0, 0 if yin = 0, else -1 2 x1 -2 y1 -2 x2 2 y2 -2 x3 2 Properties • Weights exist if input vectors are linearly independent • Orthogonal vectors can be learned perfectly • High weights imply strong correlations Exercises • What happens if (-1 -1 -1) is tested? This vector has one mistake. • What happens if (0 -1 -1) is tested? This vector has one piece of missing data. • Show an example of training data that is not learnable. Show the learned network. Delta Rule for Pattern Association • Works when patterns are linearly independent but not orthogonal • Introduced in the 1960s for ADALINE • Produces a least squares solution Activation Functions • Delta Rule (1) wij(new) = wij(old) + a(tj – yj)*xi*1 • Extended Delta Rule (f’(yin.j)) wij(new) = wij(old) + a(tj – yj)*xi*f’(yin.j) Heteroassociative Memory Net • Application: Associate characters. A <-> a B <-> b Autoassociative Net • Architecture x1 w11 xn y1 yn wnn Training Algorithm • Assuming that the training vectors are orthogonal, we can use the Hebb rule algorithm mentioned earlier. • Application: Find out whether an input vector is familiar or unfamiliar. For example, voice input as part of a security system. Autoassociate Example 1 1 1 111 = 1 1 1 1 1 1 1 1 1 = 0 1 1 1 0 1 1 1 0 Evaluation • • • • • • What happens if (1 1 1) is presented? What happens if (0 1 1) is presented? What happens if (0 0 1) is presented? What happens if (-1 1 1) is presented? What happens if (-1 -1 1) is presented? Why are the diagonals set to 0? Storage Capacity • 2 vectors (1 1 1), (-1 -1 -1) • Recall is perfect 1 -1 1 -1 1 -1 1 1 1 -1 -1 -1 = 0 2 2 2 0 2 2 2 0 Storage Capacity • 3 vectors: (1 1 1), (-1 -1 -1), (1 -1 1) • Recall is no longer perfect 1 -1 1 1 -1 -1 1 -1 1 1 1 1 -1 -1 -1 = 1 -1 1 0 1 3 1 0 1 3 1 0 Theorem • Up to n-1 bipolar vectors of n dimensions can be stored in an autoassociative net. Iterative Autoassociative Net • 1 vector: s = (1 1 -1) • st * s = 0 1 -1 1 0 -1 -1 -1 0 • (1 0 0) -> (0 1 -1) • (0 1 -1) -> (2 1 -1) -> (1 1 -1) • (1 1 -1) -> (2 2 -2) -> (1 1 -1) Testing Procedure 1. 2. 3. 4. 5. 6. initialize weights using Hebb learning for each test vector do set xi = si calculate ti set si = ti go to step 4 if the s vector is new Exercises • • • • • 1 piece of missing data: (0 1 -1) 2 pieces of missing data: (0 0 -1) 3 pieces of missing data: (0 0 0) 1 mistake: (-1 1 -1) 2 mistakes: (-1 -1 -1) Discrete Hopfield Net • content addressable problems • pattern association problems • constrained optimization problems • wij = wji • wii = 0 Characteristics • Only 1 unit updates its activation at a time • Each unit continues to receive the external signal • An energy (Lyapunov) function can be found that allows the net to converge, unlike the previous system • Autoassociative Architecture x2 y2 y1 x1 y3 x3 Algorithm 1. 2. 3. 4. 5. 6. 7. initialize weights using Hebb rule for each input vector do yi = xi do steps 5-6 randomly for each yi yin.i = xi + Syjwji calculate f(yin.i) go to step 2 if the net hasn’t converged Example • training vector: (1 -1) y1 y2 -1 x1 x2 Example • input (0 -1) update y1 = 0 + (-1)(-1) = 1 update y2 = -1 + 1(-1) = -2 -> -1 • input (1 -1) update y2 = -1 + 1(-1) = -2 -> -1 update y1 = 1 + -1(-1) = 2 -> 1 Hopfield Theorems • Convergence is guaranteed. • The number of storable patterns is approximately n / (2 * log n) where n is the dimension of a vector Bidirectional Associative Memory (BAM) • Heteroassociative Recurrent Net • Kosko, 1988 • Architecture x1 y1 xn ym Activation Function • f(yin) = 1, if yin > 0 • f(yin) = 0, if yin = 0 • f(yin) = -1 otherwise Algorithm 1. 2. 3. 4. 5. 6. 7. initialize weights using Hebb rule for each test vector do present s to x layer present t to y layer while equilibrium is not reached compute f(yin.j) compute f(xin.j) Example • s1 = (1 1), t1 = (1 -1) • s2 = (-1 -1), t2 = (-1 1) 1 -1 1 -1 1 -1 -1 1 2 -2 2 -2 Example • Architecture 2 x1 y1 -2 2 y2 x2 -2 present (1 1) to x -> 1 -1 present (1 -1) to y -> 1 1 Hamming Distance • Definition: Number of different corresponding bits in two vectors • For example, H[(1 -1), (1 1)] = 1 • Average Hamming Distance is ½. About BAMs • Observation: Encoding is better when the average Hamming distance of the inputs is similar to the average Hamming distance of the outputs. • The memory capacity of a BAM is min(n-1, m-1).