CS 182 Sections 101 - 102 Eva Mok (emok@icsi.berkeley.edu) Feb 11, 2004 bad puns alert! (http://www2.hi.net/s4/strangebreed.htm) Announcements • a3 part 1 is due tonight (submit as a3-1) • The second tester file is up, so pls. start part 2. • The quiz is graded (get it after class). Where we stand • Last Week – Backprop • This Week – Recruitment learning – color • Coming up – Imagining techniques (e.g. fMRI) The Big (and complicated) Picture Psycholinguistics Experiments Spatial Relation Motor Control Metaphor Grammar Cognition and Language abstraction Computation Chang Model Bailey Model Structured Connectionism Neural Net Regier & Learning Triangle Nodes Narayanan Model Model SHRUTI Computational Neurobiology Visual System Neural Development Quiz Biology Midterm Finals Quiz 1. What is a localist representation? What is a distributed representation? Why are they both bad? 2. What is coarse-fine encoding? Where is it used in our brain? 3. What can Back-Propagation do that Hebb’s Rule can’t? 4. Derive the Back-Propagation Algorithm 5. What (intuitively) does the learning rate do? How about the momentum term? Distributed vs Localist Rep’n John 1 1 0 0 John 1 0 0 0 Paul 0 1 1 0 Paul 0 1 0 0 George 0 0 1 1 George 0 0 1 0 Ringo 1 0 0 1 Ringo 0 0 0 1 What are the drawbacks of each representation? Distributed vs Localist Rep’n John 1 1 0 0 John 1 0 0 0 Paul 0 1 1 0 Paul 0 1 0 0 George 0 0 1 1 George 0 0 1 0 Ringo 1 0 0 1 Ringo 0 0 0 1 • What happens if you want to represent a group? • What happens if one neuron dies? • How many persons can you represent with n bits? 2^n • How many persons can you represent with n bits? n Visual System • 1000 x 1000 visual map • For each location, encode: –orientation … –direction of motion –speed –size –color –depth • Blows up combinatorically! … Coarse Coding info you can encode with one fine resolution unit = info you can encode with a few coarse resolution units Now as long as we need fewer coarse units total, we’re good Coarse-Fine Coding Coarse in F2, Fine in F1 Feature 1 e.g. Orientation Y-Orientation Y G X-Orientation G X Y-Dir X-Dir Coarse in F1, Fine in F2 but we can run into ghost “images” Feature 2 e.g. Direction of Motion Back-Propagation Algorithm yj wij xi f xi = ∑j wij yj yi ti:target yi = f(xi) Sigmoid: 1 y i f ( xi ) 1 e xi We define the error term for a single node to be ti - yi Gradient Descent i2 i1 global mimimum: this is your goal it should be 4-D (3 weights) but you get the idea The output layer learning rate wjk k wij j yi ti: target i E = Error = ½ ∑i (ti – yi)2 The derivative of the sigmoid is just E Wij E Wij Wij Wij Wij E E yi xi ti yi f ' ( xi ) y j Wij yi xi Wij yi 1 yi Wij ti yi yi 1 yi y j Wij y j i i ti yi yi 1 yi The hidden layer wjk wij yi ti: target W jk E W jk E E y j x j W jk y j x j W jk k j i E = Error = ½ ∑i (ti – yi)2 E E yi xi (ti yi ) f ' ( xi ) Wij y j i yi xi y j i E (ti yi ) f ' ( xi ) Wij f ' ( x j ) yk W jk i W jk (ti yi ) yi 1 yi Wij y j 1 y j yk i W jk yk j j (ti yi ) yi 1 yi Wij y j 1 y j i j Wij i y j 1 y j i Let’s just do an example 0 i 1 0 i 2 b=1 w01 0.8 w02 0.6 w0b 1/(1+e^-0.5) 0.5 x0 f y0 0.6224 i1 i2 y0 0 0 0 0 1 1 1 0 1 1 1 1 E = Error = ½ ∑i (ti – yi)2 E = ½ (t0 – y0)2 0.5 0.4268 E = ½ (0 – 0.6224)2 = 0.1937 Wij y j i i ti yi yi 1 yi W01 y1 0 i1 0 0 0 t0 y0 y0 1 y0 0 W02 y2 0 i2 0 W0b yb 0 b 0 0.1463 learning rate 0 0 0.6224 0.62241 0.6224 0 0.1463 suppose = 0.5 W0b 0.5 0.1463 0.0731