CS 182 Sections 101 - 102 bad puns alert! Eva Mok ()

advertisement
CS 182
Sections 101 - 102
Eva Mok (emok@icsi.berkeley.edu)
Feb 11, 2004
bad puns alert!
(http://www2.hi.net/s4/strangebreed.htm)
Announcements
• a3 part 1 is due tonight (submit as a3-1)
• The second tester file is up, so pls. start part
2.
• The quiz is graded (get it after class).
Where we stand
• Last Week
– Backprop
• This Week
– Recruitment learning
– color
• Coming up
– Imagining techniques (e.g. fMRI)
The Big (and complicated) Picture
Psycholinguistics
Experiments
Spatial
Relation
Motor
Control
Metaphor Grammar
Cognition and Language
abstraction
Computation
Chang
Model
Bailey
Model
Structured
Connectionism
Neural Net
Regier
& Learning
Triangle Nodes
Narayanan
Model
Model
SHRUTI
Computational Neurobiology
Visual System
Neural
Development
Quiz
Biology
Midterm
Finals
Quiz
1. What is a localist representation? What is a
distributed representation? Why are they both bad?
2. What is coarse-fine encoding? Where is it used in our
brain?
3. What can Back-Propagation do that Hebb’s Rule
can’t?
4. Derive the Back-Propagation Algorithm
5. What (intuitively) does the learning rate do? How
about the momentum term?
Distributed vs Localist Rep’n
John
1
1
0
0
John
1
0
0
0
Paul
0
1
1
0
Paul
0
1
0
0
George
0
0
1
1
George
0
0
1
0
Ringo
1
0
0
1
Ringo
0
0
0
1
What are the drawbacks of each representation?
Distributed vs Localist Rep’n
John
1
1
0
0
John
1
0
0
0
Paul
0
1
1
0
Paul
0
1
0
0
George
0
0
1
1
George
0
0
1
0
Ringo
1
0
0
1
Ringo
0
0
0
1
• What happens if you want to
represent a group?
• What happens if one neuron
dies?
• How many persons can you
represent with n bits? 2^n
• How many persons can you
represent with n bits? n
Visual System
• 1000 x 1000 visual map
• For each location, encode:
–orientation
…
–direction of motion
–speed
–size
–color
–depth
• Blows up combinatorically!
…
Coarse Coding
info you can encode with one fine resolution unit =
info you can encode with a few coarse resolution units
Now as long as we need fewer coarse units total, we’re good
Coarse-Fine Coding
Coarse in F2,
Fine in F1
Feature 1
e.g.
Orientation
Y-Orientation
Y
G
X-Orientation
G
X
Y-Dir
X-Dir
Coarse in F1,
Fine in F2
but we can run
into ghost
“images”
Feature 2
e.g. Direction of Motion
Back-Propagation Algorithm
yj
wij
xi
f
xi = ∑j wij yj
yi
ti:target
yi = f(xi)
Sigmoid:
1
y i  f ( xi ) 
1  e  xi
We define the error term for a single node to be ti - yi
Gradient Descent
i2
i1
global mimimum:
this is your goal
it should be 4-D (3 weights) but you get the idea
The output layer
learning rate
wjk
k
wij
j
yi
ti: target
i
E = Error = ½ ∑i (ti – yi)2
The derivative of the sigmoid is just
E
Wij
E
Wij   
Wij
Wij  Wij   
E E yi xi



 ti  yi   f ' ( xi )  y j
Wij yi xi Wij
yi 1  yi 
Wij    ti  yi   yi 1  yi   y j
Wij     y j   i
 i  ti  yi  yi 1  yi 
The hidden layer
wjk
wij
yi
ti: target
W jk   
E
W jk
E
E y j x j



W jk y j x j W jk
k
j
i
E = Error = ½ ∑i (ti – yi)2
E
E yi xi



   (ti  yi )  f ' ( xi ) Wij
y j
i yi xi y j
i
E


    (ti  yi )  f ' ( xi ) Wij   f ' ( x j )  yk
W jk  i



W jk       (ti  yi )  yi 1  yi  Wij   y j 1  y j  yk
 i

W jk     yk   j


 j    (ti  yi )  yi 1  yi Wij   y j 1  y j 
 i



 j   Wij   i   y j 1  y j 
 i

Let’s just do an example
0 i
1
0 i
2
b=1
w01
0.8
w02 0.6
w0b
1/(1+e^-0.5)
0.5
x0
f
y0 0.6224
i1
i2
y0
0
0
0
0
1
1
1
0
1
1
1
1
E = Error = ½ ∑i (ti – yi)2
E = ½ (t0 – y0)2
0.5
0.4268
E = ½ (0 – 0.6224)2 = 0.1937
Wij     y j   i
 i  ti  yi  yi 1  yi 
W01     y1   0    i1 0 0
 0  t0  y0  y0 1  y0 
0
W02     y2   0    i2   0
W0b     yb   0    b   0
   0.1463
learning rate
 0  0  0.6224 0.62241  0.6224
 0  0.1463
suppose  = 0.5
W0b  0.5  0.1463  0.0731
Download