Uploaded by liuxiaoyi1229

linear algebra for machine learning

advertisement
Session #1: Introduction
Linear Algebra
for
Machine Learning
UCSD Course
Part 2
© Bilyana Aleksic 2016 UCSD Extension Online Learning Course:
Neuron
๐‘ค1
๐‘ฅ1
∑
๐‘ฅ2
๐‘ค2
๐‘ฆ = ๐œŽ(๐‘ง)
๐‘ง = เท ๐‘ฅ๐‘– ๐‘ค๐‘– + ๐‘๐‘–
๐‘–
© Bilyana Aleksic 2015 UCSD Extension Online Learning Course: Name of Course Goes Here
Perceptron learning algorithm
• Statistical pattern recognition
system
• Sum of feature activities times
learned weights is greater than a
threshold.
How standard pattern recognition
works?
• Convert raw data into vector of
features
• Learn how to weight each feature
• Make a decision that input vector
is positive example of the target
class
© Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here
Introduce non-linearity
Linear Neuron model of perceptron is limited in what it can do:
We had to introduce step-up function to model non-linearity or decision
making process
© Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here
Weight Space separated
by hyperplanes
Linear neurons with threshold units where each test case defines a
line that separate the weights space
We can now limit the size of the space for the “good” weights
But how do we train a multi layer neural network?
Can not use perceptron learning algorithm because we don’t know
the output values for the hidden units
© Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here
What binary threshold
neurons can not do?
• Can not tell if two single bit features are the same like solve an XOR
circuit where input output pairs give inequalities that are impossible
to satisfy
© Instructor Name 2015 UCSD Extension Online Learning Course: Name of
Course Goes Here
Hidden units
• How do we train a multi layer network?
Gradient descent
algorithm
© Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course
Goes Here
Backpropagation
Goal is to optimize weights so that neural network can learn how
to map inputs into outputs by minimizing the prediction error
w1
w2
i
1
w5
h
1
w6
o
1
i
2
w4
b
1
h
2
w
8
2
o
2
b
2
=
๐๐‘ฌ ๐๐’๐’–๐’•๐’Š
๐๐’๐’–๐’•๐’Š ๐๐’˜๐’Š
Chain Rule:
๐๐‘ฌ
๐๐’˜๐’Š
w7
w3
1
๐๐‘ฌ
๐๐’˜๐’Š
๐๐‘ฌ
๐๐’๐’–๐’•๐’Š ๐๐’๐’†๐’•๐’Š
๐’Š ๐๐’๐’†๐’•๐’Š ๐๐’˜๐’Š
=๐๐’๐’–๐’•
๐‘›๐‘’๐‘ก๐‘œ1 is signal “z” from slide 2
“Introduce non-linearity”
linear output of neuron
๐‘›๐‘’๐‘ก๐‘œ1 =๐‘ค5 โ„Ž1 + ๐‘ค6 โ„Ž2 + ๐‘2 ∗ 1
© Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course
Goes Here
Forward Propagation
Start with random weights, apply input and calculate each layer
output
๐‘›๐‘’๐‘ก๐‘œ1 =๐‘ค5 โ„Ž1 + ๐‘ค6 โ„Ž2 + ๐‘2 ∗ 1
๐๐’๐’†๐’•๐’Š
is calculated using linear algebra
๐๐’˜๐’Š
w1 h
0.20 w2
1
0.40
0.15
i
1
0.25
i
2
o 0.1
0.45 w6
1
w3
0.50
h
0.30 w4
2
1
b
1
w5
0.55
w8
2
w7
o
2
๐‘œ๐‘ข๐‘ก๐‘œ1 = ๐‘” (๐‘›๐‘’๐‘ก01 )
0.9 ๐‘”() is nonlinear function,
chosen for specific
9
b
2
application; calculated
using calculus
๐œ•๐ธ
= − ๐‘ก๐‘Ž๐‘Ÿ๐‘”๐‘’๐‘ก − ๐‘œ๐‘ข๐‘ก๐‘๐‘ข๐‘ก ;
๐œ•๐‘œ๐‘ข๐‘ก
๐‘ ๐‘™๐‘–๐‘‘๐‘’ 7
© Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here
Backwards Pass
Propagate Error backwards
w1
w2
i
1
h
1
w6
o
1
๐‘ฌ๐ŸŽ๐Ÿ
o
2
๐‘ฌ๐ŸŽ๐Ÿ
๐‘ฌ = ๐‘ฌ๐ŸŽ๐Ÿ + ๐‘ฌ๐ŸŽ๐Ÿ
w7
w3
i
2
w4
1
w5
b
1
h
2
w
8
2
b
2
Calculate new values for the weights
๐’˜๐’๐’†๐’˜
๐Ÿ = ๐’˜๐Ÿ
© Instructor Name 2015 UCSD Extension Online Learning Course: Name of
Course Goes Here
−
๐๐‘ฌ
๐๐’˜
Encoder problem
• How many layers and how many hidden units?
© Instructor Name 2015 UCSD Extension Online Learning Course: Name of
Course Goes Here
Solving XOR
• Solution with 2 hidden layers
© Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course
Goes Here
Download