# linear algebra for machine learning

```Session #1: Introduction
Linear Algebra
for
Machine Learning
UCSD Course
Part 2
&copy; Bilyana Aleksic 2016 UCSD Extension Online Learning Course:
Neuron
๐ค1
๐ฅ1
∑
๐ฅ2
๐ค2
๐ฆ = ๐(๐ง)
๐ง = เท ๐ฅ๐ ๐ค๐ + ๐๐
๐
&copy; Bilyana Aleksic 2015 UCSD Extension Online Learning Course: Name of Course Goes Here
Perceptron learning algorithm
• Statistical pattern recognition
system
• Sum of feature activities times
learned weights is greater than a
threshold.
How standard pattern recognition
works?
• Convert raw data into vector of
features
• Learn how to weight each feature
• Make a decision that input vector
is positive example of the target
class
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here
Introduce non-linearity
Linear Neuron model of perceptron is limited in what it can do:
We had to introduce step-up function to model non-linearity or decision
making process
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here
Weight Space separated
by hyperplanes
Linear neurons with threshold units where each test case defines a
line that separate the weights space
We can now limit the size of the space for the “good” weights
But how do we train a multi layer neural network?
Can not use perceptron learning algorithm because we don’t know
the output values for the hidden units
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here
What binary threshold
neurons can not do?
• Can not tell if two single bit features are the same like solve an XOR
circuit where input output pairs give inequalities that are impossible
to satisfy
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of
Course Goes Here
Hidden units
• How do we train a multi layer network?
algorithm
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course
Goes Here
Backpropagation
Goal is to optimize weights so that neural network can learn how
to map inputs into outputs by minimizing the prediction error
w1
w2
i
1
w5
h
1
w6
o
1
i
2
w4
b
1
h
2
w
8
2
o
2
b
2
=
๐๐ฌ ๐๐๐๐๐
๐๐๐๐๐ ๐๐๐
Chain Rule:
๐๐ฌ
๐๐๐
w7
w3
1
๐๐ฌ
๐๐๐
๐๐ฌ
๐๐๐๐๐ ๐๐๐๐๐
๐ ๐๐๐๐๐ ๐๐๐
=๐๐๐๐
๐๐๐ก๐1 is signal “z” from slide 2
“Introduce non-linearity”
linear output of neuron
๐๐๐ก๐1 =๐ค5 โ1 + ๐ค6 โ2 + ๐2 ∗ 1
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course
Goes Here
Forward Propagation
output
๐๐๐ก๐1 =๐ค5 โ1 + ๐ค6 โ2 + ๐2 ∗ 1
๐๐๐๐๐
is calculated using linear algebra
๐๐๐
w1 h
0.20 w2
1
0.40
0.15
i
1
0.25
i
2
o 0.1
0.45 w6
1
w3
0.50
h
0.30 w4
2
1
b
1
w5
0.55
w8
2
w7
o
2
๐๐ข๐ก๐1 = ๐ (๐๐๐ก01 )
0.9 ๐() is nonlinear function,
chosen for specific
9
b
2
application; calculated
using calculus
๐๐ธ
= − ๐ก๐๐๐๐๐ก − ๐๐ข๐ก๐๐ข๐ก ;
๐๐๐ข๐ก
๐ ๐๐๐๐ 7
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here
Backwards Pass
Propagate Error backwards
w1
w2
i
1
h
1
w6
o
1
๐ฌ๐๐
o
2
๐ฌ๐๐
๐ฌ = ๐ฌ๐๐ + ๐ฌ๐๐
w7
w3
i
2
w4
1
w5
b
1
h
2
w
8
2
b
2
Calculate new values for the weights
๐๐๐๐
๐ = ๐๐
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of
Course Goes Here
−
๐๐ฌ
๐๐
Encoder problem
• How many layers and how many hidden units?
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of
Course Goes Here
Solving XOR
• Solution with 2 hidden layers
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course
Goes Here
```