# linear algebra for machine learning ```Session #1: Introduction
Linear Algebra
for
Machine Learning
UCSD Course
Part 2
&copy; Bilyana Aleksic 2016 UCSD Extension Online Learning Course:
Neuron
𝑤1
𝑥1
∑
𝑥2
𝑤2
𝑦 = 𝜎(𝑧)
𝑧 = ෍ 𝑥𝑖 𝑤𝑖 + 𝑏𝑖
𝑖
&copy; Bilyana Aleksic 2015 UCSD Extension Online Learning Course: Name of Course Goes Here
Perceptron learning algorithm
• Statistical pattern recognition
system
• Sum of feature activities times
learned weights is greater than a
threshold.
How standard pattern recognition
works?
• Convert raw data into vector of
features
• Learn how to weight each feature
• Make a decision that input vector
is positive example of the target
class
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here
Introduce non-linearity
Linear Neuron model of perceptron is limited in what it can do:
We had to introduce step-up function to model non-linearity or decision
making process
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here
Weight Space separated
by hyperplanes
Linear neurons with threshold units where each test case defines a
line that separate the weights space
We can now limit the size of the space for the “good” weights
But how do we train a multi layer neural network?
Can not use perceptron learning algorithm because we don’t know
the output values for the hidden units
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here
What binary threshold
neurons can not do?
• Can not tell if two single bit features are the same like solve an XOR
circuit where input output pairs give inequalities that are impossible
to satisfy
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of
Course Goes Here
Hidden units
• How do we train a multi layer network?
algorithm
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course
Goes Here
Backpropagation
Goal is to optimize weights so that neural network can learn how
to map inputs into outputs by minimizing the prediction error
w1
w2
i
1
w5
h
1
w6
o
1
i
2
w4
b
1
h
2
w
8
2
o
2
b
2
=
𝝏𝑬 𝝏𝒐𝒖𝒕𝒊
𝝏𝒐𝒖𝒕𝒊 𝝏𝒘𝒊
Chain Rule:
𝝏𝑬
𝝏𝒘𝒊
w7
w3
1
𝝏𝑬
𝝏𝒘𝒊
𝝏𝑬
𝝏𝒐𝒖𝒕𝒊 𝝏𝒏𝒆𝒕𝒊
𝒊 𝝏𝒏𝒆𝒕𝒊 𝝏𝒘𝒊
=𝝏𝒐𝒖𝒕
𝑛𝑒𝑡𝑜1 is signal “z” from slide 2
“Introduce non-linearity”
linear output of neuron
𝑛𝑒𝑡𝑜1 =𝑤5 ℎ1 + 𝑤6 ℎ2 + 𝑏2 ∗ 1
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course
Goes Here
Forward Propagation
Start with random weights, apply input and calculate each layer
output
𝑛𝑒𝑡𝑜1 =𝑤5 ℎ1 + 𝑤6 ℎ2 + 𝑏2 ∗ 1
𝝏𝒏𝒆𝒕𝒊
is calculated using linear algebra
𝝏𝒘𝒊
w1 h
0.20 w2
1
0.40
0.15
i
1
0.25
i
2
o 0.1
0.45 w6
1
w3
0.50
h
0.30 w4
2
1
b
1
w5
0.55
w8
2
w7
o
2
𝑜𝑢𝑡𝑜1 = 𝑔 (𝑛𝑒𝑡01 )
0.9 𝑔() is nonlinear function,
chosen for specific
9
b
2
application; calculated
using calculus
𝜕𝐸
= − 𝑡𝑎𝑟𝑔𝑒𝑡 − 𝑜𝑢𝑡𝑝𝑢𝑡 ;
𝜕𝑜𝑢𝑡
𝑠𝑙𝑖𝑑𝑒 7
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here
Backwards Pass
Propagate Error backwards
w1
w2
i
1
h
1
w6
o
1
𝑬𝟎𝟏
o
2
𝑬𝟎𝟐
𝑬 = 𝑬𝟎𝟏 + 𝑬𝟎𝟐
w7
w3
i
2
w4
1
w5
b
1
h
2
w
8
2
b
2
Calculate new values for the weights
𝒘𝒏𝒆𝒘
𝟏 = 𝒘𝟏
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of
Course Goes Here
−
𝝏𝑬
𝝏𝒘
Encoder problem
• How many layers and how many hidden units?
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of
Course Goes Here
Solving XOR
• Solution with 2 hidden layers
&copy; Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course
Goes Here
```