Session #1: Introduction Linear Algebra for Machine Learning UCSD Course Part 2 © Bilyana Aleksic 2016 UCSD Extension Online Learning Course: Neuron ๐ค1 ๐ฅ1 ∑ ๐ฅ2 ๐ค2 ๐ฆ = ๐(๐ง) ๐ง = เท ๐ฅ๐ ๐ค๐ + ๐๐ ๐ © Bilyana Aleksic 2015 UCSD Extension Online Learning Course: Name of Course Goes Here Perceptron learning algorithm • Statistical pattern recognition system • Sum of feature activities times learned weights is greater than a threshold. How standard pattern recognition works? • Convert raw data into vector of features • Learn how to weight each feature • Make a decision that input vector is positive example of the target class © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here Introduce non-linearity Linear Neuron model of perceptron is limited in what it can do: We had to introduce step-up function to model non-linearity or decision making process © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here Weight Space separated by hyperplanes Linear neurons with threshold units where each test case defines a line that separate the weights space We can now limit the size of the space for the “good” weights But how do we train a multi layer neural network? Can not use perceptron learning algorithm because we don’t know the output values for the hidden units © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here What binary threshold neurons can not do? • Can not tell if two single bit features are the same like solve an XOR circuit where input output pairs give inequalities that are impossible to satisfy © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here Hidden units • How do we train a multi layer network? Gradient descent algorithm © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here Backpropagation Goal is to optimize weights so that neural network can learn how to map inputs into outputs by minimizing the prediction error w1 w2 i 1 w5 h 1 w6 o 1 i 2 w4 b 1 h 2 w 8 2 o 2 b 2 = ๐๐ฌ ๐๐๐๐๐ ๐๐๐๐๐ ๐๐๐ Chain Rule: ๐๐ฌ ๐๐๐ w7 w3 1 ๐๐ฌ ๐๐๐ ๐๐ฌ ๐๐๐๐๐ ๐๐๐๐๐ ๐ ๐๐๐๐๐ ๐๐๐ =๐๐๐๐ ๐๐๐ก๐1 is signal “z” from slide 2 “Introduce non-linearity” linear output of neuron ๐๐๐ก๐1 =๐ค5 โ1 + ๐ค6 โ2 + ๐2 ∗ 1 © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here Forward Propagation Start with random weights, apply input and calculate each layer output ๐๐๐ก๐1 =๐ค5 โ1 + ๐ค6 โ2 + ๐2 ∗ 1 ๐๐๐๐๐ is calculated using linear algebra ๐๐๐ w1 h 0.20 w2 1 0.40 0.15 i 1 0.25 i 2 o 0.1 0.45 w6 1 w3 0.50 h 0.30 w4 2 1 b 1 w5 0.55 w8 2 w7 o 2 ๐๐ข๐ก๐1 = ๐ (๐๐๐ก01 ) 0.9 ๐() is nonlinear function, chosen for specific 9 b 2 application; calculated using calculus ๐๐ธ = − ๐ก๐๐๐๐๐ก − ๐๐ข๐ก๐๐ข๐ก ; ๐๐๐ข๐ก ๐ ๐๐๐๐ 7 © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here Backwards Pass Propagate Error backwards w1 w2 i 1 h 1 w6 o 1 ๐ฌ๐๐ o 2 ๐ฌ๐๐ ๐ฌ = ๐ฌ๐๐ + ๐ฌ๐๐ w7 w3 i 2 w4 1 w5 b 1 h 2 w 8 2 b 2 Calculate new values for the weights ๐๐๐๐ ๐ = ๐๐ © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here − ๐๐ฌ ๐๐ Encoder problem • How many layers and how many hidden units? © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here Solving XOR • Solution with 2 hidden layers © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here