Linear Algebra for Machine Learning: Neural Networks

Session #1: Introduction Linear Algebra for Machine Learning UCSD Course Part 2 © Bilyana Aleksic 2016 UCSD Extension Online Learning Course: Neuron 𝑤1 𝑥1 ∑ 𝑥2 𝑤2 𝑦 = 𝜎(𝑧) 𝑧 = ෍ 𝑥𝑖 𝑤𝑖 + 𝑏𝑖 𝑖 © Bilyana Aleksic 2015 UCSD Extension Online Learning Course: Name of Course Goes Here Perceptron learning algorithm • Statistical pattern recognition system • Sum of feature activities times learned weights is greater than a threshold. How standard pattern recognition works? • Convert raw data into vector of features • Learn how to weight each feature • Make a decision that input vector is positive example of the target class © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here Introduce non-linearity Linear Neuron model of perceptron is limited in what it can do: We had to introduce step-up function to model non-linearity or decision making process © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here Weight Space separated by hyperplanes Linear neurons with threshold units where each test case defines a line that separate the weights space We can now limit the size of the space for the “good” weights But how do we train a multi layer neural network? Can not use perceptron learning algorithm because we don’t know the output values for the hidden units © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here What binary threshold neurons can not do? • Can not tell if two single bit features are the same like solve an XOR circuit where input output pairs give inequalities that are impossible to satisfy © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here Hidden units • How do we train a multi layer network? Gradient descent algorithm © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here Backpropagation Goal is to optimize weights so that neural network can learn how to map inputs into outputs by minimizing the prediction error w1 w2 i 1 w5 h 1 w6 o 1 i 2 w4 b 1 h 2 w 8 2 o 2 b 2 = 𝝏𝑬 𝝏𝒐𝒖𝒕𝒊 𝝏𝒐𝒖𝒕𝒊 𝝏𝒘𝒊 Chain Rule: 𝝏𝑬 𝝏𝒘𝒊 w7 w3 1 𝝏𝑬 𝝏𝒘𝒊 𝝏𝑬 𝝏𝒐𝒖𝒕𝒊 𝝏𝒏𝒆𝒕𝒊 𝒊 𝝏𝒏𝒆𝒕𝒊 𝝏𝒘𝒊 =𝝏𝒐𝒖𝒕 𝑛𝑒𝑡𝑜1 is signal “z” from slide 2 “Introduce non-linearity” linear output of neuron 𝑛𝑒𝑡𝑜1 =𝑤5 ℎ1 + 𝑤6 ℎ2 + 𝑏2 ∗ 1 © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here Forward Propagation Start with random weights, apply input and calculate each layer output 𝑛𝑒𝑡𝑜1 =𝑤5 ℎ1 + 𝑤6 ℎ2 + 𝑏2 ∗ 1 𝝏𝒏𝒆𝒕𝒊 is calculated using linear algebra 𝝏𝒘𝒊 w1 h 0.20 w2 1 0.40 0.15 i 1 0.25 i 2 o 0.1 0.45 w6 1 w3 0.50 h 0.30 w4 2 1 b 1 w5 0.55 w8 2 w7 o 2 𝑜𝑢𝑡𝑜1 = 𝑔 (𝑛𝑒𝑡01 ) 0.9 𝑔() is nonlinear function, chosen for specific 9 b 2 application; calculated using calculus 𝜕𝐸 = − 𝑡𝑎𝑟𝑔𝑒𝑡 − 𝑜𝑢𝑡𝑝𝑢𝑡 ; 𝜕𝑜𝑢𝑡 𝑠𝑙𝑖𝑑𝑒 7 © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here Backwards Pass Propagate Error backwards w1 w2 i 1 h 1 w6 o 1 𝑬𝟎𝟏 o 2 𝑬𝟎𝟐 𝑬 = 𝑬𝟎𝟏 + 𝑬𝟎𝟐 w7 w3 i 2 w4 1 w5 b 1 h 2 w 8 2 b 2 Calculate new values for the weights 𝒘𝒏𝒆𝒘 𝟏 = 𝒘𝟏 © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here − 𝝏𝑬 𝝏𝒘 Encoder problem • How many layers and how many hidden units? © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here Solving XOR • Solution with 2 hidden layers © Instructor Name 2015 UCSD Extension Online Learning Course: Name of Course Goes Here

Linear Algebra for Machine Learning: Neural Networks

Related documents

Products

Support

Linear Algebra for Machine Learning: Neural Networks

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib