NEURAL NETWORK THEORY

advertisement
NEURAL NETWORK THEORY
TABLE OF CONTENTS
• Part 1: The Motivation and History of Neural Networks
• Part 2: Components of Artificial Neural Networks
• Part 3: Particular Types of Neural Network Architectures
• Part 4: Fundamentals on Learning and Training Samples
• Part 5: Applications of Neural Network Theory and Open Problems
• Part 6: Homework
• Part 7: Bibliography
PART 1: THE MOTIVATION AND
HISTORY OF NEURAL NETWORKS
MOTIVATION
• Biologically inspired
• The organization of the brain is considered when constructing network
configurations and algorithms
THE BRAIN
• A human Neuron has four elements:
• Dendrites – receive signals from other cells
• Synapses – where information is stored at the contact points between neurons
• Axons – output signals transmitted
• Cell body – produces all necessary chemicals for the neuron to function
properly
ASSOCIATION TO NEURAL NETWORKS
• Artificial neurons have
• Input channels
• Cell body
• Output channel
• And synapses are simulated with a weight
MAIN CHARACTERISTICS ADAPTED FROM BIOLOGY
• Self-organization and learning capability
• Generalization capability
• Fault tolerance
THE 100-STEP RULE
• Experiments showed that a human can recognize the picture of a familiar
object or person in 0.1 seconds. Which corresponds to a neuron switching time
of 10−5 seconds in 100 discrete time steps of parallel processing
• A computer following the von Neumann architecture can do practically nothing
in 100 time steps of assembler steps.
WORD TO THE WISE
• We must be careful comparing the nervous system with a complicated
contemporary device. In ancient times , the brain was compared to a
pneumatic machine, in renaissance to a clockwork, and in the 1900's to a
telephone network
HISTORY OF NEURAL NETWORK THEORY
• 1943 - Warren McCulloch and Walter Pitts introduced models of neurological
networks
• 1947 - Pitts and McCulloch indicated a practical field of application for
neural networks
• 1949 - Karl Lashley defended his thesis that brain information storage is
realized as a distributed system.
HISTORY CONTINUED
•
1960 - Bernard Widrow and Marcian Hoff introduced the first fast and precise
adaptive learning system. The first widely commercially used neural network. Hoff
later became the co-founder of Intel Corporation.
•
1961 - Karl Steinbuch introduced technical realizations of associative memory which
can be seen as predecessors of today's neural associative memories
•
1969 - Marvin Minsky and Seymour Papert published a precise analysis of the
perceptron to show the perceptron model was not capable of representing many
important problems and so, deduced that the field would be a research "dead end".
HISTORY PART 3
•
1973 - Christoph von der Malsburg used a neuron model that was non-linear and
biologically more motivated
•
1974 - Harvard Werbos developed a learning procedure called backpropagation
of error
•
1982 - Teuvo Kohonen described the self-organizing feature maps also known as
Kohonen maps
•
1985 - John Hopfield published an article describing a way of finding acceptable
solutions for the Travelling Salesman problem using Hopfield nets
SIMPLE EXAMPLE OF A NEURAL NETWORK
• Assume we have a small robot. This robot has n number of distance sensors from which it
extracts input data. Each sensor provides a real numeric value at any time. In this example,
the robot can "sense" when it is about to crash. So, it drives until one of its sensors denotes it is
going to collide with an object.
• Neural networks allow the robot to "learn when to stop" by treating the neural network as a
"black box", then we do not know its structure but just regard its behavior in practice. So, we
show the robot when to drive on or when to stop. i.e. called training samples, and are taught
to the neural network by learning procedures. Either an algorithm or a mathematical formula.
From this, the neural network in the robot will generalize from these samples, and learn when
to stop.
PART 2: COMPONENTS OF
ARTIFICIAL NEURAL
NETWORKS
FLYNN’S TAXONOMY OF COMPUTER DESIGN
Single instruction stream
Multi instruction stream
Single data stream
SISD
MISD
Multiple data stream
SIMD
MIMD
Single program
Multiple program
SPMD
MPMD
• Neural Computers are a particular case of MSMID architecture
• Simplest case: an algorithm represents an operation of multiplying a large
dimensionality vector or matrix by a vector
• The number of operation cycles in the problem solving process is determined by the
physical entity and complexity of the problem
NEURAL “CLUSTERING”
•
A “cluster” is a synchronously functioning group of single-bit processors that has a
special organization that is close to the implementation of the main part of the
algorithm
•
•
This provides solutions to two additional problems
•
2) to solve weakly formalized problems (e.g. learning for optimal pattern
recognition, self-learning clusterization, etc)
1) to minimize or eliminate the information interchange between nodes of the neural
computer in the process of problem solving
DEFINTIONS
NEURONS
• Neuron – nonlinear parameterized bounded function y
• y=f(π‘₯1 , π‘₯2 ,…,π‘₯𝑛 ; 𝑀1 , 𝑀2 ,…,𝑀𝑝 ) where {xi} are the variables and {𝑀𝑗 }
are the parameters (or weights) of the function. {π‘₯𝑗 } exists in {0,1}
• The variables of the neuron are often called input and its value is the
output
• The function f can be parameterized in any appropriate fashion
• The most frequently used potential v is a weighted sum of inputs with an
additional constant term called "bias" such that v=𝑀0 +
𝑛−1
𝑖=1 (𝑀𝑖
∗ π‘₯𝑖 )
NEURAL NETWORKS
• Neural Network – sorted triple (N, V, w)
• N is the set of neurons
• V is the set 𝑖, 𝑗 |(𝑖, 𝑗)πœ–β„• whose elements are called connections
between neuron i and neuron j
• The function 𝑀: 𝑉 → ℝ
neuron i and neuron j
defines the weights of the connection between
THE PROPAGATION FUNCTION
• Looking at neuron j, we will usually find a lot of neurons with connection to j.
for a neuron j the propagation function receives outputs π‘œ 𝑖, 1 , … , π‘œ(𝑖, 𝑛) of
other neurons 𝑖1 , … , 𝑖𝑛
•
which are connected to j and transforms them in consideration of the
connecting weights 𝑀(𝑖, 𝑗) into the network input 𝑛𝑒𝑑(𝑗) that can be further
processed by the activation function
• Network input is the result of the propagation function
THRESHOLD FUNCTION
• Neurons get activated if the network input exceeds their threshold value:
• Definition: Let j be a neuron. The threshold value πœ½π’‹ is uniquely assigned to j
and marks the position of the maximum gradient value of the activation
function (basically a switching value)
ACTIVATION FUNCTION
• Definition: let j be a neuron. The activation function is defined as π‘Žπ‘—
𝑑 =
𝑓(𝑛𝑒𝑑𝑗 𝑑 , π‘Žπ‘— 𝑑 − 1 , πœƒπ‘— )
• This transforms the network input and the previous activation function into a
new activation function.
FURTHER PROPERTIES OF THE ACTIVATION
FUNCTION
• It is advisable that f, the activation function, be a sigmoid function
• The parameters are assigned to the neuron nonlinearity. i.e. they belong to the very definition
of the activation function such is the case when function f is a radial basis function (RBF) or
wavelet. For instance, the output of a gaussian RBF is given by y=
𝑛
exp[−
𝑖=1
(π‘₯𝑖 −𝑀𝑖 )2
2
∗ 𝑀𝑛+1
]
2
• Where 𝑀𝑖 is the position of the center of the gaussian and 𝑀𝑛+1 is the standard deviation
• The main difference between the two above categories of neurons is that RBFs and wavelets
are local nonlinearities which vanish asymptotically in all directions of input space, whereas
neurons that have a potential and sigmoid nonlinearity have an infinite-range of influence
along the direction defined by v=0
OPTIMAL CONTROL THEORY
• Zermelo’s problem and the handout
• Example problem
PART 3: PARTICULAR TYPES OF
NEURAL NETWORK ARCHITECTURES
TRANSFER FROM LOGICAL BASIS TO THRESHOLD
BASIS
• In the case of neural computers, the logical basis of the computer system in the
simplest case is the basis { π‘Ž ∗ π‘₯, 𝑠𝑖𝑔𝑛}. This basis maximally corresponds to
the logical basis of the major solved problems. The neural computer is a
maximally parallelized system for a given algorithmic kernel implementation.
•
The number of operation cycles in the problem solving process (the number of
adjustment cycles for optimization) of the secondary functional in the neural
computer is determined by the physical entity and the complexity of the
problem
FERMI OR LOGISTIC EQUATION AND TANH(X)
• Fermi or logistic function. 1+𝑒1 −π‘₯
• Which maps the range of values (0,1)
• Hyperbolic tangent tanh
π‘₯ =
𝑒 𝑧 −𝑒 −𝑧
𝑒 𝑧 −𝑒 −𝑧
which maps from (-1,1)
NEURAL NETWORK WITH DIRECT CONNECTIONS
NEURAL NETWORKS WITH CROSS CONNECTIONS
NEURAL NETWORKS WITH ORDERED BACKWARD
CONNECTIONS
NEURAL NETWORKS WITH AMORPHOUS
BACKWARD CONNECTIONS
MULTILAYER NEURAL NETWORKS WITH
SEQUENTIAL CONNECTIONS
MULTILAYER NEURAL NETWORK
FEEDFORWARD NETWORKS
•
Feed forward neural networks- nonlinear function of its inputs which is the
composition of the functions of its neurons
•
A feedforward network with n inputs, 𝑁𝑐 hidden neurons and 𝑁0 output neurons
computes 𝑁0 nonlinear functions of its n input variables as compositions of the 𝑁𝑐
functions computed by the hidden neurons
•
•
Feedforward networks are static: e.g. if input is constant, so is output
Feedforward multilayer networks with sigmoid nonlinearities are often termed
multilayer perceptrons or MLPs.
FEEDFORWARD NETWORK DIAGRAM
COMPLETELY LINKED NETWORKS (CLIQUE)
• Completely linked networks permit connections between all neurons except for
direct recurrences. Furthermore, the connections must be symmetric. So, every
neuron can become an input neuron. (Clique)
DIRECTED TERMS
• If the function to be computed by the feedforward neural network is thought
to have a significant linear component, it may be useful to add linear terms
(called directed terms) to the above structure
RECURRENT NETWORKS
•
General Form:
•
Neural networks that include cycles. Since the output of a neuron cannot be a function
of itself, then we must explicitly take time into account. The output of a neuron cannot
be a function of itself at the same instant of time, but can be a function of its past
values. These are considered discrete-time systems
•
Each connection of a recurrent neural network is assigned a delay value (possibly
equal to zero) in addition to being assigned a weight as in feedforward networks.
CANONICAL FORM OF RECURRENT NETWORKS
• Governed by recurrent discrete-time equations, the general mathematical description of a
linear system is the state equations,
• π‘₯ π‘˜ = 𝐴 ∗ π‘₯ π‘˜ − 1 + 𝐡 ∗ 𝑒(π‘˜ − 1)
• 𝑔 π‘˜ = 𝐴 ∗ π‘₯ π‘˜ − 1 + 𝐡 ∗ 𝑒(π‘˜ − 1)
• Where π‘₯ π‘˜ is the state vector at time π‘˜ ∗ 𝑇, 𝑒(π‘˜) is the input vector at time π‘˜ ∗ 𝑇, 𝑔(π‘˜) is
the output vector at time π‘˜ ∗ 𝑇, and A,B,C,D are matrices.
• Property: Any recurrent neural network, however complex can be cast into a canonical form,
made of a feedforward neural network, some outputs of which (state outputs) are fed back to
the inputs through unit delays.
CANONICAL FORM OF RECURRENT NETWORK
DIAGRAM
THE ORDER OF NEURAL NETWORKS
• Synchronous activation - all neurons change their values synchronously. i.e. they simultaneously
calculate network inputs, activation and output, and pass them on. Closest to biology, most
generic and can be used with networks of arbitrary topology
• Random order - a neuron i is randomly chosen and its 𝑛𝑒𝑑𝑖 , π‘Žπ‘– and π‘œπ‘– are updated. For n
neurons, a cycle is the n-fold execution of this step. Not always useful
• Random permutation - each neuron is chosen exactly once, but in random order, during one
cycle. This way is used rarely because it is generally useless, and time-consuming
• Topological order of activation - the neurons are updated during one cycle and according to
a fixed order determined by the network topology.
WHEN TO USE NEURAL NETWORKS
•
The fundamental property of neural networks with supervised training is the
parsimonious approximation property. i.e. their ability of approximating any
sufficiently regular function with arbitrary accuracy. Therefore, neural networks may
be advantageous in any application that requires finding, in a machine learning
framework, a nonlinear relation between numerical data
•
To do so, make sure that
•
•
1) a nonlinear model is necessary
2) determine if a neural network is necessary instead of, for instance a polynomial
approximation. i.e. when the number of variables is large (larger than or equal to 3)
PART 4: FUNDAMENTALS ON
LEARNING AND TRAINING
SAMPLES
THEORETICALLY, A NEURAL NETWORK COULD
LEARN BY
•
•
•
•
•
•
•
•
•
Developing new connections
Deleting existing connections
Changing connecting weights
Changing the threshold values of neurons
Varying one or more of the three neuron functions (activation, propagation, output)
Developing new neurons
Deleting neurons
The change of connecting weight is the most common procedure.
DIFFERENT TYPES OF TRAINING
• Unsupervised learning - the training set only consists of input patterns, the
network tries, by itself, to detect similarities and to generate pattern classes
• Reinforcement learning - the training set consists of input patterns, after
completion of a sequence a value is returned to the network indicating
whether the result was right or wrong, and possibly, how it was right or wrong.
• Supervised learning - the training set consists of input patterns with correct
results so that the network can receive a precise error vector
SUPERVISED LEARNING STEPS
• Enter input pattern
• Forward propagation of the input by the network, generation of the output
• Comparing the output with the desired output and provide the error vector
• Corrections of the network are calculated based on the error vector
• Corrections are applied
ERROR VECTOR
• determined usually by the root mean square function (RMSE)
• Does not always guarantee global minimum, but may only find local minimum
• To calculate RMSE
•
•
•
•
1) take each error of each data point, square the value.
2) Sum the error squared terms
3) divide by the number of data values
4) take the square root of that value
PART 5: APPLICATIONS OF
NEURAL NETWORK THEORY
AND OPEN PROBLEMS
OPEN PROBLEMS
• Identifying if the neural network will converge in finite time
• Training the neural network to identify local versus global minimums
• Neural modularity
APPLICATIONS OF NEURAL NETWORK THEORY
• Traveling Salesman problem
• Image Compression
• Character Recognition
• Optimal Control Problems
PART 6: HOMEWORK
• 1) Show that for the following, the given equations can be expressed by the
respective functions themselves
• Fermi function: 𝑓 ′ π‘₯ = 𝑓 π‘₯ ∗ (1 − 𝑓 π‘₯ )
• Hyperbolic tangent function: tanh′(π‘₯) = 1 − π‘‘π‘Žπ‘›β„Ž2 (π‘₯)
OPTIMAL CONTROL PROBLEM
• 2) min
𝑒
π‘Ž
0
1 + 𝑒2 𝑑𝑑
• π‘ π‘’π‘β„Ž π‘‘β„Žπ‘Žπ‘‘ π‘₯ ′ 𝑑 = 𝑒 𝑑 , π‘₯ 0 = 0, π‘₯ π‘Ž = 𝑏
FIND THE RMSE OF THE BELOW DATA SET
Sample
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Data
1
4
3
0
9
11
12
6
8
20
11
11
2
0
4
9
5
13
15
1
Estimation
-1
7
1
-2
7
11
13
7
8
17
9
13
2
0
5
9
5
14
17
0
PART 7: BIBLIOGRAPHY
WORKS CITED
• Dreyfus, G. Neural Networks: Methodology and Applications. Berlin: Springer, 2005. Print.
• Galushkin, A. I. Neural Networks Theory. Berlin: Springer, 2007. Print.
• Kriesel, David. "D. Kriesel." A Brief Introduction to Neural Networks []. Manuscript, n.d. Web. 28 Mar. 2016.
• Lenhart, Suzanne, and John T. Workman. Optimal Control Applied to Biological Models. Boca Raton: Chapman & Hall/CRC,
2007. Print.
• Ripley, Brian D. Pattern Recognition and Neural Networks. Cambridge: Cambridge UP, 1996. Print.
• Rojas, Raúl. Neural Networks: A Systematic Introduction. Berlin: Springer-Verlag, 1996. Print.
• Wasserman, Philip D. Neural Computing: Theory and Practice. New York: Van Nostrand Reinhold, 1989. Print.
• https://www.researchgate.net/post/What_are_the_most_important_open_problems_in_the_field_of_artificial_neural_netw
orks_for_the_next_ten_years_and_why
Download