NEURAL NETWORK THEORY TABLE OF CONTENTS • Part 1: The Motivation and History of Neural Networks • Part 2: Components of Artificial Neural Networks • Part 3: Particular Types of Neural Network Architectures • Part 4: Fundamentals on Learning and Training Samples • Part 5: Applications of Neural Network Theory and Open Problems • Part 6: Homework • Part 7: Bibliography PART 1: THE MOTIVATION AND HISTORY OF NEURAL NETWORKS MOTIVATION • Biologically inspired • The organization of the brain is considered when constructing network configurations and algorithms THE BRAIN • A human Neuron has four elements: • Dendrites – receive signals from other cells • Synapses – where information is stored at the contact points between neurons • Axons – output signals transmitted • Cell body – produces all necessary chemicals for the neuron to function properly ASSOCIATION TO NEURAL NETWORKS • Artificial neurons have • Input channels • Cell body • Output channel • And synapses are simulated with a weight MAIN CHARACTERISTICS ADAPTED FROM BIOLOGY • Self-organization and learning capability • Generalization capability • Fault tolerance THE 100-STEP RULE • Experiments showed that a human can recognize the picture of a familiar object or person in 0.1 seconds. Which corresponds to a neuron switching time of 10−5 seconds in 100 discrete time steps of parallel processing • A computer following the von Neumann architecture can do practically nothing in 100 time steps of assembler steps. WORD TO THE WISE • We must be careful comparing the nervous system with a complicated contemporary device. In ancient times , the brain was compared to a pneumatic machine, in renaissance to a clockwork, and in the 1900's to a telephone network HISTORY OF NEURAL NETWORK THEORY • 1943 - Warren McCulloch and Walter Pitts introduced models of neurological networks • 1947 - Pitts and McCulloch indicated a practical field of application for neural networks • 1949 - Karl Lashley defended his thesis that brain information storage is realized as a distributed system. HISTORY CONTINUED • 1960 - Bernard Widrow and Marcian Hoff introduced the first fast and precise adaptive learning system. The first widely commercially used neural network. Hoff later became the co-founder of Intel Corporation. • 1961 - Karl Steinbuch introduced technical realizations of associative memory which can be seen as predecessors of today's neural associative memories • 1969 - Marvin Minsky and Seymour Papert published a precise analysis of the perceptron to show the perceptron model was not capable of representing many important problems and so, deduced that the field would be a research "dead end". HISTORY PART 3 • 1973 - Christoph von der Malsburg used a neuron model that was non-linear and biologically more motivated • 1974 - Harvard Werbos developed a learning procedure called backpropagation of error • 1982 - Teuvo Kohonen described the self-organizing feature maps also known as Kohonen maps • 1985 - John Hopfield published an article describing a way of finding acceptable solutions for the Travelling Salesman problem using Hopfield nets SIMPLE EXAMPLE OF A NEURAL NETWORK • Assume we have a small robot. This robot has n number of distance sensors from which it extracts input data. Each sensor provides a real numeric value at any time. In this example, the robot can "sense" when it is about to crash. So, it drives until one of its sensors denotes it is going to collide with an object. • Neural networks allow the robot to "learn when to stop" by treating the neural network as a "black box", then we do not know its structure but just regard its behavior in practice. So, we show the robot when to drive on or when to stop. i.e. called training samples, and are taught to the neural network by learning procedures. Either an algorithm or a mathematical formula. From this, the neural network in the robot will generalize from these samples, and learn when to stop. PART 2: COMPONENTS OF ARTIFICIAL NEURAL NETWORKS FLYNN’S TAXONOMY OF COMPUTER DESIGN Single instruction stream Multi instruction stream Single data stream SISD MISD Multiple data stream SIMD MIMD Single program Multiple program SPMD MPMD • Neural Computers are a particular case of MSMID architecture • Simplest case: an algorithm represents an operation of multiplying a large dimensionality vector or matrix by a vector • The number of operation cycles in the problem solving process is determined by the physical entity and complexity of the problem NEURAL “CLUSTERING” • A “cluster” is a synchronously functioning group of single-bit processors that has a special organization that is close to the implementation of the main part of the algorithm • • This provides solutions to two additional problems • 2) to solve weakly formalized problems (e.g. learning for optimal pattern recognition, self-learning clusterization, etc) 1) to minimize or eliminate the information interchange between nodes of the neural computer in the process of problem solving DEFINTIONS NEURONS • Neuron – nonlinear parameterized bounded function y • y=f(π₯1 , π₯2 ,…,π₯π ; π€1 , π€2 ,…,π€π ) where {xi} are the variables and {π€π } are the parameters (or weights) of the function. {π₯π } exists in {0,1} • The variables of the neuron are often called input and its value is the output • The function f can be parameterized in any appropriate fashion • The most frequently used potential v is a weighted sum of inputs with an additional constant term called "bias" such that v=π€0 + π−1 π=1 (π€π ∗ π₯π ) NEURAL NETWORKS • Neural Network – sorted triple (N, V, w) • N is the set of neurons • V is the set π, π |(π, π)πβ whose elements are called connections between neuron i and neuron j • The function π€: π → β neuron i and neuron j defines the weights of the connection between THE PROPAGATION FUNCTION • Looking at neuron j, we will usually find a lot of neurons with connection to j. for a neuron j the propagation function receives outputs π π, 1 , … , π(π, π) of other neurons π1 , … , ππ • which are connected to j and transforms them in consideration of the connecting weights π€(π, π) into the network input πππ‘(π) that can be further processed by the activation function • Network input is the result of the propagation function THRESHOLD FUNCTION • Neurons get activated if the network input exceeds their threshold value: • Definition: Let j be a neuron. The threshold value π½π is uniquely assigned to j and marks the position of the maximum gradient value of the activation function (basically a switching value) ACTIVATION FUNCTION • Definition: let j be a neuron. The activation function is defined as ππ π‘ = π(πππ‘π π‘ , ππ π‘ − 1 , ππ ) • This transforms the network input and the previous activation function into a new activation function. FURTHER PROPERTIES OF THE ACTIVATION FUNCTION • It is advisable that f, the activation function, be a sigmoid function • The parameters are assigned to the neuron nonlinearity. i.e. they belong to the very definition of the activation function such is the case when function f is a radial basis function (RBF) or wavelet. For instance, the output of a gaussian RBF is given by y= π exp[− π=1 (π₯π −π€π )2 2 ∗ π€π+1 ] 2 • Where π€π is the position of the center of the gaussian and π€π+1 is the standard deviation • The main difference between the two above categories of neurons is that RBFs and wavelets are local nonlinearities which vanish asymptotically in all directions of input space, whereas neurons that have a potential and sigmoid nonlinearity have an infinite-range of influence along the direction defined by v=0 OPTIMAL CONTROL THEORY • Zermelo’s problem and the handout • Example problem PART 3: PARTICULAR TYPES OF NEURAL NETWORK ARCHITECTURES TRANSFER FROM LOGICAL BASIS TO THRESHOLD BASIS • In the case of neural computers, the logical basis of the computer system in the simplest case is the basis { π ∗ π₯, π πππ}. This basis maximally corresponds to the logical basis of the major solved problems. The neural computer is a maximally parallelized system for a given algorithmic kernel implementation. • The number of operation cycles in the problem solving process (the number of adjustment cycles for optimization) of the secondary functional in the neural computer is determined by the physical entity and the complexity of the problem FERMI OR LOGISTIC EQUATION AND TANH(X) • Fermi or logistic function. 1+π1 −π₯ • Which maps the range of values (0,1) • Hyperbolic tangent tanh π₯ = π π§ −π −π§ π π§ −π −π§ which maps from (-1,1) NEURAL NETWORK WITH DIRECT CONNECTIONS NEURAL NETWORKS WITH CROSS CONNECTIONS NEURAL NETWORKS WITH ORDERED BACKWARD CONNECTIONS NEURAL NETWORKS WITH AMORPHOUS BACKWARD CONNECTIONS MULTILAYER NEURAL NETWORKS WITH SEQUENTIAL CONNECTIONS MULTILAYER NEURAL NETWORK FEEDFORWARD NETWORKS • Feed forward neural networks- nonlinear function of its inputs which is the composition of the functions of its neurons • A feedforward network with n inputs, ππ hidden neurons and π0 output neurons computes π0 nonlinear functions of its n input variables as compositions of the ππ functions computed by the hidden neurons • • Feedforward networks are static: e.g. if input is constant, so is output Feedforward multilayer networks with sigmoid nonlinearities are often termed multilayer perceptrons or MLPs. FEEDFORWARD NETWORK DIAGRAM COMPLETELY LINKED NETWORKS (CLIQUE) • Completely linked networks permit connections between all neurons except for direct recurrences. Furthermore, the connections must be symmetric. So, every neuron can become an input neuron. (Clique) DIRECTED TERMS • If the function to be computed by the feedforward neural network is thought to have a significant linear component, it may be useful to add linear terms (called directed terms) to the above structure RECURRENT NETWORKS • General Form: • Neural networks that include cycles. Since the output of a neuron cannot be a function of itself, then we must explicitly take time into account. The output of a neuron cannot be a function of itself at the same instant of time, but can be a function of its past values. These are considered discrete-time systems • Each connection of a recurrent neural network is assigned a delay value (possibly equal to zero) in addition to being assigned a weight as in feedforward networks. CANONICAL FORM OF RECURRENT NETWORKS • Governed by recurrent discrete-time equations, the general mathematical description of a linear system is the state equations, • π₯ π = π΄ ∗ π₯ π − 1 + π΅ ∗ π’(π − 1) • π π = π΄ ∗ π₯ π − 1 + π΅ ∗ π’(π − 1) • Where π₯ π is the state vector at time π ∗ π, π’(π) is the input vector at time π ∗ π, π(π) is the output vector at time π ∗ π, and A,B,C,D are matrices. • Property: Any recurrent neural network, however complex can be cast into a canonical form, made of a feedforward neural network, some outputs of which (state outputs) are fed back to the inputs through unit delays. CANONICAL FORM OF RECURRENT NETWORK DIAGRAM THE ORDER OF NEURAL NETWORKS • Synchronous activation - all neurons change their values synchronously. i.e. they simultaneously calculate network inputs, activation and output, and pass them on. Closest to biology, most generic and can be used with networks of arbitrary topology • Random order - a neuron i is randomly chosen and its πππ‘π , ππ and ππ are updated. For n neurons, a cycle is the n-fold execution of this step. Not always useful • Random permutation - each neuron is chosen exactly once, but in random order, during one cycle. This way is used rarely because it is generally useless, and time-consuming • Topological order of activation - the neurons are updated during one cycle and according to a fixed order determined by the network topology. WHEN TO USE NEURAL NETWORKS • The fundamental property of neural networks with supervised training is the parsimonious approximation property. i.e. their ability of approximating any sufficiently regular function with arbitrary accuracy. Therefore, neural networks may be advantageous in any application that requires finding, in a machine learning framework, a nonlinear relation between numerical data • To do so, make sure that • • 1) a nonlinear model is necessary 2) determine if a neural network is necessary instead of, for instance a polynomial approximation. i.e. when the number of variables is large (larger than or equal to 3) PART 4: FUNDAMENTALS ON LEARNING AND TRAINING SAMPLES THEORETICALLY, A NEURAL NETWORK COULD LEARN BY • • • • • • • • • Developing new connections Deleting existing connections Changing connecting weights Changing the threshold values of neurons Varying one or more of the three neuron functions (activation, propagation, output) Developing new neurons Deleting neurons The change of connecting weight is the most common procedure. DIFFERENT TYPES OF TRAINING • Unsupervised learning - the training set only consists of input patterns, the network tries, by itself, to detect similarities and to generate pattern classes • Reinforcement learning - the training set consists of input patterns, after completion of a sequence a value is returned to the network indicating whether the result was right or wrong, and possibly, how it was right or wrong. • Supervised learning - the training set consists of input patterns with correct results so that the network can receive a precise error vector SUPERVISED LEARNING STEPS • Enter input pattern • Forward propagation of the input by the network, generation of the output • Comparing the output with the desired output and provide the error vector • Corrections of the network are calculated based on the error vector • Corrections are applied ERROR VECTOR • determined usually by the root mean square function (RMSE) • Does not always guarantee global minimum, but may only find local minimum • To calculate RMSE • • • • 1) take each error of each data point, square the value. 2) Sum the error squared terms 3) divide by the number of data values 4) take the square root of that value PART 5: APPLICATIONS OF NEURAL NETWORK THEORY AND OPEN PROBLEMS OPEN PROBLEMS • Identifying if the neural network will converge in finite time • Training the neural network to identify local versus global minimums • Neural modularity APPLICATIONS OF NEURAL NETWORK THEORY • Traveling Salesman problem • Image Compression • Character Recognition • Optimal Control Problems PART 6: HOMEWORK • 1) Show that for the following, the given equations can be expressed by the respective functions themselves • Fermi function: π ′ π₯ = π π₯ ∗ (1 − π π₯ ) • Hyperbolic tangent function: tanh′(π₯) = 1 − π‘ππβ2 (π₯) OPTIMAL CONTROL PROBLEM • 2) min π’ π 0 1 + π’2 ππ‘ • π π’πβ π‘βππ‘ π₯ ′ π‘ = π’ π‘ , π₯ 0 = 0, π₯ π = π FIND THE RMSE OF THE BELOW DATA SET Sample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Data 1 4 3 0 9 11 12 6 8 20 11 11 2 0 4 9 5 13 15 1 Estimation -1 7 1 -2 7 11 13 7 8 17 9 13 2 0 5 9 5 14 17 0 PART 7: BIBLIOGRAPHY WORKS CITED • Dreyfus, G. Neural Networks: Methodology and Applications. Berlin: Springer, 2005. Print. • Galushkin, A. I. Neural Networks Theory. Berlin: Springer, 2007. Print. • Kriesel, David. "D. Kriesel." A Brief Introduction to Neural Networks []. Manuscript, n.d. Web. 28 Mar. 2016. • Lenhart, Suzanne, and John T. Workman. Optimal Control Applied to Biological Models. Boca Raton: Chapman & Hall/CRC, 2007. Print. • Ripley, Brian D. Pattern Recognition and Neural Networks. Cambridge: Cambridge UP, 1996. Print. • Rojas, RauΜl. Neural Networks: A Systematic Introduction. Berlin: Springer-Verlag, 1996. Print. • Wasserman, Philip D. Neural Computing: Theory and Practice. New York: Van Nostrand Reinhold, 1989. Print. • https://www.researchgate.net/post/What_are_the_most_important_open_problems_in_the_field_of_artificial_neural_netw orks_for_the_next_ten_years_and_why