Artificial Neural Network Lecture Note: 6 |Page1 Multi-class classification Multi-class classification can be converted to multiple logistic regressions using ONE sVS ALL 2 What actions can human Brain ( BNN ) do ?? § Human can do Classification o EX: when a child see a pen , he can say that this pen belong to class of pens § Human can do Clustering (group similar patterns together) o EX: child can say that some things in the same group because they are similar § Human can do Mapping (pattern association) o Associate a pattern with itself ( storing information ) § Ex: when studying, we save info by associating info by itself Note: Child must be trained before asking him to classify some thing or to cluster similar patterns • There is a part in the human brain that allow the human to do the previous actions called Biological Neural Network (BNN) • The main component in human neural system is Neuron Cell • Neuron considered to be a small processor and memory in human brain Human brain • The brain is a highly complex, non-linear, parallel information processing system. • It performs tasks like pattern recognition, perception, motor control, many times faster than the fastest digital computers. • It characterize by – Robust and fault tolerant – Flexible – can adjust to new environment by learning – Can deal with fuzzy, probabilistic, noisy or inconsistent information – highly parallel – small, compact and requires little power. 3 Human Brain VS Von Neuman Computer Human brain Von Neumann computer # elements 1010 - 1012 neurons 107 - 108 transistors # connections / element 104 50 switching frequency 103 Hz 109 Hz energy / operation 10-16 Joule 10-6 Joule power consumption 10 Watt 100 - 500 Watt reliability of elements low reasonable reliability of system high reasonable Data representation analog digital Memory localization distributed localized Control distributed localized Processing parallel sequential Skill acquisition learning programming 4 Biological Neuron structure • • brain consists of approximately 1011 elements called neurons. • Each of these axons splits up into a series of smaller fiber, Communicate through a network of long fiber called axons. which communicate with other neurons via junctions called synapses that connect to small fibers called dendrites attached to the main body of the neuron (Soma) • Basic computational unit is the Neuron ► Dendrites (inputs, 1 to 104 per neuron) ► Soma (cell body) ► Axon (output) ► Synapses 5 How neurons work • Synapse like a one-way valve. • Electrical signal is generated by the neuron, passes down the axon, and is received by the synapses that join onto other neurons dendrites. • Electrical signal causes the release of transmitter chemicals which flow across a small gap in the synapse (synaptic cleft). • Chemicals can have an excitatory effect on the receiving neuron (making it more likely to fire) or an inhibitory effect (making it less likely to fire) • Total inhibitory and excitatory connections to a particular neuron are summed, if this value exceeds the neurons threshold the neuron fires, otherwise does not Learning in networks of neurons • Knowledge is represented in neural networks by the strength of the synaptic connections between neurons (hence “connectionism”) • Learning in neural networks is accomplished by adjusting the synaptic strengths (weights) • There are three primary categories of neural network learning algorithms : 1. Supervised — exemplar pairs of inputs and (known, labeled) target outputs are used for training. 2. Reinforcement — single good/bad training signal used for training. 3. Unsupervised — no training signal; self-organization and clustering produced by the “training” 6 Artificial Neural network (ANN) ► An Artificial neural network is an information-processing system that has certain performance characteristics in common with biological neural networks. ► ANN have been developed as generalizations of mathematical models of human cognition or neural biology. BNN VS ANN Biological neural network (BNN) Soma Dendrite Axon Strength of connection between the neurons Synapse Learning the solution to a problem Examples Artificial neural network (ANN) Neuron Input Output Weight-value for the specific connection Weigh Changing the connection weights Training data § Figure of neuron 7 How ANN Works ? 1) Information processing occurs at many simple elements called neurons. 2) Signals are passed between neurons over connection links. 3) Each connection link has an associated weight, which multiplies the signal transmitted. 4) Net input is calculated as the weighted sum of the input signals 5) Each neuron applies an (Transfer) activation function to its net input (sum of weighted input signals) to determine its output signal. 6) Each neuron has a single threshold value 7) An output signal is either discrete (e.g., 0 or 1) or it is a realvalued number (e.g., between 0 and 1) y = f(netinput). è f is activation function 8 Adding Bias • A linear neuron is a more flexible model if we include a bias. • A Bias unit can be thought of as a unit which always has an output value of 1, which is connected to the hidden and output layer units via modifiable weights. • It sometimes helps convergence of the weights to an acceptable solution • A bias is exactly equivalent to a weight on an extra input line that always has an activity of 1. y = f(netinput). è f is activation function 9 Example-1: Calculate net input net input = 3 Calculate output using activation function (step function) è Example-2 10 Example-3 – Sigmoidal function net input = 3 11 Characteristics of ANN 1. Architecture (Structure): the pattern of nodes and connections between them 2. Training, or learning, algorithm: method of determining the weights on the connections 3. Activation function function that produces an output based on the input values received by node Characteristic of Artificial Neural Network ý Architecture :arrangement of neurons into layers and the connection pattern between layers 1- Feed forward NN • Single layer • Multi-layer 2- Feed backward (recurrent) NN 3- Associative networks ý Training algorithm: setting the values of the weights 1. Supervised training 2. Unsupervised training 3. Reinforcement training ý Activation function 1. Identity function 2. Binary step function 3. Bipolar sign function 4. Binary sigmoid 5. Bipolar sigmoid 12 Architecture 1- Feed Forward NN - The neurons are arranged in separate layers (input – hidden ouput) - There is no connection between the neurons in the same layer - The neurons in one layer receive inputs from the previous layer - The neurons in one layer delivers its output to the next layer - Allow signal to travel one way only from input to output - No feed back - Associates input with output - The connections are unidirectional (Hierarchical) A- Single layer : has one layer of connection weights ► input layer of source neurons connected to neurons of output layer. ► input neurons are fully connected to output units but are not connected to other input units ► output meurons are not connected to other output units. 13 B- Multi-Layer network : ► A net with one or more layers (or levels) of neurons (hidden neurons) between the input units and output layers. ► There is a layer of weights between two adjacent levels of units (input, hidden, or output). ► Multilayer nets can solve more complicated problems than can single layer nets, but training may be more difficult. 2- Feed-back (Recurrent) NN : ► Some connections are present from a layer to the previous ► ► ► ► ► layers More biologically realistic. Signals traveling in both direction by introducing loops Powerful and complicated Dynamic network Recurrent Network : has at least one feedback loop 14 3- Associative Network: ► There is no hierarchical arrangement ► The connections can be bidirectional Training – Learning algorithm. 15 1- Supervised Learning • the network is presented with inputs together with the target (teacher signal) outputs. • Then, the neural network tries to produce an output as close as possible to the target signal by adjusting the values of internal weights. • The most common supervised learning method is the “error correction method”, using methods such as – Least Mean Square (LMS) – Back Propagation 2- Un-Supervised Learning • There is no teacher (target signal) from outside and the network adjusts its weights in response to only the input patterns. • competitive learning: the neurons take part in some competition for each input. • The winner of the competition and sometimes some other neurons are allowed to change their weights • In simple competitive learning only the winner is allowed to learn (change its weight). • In self-organizing maps other neurons in the neighborhood of the winner may also learn. 3- Reinforcement training • Generalization of Supervised Learning; • Uses some random search strategy until correct answer is found • The teacher scores the performance of the training examples. • Based on actions • Use performance score to change weights randomly. 16 1- Supervised 2- Unsupervised Define: Each of the training patterns associated with target output vector Data: (Input, desired output) Define: Each of the training patterns not associated with target output vector Data: (Different input) Problems: Classification , regression Pattern recognition Problems: Clustering , data reduction NN models: perceptron Heb NN models: Self-organizing maps (SOM) Hopfield Activation function 1- Identity function – linear transfer function • Performs no input squashing • Not very interesting... • output = input 17 2- Binary step function – threshold function – hard limit transfer function – unipolar • Convert the net input, which is a continuously valued variable, to an output unit that is a binary (l or 0) • The binary step function is also known as Heaviside function. 3- Bipolar step function output = 1 -1 net > = θ net < θ 4- Binary sigmoid – log sigmoid • • • • Squashes the neuron’s pre-activation between 0 and 1 Always positive Bounded Strictly increasing 18 5- hyperbolic tangent (‘‘tanh’’) • • • • 6- Squashes the neuron’s pre-activation between -1 and 1 Can be positive or negative Bounded Strictly increasing Rectified linear activation function • Bounded below by 0 (always non-negative) • Not upper bounded • Strictly increasing 19 7- Radial basis activation function 20 Artificial Neural Network Development Process 21 Linearly Separable Function § The function is capable of assigning all inputs to two categories. § Used if number of classes is 2 § Decision boundary : line that partitions the plane into two decision regions § Decision boundary has equation b + åi =1 x i w i = 0 n o Positive region : decision region for output 1 with equation b + x1w1 + x2 w 2 ³ 0 o Negative region : decision boundary for output -1 with equation b + x1w1 + x2 w 2 < 0 22 § If two classes of patterns can be separated by a decision boundary then they are said to be linearly separable § If such a decision boundary does not exist, then the two classes are said to be linearly inseparable (Non-Linear Separable) § Linearly inseparable problems cannot be solved by the simple network , more sophisticated architecture is needed. 23 Capacity of single neuron • Could do binary classification (two outputs): • also known as logistic regression classifier o if greater than 0.5, predict class 1 o otherwise, predict class 0 u Can solve linearly separable problems 24 Can’t solve non-linearly separable problems u The First Artificial Neuron (McCulloch-Pitts network) • The McCulloch-Pitts neuron is perhaps the earliest artificial neuron • The neuron has binary inputs (0 or 1) labelled xi where i = 1,2,..,n. • The activation (output) of a McCulloch-Pitts neuron is binary. • The neuron either fires (has an activation of 1) or does not fire (has an activation of 0). • Each neuron has a fixed threshold value T such that if the net input to the neuron is greater than the threshold, the neuron fires. • The activation function is Binary step • • 25 Architecture • In general, a McCulloch-Pitts neuron Y may receive signals from any number of other neurons. • Each connection path is either excitatory, with weight w > 0, or inhibitory, with weight w < 0 . • All excitatory connections into a particular neuron have the same weights. • Output of each neuron is as follow • Figure of McCulloch-Pitts neuron Algorithm • The weights for a McCulloch-Pitts neuron are set, together with the threshold for the neuron's activation function, • The analysis is used to determine the values of the weights and threshold. • Logic functions will be used as simple examples for a number of neural nets. 26 Example -1 : And Function Example -2 : OR Function 27 Example -3 : And Not Example - 4: NAND Function 28 Example -5 : XOR Function 29 ý Applications of neural Network ☺ Financial modelling – predicting stocks, currency exchange rates ☺ Other time series prediction – climate, weather, ☺ Computer games – intelligent agents, backgammon ☺ Control systems – autonomous adaptable robotics ☺ Pattern recognition – speech recognition, hand-writing recognition, ☺ Data analysis – data compression, data mining ☺ Noise reduction – ECG noise reduction ☺ Bioinformatics – DNA sequencing ý Advantage of Neural Network o ANN are powerful computation system: consists of many nuerons o Generalization: § can learn from training data and generalize to new one. § using responses to prior input patterns to determine the response to a novel input o fault tolerance: § able to recognizes a pattern that has some noises § Still works when part of the net fails o Massive parallel processing: § process more than one pattern at same time using same set of weights o distributed memory representation o Adaptability: § increase network ability of recognition by more training o low energy consumption o useful for brain modeling o used pattern recognition o Able to learn any complex non-linear mapping o Learning instead of programming o Robust: Can deal with incomplete and/or noisy data 30 ý Disadvantage of Neural Network o Need training to operate o High processing time for training o require specialized HW and SW o lack of understanding the behavior of NN o convergence not guaranteed ( to reach solution is not guaranteed ) o no mathematical proof for the learning process o Difficult to design o They are no clear design rules for arbitrary applications o Learning process can be very time consuming o Can over-fit the training data, becoming useless for generalization ý Types of Problems Solved by NN ☺ Classification: determine to which of a discrete number of classes a given input case belongs ☺ Regression: predict the value of a (usually) continuous variable (weather) ☺ Times series- you wish to predict the value of variables from earlier values of the same or other variables. ☺ Clustering (Natural language processing , Data mining) ☺ Control (Automotive control) (robotics) ☺ Function approximation (Modelling) Modelling of highly nonlinear industrial processes , Financial market prediction. 31 ý Who is concerned with NNs? • Computer scientists want to find out about the properties of nonsymbolic information processing with neural nets and about learning systems in general. • Statisticians use neural nets as flexible, nonlinear regression and classification models. • Engineers of many kinds exploit the capabilities of neural networks in many areas, such as signal processing and automatic control. • Cognitive scientists view neural networks as a possible apparatus to describe models of thinking and consciousness (High-level brain function). • Neuro-physiologists use neural networks to describe and explore medium-level brain function (e.g. memory, sensory system, motorics). • Physicists use neural networks to model phenomena statistical mechanics and for a lot of other tasks. • Biologists use Neural Networks to interpret nucleotide sequences. • Philosophers and some other people may also be interested in Neural Networks for various reasons. 32