Cognitive Neuroscience and Embodied Intelligence Universal Learning Models Based on a courses taught by Prof. Randall O'Reilly, University of Colorado, Prof. Włodzisław Duch, Uniwersytet Mikołaja Kopernika and http://wikipedia.org/ http://grey.colorado.edu/CompCogNeuro/index.php/CECN_CU_Boulder_OReilly http://grey.colorado.edu/CompCogNeuro/index.php/Main_Page Janusz A. Starzyk EE141 1 Task learning We want to combine Hebbian learning and learning using error correction, hidden units and biologically justified models. Hebbian networks model states of the world but not perception-action. Error correction can learn mapping. Unfortunately the delta rule is only good for output units, and not hidden units, because it has to be given a goal. Backpropagation of errors can teach hidden units. But there is no good biological justification for this method… The idea of backpropagation is simple but a detailed algorithms requires many calculations. Main idea: we're looking for the minimum error function, measuring the difference between the desired behavior and the behavior realized by the network. 2 EE141 Error function E(w) – error function, dependent on all parameters of network w, is the sum of errors E(X;w) for all images X. ok(X;w) – values reached on output nr. k network for image X. tk(X;w) – values desired on output nr. k network for image X. One image X, one parameter w then: E X; w t X o X; w 2 Error value f. =0 is not always attainable, the network may not have enough parameters to learn the desired behavior, we can only aim for the smallest error. In the minimum error E(X;w) is for parameter w for derivative dE(X;w)/dw = 0. For many parameters we have all derivatives dE/dwi, or gradient. EE141 3 Error propagation The delta rule minimizes error for one neuron, e.g.. the output neuron, which is reached by signals si Dwik =e ||tk – ok|| si What signals should we take for hidden neurons? First we let signals into the network calculating activation h, output signals from neurons h, through all layers, to the outputs ok (forward step). We calculate the errors dk = (tk-ok), and corrections for the output neurons Dwik = e dk hi. Error for hidden neurons: dj = e Sk wjk dk hj(1-hj), (backward step) (backpropagation of error). The strongest correction for undecided weights – near 0.5 EE141 4 GeneRec Although most models used in psychology teach multilayer perceptron structures with the help of variations of backpropagation (in this way one can learn any function) the idea of transferring information about errors doesn't have a biological justification. GeneRec (General Recirculation, O’Reilly 1996), Bi-directional signal propagation, asymmetrical weights wkl wjk. First phase –, response of the network to the activation of x– gives output y–, then observation of the desired result y+ and propagation to input x+. The change in weights requires information about signals from both phases. 5 EE141 GeneRec - learning The learning rule agrees with the delta rule: Dwij e y j y j xi In comparison with backpropagation the difference of signals [y+-y-] replaces the aggregate error, (the difference of signals) ~ (the difference of activations) * (the derivative of the activation function), thus it is a gradient rule. Db j e y j y j For setups b is xi=1, so: Bi-directional information transfer is almost simultaneous, answers for the formation of attractor states, constraint satisfaction, image completion. The P300 wave which appears 300 msec after activation shows expectations resulting from external activation Errors are the result of activity in the whole network, we will get slightly better results taking the average [x++x-]/2 and retaining the weight symmetry: CHL rule (Contrastive Dwij e xi y j xi y j 6 Hebbian Rule) EE141 Two phases From where does the error come for correction of synaptic connections? The layer on the right side = the middle after time t+1; e.g.. a) word pronunciation: external action correction; b) external expectations and someone's pronunciation; c) awaiting results of action and their observation; d) reconstruction (awaiting input). EE141 7 GeneRec properties Hebbian learning creates a model of the world, remembering correlations, but it is not capable of learning task execution. Hidden layers allow for the transformation of a problem and error correction permits learning of difficult task execution, the relationships of inputs and outputs. The combination of Hebbian learning – correlations (x y) – and errorbased learning can learn everything in a biologically correct manner: CHL leads to symmetry, an approximate symmetry will suffice, connections are generally bidirectional. Err = CHL in the table. * * * * Lack of Ca2+ = there is no learning; little Ca2+ = LTD, much Ca2+ = LTP 8 LTD – unfulfilled expectations, only phase -, lack of z + reinforcement. EE141 Combination of Hebb + errors It's good to combine Hebbian learning and CHL error correction CHL is like socialism tries to correct errors of the whole, limits unit motivation, common responsibility low effectiveness planed activity Hebbian learning is like capitalism based on greed local interests Hebb individualism (Local) efficacy of activity lack of monitoring the whole Error (Remote) EE141 Advantages Disadvantages Autonomic Reliable narrow greedy Purposeful Cooperative interdependent 9 lazy Combination of Hebb + errors It's good to combine Hebbian learning and CHL error correction Correlations and errors: Combination Additionally, inhibition within layers is necessary: it creates economical internal representations, units compete with each other, only the best remain, specialized, makes possible self-organized learning. EE141 10 Simulation of a difficult problem Genrec.proj.gz, chapt. 5.9 3 hidden units. Learning is interrupted after 5 epochs without error. Errors during learning show substantial fluctuations – networks with recurrence are sensitive to small changes in weight, explore different solutions. Compare with learning easy and difficult tasks using only Hebb. EE141 11 Inhibitory competition as a constraint Inhibition Leads to sparse distributed representations (many representations, only some are useful in a concrete situation) Competition and specialization: survival of the best adapted Self-organized learning Often more important than Hebbian Inhibition was also used in the mixture of experts framework gating units are subject to WTA competition control outputs of the experts 12 EE141 Comparison of weight change in learning View of hidden layer weights in Hebbian learning Neural weights are introduced in reference to particular inputs EE141 View of hidden layer weights in error correction learning The weights seem fairly random when compared with Hebbian learning 13 Comparison of weight change in learning b) Epochs Charts comparing a) training errors b) number of cycles as functions of the number of training epochs for three different learning methods Hebbian (Pure Hebb) Error correction (Pure Err) Combination (Hebb& Err) – which attained the best results 14 EE141 Full Leabra model 6 principles of intelligent system construction. 1. 2. 3. 4. Biological realism Distributed representations Inhibitory competition Bidirectional Activation Propagation 1. Error-driven learning 2. Hebbian learning Inhibition within layers, Hebbian learning + error correction for weights between layers. 15 EE141 Generalization in attractor networks GeneRec by itself does not give a good generalization. Simulation: Ch6, model_and_task.proj. learn_rule = PURE_ERR, PURE_HEBB or HEBB_AND_ERR 35 training data, testing every 5 epochs on the remaining 10 data. Learning by error correction only does not give good results. Parameter hebb controls how much CHL and how much Hebb correlation. Pure_err implements only CHL. Check + and – learning phases. Generalization requires good internal representations = strong correlations, and error correction by itself do not lead to sufficiently strong internal representations, only Hebb + kWTA will do it. 16 EE141 Generalization: plots Black line = cnt_err, training data error. Red = unq_pats, determines how many input lines is uniquely represented by the hidden layer (max=10). Blue = gen_Cnt, evaluates generalization for 10 new lines – 8 errors at the end. Weights appear random. Generalization is poor based only on the error correction, since there is no conditions forcing internal representations. Batch Run repeats 5 times (slow). 17 EE141 Generalizacja: korekcja błędów + Hebb Szybka zbieżność, powstają dobre reprezentacje wewnętrzne. 18 EE141 Generalization How do we deal with things which we've never seen nust every time we enter the classroom, every meeting, every sentence that you hear, etc. We always encounter new situations, and we reasonably generalize them How do we do this? 19 EE141 Good representations Internal distributed representations. New concepts are combinations of existing properties. Hebbian learning + competition based on inhibition limit error correction so as to create good representations. 20 EE141 Generalization in attractor networks The GeneRec rule itself doesn't lead to good generalization. Simulations: model_and_task.proj. gz, Chapt. 6 The Hebb parameter controls how much CHL and how much Hebb. Pure_err realizes only CHL, check phases - and + Compare internal representations for different types of learning. 21 EE141 Deep networks To learn difficult problems, many transformations are necessary, strongly changing the representation of the problem. Error signals become weak and learning is difficult. We must add limits and self-organizing learning. Analogy: Balancing several connected sticks is difficult, but adding self-organizing learning between fragments will simplify this significantly – like adding a gyroscope to each element. 22 EE141 Sequential learning Except for object and relationship recognition and task execution, sequential learning is important, eg. the sequence of words in the sentences: The dog bit the man. The man bit the dog. The child lifted up the toy. I drove through the intersection because the car on the right was just approaching. The meaning of words, gestures, behaviors, depends on the sequence, the context. Time plays a fundamental role: the consequences of the appearance of image X may be visible only with a delay, eg. the consequences of the position of figures during a game are only evident after a few turns. Network models react immediately – how do brains do this? 23 EE141 Family tree Example simulation: family_trees.proj.gz, Chapt. 6.4.1 24 people = agent. relations: husband, wife, son, daughter, father, mother, brother, sister, aunt, uncle, cousin. Generalization needs to find relations between people. What is still missing? Temporal and sequential relationships! 24 EE141 Family tree How to learn family relations? Enter all relations according to the tree below. We need 40 epochs to learn, but BP needs 80. Hebbian learning only is not sufficient. Init_cluster + Cluster run. 25 EE141 Sequential learning Cluster plot showing the representation of hidden layer neurons a) before learning b) after learning using a combined Hebbian and errorcorrection method 26 The trained network has two branches corresponding to two families EE141 Sequential learning Categories of temporal relationships: Sequences with a given structure Delayed in time Continuous trajectories The context is represented in the frontal lobes of the cortex it should affect the hidden layer. We need recurrent networks, which can hold onto context information for a period of time. Simple Recurrent Network, SRN, The context layer is a copy of the hidden layer Elman network. 27 EE141 Sequential learning Biological justification for context representation Frontal lobes of the cortex Responsible for planning and performing temporal activities. People with damaged frontal lobes have trouble performing the sequence of an activity even though they have no problem with the individual steps of the activity Frontal lobes are responsible for temporal representations For example words such as “fly” or “pole” acquire meanings based on the context Context is a function of previously acquired information People with schizophrenia can use context directly before an ambiguous word but not context from a previous sentence. Context representations not only lead to sequential behavior but are also necessary for understanding sequentially presented information such as speech. 28 EE141 Examples of sequential learning Can we discover rules of sequence creation? Examples: Are these sequences acceptable? BTXSE BPVPSE BTSXXTVVE BPTVPSE BTXXTTVVE TSXSE VVSXE BSSXSE A machine with consecutive passages produces these behaviors: As studies have shown, people can learn more quickly to recognize letters produced according to a specific pattern, even if they don't know the rules being used 29 EE141 Network realization The network randomly chooses one of two possible states. Hidden/contextual neurons learn to recognize machine states, not only labels. Behavior modeling: the same observations but different internal states => different decisions and next states. Project fsa.proj.gz, chapt. 6.6.3 30 EE141 Temporal delay and reinforcement The reward (reinforcement) often follows with a delay eg. learning a game, behavioral strategies. Idea: we have to foresee sufficiently early what events lead to a reward. This is done by the temporal differences algorithm. (Temporal Differences TD - Sutton). From where does a reward come in the brain? The midbrain dopaminergic system modulates the activity of the basal ganglia (BG) through the substantia nigra (SN), and the frontal cortex through the ventral tegmental area (VTA). It's a rather complicated system, whose actions are related to the evaluation of impulses/actions from the point of view of value and reward. EE141 31 Temporal delay and reinforcement The ventral tegmental area (VTA) is part of the reward system. VTA neurons deliver the neurotransmitter dopamine (DA) to the frontal lobes and the basal ganglia modulating learning in this area responsible for planning and action. More advanced regions of the brain are responsible for producing this global learning signal Studies of patients with damage in the VTA area indicate its role in predicting reward and punishment. 32 EE141 Anticipation of reward and result 33 Anticipation of reward and reaction on the decision (Knutson et al, 2001) EE141 Basal ganglia BG VTA neurons first learn to react to reward and then to predict ahead of time the appearance of a reward. 34 EE141 Formulation sketch –TD algorithm We need to determine a value function, the sum after all future rewards, the further away in time the less important: The adaptive critic AC learns how to estimate the value function V(t). At every point in time, AC tries to predict the value of the reward This can be done recursively: Error of the predicted reward: The network tries to reduce this error. The name of the algorithm – TD (temporal difference) represents the error in the calculation of the value function during a period of 35 time EE141 Network implementation Prediction of activity and error. Conditioned stimulus CS for t=2 Unconditioned stimulus (reward) US for t=16 rl_cond.proj.gz Initially large error for Time=16 because the reward r(16) is unexpected Adaptive critic AC 36 EE141 Two-phase implementation (Phase +) computes the expected size of the reward over time t+1 (value r). (Phase –) in step t-k predicts t-k+1, at the end r(tk). The function value V(t+1) in phase + is carried over to value V(t) in phase - 1 Vˆ (t 1) Vˆ (t 1) CS for t=2 US for t=16 Learning progresses backwards in time affecting the value of the previous 37 EE141 step Two-phase implementation The system learns that stimulants (tone) predicts the reward Input CSC – Complete Serial Compound, uses unique elements for each stimulus for each point in time. Chapt. 6.7.3, proj. rl_cond.proj.gz This is not a very realistic model of classical conditioning. 38 EE141